f88ff9a2912a06b12a30ecf77189049a29c834ef
1471 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
535c72f40b |
Merge 4.19.240 into android-4.19-stable
Changes in 4.19.240 etherdevice: Adjust ether_addr* prototypes to silence -Wstringop-overead mm: page_alloc: fix building error on -Werror=array-compare tracing: Dump stacktrace trigger to the corresponding instance can: usb_8dev: usb_8dev_start_xmit(): fix double dev_kfree_skb() in error path dm integrity: fix memory corruption when tag_size is less than digest size gfs2: assign rgrp glock before compute_bitstructs ALSA: usb-audio: Clear MIDI port active flag after draining tcp: fix race condition when creating child sockets from syncookies tcp: Fix potential use-after-free due to double kfree() ASoC: atmel: Remove system clock tree configuration for at91sam9g20ek ASoC: msm8916-wcd-digital: Check failure for devm_snd_soc_register_component dmaengine: imx-sdma: Fix error checking in sdma_event_remap rxrpc: Restore removed timer deletion net/packet: fix packet_sock xmit return value checking net/sched: cls_u32: fix possible leak in u32_init_knode() netlink: reset network and mac headers in netlink_dump() ARM: vexpress/spc: Avoid negative array index when !SMP reset: tegra-bpmp: Restore Handle errors in BPMP response platform/x86: samsung-laptop: Fix an unsigned comparison which can never be negative ALSA: usb-audio: Fix undefined behavior due to shift overflowing the constant vxlan: fix error return code in vxlan_fdb_append cifs: Check the IOCB_DIRECT flag, not O_DIRECT mt76: Fix undefined behavior due to shift overflowing the constant brcmfmac: sdio: Fix undefined behavior due to shift overflowing the constant dpaa_eth: Fix missing of_node_put in dpaa_get_ts_info() drm/msm/mdp5: check the return of kzalloc() net: macb: Restart tx only if queue pointer is lagging stat: fix inconsistency between struct stat and struct compat_stat ata: pata_marvell: Check the 'bmdma_addr' beforing reading dma: at_xdmac: fix a missing check on list iterator drm/panel/raspberrypi-touchscreen: Avoid NULL deref if not initialised drm/panel/raspberrypi-touchscreen: Initialise the bridge in prepare powerpc/perf: Fix power9 event alternatives openvswitch: fix OOB access in reserve_sfa_size() ASoC: soc-dapm: fix two incorrect uses of list iterator e1000e: Fix possible overflow in LTR decoding ARC: entry: fix syscall_trace_exit argument arm_pmu: Validate single/group leader events ext4: fix symlink file size not match to file content ext4: limit length to bitmap_maxbytes - blocksize in punch_hole ext4: fix overhead calculation to account for the reserved gdt blocks ext4: force overhead calculation if the s_overhead_cluster makes no sense staging: ion: Prevent incorrect reference counting behavour block/compat_ioctl: fix range check in BLKGETSIZE ax25: add refcount in ax25_dev to avoid UAF bugs ax25: fix reference count leaks of ax25_dev ax25: fix UAF bugs of net_device caused by rebinding operation ax25: Fix refcount leaks caused by ax25_cb_del() ax25: fix UAF bug in ax25_send_control() ax25: fix NPD bug in ax25_disconnect ax25: Fix NULL pointer dereferences in ax25 timers ax25: Fix UAF bugs in ax25 timers Revert "net: micrel: fix KS8851_MLL Kconfig" Linux 4.19.240 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> Change-Id: I8e14eaee84f2ca2ffa54214bd3e86fa1f5ea7954 |
||
|
|
06ad6b54ce |
mm: page_alloc: fix building error on -Werror=array-compare
commit ca831f29f8f25c97182e726429b38c0802200c8f upstream.
Arthur Marsh reported we would hit the error below when building kernel
with gcc-12:
CC mm/page_alloc.o
mm/page_alloc.c: In function `mem_init_print_info':
mm/page_alloc.c:8173:27: error: comparison between two arrays [-Werror=array-compare]
8173 | if (start <= pos && pos < end && size > adj) \
|
In C++20, the comparision between arrays should be warned.
Link: https://lkml.kernel.org/r/20211125130928.32465-1-sxwjean@me.com
Signed-off-by: Xiongwei Song <sxwjean@gmail.com>
Reported-by: Arthur Marsh <arthur.marsh@internode.on.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Khem Raj <raj.khem@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
||
|
|
dbfc3fbbe0 |
Merge 4.19.239 into android-4.19-stable
Changes in 4.19.239 memory: atmel-ebi: Fix missing of_node_put in atmel_ebi_probe net/sched: flower: fix parsing of ethertype following VLAN header veth: Ensure eth header is in skb's linear part gpiolib: acpi: use correct format characters mlxsw: i2c: Fix initialization error flow net: ethernet: stmmac: fix altr_tse_pcs function when using a fixed-link sctp: Initialize daddr on peeled off socket testing/selftests/mqueue: Fix mq_perf_tests to free the allocated cpu set nfc: nci: add flush_workqueue to prevent uaf cifs: potential buffer overflow in handling symlinks drm/amd: Add USBC connector ID drm/amdkfd: Check for potential null return of kmalloc_array() Drivers: hv: vmbus: Prevent load re-ordering when reading ring buffer scsi: target: tcmu: Fix possible page UAF scsi: ibmvscsis: Increase INITIAL_SRP_LIMIT to 1024 net: micrel: fix KS8851_MLL Kconfig ata: libata-core: Disable READ LOG DMA EXT for Samsung 840 EVOs gpu: ipu-v3: Fix dev_dbg frequency output arm64: alternatives: mark patch_alternative() as `noinstr` drm/amd/display: Fix allocate_mst_payload assert on resume scsi: mvsas: Add PCI ID of RocketRaid 2640 drivers: net: slip: fix NPD bug in sl_tx_timeout() mm, page_alloc: fix build_zonerefs_node() mm: kmemleak: take a full lowmem check in kmemleak_*_phys() gcc-plugins: latent_entropy: use /dev/urandom ALSA: hda/realtek: Add quirk for Clevo PD50PNT ALSA: pcm: Test for "silence" field in struct "pcm_format_data" ipv6: fix panic when forwarding a pkt with no in6 dev ARM: davinci: da850-evm: Avoid NULL pointer dereference smp: Fix offline cpu check in flush_smp_call_function_queue() i2c: pasemi: Wait for write xfers to finish Linux 4.19.239 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> Change-Id: Ia9e9320ce18a1d1b1f943adbafc1b80f33a9c3e0 |
||
|
|
c0ca2da695 |
mm, page_alloc: fix build_zonerefs_node()
commit e553f62f10d93551eb883eca227ac54d1a4fad84 upstream. Since commit |
||
|
|
ce7025b713 |
Merge 4.19.238 into android-4.19-stable
Changes in 4.19.238 USB: serial: pl2303: add IBM device IDs USB: serial: simple: add Nokia phone driver netdevice: add the case if dev is NULL xfrm: fix tunnel model fragmentation behavior virtio_console: break out of buf poll on remove ethernet: sun: Free the coherent when failing in probing spi: Fix invalid sgs value net:mcf8390: Use platform_get_irq() to get the interrupt spi: Fix erroneous sgs value with min_t() af_key: add __GFP_ZERO flag for compose_sadb_supported in function pfkey_register fuse: fix pipe buffer lifetime for direct_io tpm: fix reference counting for struct tpm_chip block: Add a helper to validate the block size virtio-blk: Use blk_validate_block_size() to validate block size USB: usb-storage: Fix use of bitfields for hardware data in ene_ub6250.c xhci: make xhci_handshake timeout for xhci_reset() adjustable coresight: Fix TRCCONFIGR.QE sysfs interface iio: afe: rescale: use s64 for temporary scale calculations iio: inkern: apply consumer scale on IIO_VAL_INT cases iio: inkern: apply consumer scale when no channel scale is available iio: inkern: make a best effort on offset calculation clk: uniphier: Fix fixed-rate initialization ptrace: Check PTRACE_O_SUSPEND_SECCOMP permission on PTRACE_SEIZE Documentation: add link to stable release candidate tree Documentation: update stable tree link SUNRPC: avoid race between mod_timer() and del_timer_sync() NFSD: prevent underflow in nfssvc_decode_writeargs() NFSD: prevent integer overflow on 32 bit systems f2fs: fix to unlock page correctly in error path of is_alive() pinctrl: samsung: drop pin banks references on error paths can: ems_usb: ems_usb_start_xmit(): fix double dev_kfree_skb() in error path jffs2: fix use-after-free in jffs2_clear_xattr_subsystem jffs2: fix memory leak in jffs2_do_mount_fs jffs2: fix memory leak in jffs2_scan_medium mm/pages_alloc.c: don't create ZONE_MOVABLE beyond the end of a node mm: invalidate hwpoison page cache page in fault path mempolicy: mbind_range() set_policy() after vma_merge() scsi: libsas: Fix sas_ata_qc_issue() handling of NCQ NON DATA commands qed: display VF trust config qed: validate and restrict untrusted VFs vlan promisc mode Revert "Input: clear BTN_RIGHT/MIDDLE on buttonpads" ALSA: cs4236: fix an incorrect NULL check on list iterator ALSA: hda/realtek: Fix audio regression on Mi Notebook Pro 2020 mm,hwpoison: unmap poisoned page before invalidation drbd: fix potential silent data corruption powerpc/kvm: Fix kvm_use_magic_page ACPI: properties: Consistently return -ENOENT if there are no more references drivers: hamradio: 6pack: fix UAF bug caused by mod_timer() block: don't merge across cgroup boundaries if blkcg is enabled drm/edid: check basic audio support on CEA extension block video: fbdev: sm712fb: Fix crash in smtcfb_read() video: fbdev: atari: Atari 2 bpp (STe) palette bugfix ARM: dts: at91: sama5d2: Fix PMERRLOC resource size ARM: dts: exynos: fix UART3 pins configuration in Exynos5250 ARM: dts: exynos: add missing HDMI supplies on SMDK5250 ARM: dts: exynos: add missing HDMI supplies on SMDK5420 carl9170: fix missing bit-wise or operator for tx_params thermal: int340x: Increase bitmap size lib/raid6/test: fix multiple definition linking error DEC: Limit PMAX memory probing to R3k systems media: davinci: vpif: fix unbalanced runtime PM get brcmfmac: firmware: Allocate space for default boardrev in nvram brcmfmac: pcie: Replace brcmf_pcie_copy_mem_todev with memcpy_toio PCI: pciehp: Clear cmd_busy bit in polling mode regulator: qcom_smd: fix for_each_child.cocci warnings crypto: authenc - Fix sleep in atomic context in decrypt_tail crypto: mxs-dcp - Fix scatterlist processing spi: tegra114: Add missing IRQ check in tegra_spi_probe selftests/x86: Add validity check and allow field splitting spi: pxa2xx-pci: Balance reference count for PCI DMA device hwmon: (pmbus) Add mutex to regulator ops hwmon: (sch56xx-common) Replace WDOG_ACTIVE with WDOG_HW_RUNNING block: don't delete queue kobject before its children PM: hibernate: fix __setup handler error handling PM: suspend: fix return value of __setup handler hwrng: atmel - disable trng on failure path crypto: vmx - add missing dependencies clocksource/drivers/timer-of: Check return value of of_iomap in timer_of_base_init() ACPI: APEI: fix return value of __setup handlers crypto: ccp - ccp_dmaengine_unregister release dma channels hwmon: (pmbus) Add Vin unit off handling clocksource: acpi_pm: fix return value of __setup handler sched/debug: Remove mpol_get/put and task_lock/unlock from sched_show_numa perf/core: Fix address filter parser for multiple filters perf/x86/intel/pt: Fix address filter config for 32-bit kernel media: coda: Fix missing put_device() call in coda_get_vdoa_data video: fbdev: smscufx: Fix null-ptr-deref in ufx_usb_probe() video: fbdev: fbcvt.c: fix printing in fb_cvt_print_name() ARM: dts: qcom: ipq4019: fix sleep clock soc: ti: wkup_m3_ipc: Fix IRQ check in wkup_m3_ipc_probe media: em28xx: initialize refcount before kref_get media: usb: go7007: s2250-board: fix leak in probe() ASoC: rt5663: check the return value of devm_kzalloc() in rt5663_parse_dp() ASoC: ti: davinci-i2s: Add check for clk_enable() ALSA: spi: Add check for clk_enable() arm64: dts: ns2: Fix spi-cpol and spi-cpha property arm64: dts: broadcom: Fix sata nodename printk: fix return value of printk.devkmsg __setup handler ASoC: mxs-saif: Handle errors for clk_enable ASoC: atmel_ssc_dai: Handle errors for clk_enable memory: emif: Add check for setup_interrupts memory: emif: check the pointer temp in get_device_details() ALSA: firewire-lib: fix uninitialized flag for AV/C deferred transaction media: stk1160: If start stream fails, return buffers with VB2_BUF_STATE_QUEUED ASoC: atmel: Add missing of_node_put() in at91sam9g20ek_audio_probe ASoC: wm8350: Handle error for wm8350_register_irq ASoC: fsi: Add check for clk_enable video: fbdev: omapfb: Add missing of_node_put() in dvic_probe_of ASoC: dmaengine: do not use a NULL prepare_slave_config() callback ASoC: mxs: Fix error handling in mxs_sgtl5000_probe ASoC: imx-es8328: Fix error return code in imx_es8328_probe() ASoC: msm8916-wcd-digital: Fix missing clk_disable_unprepare() in msm8916_wcd_digital_probe mmc: davinci_mmc: Handle error for clk_enable drm/bridge: Fix free wrong object in sii8620_init_rcp_input_dev ath10k: fix memory overwrite of the WoWLAN wakeup packet pattern Bluetooth: hci_serdev: call init_rwsem() before p->open() mtd: onenand: Check for error irq drm/edid: Don't clear formats if using deep color drm/amd/display: Fix a NULL pointer dereference in amdgpu_dm_connector_add_common_modes() ath9k_htc: fix uninit value bugs KVM: PPC: Fix vmx/vsx mixup in mmio emulation power: reset: gemini-poweroff: Fix IRQ check in gemini_poweroff_probe ray_cs: Check ioremap return value power: supply: ab8500: Fix memory leak in ab8500_fg_sysfs_init HID: i2c-hid: fix GET/SET_REPORT for unnumbered reports iwlwifi: Fix -EIO error code that is never returned dm crypt: fix get_key_size compiler warning if !CONFIG_KEYS scsi: pm8001: Fix command initialization in pm80XX_send_read_log() scsi: pm8001: Fix command initialization in pm8001_chip_ssp_tm_req() scsi: pm8001: Fix payload initialization in pm80xx_set_thermal_config() scsi: pm8001: Fix abort all task initialization TOMOYO: fix __setup handlers return values ext2: correct max file size computing drm/tegra: Fix reference leak in tegra_dsi_ganged_probe power: supply: bq24190_charger: Fix bq24190_vbus_is_enabled() wrong false return drm/bridge: cdns-dsi: Make sure to to create proper aliases for dt powerpc/Makefile: Don't pass -mcpu=powerpc64 when building 32-bit KVM: x86: Fix emulation in writing cr8 KVM: x86/emulator: Defer not-present segment check in __load_segment_descriptor() hv_balloon: rate-limit "Unhandled message" warning i2c: xiic: Make bus names unique power: supply: wm8350-power: Handle error for wm8350_register_irq power: supply: wm8350-power: Add missing free in free_charger_irq PCI: Reduce warnings on possible RW1C corruption powerpc/sysdev: fix incorrect use to determine if list is empty mfd: mc13xxx: Add check for mc13xxx_irq_request vxcan: enable local echo for sent CAN frames MIPS: RB532: fix return value of __setup handler mtd: rawnand: atmel: fix refcount issue in atmel_nand_controller_init USB: storage: ums-realtek: fix error code in rts51x_read_mem() af_netlink: Fix shift out of bounds in group mask calculation i2c: mux: demux-pinctrl: do not deactivate a master that is not active selftests/bpf/test_lirc_mode2.sh: Exit with proper code tcp: ensure PMTU updates are processed during fastopen mfd: asic3: Add missing iounmap() on error asic3_mfd_probe mxser: fix xmit_buf leak in activate when LSR == 0xff pwm: lpc18xx-sct: Initialize driver data and hardware before pwmchip_add() staging:iio:adc:ad7280a: Fix handing of device address bit reversing. clk: qcom: ipq8074: Use floor ops for SDCC1 clock serial: 8250_mid: Balance reference count for PCI DMA device serial: 8250: Fix race condition in RTS-after-send handling iio: adc: Add check for devm_request_threaded_irq dma-debug: fix return value of __setup handlers clk: qcom: clk-rcg2: Update the frac table for pixel clock remoteproc: qcom_wcnss: Add missing of_node_put() in wcnss_alloc_memory_region clk: actions: Terminate clk_div_table with sentinel element clk: loongson1: Terminate clk_div_table with sentinel element clk: clps711x: Terminate clk_div_table with sentinel element clk: tegra: tegra124-emc: Fix missing put_device() call in emc_ensure_emc_driver NFS: remove unneeded check in decode_devicenotify_args() pinctrl: mediatek: Fix missing of_node_put() in mtk_pctrl_init pinctrl: nomadik: Add missing of_node_put() in nmk_pinctrl_probe pinctrl/rockchip: Add missing of_node_put() in rockchip_pinctrl_probe tty: hvc: fix return value of __setup handler kgdboc: fix return value of __setup handler kgdbts: fix return value of __setup handler jfs: fix divide error in dbNextAG netfilter: nf_conntrack_tcp: preserve liberal flag in tcp options clk: qcom: gcc-msm8994: Fix gpll4 width xen: fix is_xen_pmu() net: phy: broadcom: Fix brcm_fet_config_init() qlcnic: dcb: default to returning -EOPNOTSUPP net/x25: Fix null-ptr-deref caused by x25_disconnect NFSv4/pNFS: Fix another issue with a list iterator pointing to the head lib/test: use after free in register_test_dev_kmod() selinux: use correct type for context length loop: use sysfs_emit() in the sysfs xxx show() Fix incorrect type in assignment of ipv6 port for audit irqchip/qcom-pdc: Fix broken locking irqchip/nvic: Release nvic_base upon failure bfq: fix use-after-free in bfq_dispatch_request ACPICA: Avoid walking the ACPI Namespace if it is not there lib/raid6/test/Makefile: Use $(pound) instead of \# for Make 4.3 Revert "Revert "block, bfq: honor already-setup queue merges"" ACPI/APEI: Limit printable size of BERT table data PM: core: keep irq flags in device_pm_check_callbacks() spi: tegra20: Use of_device_get_match_data() ext4: don't BUG if someone dirty pages without asking ext4 first ntfs: add sanity check on allocation size video: fbdev: nvidiafb: Use strscpy() to prevent buffer overflow video: fbdev: w100fb: Reset global state video: fbdev: cirrusfb: check pixclock to avoid divide by zero video: fbdev: omapfb: acx565akm: replace snprintf with sysfs_emit ARM: dts: qcom: fix gic_irq_domain_translate warnings for msm8960 ARM: dts: bcm2837: Add the missing L1/L2 cache information video: fbdev: omapfb: panel-dsi-cm: Use sysfs_emit() instead of snprintf() video: fbdev: omapfb: panel-tpo-td043mtea1: Use sysfs_emit() instead of snprintf() video: fbdev: udlfb: replace snprintf in show functions with sysfs_emit ASoC: soc-core: skip zero num_dai component in searching dai name media: cx88-mpeg: clear interrupt status register before streaming video ARM: tegra: tamonten: Fix I2C3 pad setting ARM: mmp: Fix failure to remove sram device video: fbdev: sm712fb: Fix crash in smtcfb_write() media: Revert "media: em28xx: add missing em28xx_close_extension" media: hdpvr: initialize dev->worker at hdpvr_register_videodev mmc: host: Return an error when ->enable_sdio_irq() ops is missing powerpc/lib/sstep: Fix 'sthcx' instruction powerpc/lib/sstep: Fix build errors with newer binutils powerpc: Fix build errors with newer binutils scsi: qla2xxx: Fix stuck session in gpdb scsi: qla2xxx: Fix warning for missing error code scsi: qla2xxx: Check for firmware dump already collected scsi: qla2xxx: Suppress a kernel complaint in qla_create_qpair() scsi: qla2xxx: Fix incorrect reporting of task management failure scsi: qla2xxx: Fix hang due to session stuck scsi: qla2xxx: Reduce false trigger to login scsi: qla2xxx: Use correct feature type field during RFF_ID processing KVM: Prevent module exit until all VMs are freed KVM: x86: fix sending PV IPI ubifs: rename_whiteout: Fix double free for whiteout_ui->data ubifs: Fix deadlock in concurrent rename whiteout and inode writeback ubifs: Add missing iput if do_tmpfile() failed in rename whiteout ubifs: setflags: Make dirtied_ino_d 8 bytes aligned ubifs: Fix read out-of-bounds in ubifs_wbuf_write_nolock() ubifs: rename_whiteout: correct old_dir size computing can: mcba_usb: mcba_usb_start_xmit(): fix double dev_kfree_skb in error path can: mcba_usb: properly check endpoint type gfs2: Make sure FITRIM minlen is rounded up to fs block size pinctrl: pinconf-generic: Print arguments for bias-pull-* ubi: Fix race condition between ctrl_cdev_ioctl and ubi_cdev_ioctl ACPI: CPPC: Avoid out of bounds access when parsing _CPC data mm/mmap: return 1 from stack_guard_gap __setup() handler mm/memcontrol: return 1 from cgroup.memory __setup() handler mm/usercopy: return 1 from hardened_usercopy __setup() handler bpf: Fix comment for helper bpf_current_task_under_cgroup() ubi: fastmap: Return error code if memory allocation fails in add_aeb() ASoC: topology: Allow TLV control to be either read or write ARM: dts: spear1340: Update serial node properties ARM: dts: spear13xx: Update SPI dma properties um: Fix uml_mconsole stop/go openvswitch: Fixed nd target mask field in the flow dump. KVM: x86: Forbid VMM to set SYNIC/STIMER MSRs when SynIC wasn't activated ubifs: Rectify space amount budget for mkdir/tmpfile operations rtc: wm8350: Handle error for wm8350_register_irq riscv module: remove (NOLOAD) ARM: 9187/1: JIVE: fix return value of __setup handler KVM: x86/svm: Clear reserved bits written to PerfEvtSeln MSRs drm: Add orientation quirk for GPD Win Max ath5k: fix OOB in ath5k_eeprom_read_pcal_info_5111 drm/amd/amdgpu/amdgpu_cs: fix refcount leak of a dma_fence obj ptp: replace snprintf with sysfs_emit powerpc: dts: t104xrdb: fix phy type for FMAN 4/5 scsi: mvsas: Replace snprintf() with sysfs_emit() scsi: bfa: Replace snprintf() with sysfs_emit() power: supply: axp20x_battery: properly report current when discharging powerpc: Set crashkernel offset to mid of RMA region PCI: aardvark: Fix support for MSI interrupts iommu/arm-smmu-v3: fix event handling soft lockup usb: ehci: add pci device support for Aspeed platforms PCI: pciehp: Add Qualcomm quirk for Command Completed erratum ipv4: Invalidate neighbour for broadcast address upon address addition dm ioctl: prevent potential spectre v1 gadget drm/amdkfd: make CRAT table missing message informational only scsi: pm8001: Fix pm8001_mpi_task_abort_resp() scsi: aha152x: Fix aha152x_setup() __setup handler return value net/smc: correct settings of RMB window update limit macvtap: advertise link netns via netlink bnxt_en: Eliminate unintended link toggle during FW reset MIPS: fix fortify panic when copying asm exception handlers scsi: libfc: Fix use after free in fc_exch_abts_resp() usb: dwc3: omap: fix "unbalanced disables for smps10_out1" on omap5evm xtensa: fix DTC warning unit_address_format Bluetooth: Fix use after free in hci_send_acl init/main.c: return 1 from handled __setup() functions minix: fix bug when opening a file with O_DIRECT w1: w1_therm: fixes w1_seq for ds28ea00 sensors NFSv4: Protect the state recovery thread against direct reclaim xen: delay xen_hvm_init_time_ops() if kdump is boot on vcpu>=32 clk: Enforce that disjoints limits are invalid SUNRPC/call_alloc: async tasks mustn't block waiting for memory NFS: swap IO handling is slightly different for O_DIRECT IO NFS: swap-out must always use STABLE writes. serial: samsung_tty: do not unlock port->lock for uart_write_wakeup() virtio_console: eliminate anonymous module_init & module_exit jfs: prevent NULL deref in diFree parisc: Fix CPU affinity for Lasi, WAX and Dino chips net: add missing SOF_TIMESTAMPING_OPT_ID support mm: fix race between MADV_FREE reclaim and blkdev direct IO read KVM: arm64: Check arm64_get_bp_hardening_data() didn't return NULL drm/amdgpu: fix off by one in amdgpu_gfx_kiq_acquire() Drivers: hv: vmbus: Fix potential crash on module unload scsi: zorro7xx: Fix a resource leak in zorro7xx_remove_one() net: stmmac: Fix unset max_speed difference between DT and non-DT platforms drm/imx: Fix memory leak in imx_pd_connector_get_modes net: openvswitch: don't send internal clone attribute to the userspace. rxrpc: fix a race in rxrpc_exit_net() qede: confirm skb is allocated before using spi: bcm-qspi: fix MSPI only access with bcm_qspi_exec_mem_op() drbd: Fix five use after free bugs in get_initial_state Revert "mmc: sdhci-xenon: fix annoying 1.8V regulator warning" mmc: renesas_sdhi: don't overwrite TAP settings when HS400 tuning is complete mmmremap.c: avoid pointless invalidate_range_start/end on mremap(old_size=0) mm/mempolicy: fix mpol_new leak in shared_policy_replace x86/pm: Save the MSR validity status at context setup x86/speculation: Restore speculation related MSRs during S3 resume btrfs: fix qgroup reserve overflow the qgroup limit arm64: patch_text: Fixup last cpu should be master ata: sata_dwc_460ex: Fix crash due to OOB write perf: qcom_l2_pmu: fix an incorrect NULL check on list iterator irqchip/gic-v3: Fix GICR_CTLR.RWP polling tools build: Filter out options and warnings not supported by clang tools build: Use $(shell ) instead of `` to get embedded libperl's ccopts dmaengine: Revert "dmaengine: shdma: Fix runtime PM imbalance on error" mm: don't skip swap entry even if zap_details specified arm64: module: remove (NOLOAD) from linker script mm/sparsemem: fix 'mem_section' will never be NULL gcc 12 warning cgroup: Use open-time credentials for process migraton perm checks cgroup: Allocate cgroup_file_ctx for kernfs_open_file->priv cgroup: Use open-time cgroup namespace for process migration perm checks selftests: cgroup: Make cg_create() use 0755 for permission instead of 0644 selftests: cgroup: Test open-time credential usage for migration checks selftests: cgroup: Test open-time cgroup namespace usage for migration checks xfrm: policy: match with both mark and mask on user interfaces drm/amdgpu: Check if fd really is an amdgpu fd. drm/amdkfd: Use drm_priv to pass VM from KFD to amdgpu Linux 4.19.238 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> Change-Id: I55a3615d2fbf9bde9ac152456701b36a6c9d20b6 |
||
|
|
956bb10961 |
mm/pages_alloc.c: don't create ZONE_MOVABLE beyond the end of a node
commit ddbc84f3f595cf1fc8234a191193b5d20ad43938 upstream.
ZONE_MOVABLE uses the remaining memory in each node. Its starting pfn
is also aligned to MAX_ORDER_NR_PAGES. It is possible for the remaining
memory in a node to be less than MAX_ORDER_NR_PAGES, meaning there is
not enough room for ZONE_MOVABLE on that node.
Unfortunately this condition is not checked for. This leads to
zone_movable_pfn[] getting set to a pfn greater than the last pfn in a
node.
calculate_node_totalpages() then sets zone->present_pages to be greater
than zone->spanned_pages which is invalid, as spanned_pages represents
the maximum number of pages in a zone assuming no holes.
Subsequently it is possible free_area_init_core() will observe a zone of
size zero with present pages. In this case it will skip setting up the
zone, including the initialisation of free_lists[].
However populated_zone() checks zone->present_pages to see if a zone has
memory available. This is used by iterators such as
walk_zones_in_node(). pagetypeinfo_showfree() uses this to walk the
free_list of each zone in each node, which are assumed to be initialised
due to the zone not being empty.
As free_area_init_core() never initialised the free_lists[] this results
in the following kernel crash when trying to read /proc/pagetypeinfo:
BUG: kernel NULL pointer dereference, address: 0000000000000000
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 0 P4D 0
Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC NOPTI
CPU: 0 PID: 456 Comm: cat Not tainted 5.16.0 #461
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.14.0-2 04/01/2014
RIP: 0010:pagetypeinfo_show+0x163/0x460
Code: 9e 82 e8 80 57 0e 00 49 8b 06 b9 01 00 00 00 4c 39 f0 75 16 e9 65 02 00 00 48 83 c1 01 48 81 f9 a0 86 01 00 0f 84 48 02 00 00 <48> 8b 00 4c 39 f0 75 e7 48 c7 c2 80 a2 e2 82 48 c7 c6 79 ef e3 82
RSP: 0018:ffffc90001c4bd10 EFLAGS: 00010003
RAX: 0000000000000000 RBX: ffff88801105f638 RCX: 0000000000000001
RDX: 0000000000000001 RSI: 000000000000068b RDI: ffff8880163dc68b
RBP: ffffc90001c4bd90 R08: 0000000000000001 R09: ffff8880163dc67e
R10: 656c6261766f6d6e R11: 6c6261766f6d6e55 R12: ffff88807ffb4a00
R13: ffff88807ffb49f8 R14: ffff88807ffb4580 R15: ffff88807ffb3000
FS: 00007f9c83eff5c0(0000) GS:ffff88807dc00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 0000000013c8e000 CR4: 0000000000350ef0
Call Trace:
seq_read_iter+0x128/0x460
proc_reg_read_iter+0x51/0x80
new_sync_read+0x113/0x1a0
vfs_read+0x136/0x1d0
ksys_read+0x70/0xf0
__x64_sys_read+0x1a/0x20
do_syscall_64+0x3b/0xc0
entry_SYSCALL_64_after_hwframe+0x44/0xae
Fix this by checking that the aligned zone_movable_pfn[] does not exceed
the end of the node, and if it does skip creating a movable zone on this
node.
Link: https://lkml.kernel.org/r/20220215025831.2113067-1-apopple@nvidia.com
Fixes:
|
||
|
|
11156bde8d |
Merge 4.19.207 into android-4.19-stable
Changes in 4.19.207 ext4: fix race writing to an inline_data file while its xattrs are changing xtensa: fix kconfig unmet dependency warning for HAVE_FUTEX_CMPXCHG gpu: ipu-v3: Fix i.MX IPU-v3 offset calculations for (semi)planar U/V formats qed: Fix the VF msix vectors flow net: macb: Add a NULL check on desc_ptp qede: Fix memset corruption perf/x86/intel/pt: Fix mask of num_address_ranges perf/x86/amd/ibs: Work around erratum #1197 cryptoloop: add a deprecation warning ARM: 8918/2: only build return_address() if needed ALSA: pcm: fix divide error in snd_pcm_lib_ioctl clk: fix build warning for orphan_list media: stkwebcam: fix memory leak in stk_camera_probe ARM: imx: add missing clk_disable_unprepare() ARM: imx: fix missing 3rd argument in macro imx_mmdc_perf_init igmp: Add ip_mc_list lock in ip_check_mc_rcu USB: serial: mos7720: improve OOM-handling in read_mos_reg() ipv4/icmp: l3mdev: Perform icmp error route lookup on source device routing table (v2) SUNRPC/nfs: Fix return value for nfs4_callback_compound() crypto: talitos - reduce max key size for SEC1 powerpc/module64: Fix comment in R_PPC64_ENTRY handling powerpc/boot: Delete unneeded .globl _zimage_start net: ll_temac: Remove left-over debug message mm/page_alloc: speed up the iteration of max_order Revert "btrfs: compression: don't try to compress if we don't have enough pages" ALSA: usb-audio: Add registration quirk for JBL Quantum 800 usb: host: xhci-rcar: Don't reload firmware after the completion usb: mtu3: use @mult for HS isoc or intr usb: mtu3: fix the wrong HS mult value x86/reboot: Limit Dell Optiplex 990 quirk to early BIOS versions PCI: Call Max Payload Size-related fixup quirks early locking/mutex: Fix HANDOFF condition regmap: fix the offset of register error log crypto: mxs-dcp - Check for DMA mapping errors sched/deadline: Fix reset_on_fork reporting of DL tasks power: supply: axp288_fuel_gauge: Report register-address on readb / writeb errors crypto: omap-sham - clear dma flags only after omap_sham_update_dma_stop() sched/deadline: Fix missing clock update in migrate_task_rq_dl() hrtimer: Avoid double reprogramming in __hrtimer_start_range_ns() udf: Check LVID earlier isofs: joliet: Fix iocharset=utf8 mount option bcache: add proper error unwinding in bcache_device_init nvme-rdma: don't update queue count when failing to set io queues power: supply: max17042_battery: fix typo in MAx17042_TOFF s390/cio: add dev_busid sysfs entry for each subchannel libata: fix ata_host_start() crypto: qat - do not ignore errors from enable_vf2pf_comms() crypto: qat - handle both source of interrupt in VF ISR crypto: qat - fix reuse of completion variable crypto: qat - fix naming for init/shutdown VF to PF notifications crypto: qat - do not export adf_iov_putmsg() fcntl: fix potential deadlock for &fasync_struct.fa_lock udf_get_extendedattr() had no boundary checks. m68k: emu: Fix invalid free in nfeth_cleanup() spi: spi-fsl-dspi: Fix issue with uninitialized dma_slave_config spi: spi-pic32: Fix issue with uninitialized dma_slave_config lib/mpi: use kcalloc in mpi_resize clocksource/drivers/sh_cmt: Fix wrong setting if don't request IRQ for clock source channel crypto: qat - use proper type for vf_mask certs: Trigger creation of RSA module signing key if it's not an RSA key spi: sprd: Fix the wrong WDG_LOAD_VAL media: TDA1997x: enable EDID support soc: rockchip: ROCKCHIP_GRF should not default to y, unconditionally media: dvb-usb: fix uninit-value in dvb_usb_adapter_dvb_init media: dvb-usb: fix uninit-value in vp702x_read_mac_addr media: go7007: remove redundant initialization Bluetooth: sco: prevent information leak in sco_conn_defer_accept() tcp: seq_file: Avoid skipping sk during tcp_seek_last_pos net: cipso: fix warnings in netlbl_cipsov4_add_std i2c: highlander: add IRQ check media: em28xx-input: fix refcount bug in em28xx_usb_disconnect media: venus: venc: Fix potential null pointer dereference on pointer fmt PCI: PM: Avoid forcing PCI_D0 for wakeup reasons inconsistently PCI: PM: Enable PME if it can be signaled from D3cold soc: qcom: smsm: Fix missed interrupts if state changes while masked Bluetooth: increase BTNAMSIZ to 21 chars to fix potential buffer overflow drm/msm/dpu: make dpu_hw_ctl_clear_all_blendstages clear necessary LMs arm64: dts: exynos: correct GIC CPU interfaces address range on Exynos7 Bluetooth: fix repeated calls to sco_sock_kill drm/msm/dsi: Fix some reference counted resource leaks usb: gadget: udc: at91: add IRQ check usb: phy: fsl-usb: add IRQ check usb: phy: twl6030: add IRQ checks Bluetooth: Move shutdown callback before flushing tx and rx queue usb: host: ohci-tmio: add IRQ check usb: phy: tahvo: add IRQ check mac80211: Fix insufficient headroom issue for AMSDU usb: gadget: mv_u3d: request_irq() after initializing UDC Bluetooth: add timeout sanity check to hci_inquiry i2c: iop3xx: fix deferred probing i2c: s3c2410: fix IRQ check mmc: dw_mmc: Fix issue with uninitialized dma_slave_config mmc: moxart: Fix issue with uninitialized dma_slave_config CIFS: Fix a potencially linear read overflow i2c: mt65xx: fix IRQ check usb: ehci-orion: Handle errors of clk_prepare_enable() in probe usb: bdc: Fix an error handling path in 'bdc_probe()' when no suitable DMA config is available tty: serial: fsl_lpuart: fix the wrong mapbase value ath6kl: wmi: fix an error code in ath6kl_wmi_sync_point() bcma: Fix memory leak for internally-handled cores ipv4: make exception cache less predictible net: sched: Fix qdisc_rate_table refcount leak when get tcf_block failed net: qualcomm: fix QCA7000 checksum handling ipv4: fix endianness issue in inet_rtm_getroute_build_skb() netns: protect netns ID lookups with RCU fscrypt: add fscrypt_symlink_getattr() for computing st_size ext4: report correct st_size for encrypted symlinks f2fs: report correct st_size for encrypted symlinks ubifs: report correct st_size for encrypted symlinks tty: Fix data race between tiocsti() and flush_to_ldisc() x86/resctrl: Fix a maybe-uninitialized build warning treated as error KVM: x86: Update vCPU's hv_clock before back to guest when tsc_offset is adjusted IMA: remove -Wmissing-prototypes warning IMA: remove the dependency on CRYPTO_MD5 fbmem: don't allow too huge resolutions backlight: pwm_bl: Improve bootloader/kernel device handover clk: kirkwood: Fix a clocking boot regression rtc: tps65910: Correct driver module alias btrfs: reset replace target device to allocation state on close blk-zoned: allow zone management send operations without CAP_SYS_ADMIN blk-zoned: allow BLKREPORTZONE without CAP_SYS_ADMIN PCI/MSI: Skip masking MSI-X on Xen PV powerpc/perf/hv-gpci: Fix counter value parsing xen: fix setting of max_pfn in shared_info include/linux/list.h: add a macro to test if entry is pointing to the head 9p/xen: Fix end of loop tests for list_for_each_entry bpf/verifier: per-register parent pointers bpf: correct slot_type marking logic to allow more stack slot sharing bpf: Support variable offset stack access from helpers bpf: Reject indirect var_off stack access in raw mode bpf: Reject indirect var_off stack access in unpriv mode bpf: Sanity check max value for var_off stack access selftests/bpf: Test variable offset stack access bpf: track spill/fill of constants selftests/bpf: fix tests due to const spill/fill bpf: Introduce BPF nospec instruction for mitigating Spectre v4 bpf: Fix leakage due to insufficient speculative store bypass mitigation bpf: verifier: Allocate idmap scratch in verifier env bpf: Fix pointer arithmetic mask tightening under state pruning tools/thermal/tmon: Add cross compiling support soc: aspeed: lpc-ctrl: Fix boundary check for mmap arm64: head: avoid over-mapping in map_memory crypto: public_key: fix overflow during implicit conversion block: bfq: fix bfq_set_next_ioprio_data() power: supply: max17042: handle fails of reading status register dm crypt: Avoid percpu_counter spinlock contention in crypt_page_alloc() VMCI: fix NULL pointer dereference when unmapping queue pair media: uvc: don't do DMA on stack media: rc-loopback: return number of emitters rather than error libata: add ATA_HORKAGE_NO_NCQ_TRIM for Samsung 860 and 870 SSDs ARM: 9105/1: atags_to_fdt: don't warn about stack size PCI: Restrict ASMedia ASM1062 SATA Max Payload Size Supported PCI: Return ~0 data on pciconfig_read() CAP_SYS_ADMIN failure PCI: xilinx-nwl: Enable the clock through CCF PCI: aardvark: Increase polling delay to 1.5s while waiting for PIO response PCI: aardvark: Fix masking and unmasking legacy INTx interrupts HID: input: do not report stylus battery state as "full" RDMA/iwcm: Release resources if iw_cm module initialization fails docs: Fix infiniband uverbs minor number pinctrl: samsung: Fix pinctrl bank pin count vfio: Use config not menuconfig for VFIO_NOIOMMU powerpc/stacktrace: Include linux/delay.h openrisc: don't printk() unconditionally pinctrl: single: Fix error return code in pcs_parse_bits_in_pinctrl_entry() scsi: qedi: Fix error codes in qedi_alloc_global_queues() platform/x86: dell-smbios-wmi: Add missing kfree in error-exit from run_smbios_call fscache: Fix cookie key hashing f2fs: fix to account missing .skipped_gc_rwsem f2fs: fix to unmap pages from userspace process in punch_hole() MIPS: Malta: fix alignment of the devicetree buffer userfaultfd: prevent concurrent API initialization media: dib8000: rewrite the init prbs logic crypto: mxs-dcp - Use sg_mapping_iter to copy data PCI: Use pci_update_current_state() in pci_enable_device_flags() tipc: keep the skb in rcv queue until the whole data is read iio: dac: ad5624r: Fix incorrect handling of an optional regulator. ARM: dts: qcom: apq8064: correct clock names video: fbdev: kyro: fix a DoS bug by restricting user input netlink: Deal with ESRCH error in nlmsg_notify() Smack: Fix wrong semantics in smk_access_entry() usb: host: fotg210: fix the endpoint's transactional opportunities calculation usb: host: fotg210: fix the actual_length of an iso packet usb: gadget: u_ether: fix a potential null pointer dereference usb: gadget: composite: Allow bMaxPower=0 if self-powered staging: board: Fix uninitialized spinlock when attaching genpd tty: serial: jsm: hold port lock when reporting modem line changes drm/amd/amdgpu: Update debugfs link_settings output link_rate field in hex bpf/tests: Fix copy-and-paste error in double word test bpf/tests: Do not PASS tests without actually testing the result video: fbdev: asiliantfb: Error out if 'pixclock' equals zero video: fbdev: kyro: Error out if 'pixclock' equals zero video: fbdev: riva: Error out if 'pixclock' equals zero ipv4: ip_output.c: Fix out-of-bounds warning in ip_copy_addrs() flow_dissector: Fix out-of-bounds warnings s390/jump_label: print real address in a case of a jump label bug serial: 8250: Define RX trigger levels for OxSemi 950 devices xtensa: ISS: don't panic in rs_init hvsi: don't panic on tty_register_driver failure serial: 8250_pci: make setup_port() parameters explicitly unsigned staging: ks7010: Fix the initialization of the 'sleep_status' structure samples: bpf: Fix tracex7 error raised on the missing argument ata: sata_dwc_460ex: No need to call phy_exit() befre phy_init() Bluetooth: skip invalid hci_sync_conn_complete_evt bonding: 3ad: fix the concurrency between __bond_release_one() and bond_3ad_state_machine_handler() ASoC: Intel: bytcr_rt5640: Move "Platform Clock" routes to the maps for the matching in-/output media: imx258: Rectify mismatch of VTS value media: imx258: Limit the max analogue gain to 480 media: v4l2-dv-timings.c: fix wrong condition in two for-loops media: TDA1997x: fix tda1997x_query_dv_timings() return value media: tegra-cec: Handle errors of clk_prepare_enable() ARM: dts: imx53-ppd: Fix ACHC entry arm64: dts: qcom: sdm660: use reg value for memory node net: ethernet: stmmac: Do not use unreachable() in ipq806x_gmac_probe() Bluetooth: schedule SCO timeouts with delayed_work Bluetooth: avoid circular locks in sco_sock_connect gpu: drm: amd: amdgpu: amdgpu_i2c: fix possible uninitialized-variable access in amdgpu_i2c_router_select_ddc_port() ARM: tegra: tamonten: Fix UART pad setting Bluetooth: Fix handling of LE Enhanced Connection Complete serial: sh-sci: fix break handling for sysrq tcp: enable data-less, empty-cookie SYN with TFO_SERVER_COOKIE_NOT_REQD rpc: fix gss_svc_init cleanup on failure staging: rts5208: Fix get_ms_information() heap buffer size gfs2: Don't call dlm after protocol is unmounted of: Don't allow __of_attached_node_sysfs() without CONFIG_SYSFS mmc: sdhci-of-arasan: Check return value of non-void funtions mmc: rtsx_pci: Fix long reads when clock is prescaled selftests/bpf: Enlarge select() timeout for test_maps mmc: core: Return correct emmc response in case of ioctl error cifs: fix wrong release in sess_alloc_buffer() failed path Revert "USB: xhci: fix U1/U2 handling for hardware with XHCI_INTEL_HOST quirk set" usb: musb: musb_dsps: request_irq() after initializing musb usbip: give back URBs for unsent unlink requests during cleanup usbip:vhci_hcd USB port can get stuck in the disabled state ASoC: rockchip: i2s: Fix regmap_ops hang ASoC: rockchip: i2s: Fixup config for DAIFMT_DSP_A/B parport: remove non-zero check on count ath9k: fix OOB read ar9300_eeprom_restore_internal ath9k: fix sleeping in atomic context net: fix NULL pointer reference in cipso_v4_doi_free net: w5100: check return value after calling platform_get_resource() parisc: fix crash with signals and alloca ovl: fix BUG_ON() in may_delete() when called from ovl_cleanup() scsi: BusLogic: Fix missing pr_cont() use scsi: qla2xxx: Sync queue idx with queue_pair_map idx cpufreq: powernv: Fix init_chip_info initialization in numa=off mm/hugetlb: initialize hugetlb_usage in mm_init memcg: enable accounting for pids in nested pid namespaces platform/chrome: cros_ec_proto: Send command again when timeout occurs drm/amdgpu: Fix BUG_ON assert dm thin metadata: Fix use-after-free in dm_bm_set_read_only xen: reset legacy rtc flag for PV domU bnx2x: Fix enabling network interfaces without VFs arm64/sve: Use correct size when reinitialising SVE state PM: base: power: don't try to use non-existing RTC for storing data PCI: Add AMD GPU multi-function power dependencies x86/mm: Fix kern_addr_valid() to cope with existing but not present entries tipc: fix an use-after-free issue in tipc_recvmsg net-caif: avoid user-triggerable WARN_ON(1) ptp: dp83640: don't define PAGE0 dccp: don't duplicate ccid when cloning dccp sock net/l2tp: Fix reference count leak in l2tp_udp_recv_core r6040: Restore MDIO clock frequency after MAC reset tipc: increase timeout in tipc_sk_enqueue() perf machine: Initialize srcline string member in add_location struct net/mlx5: Fix potential sleeping in atomic context events: Reuse value read using READ_ONCE instead of re-reading it net/af_unix: fix a data-race in unix_dgram_poll net: dsa: destroy the phylink instance on any error in dsa_slave_phy_setup tcp: fix tp->undo_retrans accounting in tcp_sacktag_one() qed: Handle management FW error ibmvnic: check failover_pending in login response net: hns3: pad the short tunnel frame before sending to hardware mm/memory_hotplug: use "unsigned long" for PFN in zone_for_pfn_range() KVM: s390: index kvm->arch.idle_mask by vcpu_idx dt-bindings: mtd: gpmc: Fix the ECC bytes vs. OOB bytes equation mfd: Don't use irq_create_mapping() to resolve a mapping PCI: Add ACS quirks for Cavium multi-function devices net: usb: cdc_mbim: avoid altsetting toggling for Telit LN920 block, bfq: honor already-setup queue merges ethtool: Fix an error code in cxgb2.c NTB: perf: Fix an error code in perf_setup_inbuf() mfd: axp20x: Update AXP288 volatile ranges PCI: Fix pci_dev_str_match_path() alloc while atomic bug KVM: arm64: Handle PSCI resets before userspace touches vCPU state PCI: Sync __pci_register_driver() stub for CONFIG_PCI=n mtd: rawnand: cafe: Fix a resource leak in the error handling path of 'cafe_nand_probe()' ARC: export clear_user_page() for modules net: dsa: b53: Fix calculating number of switch ports netfilter: socket: icmp6: fix use-after-scope fq_codel: reject silly quantum parameters qlcnic: Remove redundant unlock in qlcnic_pinit_from_rom ip_gre: validate csum_start only on pull net: renesas: sh_eth: Fix freeing wrong tx descriptor s390/bpf: Fix 64-bit subtraction of the -0x80000000 constant Linux 4.19.207 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> Change-Id: I18108cb47ba9e95838ebe55aaabe34de345ee846 |
||
|
|
dd8b408964 |
mm/page_alloc: speed up the iteration of max_order
commit 7ad69832f37e3cea8557db6df7c793905f1135e8 upstream.
When we free a page whose order is very close to MAX_ORDER and greater
than pageblock_order, it wastes some CPU cycles to increase max_order to
MAX_ORDER one by one and check the pageblock migratetype of that page
repeatedly especially when MAX_ORDER is much larger than pageblock_order.
We also should not be checking migratetype of buddy when "order ==
MAX_ORDER - 1" as the buddy pfn may be invalid, so adjust the condition.
With the new check, we don't need the max_order check anymore, so we
replace it.
Also adjust max_order initialization so that it's lower by one than
previously, which makes the code hopefully more clear.
Link: https://lkml.kernel.org/r/20201204155109.55451-1-songmuchun@bytedance.com
Fixes:
|
||
|
|
f8d9d560b9 |
Merge 4.19.160 into android-4.19-stable
Changes in 4.19.160
ah6: fix error return code in ah6_input()
atm: nicstar: Unmap DMA on send error
bnxt_en: read EEPROM A2h address using page 0
devlink: Add missing genlmsg_cancel() in devlink_nl_sb_port_pool_fill()
inet_diag: Fix error path to cancel the meseage in inet_req_diag_fill()
lan743x: fix issue causing intermittent kernel log warnings
lan743x: prevent entire kernel HANG on open, for some platforms
mlxsw: core: Use variable timeout for EMAD retries
net: b44: fix error return code in b44_init_one()
net: bridge: add missing counters to ndo_get_stats64 callback
net: dsa: mv88e6xxx: Avoid VTU corruption on 6097
net: Have netpoll bring-up DSA management interface
netlabel: fix our progress tracking in netlbl_unlabel_staticlist()
netlabel: fix an uninitialized warning in netlbl_unlabel_staticlist()
net/mlx4_core: Fix init_hca fields offset
net: qualcomm: rmnet: Fix incorrect receive packet handling during cleanup
net: x25: Increase refcnt of "struct x25_neigh" in x25_rx_call_request
page_frag: Recover from memory pressure
qed: fix error return code in qed_iwarp_ll2_start()
qlcnic: fix error return code in qlcnic_83xx_restart_hw()
sctp: change to hold/put transport for proto_unreach_timer
tcp: only postpone PROBE_RTT if RTT is < current min_rtt estimate
net/mlx5: Disable QoS when min_rates on all VFs are zero
net: usb: qmi_wwan: Set DTR quirk for MR400
net/ncsi: Fix netlink registration
net: ftgmac100: Fix crash when removing driver
pinctrl: rockchip: enable gpio pclk for rockchip_gpio_to_irq
scsi: ufs: Fix unbalanced scsi_block_reqs_cnt caused by ufshcd_hold()
selftests: kvm: Fix the segment descriptor layout to match the actual layout
ACPI: button: Add DMI quirk for Medion Akoya E2228T
arm64: psci: Avoid printing in cpu_psci_cpu_die()
vfs: remove lockdep bogosity in __sb_start_write
arm64: dts: allwinner: a64: Pine64 Plus: Fix ethernet node
arm64: dts: allwinner: h5: OrangePi PC2: Fix ethernet node
ARM: dts: sun8i: r40: bananapi-m2-ultra: Fix ethernet node
Revert "arm: sun8i: orangepi-pc-plus: Set EMAC activity LEDs to active high"
ARM: dts: sun8i: h3: orangepi-plus2e: Enable RGMII RX/TX delay on Ethernet PHY
ARM: dts: sun8i: a83t: Enable both RGMII RX/TX delay on Ethernet PHY
arm64: dts: allwinner: a64: bananapi-m64: Enable RGMII RX/TX delay on PHY
Input: adxl34x - clean up a data type in adxl34x_probe()
MIPS: export has_transparent_hugepage() for modules
arm64: dts: allwinner: h5: OrangePi Prime: Fix ethernet node
arm: dts: imx6qdl-udoo: fix rgmii phy-mode for ksz9031 phy
ARM: dts: imx50-evk: Fix the chip select 1 IOMUX
Input: resistive-adc-touch - fix kconfig dependency on IIO_BUFFER
perf lock: Don't free "lock_seq_stat" if read_count isn't zero
ip_tunnels: Set tunnel option flag when tunnel metadata is present
can: af_can: prevent potential access of uninitialized member in can_rcv()
can: af_can: prevent potential access of uninitialized member in canfd_rcv()
can: dev: can_restart(): post buffer from the right context
can: ti_hecc: Fix memleak in ti_hecc_probe
can: mcba_usb: mcba_usb_start_xmit(): first fill skb, then pass to can_put_echo_skb()
can: peak_usb: fix potential integer overflow on shift of a int
can: m_can: m_can_handle_state_change(): fix state change
ASoC: qcom: lpass-platform: Fix memory leak
MIPS: Alchemy: Fix memleak in alchemy_clk_setup_cpu
drm/sun4i: dw-hdmi: fix error return code in sun8i_dw_hdmi_bind()
can: kvaser_usb: kvaser_usb_hydra: Fix KCAN bittiming limits
xfs: fix the minrecs logic when dealing with inode root child blocks
xfs: strengthen rmap record flags checking
regulator: ti-abb: Fix array out of bound read access on the first transition
fail_function: Remove a redundant mutex unlock
xfs: revert "xfs: fix rmap key and record comparison functions"
efi/x86: Free efi_pgd with free_pages()
libfs: fix error cast of negative value in simple_attr_write()
speakup: Do not let the line discipline be used several times
ALSA: firewire: Clean up a locking issue in copy_resp_to_buf()
ALSA: usb-audio: Add delay quirk for all Logitech USB devices
ALSA: ctl: fix error path at adding user-defined element set
ALSA: mixart: Fix mutex deadlock
ALSA: hda/realtek: Add some Clove SSID in the ALC293(ALC1220)
tty: serial: imx: keep console clocks always on
efivarfs: fix memory leak in efivarfs_create()
staging: rtl8723bs: Add 024c:0627 to the list of SDIO device-ids
ext4: fix bogus warning in ext4_update_dx_flag()
iio: accel: kxcjk1013: Replace is_smo8500_device with an acpi_type enum
iio: accel: kxcjk1013: Add support for KIOX010A ACPI DSM for setting tablet-mode
regulator: pfuze100: limit pfuze-support-disable-sw to pfuze{100,200}
regulator: fix memory leak with repeated set_machine_constraints()
regulator: avoid resolve_supply() infinite recursion
regulator: workaround self-referent regulators
xtensa: disable preemption around cache alias management calls
mac80211: minstrel: remove deferred sampling code
mac80211: minstrel: fix tx status processing corner case
mac80211: free sta in sta_info_insert_finish() on errors
s390/cpum_sf.c: fix file permission for cpum_sfb_size
s390/dasd: fix null pointer dereference for ERP requests
ptrace: Set PF_SUPERPRIV when checking capability
seccomp: Set PF_SUPERPRIV when checking capability
x86/microcode/intel: Check patch signature before saving microcode for early loading
mm/userfaultfd: do not access vma->vm_mm after calling handle_userfault()
Linux 4.19.160
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I3a7304be6687f4ffe96f0e765da0c0ec7dcb971d
|
||
|
|
082b21d034 |
page_frag: Recover from memory pressure
[ Upstream commit d8c19014bba8f565d8a2f1f46b4e38d1d97bf1a7 ]
The ethernet driver may allocate skb (and skb->data) via napi_alloc_skb().
This ends up to page_frag_alloc() to allocate skb->data from
page_frag_cache->va.
During the memory pressure, page_frag_cache->va may be allocated as
pfmemalloc page. As a result, the skb->pfmemalloc is always true as
skb->data is from page_frag_cache->va. The skb will be dropped if the
sock (receiver) does not have SOCK_MEMALLOC. This is expected behaviour
under memory pressure.
However, once kernel is not under memory pressure any longer (suppose large
amount of memory pages are just reclaimed), the page_frag_alloc() may still
re-use the prior pfmemalloc page_frag_cache->va to allocate skb->data. As a
result, the skb->pfmemalloc is always true unless page_frag_cache->va is
re-allocated, even if the kernel is not under memory pressure any longer.
Here is how kernel runs into issue.
1. The kernel is under memory pressure and allocation of
PAGE_FRAG_CACHE_MAX_ORDER in __page_frag_cache_refill() will fail. Instead,
the pfmemalloc page is allocated for page_frag_cache->va.
2: All skb->data from page_frag_cache->va (pfmemalloc) will have
skb->pfmemalloc=true. The skb will always be dropped by sock without
SOCK_MEMALLOC. This is an expected behaviour.
3. Suppose a large amount of pages are reclaimed and kernel is not under
memory pressure any longer. We expect skb->pfmemalloc drop will not happen.
4. Unfortunately, page_frag_alloc() does not proactively re-allocate
page_frag_alloc->va and will always re-use the prior pfmemalloc page. The
skb->pfmemalloc is always true even kernel is not under memory pressure any
longer.
Fix this by freeing and re-allocating the page instead of recycling it.
References: https://lore.kernel.org/lkml/20201103193239.1807-1-dongli.zhang@oracle.com/
References: https://lore.kernel.org/linux-mm/20201105042140.5253-1-willy@infradead.org/
Suggested-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Aruna Ramakrishna <aruna.ramakrishna@oracle.com>
Cc: Bert Barbe <bert.barbe@oracle.com>
Cc: Rama Nichanamatlu <rama.nichanamatlu@oracle.com>
Cc: Venkat Venkatsubra <venkat.x.venkatsubra@oracle.com>
Cc: Manjunath Patil <manjunath.b.patil@oracle.com>
Cc: Joe Jin <joe.jin@oracle.com>
Cc: SRINIVAS <srinivas.eeda@oracle.com>
Fixes:
|
||
|
|
1091b1b93c |
UPSTREAM: mm/page_alloc: silence a KASAN false positive
kernel_init_free_pages() will use memset() on s390 to clear all pages from
kmalloc_order() which will override KASAN redzones because a redzone was
setup from the end of the allocation size to the end of the last page.
Silence it by not reporting it there. An example of the report is,
BUG: KASAN: slab-out-of-bounds in __free_pages_ok
Write of size 4096 at addr 000000014beaa000
Call Trace:
show_stack+0x152/0x210
dump_stack+0x1f8/0x248
print_address_description.isra.13+0x5e/0x4d0
kasan_report+0x130/0x178
check_memory_region+0x190/0x218
memset+0x34/0x60
__free_pages_ok+0x894/0x12f0
kfree+0x4f2/0x5e0
unpack_to_rootfs+0x60e/0x650
populate_rootfs+0x56/0x358
do_one_initcall+0x1f4/0xa20
kernel_init_freeable+0x758/0x7e8
kernel_init+0x1c/0x170
ret_from_fork+0x24/0x28
Memory state around the buggy address:
000000014bea9f00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
000000014bea9f80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>000000014beaa000: 03 fe fe fe fe fe fe fe fe fe fe fe fe fe fe fe
^
000000014beaa080: fe fe fe fe fe fe fe fe fe fe fe fe fe fe fe fe
000000014beaa100: fe fe fe fe fe fe fe fe fe fe fe fe fe fe
Fixes: 6471384af2a6 ("mm: security: introduce init_on_alloc=1 and init_on_free=1 boot options")
Signed-off-by: Qian Cai <cai@lca.pw>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Tested-by: Vasily Gorbik <gor@linux.ibm.com>
Acked-by: Vasily Gorbik <gor@linux.ibm.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Link: http://lkml.kernel.org/r/20200610052154.5180-1-cai@lca.pw
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit 9e15afa5a87a3bf969a3b33c3dadfb8b46df42c0)
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I9375f117c1aa08c0d5d0e881bb5a917458f97c99
|
||
|
|
9f80205d66 |
Merge 4.19.151 into android-4.19-stable
Changes in 4.19.151 fbdev, newport_con: Move FONT_EXTRA_WORDS macros into linux/font.h Fonts: Support FONT_EXTRA_WORDS macros for built-in fonts fbcon: Fix global-out-of-bounds read in fbcon_get_font() Revert "ravb: Fixed to be able to unload modules" net: wireless: nl80211: fix out-of-bounds access in nl80211_del_key() drm/nouveau/mem: guard against NULL pointer access in mem_del usermodehelper: reset umask to default before executing user process platform/x86: intel-vbtn: Fix SW_TABLET_MODE always reporting 1 on the HP Pavilion 11 x360 platform/x86: thinkpad_acpi: initialize tp_nvram_state variable platform/x86: intel-vbtn: Switch to an allow-list for SW_TABLET_MODE reporting platform/x86: thinkpad_acpi: re-initialize ACPI buffer size when reuse driver core: Fix probe_count imbalance in really_probe() perf top: Fix stdio interface input handling with glibc 2.28+ i2c: i801: Exclude device from suspend direct complete optimization mtd: rawnand: sunxi: Fix the probe error path arm64: dts: stratix10: add status to qspi dts node nvme-core: put ctrl ref when module ref get fail macsec: avoid use-after-free in macsec_handle_frame() mm/khugepaged: fix filemap page_to_pgoff(page) != offset xfrmi: drop ignore_df check before updating pmtu cifs: Fix incomplete memory allocation on setxattr path i2c: meson: fix clock setting overwrite i2c: meson: fixup rate calculation with filter delay i2c: owl: Clear NACK and BUS error bits sctp: fix sctp_auth_init_hmacs() error path team: set dev->needed_headroom in team_setup_by_port() net: team: fix memory leak in __team_options_register openvswitch: handle DNAT tuple collision drm/amdgpu: prevent double kfree ttm->sg xfrm: clone XFRMA_SET_MARK in xfrm_do_migrate xfrm: clone XFRMA_REPLAY_ESN_VAL in xfrm_do_migrate xfrm: clone XFRMA_SEC_CTX in xfrm_do_migrate xfrm: clone whole liftime_cur structure in xfrm_do_migrate net: stmmac: removed enabling eee in EEE set callback platform/x86: fix kconfig dependency warning for FUJITSU_LAPTOP xfrm: Use correct address family in xfrm_state_find bonding: set dev->needed_headroom in bond_setup_by_slave() mdio: fix mdio-thunder.c dependency & build error net: usb: ax88179_178a: fix missing stop entry in driver_info net/mlx5e: Fix VLAN cleanup flow net/mlx5e: Fix VLAN create flow rxrpc: Fix rxkad token xdr encoding rxrpc: Downgrade the BUG() for unsupported token type in rxrpc_read() rxrpc: Fix some missing _bh annotations on locking conn->state_lock rxrpc: Fix server keyring leak perf: Fix task_function_call() error handling mmc: core: don't set limits.discard_granularity as 0 mm: khugepaged: recalculate min_free_kbytes after memory hotplug as expected by khugepaged net: usb: rtl8150: set random MAC address when set_ethernet_addr() fails Linux 4.19.151 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> Change-Id: I9ee2b0fc4fc39f27be6ae680529e1046f249a3e6 |
||
|
|
94c5167581 |
mm: khugepaged: recalculate min_free_kbytes after memory hotplug as expected by khugepaged
commit 4aab2be0983031a05cb4a19696c9da5749523426 upstream.
When memory is hotplug added or removed the min_free_kbytes should be
recalculated based on what is expected by khugepaged. Currently after
hotplug, min_free_kbytes will be set to a lower default and higher
default set when THP enabled is lost.
This change restores min_free_kbytes as expected for THP consumers.
[vijayb@linux.microsoft.com: v5]
Link: https://lkml.kernel.org/r/1601398153-5517-1-git-send-email-vijayb@linux.microsoft.com
Fixes:
|
||
|
|
2dce03a5c2 |
Merge 4.19.150 into android-4.19-stable
Changes in 4.19.150 mmc: sdhci: Workaround broken command queuing on Intel GLK based IRBIS models USB: gadget: f_ncm: Fix NDP16 datagram validation gpio: mockup: fix resource leak in error path gpio: tc35894: fix up tc35894 interrupt configuration clk: socfpga: stratix10: fix the divider for the emac_ptp_free_clk vsock/virtio: use RCU to avoid use-after-free on the_virtio_vsock vsock/virtio: stop workers during the .remove() vsock/virtio: add transport parameter to the virtio_transport_reset_no_sock() net: virtio_vsock: Enhance connection semantics Input: i8042 - add nopnp quirk for Acer Aspire 5 A515 ftrace: Move RCU is watching check after recursion check drm/amdgpu: restore proper ref count in amdgpu_display_crtc_set_config drivers/net/wan/hdlc_fr: Add needed_headroom for PVC devices drm/sun4i: mixer: Extend regmap max_register net: dec: de2104x: Increase receive ring size for Tulip rndis_host: increase sleep time in the query-response loop nvme-core: get/put ctrl and transport module in nvme_dev_open/release() drivers/net/wan/lapbether: Make skb->protocol consistent with the header drivers/net/wan/hdlc: Set skb->protocol before transmitting mac80211: do not allow bigger VHT MPDUs than the hardware supports spi: fsl-espi: Only process interrupts for expected events nvme-fc: fail new connections to a deleted host or remote port gpio: sprd: Clear interrupt when setting the type as edge pinctrl: mvebu: Fix i2c sda definition for 98DX3236 nfs: Fix security label length not being reset clk: samsung: exynos4: mark 'chipid' clock as CLK_IGNORE_UNUSED iommu/exynos: add missing put_device() call in exynos_iommu_of_xlate() i2c: cpm: Fix i2c_ram structure Input: trackpoint - enable Synaptics trackpoints random32: Restore __latent_entropy attribute on net_rand_state mm: replace memmap_context by meminit_context mm: don't rely on system state to detect hot-plug operations net/packet: fix overflow in tpacket_rcv epoll: do not insert into poll queues until all sanity checks are done epoll: replace ->visited/visited_list with generation count epoll: EPOLL_CTL_ADD: close the race in decision to take fast path ep_create_wakeup_source(): dentry name can change under you... netfilter: ctnetlink: add a range check for l3/l4 protonum Linux 4.19.150 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> Change-Id: Ib6f1b6fce01bec80efd4a905d03903ff20ca89be |
||
|
|
25eaea1b33 |
mm: replace memmap_context by meminit_context
commit c1d0da83358a2316d9be7f229f26126dbaa07468 upstream. Patch series "mm: fix memory to node bad links in sysfs", v3. Sometimes, firmware may expose interleaved memory layout like this: Early memory node ranges node 1: [mem 0x0000000000000000-0x000000011fffffff] node 2: [mem 0x0000000120000000-0x000000014fffffff] node 1: [mem 0x0000000150000000-0x00000001ffffffff] node 0: [mem 0x0000000200000000-0x000000048fffffff] node 2: [mem 0x0000000490000000-0x00000007ffffffff] In that case, we can see memory blocks assigned to multiple nodes in sysfs: $ ls -l /sys/devices/system/memory/memory21 total 0 lrwxrwxrwx 1 root root 0 Aug 24 05:27 node1 -> ../../node/node1 lrwxrwxrwx 1 root root 0 Aug 24 05:27 node2 -> ../../node/node2 -rw-r--r-- 1 root root 65536 Aug 24 05:27 online -r--r--r-- 1 root root 65536 Aug 24 05:27 phys_device -r--r--r-- 1 root root 65536 Aug 24 05:27 phys_index drwxr-xr-x 2 root root 0 Aug 24 05:27 power -r--r--r-- 1 root root 65536 Aug 24 05:27 removable -rw-r--r-- 1 root root 65536 Aug 24 05:27 state lrwxrwxrwx 1 root root 0 Aug 24 05:25 subsystem -> ../../../../bus/memory -rw-r--r-- 1 root root 65536 Aug 24 05:25 uevent -r--r--r-- 1 root root 65536 Aug 24 05:27 valid_zones The same applies in the node's directory with a memory21 link in both the node1 and node2's directory. This is wrong but doesn't prevent the system to run. However when later, one of these memory blocks is hot-unplugged and then hot-plugged, the system is detecting an inconsistency in the sysfs layout and a BUG_ON() is raised: kernel BUG at /Users/laurent/src/linux-ppc/mm/memory_hotplug.c:1084! LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries Modules linked in: rpadlpar_io rpaphp pseries_rng rng_core vmx_crypto gf128mul binfmt_misc ip_tables x_tables xfs libcrc32c crc32c_vpmsum autofs4 CPU: 8 PID: 10256 Comm: drmgr Not tainted 5.9.0-rc1+ #25 Call Trace: add_memory_resource+0x23c/0x340 (unreliable) __add_memory+0x5c/0xf0 dlpar_add_lmb+0x1b4/0x500 dlpar_memory+0x1f8/0xb80 handle_dlpar_errorlog+0xc0/0x190 dlpar_store+0x198/0x4a0 kobj_attr_store+0x30/0x50 sysfs_kf_write+0x64/0x90 kernfs_fop_write+0x1b0/0x290 vfs_write+0xe8/0x290 ksys_write+0xdc/0x130 system_call_exception+0x160/0x270 system_call_common+0xf0/0x27c This has been seen on PowerPC LPAR. The root cause of this issue is that when node's memory is registered, the range used can overlap another node's range, thus the memory block is registered to multiple nodes in sysfs. There are two issues here: (a) The sysfs memory and node's layouts are broken due to these multiple links (b) The link errors in link_mem_sections() should not lead to a system panic. To address (a) register_mem_sect_under_node should not rely on the system state to detect whether the link operation is triggered by a hot plug operation or not. This is addressed by the patches 1 and 2 of this series. Issue (b) will be addressed separately. This patch (of 2): The memmap_context enum is used to detect whether a memory operation is due to a hot-add operation or happening at boot time. Make it general to the hotplug operation and rename it as meminit_context. There is no functional change introduced by this patch Suggested-by: David Hildenbrand <david@redhat.com> Signed-off-by: Laurent Dufour <ldufour@linux.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Oscar Salvador <osalvador@suse.de> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: "Rafael J . Wysocki" <rafael@kernel.org> Cc: Nathan Lynch <nathanl@linux.ibm.com> Cc: Scott Cheloha <cheloha@linux.ibm.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: <stable@vger.kernel.org> Link: https://lkml.kernel.org/r/20200915094143.79181-1-ldufour@linux.ibm.com Link: https://lkml.kernel.org/r/20200915132624.9723-1-ldufour@linux.ibm.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
||
|
|
599bf02de4 |
Merge 4.19.142 into android-4.19-stable
Changes in 4.19.142
drm/vgem: Replace opencoded version of drm_gem_dumb_map_offset()
perf probe: Fix memory leakage when the probe point is not found
khugepaged: khugepaged_test_exit() check mmget_still_valid()
khugepaged: adjust VM_BUG_ON_MM() in __khugepaged_enter()
btrfs: export helpers for subvolume name/id resolution
btrfs: don't show full path of bind mounts in subvol=
btrfs: Move free_pages_out label in inline extent handling branch in compress_file_range
btrfs: inode: fix NULL pointer dereference if inode doesn't need compression
btrfs: sysfs: use NOFS for device creation
romfs: fix uninitialized memory leak in romfs_dev_read()
kernel/relay.c: fix memleak on destroy relay channel
mm: include CMA pages in lowmem_reserve at boot
mm, page_alloc: fix core hung in free_pcppages_bulk()
ext4: fix checking of directory entry validity for inline directories
jbd2: add the missing unlock_buffer() in the error path of jbd2_write_superblock()
scsi: zfcp: Fix use-after-free in request timeout handlers
drm/amd/display: fix pow() crashing when given base 0
kthread: Do not preempt current task if it is going to call schedule()
spi: Prevent adding devices below an unregistering controller
scsi: ufs: Add DELAY_BEFORE_LPM quirk for Micron devices
scsi: target: tcmu: Fix crash in tcmu_flush_dcache_range on ARM
media: budget-core: Improve exception handling in budget_register()
rtc: goldfish: Enable interrupt in set_alarm() when necessary
media: vpss: clean up resources in init
Input: psmouse - add a newline when printing 'proto' by sysfs
m68knommu: fix overwriting of bits in ColdFire V3 cache control
svcrdma: Fix another Receive buffer leak
xfs: fix inode quota reservation checks
jffs2: fix UAF problem
ceph: fix use-after-free for fsc->mdsc
cpufreq: intel_pstate: Fix cpuinfo_max_freq when MSR_TURBO_RATIO_LIMIT is 0
scsi: libfc: Free skb in fc_disc_gpn_id_resp() for valid cases
virtio_ring: Avoid loop when vq is broken in virtqueue_poll
tools/testing/selftests/cgroup/cgroup_util.c: cg_read_strcmp: fix null pointer dereference
xfs: Fix UBSAN null-ptr-deref in xfs_sysfs_init
alpha: fix annotation of io{read,write}{16,32}be()
fs/signalfd.c: fix inconsistent return codes for signalfd4
ext4: fix potential negative array index in do_split()
ext4: don't allow overlapping system zones
ASoC: q6routing: add dummy register read/write function
i40e: Set RX_ONLY mode for unicast promiscuous on VLAN
i40e: Fix crash during removing i40e driver
net: fec: correct the error path for regulator disable in probe
bonding: show saner speed for broadcast mode
bonding: fix a potential double-unregister
s390/runtime_instrumentation: fix storage key handling
s390/ptrace: fix storage key handling
ASoC: msm8916-wcd-analog: fix register Interrupt offset
ASoC: intel: Fix memleak in sst_media_open
vfio/type1: Add proper error unwind for vfio_iommu_replay()
kvm: x86: Toggling CR4.SMAP does not load PDPTEs in PAE mode
kvm: x86: Toggling CR4.PKE does not load PDPTEs in PAE mode
kconfig: qconf: do not limit the pop-up menu to the first row
kconfig: qconf: fix signal connection to invalid slots
efi: avoid error message when booting under Xen
Fix build error when CONFIG_ACPI is not set/enabled:
RDMA/bnxt_re: Do not add user qps to flushlist
afs: Fix NULL deref in afs_dynroot_depopulate()
bonding: fix active-backup failover for current ARP slave
net: ena: Prevent reset after device destruction
net: gemini: Fix missing free_netdev() in error path of gemini_ethernet_port_probe()
hv_netvsc: Fix the queue_mapping in netvsc_vf_xmit()
net: dsa: b53: check for timeout
powerpc/pseries: Do not initiate shutdown when system is running on UPS
efi: add missed destroy_workqueue when efisubsys_init fails
epoll: Keep a reference on files added to the check list
do_epoll_ctl(): clean the failure exits up a bit
mm/hugetlb: fix calculation of adjust_range_if_pmd_sharing_possible
xen: don't reschedule in preemption off sections
clk: Evict unregistered clks from parent caches
KVM: Pass MMU notifier range flags to kvm_unmap_hva_range()
KVM: arm64: Only reschedule if MMU_NOTIFIER_RANGE_BLOCKABLE is not set
Linux 4.19.142
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Ibfe4a0a4249f76ab35076f4b003e32cd6f9788a5
|
||
|
|
c666936d8d |
mm, page_alloc: fix core hung in free_pcppages_bulk()
commit 88e8ac11d2ea3acc003cf01bb5a38c8aa76c3cfd upstream.
The following race is observed with the repeated online, offline and a
delay between two successive online of memory blocks of movable zone.
P1 P2
Online the first memory block in
the movable zone. The pcp struct
values are initialized to default
values,i.e., pcp->high = 0 &
pcp->batch = 1.
Allocate the pages from the
movable zone.
Try to Online the second memory
block in the movable zone thus it
entered the online_pages() but yet
to call zone_pcp_update().
This process is entered into
the exit path thus it tries
to release the order-0 pages
to pcp lists through
free_unref_page_commit().
As pcp->high = 0, pcp->count = 1
proceed to call the function
free_pcppages_bulk().
Update the pcp values thus the
new pcp values are like, say,
pcp->high = 378, pcp->batch = 63.
Read the pcp's batch value using
READ_ONCE() and pass the same to
free_pcppages_bulk(), pcp values
passed here are, batch = 63,
count = 1.
Since num of pages in the pcp
lists are less than ->batch,
then it will stuck in
while(list_empty(list)) loop
with interrupts disabled thus
a core hung.
Avoid this by ensuring free_pcppages_bulk() is called with proper count of
pcp list pages.
The mentioned race is some what easily reproducible without [1] because
pcp's are not updated for the first memory block online and thus there is
a enough race window for P2 between alloc+free and pcp struct values
update through onlining of second memory block.
With [1], the race still exists but it is very narrow as we update the pcp
struct values for the first memory block online itself.
This is not limited to the movable zone, it could also happen in cases
with the normal zone (e.g., hotplug to a node that only has DMA memory, or
no other memory yet).
[1]: https://patchwork.kernel.org/patch/11696389/
Fixes:
|
||
|
|
84b8dc232a |
mm: include CMA pages in lowmem_reserve at boot
commit e08d3fdfe2dafa0331843f70ce1ff6c1c4900bf4 upstream.
The lowmem_reserve arrays provide a means of applying pressure against
allocations from lower zones that were targeted at higher zones. Its
values are a function of the number of pages managed by higher zones and
are assigned by a call to the setup_per_zone_lowmem_reserve() function.
The function is initially called at boot time by the function
init_per_zone_wmark_min() and may be called later by accesses of the
/proc/sys/vm/lowmem_reserve_ratio sysctl file.
The function init_per_zone_wmark_min() was moved up from a module_init to
a core_initcall to resolve a sequencing issue with khugepaged.
Unfortunately this created a sequencing issue with CMA page accounting.
The CMA pages are added to the managed page count of a zone when
cma_init_reserved_areas() is called at boot also as a core_initcall. This
makes it uncertain whether the CMA pages will be added to the managed page
counts of their zones before or after the call to
init_per_zone_wmark_min() as it becomes dependent on link order. With the
current link order the pages are added to the managed count after the
lowmem_reserve arrays are initialized at boot.
This means the lowmem_reserve values at boot may be lower than the values
used later if /proc/sys/vm/lowmem_reserve_ratio is accessed even if the
ratio values are unchanged.
In many cases the difference is not significant, but for example
an ARM platform with 1GB of memory and the following memory layout
cma: Reserved 256 MiB at 0x0000000030000000
Zone ranges:
DMA [mem 0x0000000000000000-0x000000002fffffff]
Normal empty
HighMem [mem 0x0000000030000000-0x000000003fffffff]
would result in 0 lowmem_reserve for the DMA zone. This would allow
userspace to deplete the DMA zone easily.
Funnily enough
$ cat /proc/sys/vm/lowmem_reserve_ratio
would fix up the situation because as a side effect it forces
setup_per_zone_lowmem_reserve.
This commit breaks the link order dependency by invoking
init_per_zone_wmark_min() as a postcore_initcall so that the CMA pages
have the chance to be properly accounted in their zone(s) and allowing
the lowmem_reserve arrays to receive consistent values.
Fixes:
|
||
|
|
4d01d462e6 |
Merge 4.19.129 into android-4.19-stable
Changes in 4.19.129 ipv6: fix IPV6_ADDRFORM operation logic net_failover: fixed rollback in net_failover_open() bridge: Avoid infinite loop when suppressing NS messages with invalid options vxlan: Avoid infinite loop when suppressing NS messages with invalid options tun: correct header offsets in napi frags mode selftests: bpf: fix use of undeclared RET_IF macro make 'user_access_begin()' do 'access_ok()' Fix 'acccess_ok()' on alpha and SH arch/openrisc: Fix issues with access_ok() x86: uaccess: Inhibit speculation past access_ok() in user_access_begin() lib: Reduce user_access_begin() boundaries in strncpy_from_user() and strnlen_user() btrfs: merge btrfs_find_device and find_device btrfs: Detect unbalanced tree with empty leaf before crashing btree operations crypto: talitos - fix ECB and CBC algs ivsize Input: mms114 - fix handling of mms345l ARM: 8977/1: ptrace: Fix mask for thumb breakpoint hook sched/fair: Don't NUMA balance for kthreads Input: synaptics - add a second working PNP_ID for Lenovo T470s drivers/net/ibmvnic: Update VNIC protocol version reporting powerpc/xive: Clear the page tables for the ESB IO mapping ath9k_htc: Silence undersized packet warnings RDMA/uverbs: Make the event_queue fds return POLLERR when disassociated x86/cpu/amd: Make erratum #1054 a legacy erratum perf probe: Accept the instance number of kretprobe event mm: add kvfree_sensitive() for freeing sensitive data objects aio: fix async fsync creds btrfs: tree-checker: Check level for leaves and nodes x86_64: Fix jiffies ODR violation x86/PCI: Mark Intel C620 MROMs as having non-compliant BARs x86/speculation: Prevent rogue cross-process SSBD shutdown x86/reboot/quirks: Add MacBook6,1 reboot quirk efi/efivars: Add missing kobject_put() in sysfs entry creation error path ALSA: es1688: Add the missed snd_card_free() ALSA: hda/realtek - add a pintbl quirk for several Lenovo machines ALSA: usb-audio: Fix inconsistent card PM state after resume ALSA: usb-audio: Add vendor, product and profile name for HP Thunderbolt Dock ACPI: sysfs: Fix reference count leak in acpi_sysfs_add_hotplug_profile() ACPI: CPPC: Fix reference count leak in acpi_cppc_processor_probe() ACPI: GED: add support for _Exx / _Lxx handler methods ACPI: PM: Avoid using power resources if there are none for D0 cgroup, blkcg: Prepare some symbols for module and !CONFIG_CGROUP usages nilfs2: fix null pointer dereference at nilfs_segctor_do_construct() spi: dw: Fix controller unregister order spi: bcm2835aux: Fix controller unregister order spi: bcm-qspi: when tx/rx buffer is NULL set to 0 PM: runtime: clk: Fix clk_pm_runtime_get() error path crypto: cavium/nitrox - Fix 'nitrox_get_first_device()' when ndevlist is fully iterated ALSA: pcm: disallow linking stream to itself x86/{mce,mm}: Unmap the entire page if the whole page is affected and poisoned KVM: x86: Fix APIC page invalidation race kvm: x86: Fix L1TF mitigation for shadow MMU KVM: x86/mmu: Consolidate "is MMIO SPTE" code KVM: x86: only do L1TF workaround on affected processors x86/speculation: Change misspelled STIPB to STIBP x86/speculation: Add support for STIBP always-on preferred mode x86/speculation: Avoid force-disabling IBPB based on STIBP and enhanced IBRS. x86/speculation: PR_SPEC_FORCE_DISABLE enforcement for indirect branches. spi: No need to assign dummy value in spi_unregister_controller() spi: Fix controller unregister order spi: pxa2xx: Fix controller unregister order spi: bcm2835: Fix controller unregister order spi: pxa2xx: Balance runtime PM enable/disable on error spi: pxa2xx: Fix runtime PM ref imbalance on probe error crypto: virtio: Fix use-after-free in virtio_crypto_skcipher_finalize_req() crypto: virtio: Fix src/dst scatterlist calculation in __virtio_crypto_skcipher_do_req() crypto: virtio: Fix dest length calculation in __virtio_crypto_skcipher_do_req() selftests/net: in rxtimestamp getopt_long needs terminating null entry ovl: initialize error in ovl_copy_xattr proc: Use new_inode not new_inode_pseudo video: fbdev: w100fb: Fix a potential double free. KVM: nSVM: fix condition for filtering async PF KVM: nSVM: leave ASID aside in copy_vmcb_control_area KVM: nVMX: Consult only the "basic" exit reason when routing nested exit KVM: MIPS: Define KVM_ENTRYHI_ASID to cpu_asid_mask(&boot_cpu_data) KVM: MIPS: Fix VPN2_MASK definition for variable cpu_vmbits KVM: arm64: Make vcpu_cp1x() work on Big Endian hosts scsi: megaraid_sas: TM command refire leads to controller firmware crash ath9k: Fix use-after-free Read in ath9k_wmi_ctrl_rx ath9k: Fix use-after-free Write in ath9k_htc_rx_msg ath9x: Fix stack-out-of-bounds Write in ath9k_hif_usb_rx_cb ath9k: Fix general protection fault in ath9k_hif_usb_rx_cb Smack: slab-out-of-bounds in vsscanf drm/vkms: Hold gem object while still in-use mm/slub: fix a memory leak in sysfs_slab_add() fat: don't allow to mount if the FAT length == 0 perf: Add cond_resched() to task_function_call() agp/intel: Reinforce the barrier after GTT updates mmc: sdhci-msm: Clear tuning done flag while hs400 tuning ARM: dts: at91: sama5d2_ptc_ek: fix sdmmc0 node description mmc: sdio: Fix potential NULL pointer error in mmc_sdio_init_card() xen/pvcalls-back: test for errors when calling backend_connect() KVM: arm64: Synchronize sysreg state on injecting an AArch32 exception ACPI: GED: use correct trigger type field in _Exx / _Lxx handling drm: bridge: adv7511: Extend list of audio sample rates crypto: ccp -- don't "select" CONFIG_DMADEVICES media: si2157: Better check for running tuner in init objtool: Ignore empty alternatives spi: pxa2xx: Apply CS clk quirk to BXT net: atlantic: make hw_get_regs optional net: ena: fix error returning in ena_com_get_hash_function() efi/libstub/x86: Work around LLVM ELF quirk build regression arm64: cacheflush: Fix KGDB trap detection spi: dw: Zero DMA Tx and Rx configurations on stack arm64: insn: Fix two bugs in encoding 32-bit logical immediates ixgbe: Fix XDP redirect on archs with PAGE_SIZE above 4K MIPS: Loongson: Build ATI Radeon GPU driver as module Bluetooth: Add SCO fallback for invalid LMP parameters error kgdb: Disable WARN_CONSOLE_UNLOCKED for all kgdb kgdb: Prevent infinite recursive entries to the debugger spi: dw: Enable interrupts in accordance with DMA xfer mode clocksource: dw_apb_timer: Make CPU-affiliation being optional clocksource: dw_apb_timer_of: Fix missing clockevent timers btrfs: do not ignore error from btrfs_next_leaf() when inserting checksums ARM: 8978/1: mm: make act_mm() respect THREAD_SIZE batman-adv: Revert "disable ethtool link speed detection when auto negotiation off" mmc: meson-mx-sdio: trigger a soft reset after a timeout or CRC error spi: dw: Fix Rx-only DMA transfers x86/kvm/hyper-v: Explicitly align hcall param for kvm_hyperv_exit net: vmxnet3: fix possible buffer overflow caused by bad DMA value in vmxnet3_get_rss() staging: android: ion: use vmap instead of vm_map_ram brcmfmac: fix wrong location to get firmware feature tools api fs: Make xxx__mountpoint() more scalable e1000: Distribute switch variables for initialization dt-bindings: display: mediatek: control dpi pins mode to avoid leakage audit: fix a net reference leak in audit_send_reply() media: dvb: return -EREMOTEIO on i2c transfer failure. media: platform: fcp: Set appropriate DMA parameters MIPS: Make sparse_init() using top-down allocation Bluetooth: btbcm: Add 2 missing models to subver tables audit: fix a net reference leak in audit_list_rules_send() netfilter: nft_nat: return EOPNOTSUPP if type or flags are not supported selftests/bpf: Fix memory leak in extract_build_id() net: bcmgenet: set Rx mode before starting netif lib/mpi: Fix 64-bit MIPS build with Clang exit: Move preemption fixup up, move blocking operations down sched/core: Fix illegal RCU from offline CPUs drivers/perf: hisi: Fix typo in events attribute array net: lpc-enet: fix error return code in lpc_mii_init() media: cec: silence shift wrapping warning in __cec_s_log_addrs() net: allwinner: Fix use correct return type for ndo_start_xmit() powerpc/spufs: fix copy_to_user while atomic xfs: clean up the error handling in xfs_swap_extents Crypto/chcr: fix for ccm(aes) failed test MIPS: Truncate link address into 32bit for 32bit kernel mips: cm: Fix an invalid error code of INTVN_*_ERR kgdb: Fix spurious true from in_dbg_master() xfs: reset buffer write failure state on successful completion xfs: fix duplicate verification from xfs_qm_dqflush() platform/x86: intel-vbtn: Use acpi_evaluate_integer() platform/x86: intel-vbtn: Split keymap into buttons and switches parts platform/x86: intel-vbtn: Do not advertise switches to userspace if they are not there platform/x86: intel-vbtn: Also handle tablet-mode switch on "Detachable" and "Portable" chassis-types nvme: refine the Qemu Identify CNS quirk ath10k: Remove msdu from idr when management pkt send fails wcn36xx: Fix error handling path in 'wcn36xx_probe()' net: qed*: Reduce RX and TX default ring count when running inside kdump kernel mt76: avoid rx reorder buffer overflow md: don't flush workqueue unconditionally in md_open veth: Adjust hard_start offset on redirect XDP frames net/mlx5e: IPoIB, Drop multicast packets that this interface sent rtlwifi: Fix a double free in _rtl_usb_tx_urb_setup() mwifiex: Fix memory corruption in dump_station x86/boot: Correct relocation destination on old linkers mips: MAAR: Use more precise address mask mips: Add udelay lpj numbers adjustment crypto: stm32/crc32 - fix ext4 chksum BUG_ON() crypto: stm32/crc32 - fix run-time self test issue. crypto: stm32/crc32 - fix multi-instance x86/mm: Stop printing BRK addresses m68k: mac: Don't call via_flush_cache() on Mac IIfx btrfs: qgroup: mark qgroup inconsistent if we're inherting snapshot to a new qgroup macvlan: Skip loopback packets in RX handler PCI: Don't disable decoding when mmio_always_on is set MIPS: Fix IRQ tracing when call handle_fpe() and handle_msa_fpe() bcache: fix refcount underflow in bcache_device_free() mmc: sdhci-msm: Set SDHCI_QUIRK_MULTIBLOCK_READ_ACMD12 quirk staging: greybus: sdio: Respect the cmd->busy_timeout from the mmc core mmc: via-sdmmc: Respect the cmd->busy_timeout from the mmc core ixgbe: fix signed-integer-overflow warning mmc: sdhci-esdhc-imx: fix the mask for tuning start point spi: dw: Return any value retrieved from the dma_transfer callback cpuidle: Fix three reference count leaks platform/x86: hp-wmi: Convert simple_strtoul() to kstrtou32() platform/x86: intel-hid: Add a quirk to support HP Spectre X2 (2015) platform/x86: intel-vbtn: Only blacklist SW_TABLET_MODE on the 9 / "Laptop" chasis-type string.h: fix incompatibility between FORTIFY_SOURCE and KASAN btrfs: include non-missing as a qualifier for the latest_bdev btrfs: send: emit file capabilities after chown mm: thp: make the THP mapcount atomic against __split_huge_pmd_locked() mm: initialize deferred pages with interrupts enabled ima: Fix ima digest hash table key calculation ima: Directly assign the ima_default_policy pointer to ima_rules evm: Fix possible memory leak in evm_calc_hmac_or_hash() ext4: fix EXT_MAX_EXTENT/INDEX to check for zeroed eh_max ext4: fix error pointer dereference ext4: fix race between ext4_sync_parent() and rename() PCI: Avoid Pericom USB controller OHCI/EHCI PME# defect PCI: Avoid FLR for AMD Matisse HD Audio & USB 3.0 PCI: Avoid FLR for AMD Starship USB 3.0 PCI: Add ACS quirk for iProc PAXB PCI: Add ACS quirk for Intel Root Complex Integrated Endpoints PCI: Remove unused NFP32xx IDs pci:ipmi: Move IPMI PCI class id defines to pci_ids.h hwmon/k10temp, x86/amd_nb: Consolidate shared device IDs x86/amd_nb: Add PCI device IDs for family 17h, model 30h PCI: add USR vendor id and use it in r8169 and w6692 driver PCI: Move Synopsys HAPS platform device IDs PCI: Move Rohm Vendor ID to generic list misc: pci_endpoint_test: Add the layerscape EP device support misc: pci_endpoint_test: Add support to test PCI EP in AM654x PCI: Add Synopsys endpoint EDDA Device ID PCI: Add NVIDIA GPU multi-function power dependencies PCI: Enable NVIDIA HDA controllers PCI: mediatek: Add controller support for MT7629 x86/amd_nb: Add PCI device IDs for family 17h, model 70h ALSA: lx6464es - add support for LX6464ESe pci express variant PCI: Add Genesys Logic, Inc. Vendor ID PCI: Add Amazon's Annapurna Labs vendor ID PCI: vmd: Add device id for VMD device 8086:9A0B x86/amd_nb: Add Family 19h PCI IDs PCI: Add Loongson vendor ID serial: 8250_pci: Move Pericom IDs to pci_ids.h PCI: Make ACS quirk implementations more uniform PCI: Unify ACS quirk desired vs provided checking PCI: Generalize multi-function power dependency device links btrfs: fix error handling when submitting direct I/O bio btrfs: fix wrong file range cleanup after an error filling dealloc range ima: Call ima_calc_boot_aggregate() in ima_eventdigest_init() PCI: Program MPS for RCiEP devices e1000e: Disable TSO for buffer overrun workaround e1000e: Relax condition to trigger reset for ME workaround carl9170: remove P2P_GO support media: go7007: fix a miss of snd_card_free Bluetooth: hci_bcm: fix freeing not-requested IRQ b43legacy: Fix case where channel status is corrupted b43: Fix connection problem with WPA3 b43_legacy: Fix connection problem with WPA3 media: ov5640: fix use of destroyed mutex igb: Report speed and duplex as unknown when device is runtime suspended power: vexpress: add suppress_bind_attrs to true pinctrl: samsung: Correct setting of eint wakeup mask on s5pv210 pinctrl: samsung: Save/restore eint_mask over suspend for EINT_TYPE GPIOs gnss: sirf: fix error return code in sirf_probe() sparc32: fix register window handling in genregs32_[gs]et() sparc64: fix misuses of access_process_vm() in genregs32_[sg]et() dm crypt: avoid truncating the logical block size alpha: fix memory barriers so that they conform to the specification kernel/cpu_pm: Fix uninitted local in cpu_pm ARM: tegra: Correct PL310 Auxiliary Control Register initialization ARM: dts: exynos: Fix GPIO polarity for thr GalaxyS3 CM36651 sensor's bus ARM: dts: at91: sama5d2_ptc_ek: fix vbus pin ARM: dts: s5pv210: Set keep-power-in-suspend for SDHCI1 on Aries drivers/macintosh: Fix memleak in windfarm_pm112 driver powerpc/64s: Don't let DT CPU features set FSCR_DSCR powerpc/64s: Save FSCR to init_task.thread.fscr after feature init kbuild: force to build vmlinux if CONFIG_MODVERSION=y sunrpc: svcauth_gss_register_pseudoflavor must reject duplicate registrations. sunrpc: clean up properly in gss_mech_unregister() mtd: rawnand: brcmnand: fix hamming oob layout mtd: rawnand: pasemi: Fix the probe error path w1: omap-hdq: cleanup to add missing newline for some dev_dbg perf probe: Do not show the skipped events perf probe: Fix to check blacklist address correctly perf probe: Check address correctness by map instead of _etext perf symbols: Fix debuginfo search for Ubuntu Linux 4.19.129 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> Change-Id: I7b1108d90ee1109a28fe488a4358b7a3e101d9c9 |
||
|
|
88afa532c1 |
mm: initialize deferred pages with interrupts enabled
commit 3d060856adfc59afb9d029c233141334cfaba418 upstream. Initializing struct pages is a long task and keeping interrupts disabled for the duration of this operation introduces a number of problems. 1. jiffies are not updated for long period of time, and thus incorrect time is reported. See proposed solution and discussion here: lkml/20200311123848.118638-1-shile.zhang@linux.alibaba.com 2. It prevents farther improving deferred page initialization by allowing intra-node multi-threading. We are keeping interrupts disabled to solve a rather theoretical problem that was never observed in real world (See |
||
|
|
bedd88210d |
Merge 4.19.123 into android-4.19
Changes in 4.19.123 USB: serial: qcserial: Add DW5816e support tracing/kprobes: Fix a double initialization typo vt: fix unicode console freeing with a common interface dp83640: reverse arguments to list_add_tail fq_codel: fix TCA_FQ_CODEL_DROP_BATCH_SIZE sanity checks net: macsec: preserve ingress frame ordering net/mlx4_core: Fix use of ENOSPC around mlx4_counter_alloc() net_sched: sch_skbprio: add message validation to skbprio_change() net: usb: qmi_wwan: add support for DW5816e sch_choke: avoid potential panic in choke_reset() sch_sfq: validate silly quantum values tipc: fix partial topology connection closure bnxt_en: Fix VLAN acceleration handling in bnxt_fix_features(). net/mlx5: Fix forced completion access non initialized command entry net/mlx5: Fix command entry leak in Internal Error State bnxt_en: Improve AER slot reset. bnxt_en: Fix VF anti-spoof filter setup. net: stricter validation of untrusted gso packets HID: wacom: Read HID_DG_CONTACTMAX directly for non-generic devices sctp: Fix bundling of SHUTDOWN with COOKIE-ACK HID: usbhid: Fix race between usbhid_close() and usbhid_stop() USB: uas: add quirk for LaCie 2Big Quadra USB: serial: garmin_gps: add sanity checking for data length tracing: Add a vmalloc_sync_mappings() for safe measure KVM: arm: vgic: Fix limit condition when writing to GICD_I[CS]ACTIVER KVM: arm64: Fix 32bit PC wrap-around arm64: hugetlb: avoid potential NULL dereference mm/page_alloc: fix watchdog soft lockups during set_zone_contiguous() staging: gasket: Check the return value of gasket_get_bar_index() coredump: fix crash when umh is disabled KVM: VMX: Explicitly reference RCX as the vmx_vcpu pointer in asm blobs KVM: VMX: Mark RCX, RDX and RSI as clobbered in vmx_vcpu_run()'s asm blob batman-adv: fix batadv_nc_random_weight_tq batman-adv: Fix refcnt leak in batadv_show_throughput_override batman-adv: Fix refcnt leak in batadv_store_throughput_override batman-adv: Fix refcnt leak in batadv_v_ogm_process x86/entry/64: Fix unwind hints in register clearing code x86/entry/64: Fix unwind hints in kernel exit path x86/entry/64: Fix unwind hints in rewind_stack_do_exit() x86/unwind/orc: Don't skip the first frame for inactive tasks x86/unwind/orc: Prevent unwinding before ORC initialization x86/unwind/orc: Fix error path for bad ORC entry type x86/unwind/orc: Fix premature unwind stoppage due to IRET frames netfilter: nat: never update the UDP checksum when it's 0 netfilter: nf_osf: avoid passing pointer to local var objtool: Fix stack offset tracking for indirect CFAs scripts/decodecode: fix trapping instruction formatting ipc/mqueue.c: change __do_notify() to bypass check_kill_permission() Linux 4.19.123 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> Change-Id: Ib87d493c94816aa0a0754530669a8bd688964987 |
||
|
|
dfe810bd92 |
mm/page_alloc: fix watchdog soft lockups during set_zone_contiguous()
commit e84fe99b68ce353c37ceeecc95dce9696c976556 upstream. Without CONFIG_PREEMPT, it can happen that we get soft lockups detected, e.g., while booting up. watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [swapper/0:1] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.6.0-next-20200331+ #4 Hardware name: Red Hat KVM, BIOS 1.11.1-4.module+el8.1.0+4066+0f1aadab 04/01/2014 RIP: __pageblock_pfn_to_page+0x134/0x1c0 Call Trace: set_zone_contiguous+0x56/0x70 page_alloc_init_late+0x166/0x176 kernel_init_freeable+0xfa/0x255 kernel_init+0xa/0x106 ret_from_fork+0x35/0x40 The issue becomes visible when having a lot of memory (e.g., 4TB) assigned to a single NUMA node - a system that can easily be created using QEMU. Inside VMs on a hypervisor with quite some memory overcommit, this is fairly easy to trigger. Signed-off-by: David Hildenbrand <david@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Pavel Tatashin <pasha.tatashin@soleen.com> Reviewed-by: Pankaj Gupta <pankaj.gupta.linux@gmail.com> Reviewed-by: Baoquan He <bhe@redhat.com> Reviewed-by: Shile Zhang <shile.zhang@linux.alibaba.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Kirill Tkhai <ktkhai@virtuozzo.com> Cc: Shile Zhang <shile.zhang@linux.alibaba.com> Cc: Pavel Tatashin <pasha.tatashin@soleen.com> Cc: Daniel Jordan <daniel.m.jordan@oracle.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Alexander Duyck <alexander.duyck@gmail.com> Cc: Baoquan He <bhe@redhat.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: <stable@vger.kernel.org> Link: http://lkml.kernel.org/r/20200416073417.5003-1-david@redhat.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
||
|
|
95bff4cdab |
Merge 4.19.116 into android-4.19
Changes in 4.19.116
ARM: dts: sun8i-a83t-tbs-a711: HM5065 doesn't like such a high voltage
bus: sunxi-rsb: Return correct data when mixing 16-bit and 8-bit reads
net: vxge: fix wrong __VA_ARGS__ usage
hinic: fix a bug of waitting for IO stopped
hinic: fix wrong para of wait_for_completion_timeout
cxgb4/ptp: pass the sign of offset delta in FW CMD
qlcnic: Fix bad kzalloc null test
i2c: st: fix missing struct parameter description
cpufreq: imx6q: Fixes unwanted cpu overclocking on i.MX6ULL
media: venus: hfi_parser: Ignore HEVC encoding for V1
firmware: arm_sdei: fix double-lock on hibernate with shared events
null_blk: Fix the null_add_dev() error path
null_blk: Handle null_add_dev() failures properly
null_blk: fix spurious IO errors after failed past-wp access
xhci: bail out early if driver can't accress host in resume
x86: Don't let pgprot_modify() change the page encryption bit
block: keep bdi->io_pages in sync with max_sectors_kb for stacked devices
irqchip/versatile-fpga: Handle chained IRQs properly
sched: Avoid scale real weight down to zero
selftests/x86/ptrace_syscall_32: Fix no-vDSO segfault
PCI/switchtec: Fix init_completion race condition with poll_wait()
media: i2c: video-i2c: fix build errors due to 'imply hwmon'
libata: Remove extra scsi_host_put() in ata_scsi_add_hosts()
pstore/platform: fix potential mem leak if pstore_init_fs failed
gfs2: Don't demote a glock until its revokes are written
x86/boot: Use unsigned comparison for addresses
efi/x86: Ignore the memory attributes table on i386
genirq/irqdomain: Check pointer in irq_domain_alloc_irqs_hierarchy()
block: Fix use-after-free issue accessing struct io_cq
media: i2c: ov5695: Fix power on and off sequences
usb: dwc3: core: add support for disabling SS instances in park mode
irqchip/gic-v4: Provide irq_retrigger to avoid circular locking dependency
md: check arrays is suspended in mddev_detach before call quiesce operations
firmware: fix a double abort case with fw_load_sysfs_fallback
locking/lockdep: Avoid recursion in lockdep_count_{for,back}ward_deps()
block, bfq: fix use-after-free in bfq_idle_slice_timer_body
btrfs: qgroup: ensure qgroup_rescan_running is only set when the worker is at least queued
btrfs: remove a BUG_ON() from merge_reloc_roots()
btrfs: track reloc roots based on their commit root bytenr
IB/mlx5: Replace tunnel mpls capability bits for tunnel_offloads
uapi: rename ext2_swab() to swab() and share globally in swab.h
slub: improve bit diffusion for freelist ptr obfuscation
ASoC: fix regwmask
ASoC: dapm: connect virtual mux with default value
ASoC: dpcm: allow start or stop during pause for backend
ASoC: topology: use name_prefix for new kcontrol
usb: gadget: f_fs: Fix use after free issue as part of queue failure
usb: gadget: composite: Inform controller driver of self-powered
ALSA: usb-audio: Add mixer workaround for TRX40 and co
ALSA: hda: Add driver blacklist
ALSA: hda: Fix potential access overflow in beep helper
ALSA: ice1724: Fix invalid access for enumerated ctl items
ALSA: pcm: oss: Fix regression by buffer overflow fix
ALSA: doc: Document PC Beep Hidden Register on Realtek ALC256
ALSA: hda/realtek - Set principled PC Beep configuration for ALC256
ALSA: hda/realtek - Remove now-unnecessary XPS 13 headphone noise fixups
ALSA: hda/realtek - Add quirk for MSI GL63
media: ti-vpe: cal: fix disable_irqs to only the intended target
acpi/x86: ignore unspecified bit positions in the ACPI global lock field
thermal: devfreq_cooling: inline all stubs for CONFIG_DEVFREQ_THERMAL=n
nvme-fc: Revert "add module to ops template to allow module references"
nvme: Treat discovery subsystems as unique subsystems
PCI: pciehp: Fix indefinite wait on sysfs requests
PCI/ASPM: Clear the correct bits when enabling L1 substates
PCI: Add boot interrupt quirk mechanism for Xeon chipsets
PCI: endpoint: Fix for concurrent memory allocation in OB address region
tpm: Don't make log failures fatal
tpm: tpm1_bios_measurements_next should increase position index
tpm: tpm2_bios_measurements_next should increase position index
KEYS: reaching the keys quotas correctly
irqchip/versatile-fpga: Apply clear-mask earlier
pstore: pstore_ftrace_seq_next should increase position index
MIPS/tlbex: Fix LDDIR usage in setup_pw() for Loongson-3
MIPS: OCTEON: irq: Fix potential NULL pointer dereference
ath9k: Handle txpower changes even when TPC is disabled
signal: Extend exec_id to 64bits
x86/entry/32: Add missing ASM_CLAC to general_protection entry
KVM: nVMX: Properly handle userspace interrupt window request
KVM: s390: vsie: Fix region 1 ASCE sanity shadow address checks
KVM: s390: vsie: Fix delivery of addressing exceptions
KVM: x86: Allocate new rmap and large page tracking when moving memslot
KVM: VMX: Always VMCLEAR in-use VMCSes during crash with kexec support
KVM: x86: Gracefully handle __vmalloc() failure during VM allocation
KVM: VMX: fix crash cleanup when KVM wasn't used
CIFS: Fix bug which the return value by asynchronous read is error
mtd: spinand: Stop using spinand->oobbuf for buffering bad block markers
mtd: spinand: Do not erase the block before writing a bad block marker
Btrfs: fix crash during unmount due to race with delayed inode workers
btrfs: set update the uuid generation as soon as possible
btrfs: drop block from cache on error in relocation
btrfs: fix missing file extent item for hole after ranged fsync
btrfs: fix missing semaphore unlock in btrfs_sync_file
crypto: mxs-dcp - fix scatterlist linearization for hash
erofs: correct the remaining shrink objects
powerpc/pseries: Drop pointless static qualifier in vpa_debugfs_init()
x86/speculation: Remove redundant arch_smt_update() invocation
tools: gpio: Fix out-of-tree build regression
mm: Use fixed constant in page_frag_alloc instead of size + 1
net: qualcomm: rmnet: Allow configuration updates to existing devices
arm64: dts: allwinner: h6: Fix PMU compatible
dm writecache: add cond_resched to avoid CPU hangs
dm verity fec: fix memory leak in verity_fec_dtr
scsi: zfcp: fix missing erp_lock in port recovery trigger for point-to-point
arm64: armv8_deprecated: Fix undef_hook mask for thumb setend
selftests: vm: drop dependencies on page flags from mlock2 tests
rtc: omap: Use define directive for PIN_CONFIG_ACTIVE_HIGH
drm/etnaviv: rework perfmon query infrastructure
powerpc/pseries: Avoid NULL pointer dereference when drmem is unavailable
NFS: Fix a page leak in nfs_destroy_unlinked_subrequests()
ext4: fix a data race at inode->i_blocks
fs/filesystems.c: downgrade user-reachable WARN_ONCE() to pr_warn_once()
ocfs2: no need try to truncate file beyond i_size
perf tools: Support Python 3.8+ in Makefile
s390/diag: fix display of diagnose call statistics
Input: i8042 - add Acer Aspire 5738z to nomux list
clk: ingenic/jz4770: Exit with error if CGU init failed
kmod: make request_module() return an error when autoloading is disabled
cpufreq: powernv: Fix use-after-free
hfsplus: fix crash and filesystem corruption when deleting files
libata: Return correct status in sata_pmp_eh_recover_pm() when ATA_DFLAG_DETACH is set
ipmi: fix hung processes in __get_guid()
xen/blkfront: fix memory allocation flags in blkfront_setup_indirect()
powerpc/powernv/idle: Restore AMR/UAMOR/AMOR after idle
powerpc/64/tm: Don't let userspace set regs->trap via sigreturn
powerpc/hash64/devmap: Use H_PAGE_THP_HUGE when setting up huge devmap PTE entries
powerpc/xive: Use XIVE_BAD_IRQ instead of zero to catch non configured IPIs
powerpc/kprobes: Ignore traps that happened in real mode
scsi: mpt3sas: Fix kernel panic observed on soft HBA unplug
powerpc: Add attributes for setjmp/longjmp
powerpc: Make setjmp/longjmp signature standard
btrfs: use nofs allocations for running delayed items
dm zoned: remove duplicate nr_rnd_zones increase in dmz_init_zone()
crypto: caam - update xts sector size for large input length
crypto: ccree - improve error handling
crypto: ccree - zero out internal struct before use
crypto: ccree - don't mangle the request assoclen
crypto: ccree - dec auth tag size from cryptlen map
crypto: ccree - only try to map auth tag if needed
Revert "drm/dp_mst: Remove VCPI while disabling topology mgr"
drm/dp_mst: Fix clearing payload state on topology disable
drm: Remove PageReserved manipulation from drm_pci_alloc
ftrace/kprobe: Show the maxactive number on kprobe_events
powerpc/fsl_booke: Avoid creating duplicate tlb1 entry
misc: echo: Remove unnecessary parentheses and simplify check for zero
etnaviv: perfmon: fix total and idle HI cyleces readout
mfd: dln2: Fix sanity checking for endpoints
efi/x86: Fix the deletion of variables in mixed mode
Linux 4.19.116
Change-Id: If09fbb53fcb11ea01eaaa7fee7ed21ed6234f352
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
|
||
|
|
695986163d |
mm: Use fixed constant in page_frag_alloc instead of size + 1
commit 8644772637deb121f7ac2df690cbf83fa63d3b70 upstream.
This patch replaces the size + 1 value introduced with the recent fix for 1
byte allocs with a constant value.
The idea here is to reduce code overhead as the previous logic would have
to read size into a register, then increment it, and write it back to
whatever field was being used. By using a constant we can avoid those
memory reads and arithmetic operations in favor of just encoding the
maximum value into the operation itself.
Fixes: 2c2ade81741c ("mm: page_alloc: fix ref bias in page_frag_alloc() for 1-byte allocs")
Signed-off-by: Alexander Duyck <alexander.h.duyck@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Cc: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
||
|
|
82d4e597c4 |
Revert "UPSTREAM: mm, page_alloc: spread allocations across zones before introducing fragmentation"
This reverts commit
|
||
|
|
5e86f20f4f |
Revert "UPSTREAM: mm: use alloc_flags to record if kswapd can wake"
This reverts commit
|
||
|
|
fbc355c16d |
Revert "BACKPORT: mm: move zone watermark accesses behind an accessor"
This reverts commit
|
||
|
|
35be952ae0 |
Revert "BACKPORT: mm: reclaim small amounts of memory when an external fragmentation event occurs"
This reverts commit
|
||
|
|
0a81e68654 |
ANDROID: GKI: mm: Export symbols __next_zones_zonelist and zone_watermark_ok_safe
These symbols are required to modularize driver for qcom,msm-ion. Bug: 147914088 Change-Id: Ice7f76c6a52c6cc94ad7076a1c2396a90d6aa9ca Signed-off-by: Hridya Valsaraju <hridya@google.com> Signed-off-by: Saravana Kannan <saravanak@google.com> |
||
|
|
75acd190bd |
ANDROID: Fix kernelci build-break on !CONFIG_CMA builds
commit |
||
|
|
89666477b2 |
ANDROID: GKI: mm: fix cma accounting in zone_watermark_ok
Some cases were reported on 3.18 where atomic unmovable allocations of order 2 fails, but kswapd does not wakeup. And in such cases it was seen that, when zone_watermark_ok check is performed to decide whether to wake up kswapd, there were lot of CMA pages of order 2 and above. This makes the watermark check succeed resulting in kswapd not being woken up. But since these atomic unmovable allocations can't come from CMA region, further atomic allocations keeps failing, without kswapd trying to reclaim. Usually concurrent movable allocations result in reclaim and improves the situtation, but the case reported was from a network test which was resulting in only atomic skb allocations being attempted. On 3.18 this was fixed by adding a cma free page counter and accouting the cma free pages properly in watermark calculations. Later this issue was indirectly fixed by the commit "mm, page_alloc: only enforce watermarks for order-0 allocations". But the commit "mm: add cma pcp list" brought the problem back because it includes MIGRATE_CMA within MIGRATE_PCPTYPES, and thus watermark check erroneously returns success for !ALLOC_CMA by finding free pages in cma free list. Change-Id: Id0e48b5c2f9deea93c5875c10d5ec72bd360df5f Signed-off-by: Vinayak Menon <vinmenon@codeaurora.org> Signed-off-by: Charan Teja Reddy <charante@codeaurora.org> Bug: 150808082 Test: build Signed-off-by: Suren Baghdasaryan <surenb@google.com> Change-Id: I232e5379797ae946e15127852d31c2d29ca35b30 |
||
|
|
c98dd3b1df |
ANDROID: GKI: mm: add cma pcp list
Add a cma pcp list in order to increase cma memory utilization.
Increased cma memory utilization will improve overall memory
utilization because free cma pages are ignored when memory reclaim
is done with gfp mask GFP_KERNEL.
Since most memory reclaim is done by kswapd, which uses a gfp mask
of GFP_KERNEL, by increasing cma memory utilization we are therefore
ensuring that less aggressive memory reclaim takes place.
Increased cma memory utilization will improve performance,
for example it will increase app concurrency.
Change-Id: I809589a25c6abca51f1c963f118adfc78e955cf9
Signed-off-by: Liam Mark <lmark@codeaurora.org>
[vinmenon@codeaurora.org: fix !CONFIG_CMA compile time issues]
Signed-off-by: Vinayak Menon <vinmenon@codeaurora.org>
Signed-off-by: Charan Teja Reddy <charante@codeaurora.org>
[swatsrid@codeaurora.org: Fix merge conflicts]
Signed-off-by: Swathi Sridhar <swatsrid@codeaurora.org>
(cherry picked from commit
|
||
|
|
c29070e5b9 |
ANDROID: GKI: cma: redirect page allocation to CMA
CMA pages are designed to be used as fallback for movable allocations
and cannot be used for non-movable allocations. If CMA pages are
utilized poorly, non-movable allocations may end up getting starved if
all regular movable pages are allocated and the only pages left are
CMA. Always using CMA pages first creates unacceptable performance
problems. As a midway alternative, use CMA pages for certain
userspace allocations. The userspace pages can be migrated or dropped
quickly which giving decent utilization.
Change-Id: I6165dda01b705309eebabc6dfa67146b7a95c174
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
Signed-off-by: Heesub Shin <heesub.shin@samsung.com>
[lauraa@codeaurora.org: Missing CONFIG_CMA guards, add commit text]
Signed-off-by: Laura Abbott <lauraa@codeaurora.org>
[lmark@codeaurora.org: resolve conflicts relating to MIGRATE_HIGHATOMIC]
Signed-off-by: Liam Mark <lmark@codeaurora.org>
Signed-off-by: Vinayak Menon <vinmenon@codeaurora.org>
Signed-off-by: Charan Teja Reddy <charante@codeaurora.org>
[swatsrid@codeaurora.org: Fix merge conflicts]
Signed-off-by: Swathi Sridhar <swatsrid@codeaurora.org>
(cherry picked from commit
|
||
|
|
5cbbeadd5a |
BACKPORT: mm: reclaim small amounts of memory when an external fragmentation event occurs
An external fragmentation event was previously described as
When the page allocator fragments memory, it records the event using
the mm_page_alloc_extfrag event. If the fallback_order is smaller
than a pageblock order (order-9 on 64-bit x86) then it's considered
an event that will cause external fragmentation issues in the future.
The kernel reduces the probability of such events by increasing the
watermark sizes by calling set_recommended_min_free_kbytes early in the
lifetime of the system. This works reasonably well in general but if
there are enough sparsely populated pageblocks then the problem can still
occur as enough memory is free overall and kswapd stays asleep.
This patch introduces a watermark_boost_factor sysctl that allows a zone
watermark to be temporarily boosted when an external fragmentation causing
events occurs. The boosting will stall allocations that would decrease
free memory below the boosted low watermark and kswapd is woken if the
calling context allows to reclaim an amount of memory relative to the size
of the high watermark and the watermark_boost_factor until the boost is
cleared. When kswapd finishes, it wakes kcompactd at the pageblock order
to clean some of the pageblocks that may have been affected by the
fragmentation event. kswapd avoids any writeback, slab shrinkage and swap
from reclaim context during this operation to avoid excessive system
disruption in the name of fragmentation avoidance. Care is taken so that
kswapd will do normal reclaim work if the system is really low on memory.
This was evaluated using the same workloads as "mm, page_alloc: Spread
allocations across zones before introducing fragmentation".
1-socket Skylake machine
config-global-dhp__workload_thpfioscale XFS (no special madvise)
4 fio threads, 1 THP allocating thread
--------------------------------------
4.20-rc3 extfrag events < order 9: 804694
4.20-rc3+patch: 408912 (49% reduction)
4.20-rc3+patch1-4: 18421 (98% reduction)
4.20.0-rc3 4.20.0-rc3
lowzone-v5r8 boost-v5r8
Amean fault-base-1 653.58 ( 0.00%) 652.71 ( 0.13%)
Amean fault-huge-1 0.00 ( 0.00%) 178.93 * -99.00%*
4.20.0-rc3 4.20.0-rc3
lowzone-v5r8 boost-v5r8
Percentage huge-1 0.00 ( 0.00%) 5.12 ( 100.00%)
Note that external fragmentation causing events are massively reduced by
this path whether in comparison to the previous kernel or the vanilla
kernel. The fault latency for huge pages appears to be increased but that
is only because THP allocations were successful with the patch applied.
1-socket Skylake machine
global-dhp__workload_thpfioscale-madvhugepage-xfs (MADV_HUGEPAGE)
-----------------------------------------------------------------
4.20-rc3 extfrag events < order 9: 291392
4.20-rc3+patch: 191187 (34% reduction)
4.20-rc3+patch1-4: 13464 (95% reduction)
thpfioscale Fault Latencies
4.20.0-rc3 4.20.0-rc3
lowzone-v5r8 boost-v5r8
Min fault-base-1 912.00 ( 0.00%) 905.00 ( 0.77%)
Min fault-huge-1 127.00 ( 0.00%) 135.00 ( -6.30%)
Amean fault-base-1 1467.55 ( 0.00%) 1481.67 ( -0.96%)
Amean fault-huge-1 1127.11 ( 0.00%) 1063.88 * 5.61%*
4.20.0-rc3 4.20.0-rc3
lowzone-v5r8 boost-v5r8
Percentage huge-1 77.64 ( 0.00%) 83.46 ( 7.49%)
As before, massive reduction in external fragmentation events, some jitter
on latencies and an increase in THP allocation success rates.
2-socket Haswell machine
config-global-dhp__workload_thpfioscale XFS (no special madvise)
4 fio threads, 5 THP allocating threads
----------------------------------------------------------------
4.20-rc3 extfrag events < order 9: 215698
4.20-rc3+patch: 200210 (7% reduction)
4.20-rc3+patch1-4: 14263 (93% reduction)
4.20.0-rc3 4.20.0-rc3
lowzone-v5r8 boost-v5r8
Amean fault-base-5 1346.45 ( 0.00%) 1306.87 ( 2.94%)
Amean fault-huge-5 3418.60 ( 0.00%) 1348.94 ( 60.54%)
4.20.0-rc3 4.20.0-rc3
lowzone-v5r8 boost-v5r8
Percentage huge-5 0.78 ( 0.00%) 7.91 ( 910.64%)
There is a 93% reduction in fragmentation causing events, there is a big
reduction in the huge page fault latency and allocation success rate is
higher.
2-socket Haswell machine
global-dhp__workload_thpfioscale-madvhugepage-xfs (MADV_HUGEPAGE)
-----------------------------------------------------------------
4.20-rc3 extfrag events < order 9: 166352
4.20-rc3+patch: 147463 (11% reduction)
4.20-rc3+patch1-4: 11095 (93% reduction)
thpfioscale Fault Latencies
4.20.0-rc3 4.20.0-rc3
lowzone-v5r8 boost-v5r8
Amean fault-base-5 6217.43 ( 0.00%) 7419.67 * -19.34%*
Amean fault-huge-5 3163.33 ( 0.00%) 3263.80 ( -3.18%)
4.20.0-rc3 4.20.0-rc3
lowzone-v5r8 boost-v5r8
Percentage huge-5 95.14 ( 0.00%) 87.98 ( -7.53%)
There is a large reduction in fragmentation events with some jitter around
the latencies and success rates. As before, the high THP allocation
success rate does mean the system is under a lot of pressure. However, as
the fragmentation events are reduced, it would be expected that the
long-term allocation success rate would be higher.
Link: http://lkml.kernel.org/r/20181123114528.28802-5-mgorman@techsingularity.net
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Zi Yan <zi.yan@cs.rutgers.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Change-Id: Ied06272defcdbf3fff07b7ebccb46c68ce081e1e
Git-commit: 1c30844d2dfe272d58c8fc000960b835d13aa2ac
Git-repo: git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
[vinmenon@codeaurora.org: trivial merge conflict fixes]
Signed-off-by: Vinayak Menon <vinmenon@codeaurora.org>
(cherry picked from commit 1c30844d2dfe272d58c8fc000960b835d13aa2ac)
Signed-off-by: Mark Salyzyn <salyzyn@google.com>
Bug: 150378964
|
||
|
|
acfb1c608b |
BACKPORT: mm: move zone watermark accesses behind an accessor
This is a preparation patch only, no functional change. Link: http://lkml.kernel.org/r/20181123114528.28802-3-mgorman@techsingularity.net Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: David Rientjes <rientjes@google.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Zi Yan <zi.yan@cs.rutgers.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Change-Id: Iccef5f02348ab59d75cbbe2b48f40017391b88b1 Git-commit: a921444382b49cc7fdeca3fba3e278bc09484a27 Git-repo: git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git [vinmenon@codeaurora.org: trivial merge conflict fixes] Signed-off-by: Vinayak Menon <vinmenon@codeaurora.org> (cherry picked from commit a921444382b49cc7fdeca3fba3e278bc09484a27) Signed-off-by: Mark Salyzyn <salyzyn@google.com> Bug: 150378964 |
||
|
|
112ced56ce |
UPSTREAM: mm: use alloc_flags to record if kswapd can wake
This is a preparation patch that copies the GFP flag __GFP_KSWAPD_RECLAIM into alloc_flags. This is a preparation patch only that avoids having to pass gfp_mask through a long callchain in a future patch. Note that the setting in the fast path happens in alloc_flags_nofragment() and it may be claimed that this has nothing to do with ALLOC_NO_FRAGMENT. That's true in this patch but is not true later so it's done now for easier review to show where the flag needs to be recorded. No functional change. [mgorman@techsingularity.net: ALLOC_KSWAPD flag needs to be applied in the !CONFIG_ZONE_DMA32 case] Link: http://lkml.kernel.org/r/20181126143503.GO23260@techsingularity.net Link: http://lkml.kernel.org/r/20181123114528.28802-4-mgorman@techsingularity.net Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: David Rientjes <rientjes@google.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Zi Yan <zi.yan@cs.rutgers.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Change-Id: I3f54fbfd87f02bd9f926a3913d88ba3055dde33c Git-commit: 0a79cdad5eb213b3a629e624565b1b3bf9192b7c Git-repo: git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git Signed-off-by: Vinayak Menon <vinmenon@codeaurora.org> (cherry picked from commit 0a79cdad5eb213b3a629e624565b1b3bf9192b7c) Signed-off-by: Mark Salyzyn <salyzyn@google.com> Bug: 150378964 |
||
|
|
8ad4b225e8 |
UPSTREAM: mm, page_alloc: spread allocations across zones before introducing fragmentation
Patch series "Fragmentation avoidance improvements", v5.
It has been noted before that fragmentation avoidance (aka
anti-fragmentation) is not perfect. Given sufficient time or an adverse
workload, memory gets fragmented and the long-term success of high-order
allocations degrades. This series defines an adverse workload, a definition
of external fragmentation events (including serious) ones and a series
that reduces the level of those fragmentation events.
The details of the workload and the consequences are described in more
detail in the changelogs. However, from patch 1, this is a high-level
summary of the adverse workload. The exact details are found in the
mmtests implementation.
The broad details of the workload are as follows;
1. Create an XFS filesystem (not specified in the configuration but done
as part of the testing for this patch)
2. Start 4 fio threads that write a number of 64K files inefficiently.
Inefficiently means that files are created on first access and not
created in advance (fio parameterr create_on_open=1) and fallocate
is not used (fallocate=none). With multiple IO issuers this creates
a mix of slab and page cache allocations over time. The total size
of the files is 150% physical memory so that the slabs and page cache
pages get mixed
3. Warm up a number of fio read-only threads accessing the same files
created in step 2. This part runs for the same length of time it
took to create the files. It'll fault back in old data and further
interleave slab and page cache allocations. As it's now low on
memory due to step 2, fragmentation occurs as pageblocks get
stolen.
4. While step 3 is still running, start a process that tries to allocate
75% of memory as huge pages with a number of threads. The number of
threads is based on a (NR_CPUS_SOCKET - NR_FIO_THREADS)/4 to avoid THP
threads contending with fio, any other threads or forcing cross-NUMA
scheduling. Note that the test has not been used on a machine with less
than 8 cores. The benchmark records whether huge pages were allocated
and what the fault latency was in microseconds
5. Measure the number of events potentially causing external fragmentation,
the fault latency and the huge page allocation success rate.
6. Cleanup
Overall the series reduces external fragmentation causing events by over 94%
on 1 and 2 socket machines, which in turn impacts high-order allocation
success rates over the long term. There are differences in latencies and
high-order allocation success rates. Latencies are a mixed bag as they
are vulnerable to exact system state and whether allocations succeeded
so they are treated as a secondary metric.
Patch 1 uses lower zones if they are populated and have free memory
instead of fragmenting a higher zone. It's special cased to
handle a Normal->DMA32 fallback with the reasons explained
in the changelog.
Patch 2-4 boosts watermarks temporarily when an external fragmentation
event occurs. kswapd wakes to reclaim a small amount of old memory
and then wakes kcompactd on completion to recover the system
slightly. This introduces some overhead in the slowpath. The level
of boosting can be tuned or disabled depending on the tolerance
for fragmentation vs allocation latency.
Patch 5 stalls some movable allocation requests to let kswapd from patch 4
make some progress. The duration of the stalls is very low but it
is possible to tune the system to avoid fragmentation events if
larger stalls can be tolerated.
The bulk of the improvement in fragmentation avoidance is from patches
1-4 but patch 5 can deal with a rare corner case and provides the option
of tuning a system for THP allocation success rates in exchange for
some stalls to control fragmentation.
This patch (of 5):
The page allocator zone lists are iterated based on the watermarks of each
zone which does not take anti-fragmentation into account. On x86, node 0
may have multiple zones while other nodes have one zone. A consequence is
that tasks running on node 0 may fragment ZONE_NORMAL even though
ZONE_DMA32 has plenty of free memory. This patch special cases the
allocator fast path such that it'll try an allocation from a lower local
zone before fragmenting a higher zone. In this case, stealing of
pageblocks or orders larger than a pageblock are still allowed in the fast
path as they are uninteresting from a fragmentation point of view.
This was evaluated using a benchmark designed to fragment memory before
attempting THP allocations. It's implemented in mmtests as the following
configurations
configs/config-global-dhp__workload_thpfioscale
configs/config-global-dhp__workload_thpfioscale-defrag
configs/config-global-dhp__workload_thpfioscale-madvhugepage
e.g. from mmtests
./run-mmtests.sh --run-monitor --config configs/config-global-dhp__workload_thpfioscale test-run-1
The broad details of the workload are as follows;
1. Create an XFS filesystem (not specified in the configuration but done
as part of the testing for this patch).
2. Start 4 fio threads that write a number of 64K files inefficiently.
Inefficiently means that files are created on first access and not
created in advance (fio parameter create_on_open=1) and fallocate
is not used (fallocate=none). With multiple IO issuers this creates
a mix of slab and page cache allocations over time. The total size
of the files is 150% physical memory so that the slabs and page cache
pages get mixed.
3. Warm up a number of fio read-only processes accessing the same files
created in step 2. This part runs for the same length of time it
took to create the files. It'll refault old data and further
interleave slab and page cache allocations. As it's now low on
memory due to step 2, fragmentation occurs as pageblocks get
stolen.
4. While step 3 is still running, start a process that tries to allocate
75% of memory as huge pages with a number of threads. The number of
threads is based on a (NR_CPUS_SOCKET - NR_FIO_THREADS)/4 to avoid THP
threads contending with fio, any other threads or forcing cross-NUMA
scheduling. Note that the test has not been used on a machine with less
than 8 cores. The benchmark records whether huge pages were allocated
and what the fault latency was in microseconds.
5. Measure the number of events potentially causing external fragmentation,
the fault latency and the huge page allocation success rate.
6. Cleanup the test files.
Note that due to the use of IO and page cache that this benchmark is not
suitable for running on large machines where the time to fragment memory
may be excessive. Also note that while this is one mix that generates
fragmentation that it's not the only mix that generates fragmentation.
Differences in workload that are more slab-intensive or whether SLUB is
used with high-order pages may yield different results.
When the page allocator fragments memory, it records the event using the
mm_page_alloc_extfrag ftrace event. If the fallback_order is smaller than
a pageblock order (order-9 on 64-bit x86) then it's considered to be an
"external fragmentation event" that may cause issues in the future.
Hence, the primary metric here is the number of external fragmentation
events that occur with order < 9. The secondary metric is allocation
latency and huge page allocation success rates but note that differences
in latencies and what the success rate also can affect the number of
external fragmentation event which is why it's a secondary metric.
1-socket Skylake machine
config-global-dhp__workload_thpfioscale XFS (no special madvise)
4 fio threads, 1 THP allocating thread
--------------------------------------
4.20-rc3 extfrag events < order 9: 804694
4.20-rc3+patch: 408912 (49% reduction)
thpfioscale Fault Latencies
4.20.0-rc3 4.20.0-rc3
vanilla lowzone-v5r8
Amean fault-base-1 662.92 ( 0.00%) 653.58 * 1.41%*
Amean fault-huge-1 0.00 ( 0.00%) 0.00 ( 0.00%)
4.20.0-rc3 4.20.0-rc3
vanilla lowzone-v5r8
Percentage huge-1 0.00 ( 0.00%) 0.00 ( 0.00%)
Fault latencies are slightly reduced while allocation success rates remain
at zero as this configuration does not make any special effort to allocate
THP and fio is heavily active at the time and either filling memory or
keeping pages resident. However, a 49% reduction of serious fragmentation
events reduces the changes of external fragmentation being a problem in
the future.
Vlastimil asked during review for a breakdown of the allocation types
that are falling back.
vanilla
3816 MIGRATE_UNMOVABLE
800845 MIGRATE_MOVABLE
33 MIGRATE_UNRECLAIMABLE
patch
735 MIGRATE_UNMOVABLE
408135 MIGRATE_MOVABLE
42 MIGRATE_UNRECLAIMABLE
The majority of the fallbacks are due to movable allocations and this is
consistent for the workload throughout the series so will not be presented
again as the primary source of fallbacks are movable allocations.
Movable fallbacks are sometimes considered "ok" to fallback because they
can be migrated. The problem is that they can fill an
unmovable/reclaimable pageblock causing those allocations to fallback
later and polluting pageblocks with pages that cannot move. If there is a
movable fallback, it is pretty much guaranteed to affect an
unmovable/reclaimable pageblock and while it might not be enough to
actually cause a unmovable/reclaimable fallback in the future, we cannot
know that in advance so the patch takes the only option available to it.
Hence, it's important to control them. This point is also consistent
throughout the series and will not be repeated.
1-socket Skylake machine
global-dhp__workload_thpfioscale-madvhugepage-xfs (MADV_HUGEPAGE)
-----------------------------------------------------------------
4.20-rc3 extfrag events < order 9: 291392
4.20-rc3+patch: 191187 (34% reduction)
thpfioscale Fault Latencies
4.20.0-rc3 4.20.0-rc3
vanilla lowzone-v5r8
Amean fault-base-1 1495.14 ( 0.00%) 1467.55 ( 1.85%)
Amean fault-huge-1 1098.48 ( 0.00%) 1127.11 ( -2.61%)
thpfioscale Percentage Faults Huge
4.20.0-rc3 4.20.0-rc3
vanilla lowzone-v5r8
Percentage huge-1 78.57 ( 0.00%) 77.64 ( -1.18%)
Fragmentation events were reduced quite a bit although this is known
to be a little variable. The latencies and allocation success rates
are similar but they were already quite high.
2-socket Haswell machine
config-global-dhp__workload_thpfioscale XFS (no special madvise)
4 fio threads, 5 THP allocating threads
----------------------------------------------------------------
4.20-rc3 extfrag events < order 9: 215698
4.20-rc3+patch: 200210 (7% reduction)
thpfioscale Fault Latencies
4.20.0-rc3 4.20.0-rc3
vanilla lowzone-v5r8
Amean fault-base-5 1350.05 ( 0.00%) 1346.45 ( 0.27%)
Amean fault-huge-5 4181.01 ( 0.00%) 3418.60 ( 18.24%)
4.20.0-rc3 4.20.0-rc3
vanilla lowzone-v5r8
Percentage huge-5 1.15 ( 0.00%) 0.78 ( -31.88%)
The reduction of external fragmentation events is slight and this is
partially due to the removal of __GFP_THISNODE in commit ac5b2c18911f
("mm: thp: relax __GFP_THISNODE for MADV_HUGEPAGE mappings") as THP
allocations can now spill over to remote nodes instead of fragmenting
local memory.
2-socket Haswell machine
global-dhp__workload_thpfioscale-madvhugepage-xfs (MADV_HUGEPAGE)
-----------------------------------------------------------------
4.20-rc3 extfrag events < order 9: 166352
4.20-rc3+patch: 147463 (11% reduction)
thpfioscale Fault Latencies
4.20.0-rc3 4.20.0-rc3
vanilla lowzone-v5r8
Amean fault-base-5 6138.97 ( 0.00%) 6217.43 ( -1.28%)
Amean fault-huge-5 2294.28 ( 0.00%) 3163.33 * -37.88%*
thpfioscale Percentage Faults Huge
4.20.0-rc3 4.20.0-rc3
vanilla lowzone-v5r8
Percentage huge-5 96.82 ( 0.00%) 95.14 ( -1.74%)
There was a slight reduction in external fragmentation events although the
latencies were higher. The allocation success rate is high enough that
the system is struggling and there is quite a lot of parallel reclaim and
compaction activity. There is also a certain degree of luck on whether
processes start on node 0 or not for this patch but the relevance is
reduced later in the series.
Overall, the patch reduces the number of external fragmentation causing
events so the success of THP over long periods of time would be improved
for this adverse workload.
Link: http://lkml.kernel.org/r/20181123114528.28802-2-mgorman@techsingularity.net
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: David Rientjes <rientjes@google.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Zi Yan <zi.yan@cs.rutgers.edu>
Cc: Michal Hocko <mhocko@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Change-Id: If804b49c46fe359ca6addacd4c3a8b36d8571ca6
Git-commit: 6bb154504f8b496780ec53ec81aba957a12981fa
Git-repo: git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
Signed-off-by: Vinayak Menon <vinmenon@codeaurora.org>
(cherry picked from commit 6bb154504f8b496780ec53ec81aba957a12981fa)
Signed-off-by: Mark Salyzyn <salyzyn@google.com>
Bug: 150378964
|
||
|
|
3389e56d31 |
Merge 4.19.103 into android-4.19
Changes in 4.19.103
Revert "drm/sun4i: dsi: Change the start delay calculation"
ovl: fix lseek overflow on 32bit
kernel/module: Fix memleak in module_add_modinfo_attrs()
media: iguanair: fix endpoint sanity check
ocfs2: fix oops when writing cloned file
x86/cpu: Update cached HLE state on write to TSX_CTRL_CPUID_CLEAR
udf: Allow writing to 'Rewritable' partitions
printk: fix exclusive_console replaying
iwlwifi: mvm: fix NVM check for 3168 devices
sparc32: fix struct ipc64_perm type definition
cls_rsvp: fix rsvp_policy
gtp: use __GFP_NOWARN to avoid memalloc warning
l2tp: Allow duplicate session creation with UDP
net: hsr: fix possible NULL deref in hsr_handle_frame()
net_sched: fix an OOB access in cls_tcindex
net: stmmac: Delete txtimer in suspend()
bnxt_en: Fix TC queue mapping.
tcp: clear tp->total_retrans in tcp_disconnect()
tcp: clear tp->delivered in tcp_disconnect()
tcp: clear tp->data_segs{in|out} in tcp_disconnect()
tcp: clear tp->segs_{in|out} in tcp_disconnect()
rxrpc: Fix use-after-free in rxrpc_put_local()
rxrpc: Fix insufficient receive notification generation
rxrpc: Fix missing active use pinning of rxrpc_local object
rxrpc: Fix NULL pointer deref due to call->conn being cleared on disconnect
media: uvcvideo: Avoid cyclic entity chains due to malformed USB descriptors
mfd: dln2: More sanity checking for endpoints
ipc/msg.c: consolidate all xxxctl_down() functions
tracing: Fix sched switch start/stop refcount racy updates
rcu: Avoid data-race in rcu_gp_fqs_check_wake()
brcmfmac: Fix memory leak in brcmf_usbdev_qinit
usb: typec: tcpci: mask event interrupts when remove driver
usb: gadget: legacy: set max_speed to super-speed
usb: gadget: f_ncm: Use atomic_t to track in-flight request
usb: gadget: f_ecm: Use atomic_t to track in-flight request
ALSA: usb-audio: Fix endianess in descriptor validation
ALSA: dummy: Fix PCM format loop in proc output
mm/memory_hotplug: fix remove_memory() lockdep splat
mm: move_pages: report the number of non-attempted pages
media/v4l2-core: set pages dirty upon releasing DMA buffers
media: v4l2-core: compat: ignore native command codes
media: v4l2-rect.h: fix v4l2_rect_map_inside() top/left adjustments
lib/test_kasan.c: fix memory leak in kmalloc_oob_krealloc_more()
irqdomain: Fix a memory leak in irq_domain_push_irq()
platform/x86: intel_scu_ipc: Fix interrupt support
ALSA: hda: Add Clevo W65_67SB the power_save blacklist
KVM: arm64: Correct PSTATE on exception entry
KVM: arm/arm64: Correct CPSR on exception entry
KVM: arm/arm64: Correct AArch32 SPSR on exception entry
KVM: arm64: Only sign-extend MMIO up to register width
MIPS: fix indentation of the 'RELOCS' message
MIPS: boot: fix typo in 'vmlinux.lzma.its' target
s390/mm: fix dynamic pagetable upgrade for hugetlbfs
powerpc/xmon: don't access ASDR in VMs
powerpc/pseries: Advance pfn if section is not present in lmb_is_removable()
smb3: fix signing verification of large reads
PCI: tegra: Fix return value check of pm_runtime_get_sync()
mmc: spi: Toggle SPI polarity, do not hardcode it
ACPI: video: Do not export a non working backlight interface on MSI MS-7721 boards
ACPI / battery: Deal with design or full capacity being reported as -1
ACPI / battery: Use design-cap for capacity calculations if full-cap is not available
ACPI / battery: Deal better with neither design nor full capacity not being reported
alarmtimer: Unregister wakeup source when module get fails
ubifs: Reject unsupported ioctl flags explicitly
ubifs: don't trigger assertion on invalid no-key filename
ubifs: Fix FS_IOC_SETFLAGS unexpectedly clearing encrypt flag
ubifs: Fix deadlock in concurrent bulk-read and writepage
crypto: geode-aes - convert to skcipher API and make thread-safe
PCI: keystone: Fix link training retries initiation
mmc: sdhci-of-at91: fix memleak on clk_get failure
hv_balloon: Balloon up according to request page number
mfd: axp20x: Mark AXP20X_VBUS_IPSOUT_MGMT as volatile
crypto: api - Check spawn->alg under lock in crypto_drop_spawn
crypto: ccree - fix backlog memory leak
crypto: ccree - fix pm wrongful error reporting
crypto: ccree - fix PM race condition
scripts/find-unused-docs: Fix massive false positives
scsi: qla2xxx: Fix mtcp dump collection failure
power: supply: ltc2941-battery-gauge: fix use-after-free
ovl: fix wrong WARN_ON() in ovl_cache_update_ino()
f2fs: choose hardlimit when softlimit is larger than hardlimit in f2fs_statfs_project()
f2fs: fix miscounted block limit in f2fs_statfs_project()
f2fs: code cleanup for f2fs_statfs_project()
PM: core: Fix handling of devices deleted during system-wide resume
of: Add OF_DMA_DEFAULT_COHERENT & select it on powerpc
dm zoned: support zone sizes smaller than 128MiB
dm space map common: fix to ensure new block isn't already in use
dm crypt: fix benbi IV constructor crash if used in authenticated mode
dm: fix potential for q->make_request_fn NULL pointer
dm writecache: fix incorrect flush sequence when doing SSD mode commit
padata: Remove broken queue flushing
tracing: Annotate ftrace_graph_hash pointer with __rcu
tracing: Annotate ftrace_graph_notrace_hash pointer with __rcu
ftrace: Add comment to why rcu_dereference_sched() is open coded
ftrace: Protect ftrace_graph_hash with ftrace_sync
samples/bpf: Don't try to remove user's homedir on clean
crypto: ccp - set max RSA modulus size for v3 platform devices as well
crypto: pcrypt - Do not clear MAY_SLEEP flag in original request
crypto: atmel-aes - Fix counter overflow in CTR mode
crypto: api - Fix race condition in crypto_spawn_alg
crypto: picoxcell - adjust the position of tasklet_init and fix missed tasklet_kill
scsi: qla2xxx: Fix unbound NVME response length
NFS: Fix memory leaks and corruption in readdir
NFS: Directory page cache pages need to be locked when read
jbd2_seq_info_next should increase position index
Btrfs: fix missing hole after hole punching and fsync when using NO_HOLES
btrfs: set trans->drity in btrfs_commit_transaction
Btrfs: fix race between adding and putting tree mod seq elements and nodes
ARM: tegra: Enable PLLP bypass during Tegra124 LP1
iwlwifi: don't throw error when trying to remove IGTK
mwifiex: fix unbalanced locking in mwifiex_process_country_ie()
sunrpc: expiry_time should be seconds not timeval
gfs2: move setting current->backing_dev_info
gfs2: fix O_SYNC write handling
drm/rect: Avoid division by zero
media: rc: ensure lirc is initialized before registering input device
tools/kvm_stat: Fix kvm_exit filter name
xen/balloon: Support xend-based toolstack take two
watchdog: fix UAF in reboot notifier handling in watchdog core code
bcache: add readahead cache policy options via sysfs interface
eventfd: track eventfd_signal() recursion depth
aio: prevent potential eventfd recursion on poll
KVM: x86: Refactor picdev_write() to prevent Spectre-v1/L1TF attacks
KVM: x86: Refactor prefix decoding to prevent Spectre-v1/L1TF attacks
KVM: x86: Protect pmu_intel.c from Spectre-v1/L1TF attacks
KVM: x86: Protect DR-based index computations from Spectre-v1/L1TF attacks
KVM: x86: Protect kvm_lapic_reg_write() from Spectre-v1/L1TF attacks
KVM: x86: Protect kvm_hv_msr_[get|set]_crash_data() from Spectre-v1/L1TF attacks
KVM: x86: Protect ioapic_write_indirect() from Spectre-v1/L1TF attacks
KVM: x86: Protect MSR-based index computations in pmu.h from Spectre-v1/L1TF attacks
KVM: x86: Protect ioapic_read_indirect() from Spectre-v1/L1TF attacks
KVM: x86: Protect MSR-based index computations from Spectre-v1/L1TF attacks in x86.c
KVM: x86: Protect x86_decode_insn from Spectre-v1/L1TF attacks
KVM: x86: Protect MSR-based index computations in fixed_msr_to_seg_unit() from Spectre-v1/L1TF attacks
KVM: x86: Fix potential put_fpu() w/o load_fpu() on MPX platform
KVM: PPC: Book3S HV: Uninit vCPU if vcore creation fails
KVM: PPC: Book3S PR: Free shared page if mmu initialization fails
x86/kvm: Be careful not to clear KVM_VCPU_FLUSH_TLB bit
KVM: x86: Don't let userspace set host-reserved cr4 bits
KVM: x86: Free wbinvd_dirty_mask if vCPU creation fails
KVM: s390: do not clobber registers during guest reset/store status
clk: tegra: Mark fuse clock as critical
drm/amd/dm/mst: Ignore payload update failures
percpu: Separate decrypted varaibles anytime encryption can be enabled
scsi: qla2xxx: Fix the endianness of the qla82xx_get_fw_size() return type
scsi: csiostor: Adjust indentation in csio_device_reset
scsi: qla4xxx: Adjust indentation in qla4xxx_mem_free
scsi: ufs: Recheck bkops level if bkops is disabled
phy: qualcomm: Adjust indentation in read_poll_timeout
ext2: Adjust indentation in ext2_fill_super
powerpc/44x: Adjust indentation in ibm4xx_denali_fixup_memsize
drm: msm: mdp4: Adjust indentation in mdp4_dsi_encoder_enable
NFC: pn544: Adjust indentation in pn544_hci_check_presence
ppp: Adjust indentation into ppp_async_input
net: smc911x: Adjust indentation in smc911x_phy_configure
net: tulip: Adjust indentation in {dmfe, uli526x}_init_module
IB/mlx5: Fix outstanding_pi index for GSI qps
IB/core: Fix ODP get user pages flow
nfsd: fix delay timer on 32-bit architectures
nfsd: fix jiffies/time_t mixup in LRU list
nfsd: Return the correct number of bytes written to the file
ubi: fastmap: Fix inverted logic in seen selfcheck
ubi: Fix an error pointer dereference in error handling code
mfd: da9062: Fix watchdog compatible string
mfd: rn5t618: Mark ADC control register volatile
bonding/alb: properly access headers in bond_alb_xmit()
net: dsa: bcm_sf2: Only 7278 supports 2Gb/sec IMP port
net: mvneta: move rx_dropped and rx_errors in per-cpu stats
net_sched: fix a resource leak in tcindex_set_parms()
net: systemport: Avoid RBUF stuck in Wake-on-LAN mode
net/mlx5: IPsec, Fix esp modify function attribute
net/mlx5: IPsec, fix memory leak at mlx5_fpga_ipsec_delete_sa_ctx
net: macb: Remove unnecessary alignment check for TSO
net: macb: Limit maximum GEM TX length in TSO
net: dsa: b53: Always use dev->vlan_enabled in b53_configure_vlan()
ext4: fix deadlock allocating crypto bounce page from mempool
btrfs: use bool argument in free_root_pointers()
btrfs: free block groups after free'ing fs trees
drm: atmel-hlcdc: enable clock before configuring timing engine
drm/dp_mst: Remove VCPI while disabling topology mgr
btrfs: flush write bio if we loop in extent_write_cache_pages
KVM: x86/mmu: Apply max PA check for MMIO sptes to 32-bit KVM
KVM: x86: Use gpa_t for cr2/gpa to fix TDP support on 32-bit KVM
KVM: VMX: Add non-canonical check on writes to RTIT address MSRs
KVM: nVMX: vmread should not set rflags to specify success in case of #PF
KVM: Use vcpu-specific gva->hva translation when querying host page size
KVM: Play nice with read-only memslots when querying host page size
mm: zero remaining unavailable struct pages
mm: return zero_resv_unavail optimization
mm/page_alloc.c: fix uninitialized memmaps on a partially populated last section
cifs: fail i/o on soft mounts if sessionsetup errors out
x86/apic/msi: Plug non-maskable MSI affinity race
clocksource: Prevent double add_timer_on() for watchdog_timer
perf/core: Fix mlock accounting in perf_mmap()
rxrpc: Fix service call disconnection
Linux 4.19.103
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I0d7f09085c3541373e0fd6b2e3ffacc5e34f7d55
|
||
|
|
0a69047d82 |
mm/page_alloc.c: fix uninitialized memmaps on a partially populated last section
[ Upstream commit e822969cab48b786b64246aad1a3ba2a774f5d23 ]
Patch series "mm: fix max_pfn not falling on section boundary", v2.
Playing with different memory sizes for a x86-64 guest, I discovered that
some memmaps (highest section if max_mem does not fall on the section
boundary) are marked as being valid and online, but contain garbage. We
have to properly initialize these memmaps.
Looking at /proc/kpageflags and friends, I found some more issues,
partially related to this.
This patch (of 3):
If max_pfn is not aligned to a section boundary, we can easily run into
BUGs. This can e.g., be triggered on x86-64 under QEMU by specifying a
memory size that is not a multiple of 128MB (e.g., 4097MB, but also
4160MB). I was told that on real HW, we can easily have this scenario
(esp., one of the main reasons sub-section hotadd of devmem was added).
The issue is, that we have a valid memmap (pfn_valid()) for the whole
section, and the whole section will be marked "online".
pfn_to_online_page() will succeed, but the memmap contains garbage.
E.g., doing a "./page-types -r -a 0x144001" when QEMU was started with "-m
4160M" - (see tools/vm/page-types.c):
[ 200.476376] BUG: unable to handle page fault for address: fffffffffffffffe
[ 200.477500] #PF: supervisor read access in kernel mode
[ 200.478334] #PF: error_code(0x0000) - not-present page
[ 200.479076] PGD 59614067 P4D 59614067 PUD 59616067 PMD 0
[ 200.479557] Oops: 0000 [#4] SMP NOPTI
[ 200.479875] CPU: 0 PID: 603 Comm: page-types Tainted: G D W 5.5.0-rc1-next-20191209 #93
[ 200.480646] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu4
[ 200.481648] RIP: 0010:stable_page_flags+0x4d/0x410
[ 200.482061] Code: f3 ff 41 89 c0 48 b8 00 00 00 00 01 00 00 00 45 84 c0 0f 85 cd 02 00 00 48 8b 53 08 48 8b 2b 48f
[ 200.483644] RSP: 0018:ffffb139401cbe60 EFLAGS: 00010202
[ 200.484091] RAX: fffffffffffffffe RBX: fffffbeec5100040 RCX: 0000000000000000
[ 200.484697] RDX: 0000000000000001 RSI: ffffffff9535c7cd RDI: 0000000000000246
[ 200.485313] RBP: ffffffffffffffff R08: 0000000000000000 R09: 0000000000000000
[ 200.485917] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000144001
[ 200.486523] R13: 00007ffd6ba55f48 R14: 00007ffd6ba55f40 R15: ffffb139401cbf08
[ 200.487130] FS: 00007f68df717580(0000) GS:ffff9ec77fa00000(0000) knlGS:0000000000000000
[ 200.487804] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 200.488295] CR2: fffffffffffffffe CR3: 0000000135d48000 CR4: 00000000000006f0
[ 200.488897] Call Trace:
[ 200.489115] kpageflags_read+0xe9/0x140
[ 200.489447] proc_reg_read+0x3c/0x60
[ 200.489755] vfs_read+0xc2/0x170
[ 200.490037] ksys_pread64+0x65/0xa0
[ 200.490352] do_syscall_64+0x5c/0xa0
[ 200.490665] entry_SYSCALL_64_after_hwframe+0x49/0xbe
But it can be triggered much easier via "cat /proc/kpageflags > /dev/null"
after cold/hot plugging a DIMM to such a system:
[root@localhost ~]# cat /proc/kpageflags > /dev/null
[ 111.517275] BUG: unable to handle page fault for address: fffffffffffffffe
[ 111.517907] #PF: supervisor read access in kernel mode
[ 111.518333] #PF: error_code(0x0000) - not-present page
[ 111.518771] PGD a240e067 P4D a240e067 PUD a2410067 PMD 0
This patch fixes that by at least zero-ing out that memmap (so e.g.,
page_to_pfn() will not crash). Commit 907ec5fca3dc ("mm: zero remaining
unavailable struct pages") tried to fix a similar issue, but forgot to
consider this special case.
After this patch, there are still problems to solve. E.g., not all of
these pages falling into a memory hole will actually get initialized later
and set PageReserved - they are only zeroed out - but at least the
immediate crashes are gone. A follow-up patch will take care of this.
Link: http://lkml.kernel.org/r/20191211163201.17179-2-david@redhat.com
Fixes:
|
||
|
|
f19a50c1e3 |
mm: return zero_resv_unavail optimization
[ Upstream commit ec393a0f014eaf688a3dbe8c8a4cbb52d7f535f9 ] When checking for valid pfns in zero_resv_unavail(), it is not necessary to verify that pfns within pageblock_nr_pages ranges are valid, only the first one needs to be checked. This is because memory for pages are allocated in contiguous chunks that contain pageblock_nr_pages struct pages. Link: http://lkml.kernel.org/r/20181002143821.5112-3-msys.mizuma@gmail.com Signed-off-by: Pavel Tatashin <pavel.tatashin@microsoft.com> Signed-off-by: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com> Reviewed-by: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com> Acked-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Reviewed-by: Oscar Salvador <osalvador@suse.de> Cc: Ingo Molnar <mingo@kernel.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org> |
||
|
|
9ac5917a1d |
mm: zero remaining unavailable struct pages
[ Upstream commit 907ec5fca3dc38d37737de826f06f25b063aa08e ] Patch series "mm: Fix for movable_node boot option", v3. This patch series contains a fix for the movable_node boot option issue which was introduced by commit |
||
|
|
2b618a16e7 |
UPSTREAM: mm: rename and change semantics of nr_indirectly_reclaimable_bytes
The vmstat counter NR_INDIRECTLY_RECLAIMABLE_BYTES was introduced by
commit
|
||
|
|
291d853dff |
Merge 4.19.88 into android-4.19
Changes in 4.19.88 clk: meson: gxbb: let sar_adc_clk_div set the parent clock rate clocksource/drivers/mediatek: Fix error handling ASoC: msm8916-wcd-analog: Fix RX1 selection in RDAC2 MUX ASoC: compress: fix unsigned integer overflow check reset: Fix memory leak in reset_control_array_put() clk: samsung: exynos5433: Fix error paths ASoC: kirkwood: fix external clock probe defer ASoC: kirkwood: fix device remove ordering clk: samsung: exynos5420: Preserve PLL configuration during suspend/resume pinctrl: cherryview: Allocate IRQ chip dynamic ARM: dts: imx6qdl-sabreauto: Fix storm of accelerometer interrupts reset: fix reset_control_ops kerneldoc comment clk: at91: avoid sleeping early clk: sunxi: Fix operator precedence in sunxi_divs_clk_setup clk: sunxi-ng: a80: fix the zero'ing of bits 16 and 18 ARM: dts: sun8i-a83t-tbs-a711: Fix WiFi resume from suspend samples/bpf: fix build by setting HAVE_ATTR_TEST to zero powerpc/bpf: Fix tail call implementation idr: Fix integer overflow in idr_for_each_entry idr: Fix idr_alloc_u32 on 32-bit systems x86/resctrl: Prevent NULL pointer dereference when reading mondata clk: ti: dra7-atl-clock: Remove ti_clk_add_alias call clk: ti: clkctrl: Fix failed to enable error with double udelay timeout net: fec: add missed clk_disable_unprepare in remove bridge: ebtables: don't crash when using dnat target in output chains can: peak_usb: report bus recovery as well can: c_can: D_CAN: c_can_chip_config(): perform a sofware reset on open can: rx-offload: can_rx_offload_queue_tail(): fix error handling, avoid skb mem leak can: rx-offload: can_rx_offload_offload_one(): do not increase the skb_queue beyond skb_queue_len_max can: rx-offload: can_rx_offload_offload_one(): increment rx_fifo_errors on queue overflow or OOM can: rx-offload: can_rx_offload_offload_one(): use ERR_PTR() to propagate error value in case of errors can: rx-offload: can_rx_offload_irq_offload_timestamp(): continue on error can: rx-offload: can_rx_offload_irq_offload_fifo(): continue on error can: flexcan: increase error counters if skb enqueueing via can_rx_offload_queue_sorted() fails can: mcp251x: mcp251x_restart_work_handler(): Fix potential force_quit race condition watchdog: meson: Fix the wrong value of left time ASoC: stm32: sai: add restriction on mmap support scripts/gdb: fix debugging modules compiled with hot/cold partitioning net: bcmgenet: use RGMII loopback for MAC reset net: bcmgenet: reapply manual settings to the PHY net: mscc: ocelot: fix __ocelot_rmw_ix prototype ceph: return -EINVAL if given fsc mount option on kernel w/o support net/fq_impl: Switch to kvmalloc() for memory allocation mac80211: fix station inactive_time shortly after boot block: drbd: remove a stray unlock in __drbd_send_protocol() pwm: bcm-iproc: Prevent unloading the driver module while in use scsi: target/tcmu: Fix queue_cmd_ring() declaration scsi: lpfc: Fix kernel Oops due to null pring pointers scsi: lpfc: Fix dif and first burst use in write commands ARM: dts: Fix up SQ201 flash access tracing: Lock event_mutex before synth_event_mutex ARM: debug-imx: only define DEBUG_IMX_UART_PORT if needed ARM: dts: imx51: Fix memory node duplication ARM: dts: imx53: Fix memory node duplication ARM: dts: imx31: Fix memory node duplication ARM: dts: imx35: Fix memory node duplication ARM: dts: imx7: Fix memory node duplication ARM: dts: imx6ul: Fix memory node duplication ARM: dts: imx6sx: Fix memory node duplication ARM: dts: imx6sl: Fix memory node duplication ARM: dts: imx50: Fix memory node duplication ARM: dts: imx23: Fix memory node duplication ARM: dts: imx1: Fix memory node duplication ARM: dts: imx27: Fix memory node duplication ARM: dts: imx25: Fix memory node duplication ARM: dts: imx53-voipac-dmm-668: Fix memory node duplication parisc: Fix serio address output parisc: Fix HP SDC hpa address output ARM: dts: Fix hsi gdd range for omap4 arm64: mm: Prevent mismatched 52-bit VA support arm64: smp: Handle errors reported by the firmware bus: ti-sysc: Check for no-reset and no-idle flags at the child level platform/x86: mlx-platform: Fix LED configuration ARM: OMAP1: fix USB configuration for device-only setups RDMA/hns: Fix the bug while use multi-hop of pbl arm64: preempt: Fix big-endian when checking preempt count in assembly RDMA/vmw_pvrdma: Use atomic memory allocation in create AH PM / AVS: SmartReflex: NULL check before some freeing functions is not needed xfs: zero length symlinks are not valid ARM: ks8695: fix section mismatch warning ACPI / LPSS: Ignore acpi_device_fix_up_power() return value scsi: lpfc: Enable Management features for IF_TYPE=6 scsi: qla2xxx: Fix NPIV handling for FC-NVMe scsi: qla2xxx: Fix for FC-NVMe discovery for NPIV port nvme: provide fallback for discard alloc failure s390/zcrypt: make sysfs reset attribute trigger queue reset crypto: user - support incremental algorithm dumps arm64: dts: renesas: draak: Fix CVBS input mwifiex: fix potential NULL dereference and use after free mwifiex: debugfs: correct histogram spacing, formatting brcmfmac: set F2 watermark to 256 for 4373 brcmfmac: set SDIO F1 MesBusyCtrl for CYW4373 rtl818x: fix potential use after free bcache: do not check if debug dentry is ERR or NULL explicitly on remove bcache: do not mark writeback_running too early xfs: require both realtime inodes to mount nvme: fix kernel paging oops ubifs: Fix default compression selection in ubifs ubi: Put MTD device after it is not used ubi: Do not drop UBI device reference before using microblaze: adjust the help to the real behavior microblaze: move "... is ready" messages to arch/microblaze/Makefile microblaze: fix multiple bugs in arch/microblaze/boot/Makefile iwlwifi: move iwl_nvm_check_version() into dvm iwlwifi: mvm: force TCM re-evaluation on TCM resume iwlwifi: pcie: fix erroneous print iwlwifi: pcie: set cmd_len in the correct place gpio: pca953x: Fix AI overflow on PCAL6524 gpiolib: Fix return value of gpio_to_desc() stub if !GPIOLIB kvm: vmx: Set IA32_TSC_AUX for legacy mode guests Revert "KVM: nVMX: reset cache/shadows when switching loaded VMCS" Revert "KVM: nVMX: move check_vmentry_postreqs() call to nested_vmx_enter_non_root_mode()" crypto/chelsio/chtls: listen fails with multiadapt VSOCK: bind to random port for VMADDR_PORT_ANY mmc: meson-gx: make sure the descriptor is stopped on errors mtd: rawnand: sunxi: Write pageprog related opcodes to WCMD_SET usb: ehci-omap: Fix deferred probe for phy handling btrfs: Check for missing device before bio submission in btrfs_map_bio btrfs: fix ncopies raid_attr for RAID56 btrfs: dev-replace: set result code of cancel by status of scrub Btrfs: allow clear_extent_dirty() to receive a cached extent state record btrfs: only track ref_heads in delayed_ref_updates serial: sh-sci: Fix crash in rx_timer_fn() on PIO fallback HID: intel-ish-hid: fixes incorrect error handling gpio: raspberrypi-exp: decrease refcount on firmware dt node serial: 8250: Rate limit serial port rx interrupts during input overruns kprobes/x86/xen: blacklist non-attachable xen interrupt functions xen/pciback: Check dev_data before using it kprobes: Blacklist symbols in arch-defined prohibited area kprobes/x86: Show x86-64 specific blacklisted symbols correctly vfio-mdev/samples: Use u8 instead of char for handle functions memory: omap-gpmc: Get the header of the enum pinctrl: xway: fix gpio-hog related boot issues net/mlx5: Continue driver initialization despite debugfs failure netfilter: nf_nat_sip: fix RTP/RTCP source port translations exofs_mount(): fix leaks on failure exits bnxt_en: Return linux standard errors in bnxt_ethtool.c bnxt_en: Save ring statistics before reset. bnxt_en: query force speeds before disabling autoneg mode. KVM: s390: unregister debug feature on failing arch init pinctrl: sh-pfc: r8a77990: Fix MOD_SEL0 SEL_I2C1 field width pinctrl: sh-pfc: sh7264: Fix PFCR3 and PFCR0 register configuration pinctrl: sh-pfc: sh7734: Fix shifted values in IPSR10 HID: doc: fix wrong data structure reference for UHID_OUTPUT dm flakey: Properly corrupt multi-page bios. gfs2: take jdata unstuff into account in do_grow dm raid: fix false -EBUSY when handling check/repair message xfs: Align compat attrlist_by_handle with native implementation. xfs: Fix bulkstat compat ioctls on x32 userspace. IB/qib: Fix an error code in qib_sdma_verbs_send() clocksource/drivers/fttmr010: Fix invalid interrupt register access vxlan: Fix error path in __vxlan_dev_create() powerpc/book3s/32: fix number of bats in p/v_block_mapped() powerpc/xmon: fix dump_segments() drivers/regulator: fix a missing check of return value Bluetooth: hci_bcm: Handle specific unknown packets after firmware loading serial: max310x: Fix tx_empty() callback openrisc: Fix broken paths to arch/or32 RDMA/srp: Propagate ib_post_send() failures to the SCSI mid-layer scsi: qla2xxx: deadlock by configfs_depend_item scsi: csiostor: fix incorrect dma device in case of vport brcmfmac: Fix access point mode ath6kl: Only use match sets when firmware supports it ath6kl: Fix off by one error in scan completion powerpc/perf: Fix unit_sel/cache_sel checks powerpc/32: Avoid unsupported flags with clang powerpc/prom: fix early DEBUG messages powerpc/mm: Make NULL pointer deferences explicit on bad page faults. powerpc/44x/bamboo: Fix PCI range vfio/spapr_tce: Get rid of possible infinite loop powerpc/powernv/eeh/npu: Fix uninitialized variables in opal_pci_eeh_freeze_status drbd: ignore "all zero" peer volume sizes in handshake drbd: reject attach of unsuitable uuids even if connected drbd: do not block when adjusting "disk-options" while IO is frozen drbd: fix print_st_err()'s prototype to match the definition IB/rxe: Make counters thread safe bpf/cpumap: make sure frame_size for build_skb is aligned if headroom isn't regulator: tps65910: fix a missing check of return value powerpc/83xx: handle machine check caused by watchdog timer powerpc/pseries: Fix node leak in update_lmb_associativity_index() powerpc: Fix HMIs on big-endian with CONFIG_RELOCATABLE=y crypto: mxc-scc - fix build warnings on ARM64 pwm: clps711x: Fix period calculation net/netlink_compat: Fix a missing check of nla_parse_nested net/net_namespace: Check the return value of register_pernet_subsys() f2fs: fix block address for __check_sit_bitmap f2fs: fix to dirty inode synchronously um: Include sys/uio.h to have writev() um: Make GCOV depend on !KCOV net: (cpts) fix a missing check of clk_prepare net: stmicro: fix a missing check of clk_prepare net: dsa: bcm_sf2: Propagate error value from mdio_write atl1e: checking the status of atl1e_write_phy_reg tipc: fix a missing check of genlmsg_put net: marvell: fix a missing check of acpi_match_device net/wan/fsl_ucc_hdlc: Avoid double free in ucc_hdlc_probe() ocfs2: clear journal dirty flag after shutdown journal vmscan: return NODE_RECLAIM_NOSCAN in node_reclaim() when CONFIG_NUMA is n mm/page_alloc.c: free order-0 pages through PCP in page_frag_free() mm/page_alloc.c: use a single function to free page mm/page_alloc.c: deduplicate __memblock_free_early() and memblock_free() tools/vm/page-types.c: fix "kpagecount returned fewer pages than expected" failures netfilter: nf_tables: fix a missing check of nla_put_failure xprtrdma: Prevent leak of rpcrdma_rep objects infiniband: bnxt_re: qplib: Check the return value of send_message infiniband/qedr: Potential null ptr dereference of qp firmware: arm_sdei: fix wrong of_node_put() in init function firmware: arm_sdei: Fix DT platform device creation lib/genalloc.c: fix allocation of aligned buffer from non-aligned chunk lib/genalloc.c: use vzalloc_node() to allocate the bitmap fork: fix some -Wmissing-prototypes warnings drivers/base/platform.c: kmemleak ignore a known leak lib/genalloc.c: include vmalloc.h mtd: Check add_mtd_device() ret code tipc: fix memory leak in tipc_nl_compat_publ_dump net/core/neighbour: tell kmemleak about hash tables ata: ahci: mvebu: do Armada 38x configuration only on relevant SoCs PCI/MSI: Return -ENOSPC from pci_alloc_irq_vectors_affinity() net/core/neighbour: fix kmemleak minimal reference count for hash tables serial: 8250: Fix serial8250 initialization crash gpu: ipu-v3: pre: don't trigger update if buffer address doesn't change sfc: suppress duplicate nvmem partition types in efx_ef10_mtd_probe ip_tunnel: Make none-tunnel-dst tunnel port work with lwtunnel decnet: fix DN_IFREQ_SIZE net/smc: prevent races between smc_lgr_terminate() and smc_conn_free() net/smc: don't wait for send buffer space when data was already sent mm/hotplug: invalid PFNs from pfn_to_online_page() xfs: end sync buffer I/O properly on shutdown error net/smc: fix sender_free computation blktrace: Show requests without sector net/smc: fix byte_order for rx_curs_confirmed tipc: fix skb may be leaky in tipc_link_input ASoC: samsung: i2s: Fix prescaler setting for the secondary DAI sfc: initialise found bitmap in efx_ef10_mtd_probe geneve: change NET_UDP_TUNNEL dependency to select net: fix possible overflow in __sk_mem_raise_allocated() net: ip_gre: do not report erspan_ver for gre or gretap net: ip6_gre: do not report erspan_ver for ip6gre or ip6gretap sctp: don't compare hb_timer expire date before starting it bpf: decrease usercnt if bpf_map_new_fd() fails in bpf_map_get_fd_by_id() mmc: core: align max segment size with logical block size net: dev: Use unsigned integer as an argument to left-shift kvm: properly check debugfs dentry before using it bpf: drop refcount if bpf_map_new_fd() fails in map_create() net: hns3: Change fw error code NOT_EXEC to NOT_SUPPORTED net: hns3: fix PFC not setting problem for DCB module net: hns3: fix an issue for hclgevf_ae_get_hdev net: hns3: fix an issue for hns3_update_new_int_gl iommu/amd: Fix NULL dereference bug in match_hid_uid apparmor: delete the dentry in aafs_remove() to avoid a leak scsi: libsas: Support SATA PHY connection rate unmatch fixing during discovery ACPI / APEI: Don't wait to serialise with oops messages when panic()ing ACPI / APEI: Switch estatus pool to use vmalloc memory scsi: hisi_sas: shutdown axi bus to avoid exception CQ returned scsi: libsas: Check SMP PHY control function result RDMA/hns: Fix the bug with updating rq head pointer when flush cqe RDMA/hns: Bugfix for the scene without receiver queue RDMA/hns: Fix the state of rereg mr RDMA/hns: Use GFP_ATOMIC in hns_roce_v2_modify_qp ASoC: rt5645: Headphone Jack sense inverts on the LattePanda board powerpc/pseries/dlpar: Fix a missing check in dlpar_parse_cc_property() xdp: fix cpumap redirect SKB creation bug mtd: Remove a debug trace in mtdpart.c mm, gup: add missing refcount overflow checks on s390 clk: at91: fix update bit maps on CFG_MOR write clk: at91: generated: set audio_pll_allowed in at91_clk_register_generated() usb: dwc2: use a longer core rest timeout in dwc2_core_reset() staging: rtl8192e: fix potential use after free staging: rtl8723bs: Drop ACPI device ids staging: rtl8723bs: Add 024c:0525 to the list of SDIO device-ids USB: serial: ftdi_sio: add device IDs for U-Blox C099-F9P mei: bus: prefix device names on bus with the bus name mei: me: add comet point V device id thunderbolt: Power cycle the router if NVM authentication fails xfrm: Fix memleak on xfrm state destroy media: v4l2-ctrl: fix flags for DO_WHITE_BALANCE net: macb: fix error format in dev_err() pwm: Clear chip_data in pwm_put() media: atmel: atmel-isc: fix asd memory allocation media: atmel: atmel-isc: fix INIT_WORK misplacement macvlan: schedule bc_work even if error net: psample: fix skb_over_panic openvswitch: fix flow command message size sctp: Fix memory leak in sctp_sf_do_5_2_4_dupcook slip: Fix use-after-free Read in slip_open openvswitch: drop unneeded BUG_ON() in ovs_flow_cmd_build_info() openvswitch: remove another BUG_ON() selftests: bpf: test_sockmap: handle file creation failures gracefully tipc: fix link name length check sctp: cache netns in sctp_ep_common net: sched: fix `tc -s class show` no bstats on class with nolock subqueues net: macb: add missed tasklet_kill ext4: add more paranoia checking in ext4_expand_extra_isize handling watchdog: sama5d4: fix WDD value to be always set to max net: macb: Fix SUBNS increment and increase resolution net: macb driver, check for SKBTX_HW_TSTAMP mtd: rawnand: atmel: Fix spelling mistake in error message mtd: rawnand: atmel: fix possible object reference leak mtd: spi-nor: cast to u64 to avoid uint overflows drm/atmel-hlcdc: revert shift by 8 mailbox: stm32_ipcc: add spinlock to fix channels concurrent access tcp: exit if nothing to retransmit on RTO timeout HID: core: check whether Usage Page item is after Usage ID items crypto: stm32/hash - Fix hmac issue more than 256 bytes media: stm32-dcmi: fix DMA corruption when stopping streaming media: stm32-dcmi: fix check of pm_runtime_get_sync return value hwrng: stm32 - fix unbalanced pm_runtime_enable clk: stm32mp1: fix HSI divider flag clk: stm32mp1: fix mcu divider table clk: stm32mp1: add CLK_SET_RATE_NO_REPARENT to Kernel clocks clk: stm32mp1: parent clocks update mailbox: mailbox-test: fix null pointer if no mmio pinctrl: stm32: fix memory leak issue ASoC: stm32: i2s: fix dma configuration ASoC: stm32: i2s: fix 16 bit format support ASoC: stm32: i2s: fix IRQ clearing ASoC: stm32: sai: add missing put_device() dmaengine: stm32-dma: check whether length is aligned on FIFO threshold platform/x86: hp-wmi: Fix ACPI errors caused by too small buffer platform/x86: hp-wmi: Fix ACPI errors caused by passing 0 as input size net: fec: fix clock count mis-match Linux 4.19.88 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> Change-Id: Ifd3801a77cb551be72788031e7fcfc8a1d4fd197 |
||
|
|
9696c7656e |
mm/page_alloc.c: use a single function to free page
[ Upstream commit 742aa7fb52c56fb3b307e704f93e67b698959cc2 ] There are multiple places of freeing a page, they all do the same things so a common function can be used to reduce code duplicate. It also avoids bug fixed in one function but left in another. Link: http://lkml.kernel.org/r/20181119134834.17765-3-aaron.lu@intel.com Signed-off-by: Aaron Lu <aaron.lu@intel.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Alexander Duyck <alexander.h.duyck@linux.intel.com> Cc: Ilias Apalodimas <ilias.apalodimas@linaro.org> Cc: Jesper Dangaard Brouer <brouer@redhat.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Pankaj gupta <pagupta@redhat.com> Cc: Pawel Staszewski <pstaszewski@itcare.pl> Cc: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org> |
||
|
|
257ad5fbfe |
mm/page_alloc.c: free order-0 pages through PCP in page_frag_free()
[ Upstream commit 65895b67ad27df0f62bfaf82dd5622f95ea29196 ]
page_frag_free() calls __free_pages_ok() to free the page back to Buddy.
This is OK for high order page, but for order-0 pages, it misses the
optimization opportunity of using Per-Cpu-Pages and can cause zone lock
contention when called frequently.
Pawel Staszewski recently shared his result of 'how Linux kernel handles
normal traffic'[1] and from perf data, Jesper Dangaard Brouer found the
lock contention comes from page allocator:
mlx5e_poll_tx_cq
|
--16.34%--napi_consume_skb
|
|--12.65%--__free_pages_ok
| |
| --11.86%--free_one_page
| |
| |--10.10%--queued_spin_lock_slowpath
| |
| --0.65%--_raw_spin_lock
|
|--1.55%--page_frag_free
|
--1.44%--skb_release_data
Jesper explained how it happened: mlx5 driver RX-page recycle mechanism is
not effective in this workload and pages have to go through the page
allocator. The lock contention happens during mlx5 DMA TX completion
cycle. And the page allocator cannot keep up at these speeds.[2]
I thought that __free_pages_ok() are mostly freeing high order pages and
thought this is an lock contention for high order pages but Jesper
explained in detail that __free_pages_ok() here are actually freeing
order-0 pages because mlx5 is using order-0 pages to satisfy its page pool
allocation request.[3]
The free path as pointed out by Jesper is:
skb_free_head()
-> skb_free_frag()
-> page_frag_free()
And the pages being freed on this path are order-0 pages.
Fix this by doing similar things as in __page_frag_cache_drain() - send
the being freed page to PCP if it's an order-0 page, or directly to Buddy
if it is a high order page.
With this change, Paweł hasn't noticed lock contention yet in his
workload and Jesper has noticed a 7% performance improvement using a micro
benchmark and lock contention is gone. Ilias' test on a 'low' speed 1Gbit
interface on an cortex-a53 shows ~11% performance boost testing with
64byte packets and __free_pages_ok() disappeared from perf top.
[1]: https://www.spinics.net/lists/netdev/msg531362.html
[2]: https://www.spinics.net/lists/netdev/msg531421.html
[3]: https://www.spinics.net/lists/netdev/msg531556.html
[akpm@linux-foundation.org: add comment]
Link: http://lkml.kernel.org/r/20181120014544.GB10657@intel.com
Signed-off-by: Aaron Lu <aaron.lu@intel.com>
Reported-by: Pawel Staszewski <pstaszewski@itcare.pl>
Analysed-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Tested-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Acked-by: Alexander Duyck <alexander.h.duyck@linux.intel.com>
Acked-by: Tariq Toukan <tariqt@mellanox.com>
Acked-by: Pankaj gupta <pagupta@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
||
|
|
9c265248fa |
FROMLIST: scs: add accounting
This change adds accounting for the memory allocated for shadow stacks. Bug: 145210207 Change-Id: I51157fe0b23b4cb28bb33c86a5dfe3ac911296a4 (am from https://lore.kernel.org/patchwork/patch/1149055/) Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Sami Tolvanen <samitolvanen@google.com> |
||
|
|
b777b6f211 |
Merge 4.19.84 into android-4.19
Changes in 4.19.84 bonding: fix state transition issue in link monitoring CDC-NCM: handle incomplete transfer of MTU ipv4: Fix table id reference in fib_sync_down_addr net: ethernet: octeon_mgmt: Account for second possible VLAN header net: fix data-race in neigh_event_send() net: qualcomm: rmnet: Fix potential UAF when unregistering net: usb: qmi_wwan: add support for DW5821e with eSIM support NFC: fdp: fix incorrect free object nfc: netlink: fix double device reference drop NFC: st21nfca: fix double free qede: fix NULL pointer deref in __qede_remove() net: mscc: ocelot: don't handle netdev events for other netdevs net: mscc: ocelot: fix NULL pointer on LAG slave removal ipv6: fixes rt6_probe() and fib6_nh->last_probe init net: hns: Fix the stray netpoll locks causing deadlock in NAPI path ALSA: timer: Fix incorrectly assigned timer instance ALSA: bebob: fix to detect configured source of sampling clock for Focusrite Saffire Pro i/o series ALSA: hda/ca0132 - Fix possible workqueue stall mm: memcontrol: fix network errors from failing __GFP_ATOMIC charges mm, meminit: recalculate pcpu batch and high limits after init completes mm: thp: handle page cache THP correctly in PageTransCompoundMap mm, vmstat: hide /proc/pagetypeinfo from normal users dump_stack: avoid the livelock of the dump_lock tools: gpio: Use !building_out_of_srctree to determine srctree perf tools: Fix time sorting drm/radeon: fix si_enable_smc_cac() failed issue HID: wacom: generic: Treat serial number and related fields as unsigned soundwire: depend on ACPI soundwire: bus: set initial value to port_status arm64: Do not mask out PTE_RDONLY in pte_same() ceph: fix use-after-free in __ceph_remove_cap() ceph: add missing check in d_revalidate snapdir handling iio: adc: stm32-adc: fix stopping dma iio: imu: adis16480: make sure provided frequency is positive iio: srf04: fix wrong limitation in distance measuring ARM: sunxi: Fix CPU powerdown on A83T netfilter: nf_tables: Align nft_expr private data to 64-bit netfilter: ipset: Fix an error code in ip_set_sockfn_get() intel_th: pci: Add Comet Lake PCH support intel_th: pci: Add Jasper Lake PCH support x86/apic/32: Avoid bogus LDR warnings SMB3: Fix persistent handles reconnect can: usb_8dev: fix use-after-free on disconnect can: flexcan: disable completely the ECC mechanism can: c_can: c_can_poll(): only read status register after status IRQ can: peak_usb: fix a potential out-of-sync while decoding packets can: rx-offload: can_rx_offload_queue_sorted(): fix error handling, avoid skb mem leak can: gs_usb: gs_can_open(): prevent memory leak can: dev: add missing of_node_put() after calling of_get_child_by_name() can: mcba_usb: fix use-after-free on disconnect can: peak_usb: fix slab info leak configfs: stash the data we need into configfs_buffer at open time configfs_register_group() shouldn't be (and isn't) called in rmdirable parts configfs: new object reprsenting tree fragments configfs: provide exclusion between IO and removals configfs: fix a deadlock in configfs_symlink() ALSA: usb-audio: More validations of descriptor units ALSA: usb-audio: Simplify parse_audio_unit() ALSA: usb-audio: Unify the release of usb_mixer_elem_info objects ALSA: usb-audio: Remove superfluous bLength checks ALSA: usb-audio: Clean up check_input_term() ALSA: usb-audio: Fix possible NULL dereference at create_yamaha_midi_quirk() ALSA: usb-audio: remove some dead code ALSA: usb-audio: Fix copy&paste error in the validator sched/fair: Fix low cpu usage with high throttling by removing expiration of cpu-local slices sched/fair: Fix -Wunused-but-set-variable warnings usbip: Fix vhci_urb_enqueue() URB null transfer buffer error path usbip: Implement SG support to vhci-hcd and stub driver PCI: tegra: Enable Relaxed Ordering only for Tegra20 & Tegra30 HID: google: add magnemite/masterball USB ids dmaengine: xilinx_dma: Fix control reg update in vdma_channel_set_config dmaengine: sprd: Fix the possible memory leak issue HID: intel-ish-hid: fix wrong error handling in ishtp_cl_alloc_tx_ring() RDMA/mlx5: Clear old rate limit when closing QP iw_cxgb4: fix ECN check on the passive accept RDMA/qedr: Fix reported firmware version net/mlx5e: TX, Fix consumer index of error cqe dump net/mlx5: prevent memory leak in mlx5_fpga_conn_create_cq scsi: qla2xxx: fixup incorrect usage of host_byte RDMA/uverbs: Prevent potential underflow net: openvswitch: free vport unless register_netdevice() succeeds scsi: lpfc: Honor module parameter lpfc_use_adisc scsi: qla2xxx: Initialized mailbox to prevent driver load failure netfilter: nf_flow_table: set timeout before insertion into hashes ipvs: don't ignore errors in case refcounting ip_vs module fails ipvs: move old_secure_tcp into struct netns_ipvs bonding: fix unexpected IFF_BONDING bit unset macsec: fix refcnt leak in module exit routine usb: fsl: Check memory resource before releasing it usb: gadget: udc: atmel: Fix interrupt storm in FIFO mode. usb: gadget: composite: Fix possible double free memory bug usb: dwc3: pci: prevent memory leak in dwc3_pci_probe usb: gadget: configfs: fix concurrent issue between composite APIs usb: dwc3: remove the call trace of USBx_GFLADJ perf/x86/amd/ibs: Fix reading of the IBS OpData register and thus precise RIP validity perf/x86/amd/ibs: Handle erratum #420 only on the affected CPU family (10h) perf/x86/uncore: Fix event group support USB: Skip endpoints with 0 maxpacket length USB: ldusb: use unsigned size format specifiers usbip: tools: Fix read_usb_vudc_device() error path handling RDMA/iw_cxgb4: Avoid freeing skb twice in arp failure case RDMA/hns: Prevent memory leaks of eq->buf_list scsi: qla2xxx: stop timer in shutdown path nvme-multipath: fix possible io hang after ctrl reconnect fjes: Handle workqueue allocation failure net: hisilicon: Fix "Trying to free already-free IRQ" net: mscc: ocelot: fix vlan_filtering when enslaving to bridge before link is up net: mscc: ocelot: refuse to overwrite the port's native vlan iommu/amd: Apply the same IVRS IOAPIC workaround to Acer Aspire A315-41 drm/amdgpu: If amdgpu_ib_schedule fails return back the error. drm/amd/display: Passive DP->HDMI dongle detection fix hv_netvsc: Fix error handling in netvsc_attach() usb: dwc3: gadget: fix race when disabling ep with cancelled xfers NFSv4: Don't allow a cached open with a revoked delegation net: ethernet: arc: add the missed clk_disable_unprepare igb: Fix constant media auto sense switching when no cable is connected e1000: fix memory leaks pinctrl: intel: Avoid potential glitches if pin is in GPIO mode ocfs2: protect extent tree in ocfs2_prepare_inode_for_write() pinctrl: cherryview: Fix irq_valid_mask calculation blkcg: make blkcg_print_stat() print stats only for online blkgs iio: imu: mpu6050: Add support for the ICM 20602 IMU iio: imu: inv_mpu6050: fix no data on MPU6050 mm/filemap.c: don't initiate writeback if mapping has no dirty pages cgroup,writeback: don't switch wbs immediately on dead wbs if the memcg is dead usbip: Fix free of unallocated memory in vhci tx netfilter: ipset: Copy the right MAC address in hash:ip,mac IPv6 sets net: prevent load/store tearing on sk->sk_stamp iio: imu: mpu6050: Fix FIFO layout for ICM20602 vsock/virtio: fix sock refcnt holding during the shutdown drm/i915: Rename gen7 cmdparser tables drm/i915: Disable Secure Batches for gen6+ drm/i915: Remove Master tables from cmdparser drm/i915: Add support for mandatory cmdparsing drm/i915: Support ro ppgtt mapped cmdparser shadow buffers drm/i915: Allow parsing of unsized batches drm/i915: Add gen9 BCS cmdparsing drm/i915/cmdparser: Use explicit goto for error paths drm/i915/cmdparser: Add support for backward jumps drm/i915/cmdparser: Ignore Length operands during command matching drm/i915: Lower RM timeout to avoid DSI hard hangs drm/i915/gen8+: Add RC6 CTX corruption WA drm/i915/cmdparser: Fix jump whitelist clearing KVM: x86: use Intel speculation bugs and features as derived in generic x86 code x86/msr: Add the IA32_TSX_CTRL MSR x86/cpu: Add a helper function x86_read_arch_cap_msr() x86/cpu: Add a "tsx=" cmdline option with TSX disabled by default x86/speculation/taa: Add mitigation for TSX Async Abort x86/speculation/taa: Add sysfs reporting for TSX Async Abort kvm/x86: Export MDS_NO=0 to guests when TSX is enabled x86/tsx: Add "auto" option to the tsx= cmdline parameter x86/speculation/taa: Add documentation for TSX Async Abort x86/tsx: Add config options to set tsx=on|off|auto x86/speculation/taa: Fix printing of TAA_MSG_SMT on IBRS_ALL CPUs x86/bugs: Add ITLB_MULTIHIT bug infrastructure x86/cpu: Add Tremont to the cpu vulnerability whitelist cpu/speculation: Uninline and export CPU mitigations helpers Documentation: Add ITLB_MULTIHIT documentation kvm: x86, powerpc: do not allow clearing largepages debugfs entry kvm: Convert kvm_lock to a mutex kvm: mmu: Do not release the page inside mmu_set_spte() KVM: x86: make FNAME(fetch) and __direct_map more similar KVM: x86: remove now unneeded hugepage gfn adjustment KVM: x86: change kvm_mmu_page_get_gfn BUG_ON to WARN_ON KVM: x86: add tracepoints around __direct_map and FNAME(fetch) KVM: vmx, svm: always run with EFER.NXE=1 when shadow paging is active kvm: mmu: ITLB_MULTIHIT mitigation kvm: Add helper function for creating VM worker threads kvm: x86: mmu: Recovery of shattered NX large pages Linux 4.19.84 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> Change-Id: I7a820f00c4b868ed677bb49613f835b7e67a3a06 |
||
|
|
7dfa51beac |
mm, meminit: recalculate pcpu batch and high limits after init completes
commit 3e8fc0075e24338b1117cdff6a79477427b8dbed upstream.
Deferred memory initialisation updates zone->managed_pages during the
initialisation phase but before that finishes, the per-cpu page
allocator (pcpu) calculates the number of pages allocated/freed in
batches as well as the maximum number of pages allowed on a per-cpu
list. As zone->managed_pages is not up to date yet, the pcpu
initialisation calculates inappropriately low batch and high values.
This increases zone lock contention quite severely in some cases with
the degree of severity depending on how many CPUs share a local zone and
the size of the zone. A private report indicated that kernel build
times were excessive with extremely high system CPU usage. A perf
profile indicated that a large chunk of time was lost on zone->lock
contention.
This patch recalculates the pcpu batch and high values after deferred
initialisation completes for every populated zone in the system. It was
tested on a 2-socket AMD EPYC 2 machine using a kernel compilation
workload -- allmodconfig and all available CPUs.
mmtests configuration: config-workload-kernbench-max Configuration was
modified to build on a fresh XFS partition.
kernbench
5.4.0-rc3 5.4.0-rc3
vanilla resetpcpu-v2
Amean user-256 13249.50 ( 0.00%) 16401.31 * -23.79%*
Amean syst-256 14760.30 ( 0.00%) 4448.39 * 69.86%*
Amean elsp-256 162.42 ( 0.00%) 119.13 * 26.65%*
Stddev user-256 42.97 ( 0.00%) 19.15 ( 55.43%)
Stddev syst-256 336.87 ( 0.00%) 6.71 ( 98.01%)
Stddev elsp-256 2.46 ( 0.00%) 0.39 ( 84.03%)
5.4.0-rc3 5.4.0-rc3
vanilla resetpcpu-v2
Duration User 39766.24 49221.79
Duration System 44298.10 13361.67
Duration Elapsed 519.11 388.87
The patch reduces system CPU usage by 69.86% and total build time by
26.65%. The variance of system CPU usage is also much reduced.
Before, this was the breakdown of batch and high values over all zones
was:
256 batch: 1
256 batch: 63
512 batch: 7
256 high: 0
256 high: 378
512 high: 42
512 pcpu pagesets had a batch limit of 7 and a high limit of 42. After
the patch:
256 batch: 1
768 batch: 63
256 high: 0
768 high: 378
[mgorman@techsingularity.net: fix merge/linkage snafu]
Link: http://lkml.kernel.org/r/20191023084705.GD3016@techsingularity.netLink: http://lkml.kernel.org/r/20191021094808.28824-2-mgorman@techsingularity.net
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Qian Cai <cai@lca.pw>
Cc: <stable@vger.kernel.org> [4.1+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
||
|
|
be21fa3044 |
UPSTREAM: kasan, mm, arm64: tag non slab memory allocated via pagealloc
(Upstream commit 2813b9c0296259fb11e75c839bab2d958ba4f96c). Tag-based KASAN doesn't check memory accesses through pointers tagged with 0xff. When page_address is used to get pointer to memory that corresponds to some page, the tag of the resulting pointer gets set to 0xff, even though the allocated memory might have been tagged differently. For slab pages it's impossible to recover the correct tag to return from page_address, since the page might contain multiple slab objects tagged with different values, and we can't know in advance which one of them is going to get accessed. For non slab pages however, we can recover the tag in page_address, since the whole page was marked with the same tag. This patch adds tagging to non slab memory allocated with pagealloc. To set the tag of the pointer returned from page_address, the tag gets stored to page->flags when the memory gets allocated. Link: http://lkml.kernel.org/r/d758ddcef46a5abc9970182b9137e2fbee202a2c.1544099024.git.andreyknvl@google.com Signed-off-by: Andrey Konovalov <andreyknvl@google.com> Reviewed-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Reviewed-by: Dmitry Vyukov <dvyukov@google.com> Acked-by: Will Deacon <will.deacon@arm.com> Cc: Christoph Lameter <cl@linux.com> Cc: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Andrey Konovalov <andreyknvl@google.com> Bug: 128674696 Change-Id: I500bdf42462fee0ee14495a6be51815e7e44460f |
||
|
|
a5587d8fce |
UPSTREAM: mm: security: introduce init_on_alloc=1 and init_on_free=1 boot options
Upstream commit 6471384af2a6 ("mm: security: introduce init_on_alloc=1
and init_on_free=1 boot options").
Patch series "add init_on_alloc/init_on_free boot options", v10.
Provide init_on_alloc and init_on_free boot options.
These are aimed at preventing possible information leaks and making the
control-flow bugs that depend on uninitialized values more deterministic.
Enabling either of the options guarantees that the memory returned by the
page allocator and SL[AU]B is initialized with zeroes. SLOB allocator
isn't supported at the moment, as its emulation of kmem caches complicates
handling of SLAB_TYPESAFE_BY_RCU caches correctly.
Enabling init_on_free also guarantees that pages and heap objects are
initialized right after they're freed, so it won't be possible to access
stale data by using a dangling pointer.
As suggested by Michal Hocko, right now we don't let the heap users to
disable initialization for certain allocations. There's not enough
evidence that doing so can speed up real-life cases, and introducing ways
to opt-out may result in things going out of control.
This patch (of 2):
The new options are needed to prevent possible information leaks and make
control-flow bugs that depend on uninitialized values more deterministic.
This is expected to be on-by-default on Android and Chrome OS. And it
gives the opportunity for anyone else to use it under distros too via the
boot args. (The init_on_free feature is regularly requested by folks
where memory forensics is included in their threat models.)
init_on_alloc=1 makes the kernel initialize newly allocated pages and heap
objects with zeroes. Initialization is done at allocation time at the
places where checks for __GFP_ZERO are performed.
init_on_free=1 makes the kernel initialize freed pages and heap objects
with zeroes upon their deletion. This helps to ensure sensitive data
doesn't leak via use-after-free accesses.
Both init_on_alloc=1 and init_on_free=1 guarantee that the allocator
returns zeroed memory. The two exceptions are slab caches with
constructors and SLAB_TYPESAFE_BY_RCU flag. Those are never
zero-initialized to preserve their semantics.
Both init_on_alloc and init_on_free default to zero, but those defaults
can be overridden with CONFIG_INIT_ON_ALLOC_DEFAULT_ON and
CONFIG_INIT_ON_FREE_DEFAULT_ON.
If either SLUB poisoning or page poisoning is enabled, those options take
precedence over init_on_alloc and init_on_free: initialization is only
applied to unpoisoned allocations.
Slowdown for the new features compared to init_on_free=0, init_on_alloc=0:
hackbench, init_on_free=1: +7.62% sys time (st.err 0.74%)
hackbench, init_on_alloc=1: +7.75% sys time (st.err 2.14%)
Linux build with -j12, init_on_free=1: +8.38% wall time (st.err 0.39%)
Linux build with -j12, init_on_free=1: +24.42% sys time (st.err 0.52%)
Linux build with -j12, init_on_alloc=1: -0.13% wall time (st.err 0.42%)
Linux build with -j12, init_on_alloc=1: +0.57% sys time (st.err 0.40%)
The slowdown for init_on_free=0, init_on_alloc=0 compared to the baseline
is within the standard error.
The new features are also going to pave the way for hardware memory
tagging (e.g. arm64's MTE), which will require both on_alloc and on_free
hooks to set the tags for heap objects. With MTE, tagging will have the
same cost as memory initialization.
Although init_on_free is rather costly, there are paranoid use-cases where
in-memory data lifetime is desired to be minimized. There are various
arguments for/against the realism of the associated threat models, but
given that we'll need the infrastructure for MTE anyway, and there are
people who want wipe-on-free behavior no matter what the performance cost,
it seems reasonable to include it in this series.
[glider@google.com: v8]
Link: http://lkml.kernel.org/r/20190626121943.131390-2-glider@google.com
[glider@google.com: v9]
Link: http://lkml.kernel.org/r/20190627130316.254309-2-glider@google.com
[glider@google.com: v10]
Link: http://lkml.kernel.org/r/20190628093131.199499-2-glider@google.com
Link: http://lkml.kernel.org/r/20190617151050.92663-2-glider@google.com
Signed-off-by: Alexander Potapenko <glider@google.com>
Acked-by: Kees Cook <keescook@chromium.org>
Acked-by: Michal Hocko <mhocko@suse.cz> [page and dmapool parts
Acked-by: James Morris <jamorris@linux.microsoft.com>]
Cc: Christoph Lameter <cl@linux.com>
Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
Cc: "Serge E. Hallyn" <serge@hallyn.com>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Kostya Serebryany <kcc@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Sandeep Patil <sspatil@android.com>
Cc: Laura Abbott <labbott@redhat.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Jann Horn <jannh@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Marco Elver <elver@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Change-Id: If0620a6a8aed34c21e98458c965e94f5a9dfd297
Bug: 138435492
Test: Boot cuttlefish with and without
Test: CONFIG_INIT_ON_ALLOC_DEFAULT_ON/CONFIG_INIT_ON_FREE_DEFAULT_ON
Signed-off-by: Alexander Potapenko <glider@google.com>
|