cbfaab8aa34f210f49974a30bd160623a23f9b22
1429 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
291d853dff |
Merge 4.19.88 into android-4.19
Changes in 4.19.88 clk: meson: gxbb: let sar_adc_clk_div set the parent clock rate clocksource/drivers/mediatek: Fix error handling ASoC: msm8916-wcd-analog: Fix RX1 selection in RDAC2 MUX ASoC: compress: fix unsigned integer overflow check reset: Fix memory leak in reset_control_array_put() clk: samsung: exynos5433: Fix error paths ASoC: kirkwood: fix external clock probe defer ASoC: kirkwood: fix device remove ordering clk: samsung: exynos5420: Preserve PLL configuration during suspend/resume pinctrl: cherryview: Allocate IRQ chip dynamic ARM: dts: imx6qdl-sabreauto: Fix storm of accelerometer interrupts reset: fix reset_control_ops kerneldoc comment clk: at91: avoid sleeping early clk: sunxi: Fix operator precedence in sunxi_divs_clk_setup clk: sunxi-ng: a80: fix the zero'ing of bits 16 and 18 ARM: dts: sun8i-a83t-tbs-a711: Fix WiFi resume from suspend samples/bpf: fix build by setting HAVE_ATTR_TEST to zero powerpc/bpf: Fix tail call implementation idr: Fix integer overflow in idr_for_each_entry idr: Fix idr_alloc_u32 on 32-bit systems x86/resctrl: Prevent NULL pointer dereference when reading mondata clk: ti: dra7-atl-clock: Remove ti_clk_add_alias call clk: ti: clkctrl: Fix failed to enable error with double udelay timeout net: fec: add missed clk_disable_unprepare in remove bridge: ebtables: don't crash when using dnat target in output chains can: peak_usb: report bus recovery as well can: c_can: D_CAN: c_can_chip_config(): perform a sofware reset on open can: rx-offload: can_rx_offload_queue_tail(): fix error handling, avoid skb mem leak can: rx-offload: can_rx_offload_offload_one(): do not increase the skb_queue beyond skb_queue_len_max can: rx-offload: can_rx_offload_offload_one(): increment rx_fifo_errors on queue overflow or OOM can: rx-offload: can_rx_offload_offload_one(): use ERR_PTR() to propagate error value in case of errors can: rx-offload: can_rx_offload_irq_offload_timestamp(): continue on error can: rx-offload: can_rx_offload_irq_offload_fifo(): continue on error can: flexcan: increase error counters if skb enqueueing via can_rx_offload_queue_sorted() fails can: mcp251x: mcp251x_restart_work_handler(): Fix potential force_quit race condition watchdog: meson: Fix the wrong value of left time ASoC: stm32: sai: add restriction on mmap support scripts/gdb: fix debugging modules compiled with hot/cold partitioning net: bcmgenet: use RGMII loopback for MAC reset net: bcmgenet: reapply manual settings to the PHY net: mscc: ocelot: fix __ocelot_rmw_ix prototype ceph: return -EINVAL if given fsc mount option on kernel w/o support net/fq_impl: Switch to kvmalloc() for memory allocation mac80211: fix station inactive_time shortly after boot block: drbd: remove a stray unlock in __drbd_send_protocol() pwm: bcm-iproc: Prevent unloading the driver module while in use scsi: target/tcmu: Fix queue_cmd_ring() declaration scsi: lpfc: Fix kernel Oops due to null pring pointers scsi: lpfc: Fix dif and first burst use in write commands ARM: dts: Fix up SQ201 flash access tracing: Lock event_mutex before synth_event_mutex ARM: debug-imx: only define DEBUG_IMX_UART_PORT if needed ARM: dts: imx51: Fix memory node duplication ARM: dts: imx53: Fix memory node duplication ARM: dts: imx31: Fix memory node duplication ARM: dts: imx35: Fix memory node duplication ARM: dts: imx7: Fix memory node duplication ARM: dts: imx6ul: Fix memory node duplication ARM: dts: imx6sx: Fix memory node duplication ARM: dts: imx6sl: Fix memory node duplication ARM: dts: imx50: Fix memory node duplication ARM: dts: imx23: Fix memory node duplication ARM: dts: imx1: Fix memory node duplication ARM: dts: imx27: Fix memory node duplication ARM: dts: imx25: Fix memory node duplication ARM: dts: imx53-voipac-dmm-668: Fix memory node duplication parisc: Fix serio address output parisc: Fix HP SDC hpa address output ARM: dts: Fix hsi gdd range for omap4 arm64: mm: Prevent mismatched 52-bit VA support arm64: smp: Handle errors reported by the firmware bus: ti-sysc: Check for no-reset and no-idle flags at the child level platform/x86: mlx-platform: Fix LED configuration ARM: OMAP1: fix USB configuration for device-only setups RDMA/hns: Fix the bug while use multi-hop of pbl arm64: preempt: Fix big-endian when checking preempt count in assembly RDMA/vmw_pvrdma: Use atomic memory allocation in create AH PM / AVS: SmartReflex: NULL check before some freeing functions is not needed xfs: zero length symlinks are not valid ARM: ks8695: fix section mismatch warning ACPI / LPSS: Ignore acpi_device_fix_up_power() return value scsi: lpfc: Enable Management features for IF_TYPE=6 scsi: qla2xxx: Fix NPIV handling for FC-NVMe scsi: qla2xxx: Fix for FC-NVMe discovery for NPIV port nvme: provide fallback for discard alloc failure s390/zcrypt: make sysfs reset attribute trigger queue reset crypto: user - support incremental algorithm dumps arm64: dts: renesas: draak: Fix CVBS input mwifiex: fix potential NULL dereference and use after free mwifiex: debugfs: correct histogram spacing, formatting brcmfmac: set F2 watermark to 256 for 4373 brcmfmac: set SDIO F1 MesBusyCtrl for CYW4373 rtl818x: fix potential use after free bcache: do not check if debug dentry is ERR or NULL explicitly on remove bcache: do not mark writeback_running too early xfs: require both realtime inodes to mount nvme: fix kernel paging oops ubifs: Fix default compression selection in ubifs ubi: Put MTD device after it is not used ubi: Do not drop UBI device reference before using microblaze: adjust the help to the real behavior microblaze: move "... is ready" messages to arch/microblaze/Makefile microblaze: fix multiple bugs in arch/microblaze/boot/Makefile iwlwifi: move iwl_nvm_check_version() into dvm iwlwifi: mvm: force TCM re-evaluation on TCM resume iwlwifi: pcie: fix erroneous print iwlwifi: pcie: set cmd_len in the correct place gpio: pca953x: Fix AI overflow on PCAL6524 gpiolib: Fix return value of gpio_to_desc() stub if !GPIOLIB kvm: vmx: Set IA32_TSC_AUX for legacy mode guests Revert "KVM: nVMX: reset cache/shadows when switching loaded VMCS" Revert "KVM: nVMX: move check_vmentry_postreqs() call to nested_vmx_enter_non_root_mode()" crypto/chelsio/chtls: listen fails with multiadapt VSOCK: bind to random port for VMADDR_PORT_ANY mmc: meson-gx: make sure the descriptor is stopped on errors mtd: rawnand: sunxi: Write pageprog related opcodes to WCMD_SET usb: ehci-omap: Fix deferred probe for phy handling btrfs: Check for missing device before bio submission in btrfs_map_bio btrfs: fix ncopies raid_attr for RAID56 btrfs: dev-replace: set result code of cancel by status of scrub Btrfs: allow clear_extent_dirty() to receive a cached extent state record btrfs: only track ref_heads in delayed_ref_updates serial: sh-sci: Fix crash in rx_timer_fn() on PIO fallback HID: intel-ish-hid: fixes incorrect error handling gpio: raspberrypi-exp: decrease refcount on firmware dt node serial: 8250: Rate limit serial port rx interrupts during input overruns kprobes/x86/xen: blacklist non-attachable xen interrupt functions xen/pciback: Check dev_data before using it kprobes: Blacklist symbols in arch-defined prohibited area kprobes/x86: Show x86-64 specific blacklisted symbols correctly vfio-mdev/samples: Use u8 instead of char for handle functions memory: omap-gpmc: Get the header of the enum pinctrl: xway: fix gpio-hog related boot issues net/mlx5: Continue driver initialization despite debugfs failure netfilter: nf_nat_sip: fix RTP/RTCP source port translations exofs_mount(): fix leaks on failure exits bnxt_en: Return linux standard errors in bnxt_ethtool.c bnxt_en: Save ring statistics before reset. bnxt_en: query force speeds before disabling autoneg mode. KVM: s390: unregister debug feature on failing arch init pinctrl: sh-pfc: r8a77990: Fix MOD_SEL0 SEL_I2C1 field width pinctrl: sh-pfc: sh7264: Fix PFCR3 and PFCR0 register configuration pinctrl: sh-pfc: sh7734: Fix shifted values in IPSR10 HID: doc: fix wrong data structure reference for UHID_OUTPUT dm flakey: Properly corrupt multi-page bios. gfs2: take jdata unstuff into account in do_grow dm raid: fix false -EBUSY when handling check/repair message xfs: Align compat attrlist_by_handle with native implementation. xfs: Fix bulkstat compat ioctls on x32 userspace. IB/qib: Fix an error code in qib_sdma_verbs_send() clocksource/drivers/fttmr010: Fix invalid interrupt register access vxlan: Fix error path in __vxlan_dev_create() powerpc/book3s/32: fix number of bats in p/v_block_mapped() powerpc/xmon: fix dump_segments() drivers/regulator: fix a missing check of return value Bluetooth: hci_bcm: Handle specific unknown packets after firmware loading serial: max310x: Fix tx_empty() callback openrisc: Fix broken paths to arch/or32 RDMA/srp: Propagate ib_post_send() failures to the SCSI mid-layer scsi: qla2xxx: deadlock by configfs_depend_item scsi: csiostor: fix incorrect dma device in case of vport brcmfmac: Fix access point mode ath6kl: Only use match sets when firmware supports it ath6kl: Fix off by one error in scan completion powerpc/perf: Fix unit_sel/cache_sel checks powerpc/32: Avoid unsupported flags with clang powerpc/prom: fix early DEBUG messages powerpc/mm: Make NULL pointer deferences explicit on bad page faults. powerpc/44x/bamboo: Fix PCI range vfio/spapr_tce: Get rid of possible infinite loop powerpc/powernv/eeh/npu: Fix uninitialized variables in opal_pci_eeh_freeze_status drbd: ignore "all zero" peer volume sizes in handshake drbd: reject attach of unsuitable uuids even if connected drbd: do not block when adjusting "disk-options" while IO is frozen drbd: fix print_st_err()'s prototype to match the definition IB/rxe: Make counters thread safe bpf/cpumap: make sure frame_size for build_skb is aligned if headroom isn't regulator: tps65910: fix a missing check of return value powerpc/83xx: handle machine check caused by watchdog timer powerpc/pseries: Fix node leak in update_lmb_associativity_index() powerpc: Fix HMIs on big-endian with CONFIG_RELOCATABLE=y crypto: mxc-scc - fix build warnings on ARM64 pwm: clps711x: Fix period calculation net/netlink_compat: Fix a missing check of nla_parse_nested net/net_namespace: Check the return value of register_pernet_subsys() f2fs: fix block address for __check_sit_bitmap f2fs: fix to dirty inode synchronously um: Include sys/uio.h to have writev() um: Make GCOV depend on !KCOV net: (cpts) fix a missing check of clk_prepare net: stmicro: fix a missing check of clk_prepare net: dsa: bcm_sf2: Propagate error value from mdio_write atl1e: checking the status of atl1e_write_phy_reg tipc: fix a missing check of genlmsg_put net: marvell: fix a missing check of acpi_match_device net/wan/fsl_ucc_hdlc: Avoid double free in ucc_hdlc_probe() ocfs2: clear journal dirty flag after shutdown journal vmscan: return NODE_RECLAIM_NOSCAN in node_reclaim() when CONFIG_NUMA is n mm/page_alloc.c: free order-0 pages through PCP in page_frag_free() mm/page_alloc.c: use a single function to free page mm/page_alloc.c: deduplicate __memblock_free_early() and memblock_free() tools/vm/page-types.c: fix "kpagecount returned fewer pages than expected" failures netfilter: nf_tables: fix a missing check of nla_put_failure xprtrdma: Prevent leak of rpcrdma_rep objects infiniband: bnxt_re: qplib: Check the return value of send_message infiniband/qedr: Potential null ptr dereference of qp firmware: arm_sdei: fix wrong of_node_put() in init function firmware: arm_sdei: Fix DT platform device creation lib/genalloc.c: fix allocation of aligned buffer from non-aligned chunk lib/genalloc.c: use vzalloc_node() to allocate the bitmap fork: fix some -Wmissing-prototypes warnings drivers/base/platform.c: kmemleak ignore a known leak lib/genalloc.c: include vmalloc.h mtd: Check add_mtd_device() ret code tipc: fix memory leak in tipc_nl_compat_publ_dump net/core/neighbour: tell kmemleak about hash tables ata: ahci: mvebu: do Armada 38x configuration only on relevant SoCs PCI/MSI: Return -ENOSPC from pci_alloc_irq_vectors_affinity() net/core/neighbour: fix kmemleak minimal reference count for hash tables serial: 8250: Fix serial8250 initialization crash gpu: ipu-v3: pre: don't trigger update if buffer address doesn't change sfc: suppress duplicate nvmem partition types in efx_ef10_mtd_probe ip_tunnel: Make none-tunnel-dst tunnel port work with lwtunnel decnet: fix DN_IFREQ_SIZE net/smc: prevent races between smc_lgr_terminate() and smc_conn_free() net/smc: don't wait for send buffer space when data was already sent mm/hotplug: invalid PFNs from pfn_to_online_page() xfs: end sync buffer I/O properly on shutdown error net/smc: fix sender_free computation blktrace: Show requests without sector net/smc: fix byte_order for rx_curs_confirmed tipc: fix skb may be leaky in tipc_link_input ASoC: samsung: i2s: Fix prescaler setting for the secondary DAI sfc: initialise found bitmap in efx_ef10_mtd_probe geneve: change NET_UDP_TUNNEL dependency to select net: fix possible overflow in __sk_mem_raise_allocated() net: ip_gre: do not report erspan_ver for gre or gretap net: ip6_gre: do not report erspan_ver for ip6gre or ip6gretap sctp: don't compare hb_timer expire date before starting it bpf: decrease usercnt if bpf_map_new_fd() fails in bpf_map_get_fd_by_id() mmc: core: align max segment size with logical block size net: dev: Use unsigned integer as an argument to left-shift kvm: properly check debugfs dentry before using it bpf: drop refcount if bpf_map_new_fd() fails in map_create() net: hns3: Change fw error code NOT_EXEC to NOT_SUPPORTED net: hns3: fix PFC not setting problem for DCB module net: hns3: fix an issue for hclgevf_ae_get_hdev net: hns3: fix an issue for hns3_update_new_int_gl iommu/amd: Fix NULL dereference bug in match_hid_uid apparmor: delete the dentry in aafs_remove() to avoid a leak scsi: libsas: Support SATA PHY connection rate unmatch fixing during discovery ACPI / APEI: Don't wait to serialise with oops messages when panic()ing ACPI / APEI: Switch estatus pool to use vmalloc memory scsi: hisi_sas: shutdown axi bus to avoid exception CQ returned scsi: libsas: Check SMP PHY control function result RDMA/hns: Fix the bug with updating rq head pointer when flush cqe RDMA/hns: Bugfix for the scene without receiver queue RDMA/hns: Fix the state of rereg mr RDMA/hns: Use GFP_ATOMIC in hns_roce_v2_modify_qp ASoC: rt5645: Headphone Jack sense inverts on the LattePanda board powerpc/pseries/dlpar: Fix a missing check in dlpar_parse_cc_property() xdp: fix cpumap redirect SKB creation bug mtd: Remove a debug trace in mtdpart.c mm, gup: add missing refcount overflow checks on s390 clk: at91: fix update bit maps on CFG_MOR write clk: at91: generated: set audio_pll_allowed in at91_clk_register_generated() usb: dwc2: use a longer core rest timeout in dwc2_core_reset() staging: rtl8192e: fix potential use after free staging: rtl8723bs: Drop ACPI device ids staging: rtl8723bs: Add 024c:0525 to the list of SDIO device-ids USB: serial: ftdi_sio: add device IDs for U-Blox C099-F9P mei: bus: prefix device names on bus with the bus name mei: me: add comet point V device id thunderbolt: Power cycle the router if NVM authentication fails xfrm: Fix memleak on xfrm state destroy media: v4l2-ctrl: fix flags for DO_WHITE_BALANCE net: macb: fix error format in dev_err() pwm: Clear chip_data in pwm_put() media: atmel: atmel-isc: fix asd memory allocation media: atmel: atmel-isc: fix INIT_WORK misplacement macvlan: schedule bc_work even if error net: psample: fix skb_over_panic openvswitch: fix flow command message size sctp: Fix memory leak in sctp_sf_do_5_2_4_dupcook slip: Fix use-after-free Read in slip_open openvswitch: drop unneeded BUG_ON() in ovs_flow_cmd_build_info() openvswitch: remove another BUG_ON() selftests: bpf: test_sockmap: handle file creation failures gracefully tipc: fix link name length check sctp: cache netns in sctp_ep_common net: sched: fix `tc -s class show` no bstats on class with nolock subqueues net: macb: add missed tasklet_kill ext4: add more paranoia checking in ext4_expand_extra_isize handling watchdog: sama5d4: fix WDD value to be always set to max net: macb: Fix SUBNS increment and increase resolution net: macb driver, check for SKBTX_HW_TSTAMP mtd: rawnand: atmel: Fix spelling mistake in error message mtd: rawnand: atmel: fix possible object reference leak mtd: spi-nor: cast to u64 to avoid uint overflows drm/atmel-hlcdc: revert shift by 8 mailbox: stm32_ipcc: add spinlock to fix channels concurrent access tcp: exit if nothing to retransmit on RTO timeout HID: core: check whether Usage Page item is after Usage ID items crypto: stm32/hash - Fix hmac issue more than 256 bytes media: stm32-dcmi: fix DMA corruption when stopping streaming media: stm32-dcmi: fix check of pm_runtime_get_sync return value hwrng: stm32 - fix unbalanced pm_runtime_enable clk: stm32mp1: fix HSI divider flag clk: stm32mp1: fix mcu divider table clk: stm32mp1: add CLK_SET_RATE_NO_REPARENT to Kernel clocks clk: stm32mp1: parent clocks update mailbox: mailbox-test: fix null pointer if no mmio pinctrl: stm32: fix memory leak issue ASoC: stm32: i2s: fix dma configuration ASoC: stm32: i2s: fix 16 bit format support ASoC: stm32: i2s: fix IRQ clearing ASoC: stm32: sai: add missing put_device() dmaengine: stm32-dma: check whether length is aligned on FIFO threshold platform/x86: hp-wmi: Fix ACPI errors caused by too small buffer platform/x86: hp-wmi: Fix ACPI errors caused by passing 0 as input size net: fec: fix clock count mis-match Linux 4.19.88 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> Change-Id: Ifd3801a77cb551be72788031e7fcfc8a1d4fd197 |
||
|
|
9696c7656e |
mm/page_alloc.c: use a single function to free page
[ Upstream commit 742aa7fb52c56fb3b307e704f93e67b698959cc2 ] There are multiple places of freeing a page, they all do the same things so a common function can be used to reduce code duplicate. It also avoids bug fixed in one function but left in another. Link: http://lkml.kernel.org/r/20181119134834.17765-3-aaron.lu@intel.com Signed-off-by: Aaron Lu <aaron.lu@intel.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Alexander Duyck <alexander.h.duyck@linux.intel.com> Cc: Ilias Apalodimas <ilias.apalodimas@linaro.org> Cc: Jesper Dangaard Brouer <brouer@redhat.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Pankaj gupta <pagupta@redhat.com> Cc: Pawel Staszewski <pstaszewski@itcare.pl> Cc: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org> |
||
|
|
257ad5fbfe |
mm/page_alloc.c: free order-0 pages through PCP in page_frag_free()
[ Upstream commit 65895b67ad27df0f62bfaf82dd5622f95ea29196 ]
page_frag_free() calls __free_pages_ok() to free the page back to Buddy.
This is OK for high order page, but for order-0 pages, it misses the
optimization opportunity of using Per-Cpu-Pages and can cause zone lock
contention when called frequently.
Pawel Staszewski recently shared his result of 'how Linux kernel handles
normal traffic'[1] and from perf data, Jesper Dangaard Brouer found the
lock contention comes from page allocator:
mlx5e_poll_tx_cq
|
--16.34%--napi_consume_skb
|
|--12.65%--__free_pages_ok
| |
| --11.86%--free_one_page
| |
| |--10.10%--queued_spin_lock_slowpath
| |
| --0.65%--_raw_spin_lock
|
|--1.55%--page_frag_free
|
--1.44%--skb_release_data
Jesper explained how it happened: mlx5 driver RX-page recycle mechanism is
not effective in this workload and pages have to go through the page
allocator. The lock contention happens during mlx5 DMA TX completion
cycle. And the page allocator cannot keep up at these speeds.[2]
I thought that __free_pages_ok() are mostly freeing high order pages and
thought this is an lock contention for high order pages but Jesper
explained in detail that __free_pages_ok() here are actually freeing
order-0 pages because mlx5 is using order-0 pages to satisfy its page pool
allocation request.[3]
The free path as pointed out by Jesper is:
skb_free_head()
-> skb_free_frag()
-> page_frag_free()
And the pages being freed on this path are order-0 pages.
Fix this by doing similar things as in __page_frag_cache_drain() - send
the being freed page to PCP if it's an order-0 page, or directly to Buddy
if it is a high order page.
With this change, Paweł hasn't noticed lock contention yet in his
workload and Jesper has noticed a 7% performance improvement using a micro
benchmark and lock contention is gone. Ilias' test on a 'low' speed 1Gbit
interface on an cortex-a53 shows ~11% performance boost testing with
64byte packets and __free_pages_ok() disappeared from perf top.
[1]: https://www.spinics.net/lists/netdev/msg531362.html
[2]: https://www.spinics.net/lists/netdev/msg531421.html
[3]: https://www.spinics.net/lists/netdev/msg531556.html
[akpm@linux-foundation.org: add comment]
Link: http://lkml.kernel.org/r/20181120014544.GB10657@intel.com
Signed-off-by: Aaron Lu <aaron.lu@intel.com>
Reported-by: Pawel Staszewski <pstaszewski@itcare.pl>
Analysed-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Tested-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Acked-by: Alexander Duyck <alexander.h.duyck@linux.intel.com>
Acked-by: Tariq Toukan <tariqt@mellanox.com>
Acked-by: Pankaj gupta <pagupta@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
||
|
|
9c265248fa |
FROMLIST: scs: add accounting
This change adds accounting for the memory allocated for shadow stacks. Bug: 145210207 Change-Id: I51157fe0b23b4cb28bb33c86a5dfe3ac911296a4 (am from https://lore.kernel.org/patchwork/patch/1149055/) Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Sami Tolvanen <samitolvanen@google.com> |
||
|
|
b777b6f211 |
Merge 4.19.84 into android-4.19
Changes in 4.19.84 bonding: fix state transition issue in link monitoring CDC-NCM: handle incomplete transfer of MTU ipv4: Fix table id reference in fib_sync_down_addr net: ethernet: octeon_mgmt: Account for second possible VLAN header net: fix data-race in neigh_event_send() net: qualcomm: rmnet: Fix potential UAF when unregistering net: usb: qmi_wwan: add support for DW5821e with eSIM support NFC: fdp: fix incorrect free object nfc: netlink: fix double device reference drop NFC: st21nfca: fix double free qede: fix NULL pointer deref in __qede_remove() net: mscc: ocelot: don't handle netdev events for other netdevs net: mscc: ocelot: fix NULL pointer on LAG slave removal ipv6: fixes rt6_probe() and fib6_nh->last_probe init net: hns: Fix the stray netpoll locks causing deadlock in NAPI path ALSA: timer: Fix incorrectly assigned timer instance ALSA: bebob: fix to detect configured source of sampling clock for Focusrite Saffire Pro i/o series ALSA: hda/ca0132 - Fix possible workqueue stall mm: memcontrol: fix network errors from failing __GFP_ATOMIC charges mm, meminit: recalculate pcpu batch and high limits after init completes mm: thp: handle page cache THP correctly in PageTransCompoundMap mm, vmstat: hide /proc/pagetypeinfo from normal users dump_stack: avoid the livelock of the dump_lock tools: gpio: Use !building_out_of_srctree to determine srctree perf tools: Fix time sorting drm/radeon: fix si_enable_smc_cac() failed issue HID: wacom: generic: Treat serial number and related fields as unsigned soundwire: depend on ACPI soundwire: bus: set initial value to port_status arm64: Do not mask out PTE_RDONLY in pte_same() ceph: fix use-after-free in __ceph_remove_cap() ceph: add missing check in d_revalidate snapdir handling iio: adc: stm32-adc: fix stopping dma iio: imu: adis16480: make sure provided frequency is positive iio: srf04: fix wrong limitation in distance measuring ARM: sunxi: Fix CPU powerdown on A83T netfilter: nf_tables: Align nft_expr private data to 64-bit netfilter: ipset: Fix an error code in ip_set_sockfn_get() intel_th: pci: Add Comet Lake PCH support intel_th: pci: Add Jasper Lake PCH support x86/apic/32: Avoid bogus LDR warnings SMB3: Fix persistent handles reconnect can: usb_8dev: fix use-after-free on disconnect can: flexcan: disable completely the ECC mechanism can: c_can: c_can_poll(): only read status register after status IRQ can: peak_usb: fix a potential out-of-sync while decoding packets can: rx-offload: can_rx_offload_queue_sorted(): fix error handling, avoid skb mem leak can: gs_usb: gs_can_open(): prevent memory leak can: dev: add missing of_node_put() after calling of_get_child_by_name() can: mcba_usb: fix use-after-free on disconnect can: peak_usb: fix slab info leak configfs: stash the data we need into configfs_buffer at open time configfs_register_group() shouldn't be (and isn't) called in rmdirable parts configfs: new object reprsenting tree fragments configfs: provide exclusion between IO and removals configfs: fix a deadlock in configfs_symlink() ALSA: usb-audio: More validations of descriptor units ALSA: usb-audio: Simplify parse_audio_unit() ALSA: usb-audio: Unify the release of usb_mixer_elem_info objects ALSA: usb-audio: Remove superfluous bLength checks ALSA: usb-audio: Clean up check_input_term() ALSA: usb-audio: Fix possible NULL dereference at create_yamaha_midi_quirk() ALSA: usb-audio: remove some dead code ALSA: usb-audio: Fix copy&paste error in the validator sched/fair: Fix low cpu usage with high throttling by removing expiration of cpu-local slices sched/fair: Fix -Wunused-but-set-variable warnings usbip: Fix vhci_urb_enqueue() URB null transfer buffer error path usbip: Implement SG support to vhci-hcd and stub driver PCI: tegra: Enable Relaxed Ordering only for Tegra20 & Tegra30 HID: google: add magnemite/masterball USB ids dmaengine: xilinx_dma: Fix control reg update in vdma_channel_set_config dmaengine: sprd: Fix the possible memory leak issue HID: intel-ish-hid: fix wrong error handling in ishtp_cl_alloc_tx_ring() RDMA/mlx5: Clear old rate limit when closing QP iw_cxgb4: fix ECN check on the passive accept RDMA/qedr: Fix reported firmware version net/mlx5e: TX, Fix consumer index of error cqe dump net/mlx5: prevent memory leak in mlx5_fpga_conn_create_cq scsi: qla2xxx: fixup incorrect usage of host_byte RDMA/uverbs: Prevent potential underflow net: openvswitch: free vport unless register_netdevice() succeeds scsi: lpfc: Honor module parameter lpfc_use_adisc scsi: qla2xxx: Initialized mailbox to prevent driver load failure netfilter: nf_flow_table: set timeout before insertion into hashes ipvs: don't ignore errors in case refcounting ip_vs module fails ipvs: move old_secure_tcp into struct netns_ipvs bonding: fix unexpected IFF_BONDING bit unset macsec: fix refcnt leak in module exit routine usb: fsl: Check memory resource before releasing it usb: gadget: udc: atmel: Fix interrupt storm in FIFO mode. usb: gadget: composite: Fix possible double free memory bug usb: dwc3: pci: prevent memory leak in dwc3_pci_probe usb: gadget: configfs: fix concurrent issue between composite APIs usb: dwc3: remove the call trace of USBx_GFLADJ perf/x86/amd/ibs: Fix reading of the IBS OpData register and thus precise RIP validity perf/x86/amd/ibs: Handle erratum #420 only on the affected CPU family (10h) perf/x86/uncore: Fix event group support USB: Skip endpoints with 0 maxpacket length USB: ldusb: use unsigned size format specifiers usbip: tools: Fix read_usb_vudc_device() error path handling RDMA/iw_cxgb4: Avoid freeing skb twice in arp failure case RDMA/hns: Prevent memory leaks of eq->buf_list scsi: qla2xxx: stop timer in shutdown path nvme-multipath: fix possible io hang after ctrl reconnect fjes: Handle workqueue allocation failure net: hisilicon: Fix "Trying to free already-free IRQ" net: mscc: ocelot: fix vlan_filtering when enslaving to bridge before link is up net: mscc: ocelot: refuse to overwrite the port's native vlan iommu/amd: Apply the same IVRS IOAPIC workaround to Acer Aspire A315-41 drm/amdgpu: If amdgpu_ib_schedule fails return back the error. drm/amd/display: Passive DP->HDMI dongle detection fix hv_netvsc: Fix error handling in netvsc_attach() usb: dwc3: gadget: fix race when disabling ep with cancelled xfers NFSv4: Don't allow a cached open with a revoked delegation net: ethernet: arc: add the missed clk_disable_unprepare igb: Fix constant media auto sense switching when no cable is connected e1000: fix memory leaks pinctrl: intel: Avoid potential glitches if pin is in GPIO mode ocfs2: protect extent tree in ocfs2_prepare_inode_for_write() pinctrl: cherryview: Fix irq_valid_mask calculation blkcg: make blkcg_print_stat() print stats only for online blkgs iio: imu: mpu6050: Add support for the ICM 20602 IMU iio: imu: inv_mpu6050: fix no data on MPU6050 mm/filemap.c: don't initiate writeback if mapping has no dirty pages cgroup,writeback: don't switch wbs immediately on dead wbs if the memcg is dead usbip: Fix free of unallocated memory in vhci tx netfilter: ipset: Copy the right MAC address in hash:ip,mac IPv6 sets net: prevent load/store tearing on sk->sk_stamp iio: imu: mpu6050: Fix FIFO layout for ICM20602 vsock/virtio: fix sock refcnt holding during the shutdown drm/i915: Rename gen7 cmdparser tables drm/i915: Disable Secure Batches for gen6+ drm/i915: Remove Master tables from cmdparser drm/i915: Add support for mandatory cmdparsing drm/i915: Support ro ppgtt mapped cmdparser shadow buffers drm/i915: Allow parsing of unsized batches drm/i915: Add gen9 BCS cmdparsing drm/i915/cmdparser: Use explicit goto for error paths drm/i915/cmdparser: Add support for backward jumps drm/i915/cmdparser: Ignore Length operands during command matching drm/i915: Lower RM timeout to avoid DSI hard hangs drm/i915/gen8+: Add RC6 CTX corruption WA drm/i915/cmdparser: Fix jump whitelist clearing KVM: x86: use Intel speculation bugs and features as derived in generic x86 code x86/msr: Add the IA32_TSX_CTRL MSR x86/cpu: Add a helper function x86_read_arch_cap_msr() x86/cpu: Add a "tsx=" cmdline option with TSX disabled by default x86/speculation/taa: Add mitigation for TSX Async Abort x86/speculation/taa: Add sysfs reporting for TSX Async Abort kvm/x86: Export MDS_NO=0 to guests when TSX is enabled x86/tsx: Add "auto" option to the tsx= cmdline parameter x86/speculation/taa: Add documentation for TSX Async Abort x86/tsx: Add config options to set tsx=on|off|auto x86/speculation/taa: Fix printing of TAA_MSG_SMT on IBRS_ALL CPUs x86/bugs: Add ITLB_MULTIHIT bug infrastructure x86/cpu: Add Tremont to the cpu vulnerability whitelist cpu/speculation: Uninline and export CPU mitigations helpers Documentation: Add ITLB_MULTIHIT documentation kvm: x86, powerpc: do not allow clearing largepages debugfs entry kvm: Convert kvm_lock to a mutex kvm: mmu: Do not release the page inside mmu_set_spte() KVM: x86: make FNAME(fetch) and __direct_map more similar KVM: x86: remove now unneeded hugepage gfn adjustment KVM: x86: change kvm_mmu_page_get_gfn BUG_ON to WARN_ON KVM: x86: add tracepoints around __direct_map and FNAME(fetch) KVM: vmx, svm: always run with EFER.NXE=1 when shadow paging is active kvm: mmu: ITLB_MULTIHIT mitigation kvm: Add helper function for creating VM worker threads kvm: x86: mmu: Recovery of shattered NX large pages Linux 4.19.84 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> Change-Id: I7a820f00c4b868ed677bb49613f835b7e67a3a06 |
||
|
|
7dfa51beac |
mm, meminit: recalculate pcpu batch and high limits after init completes
commit 3e8fc0075e24338b1117cdff6a79477427b8dbed upstream.
Deferred memory initialisation updates zone->managed_pages during the
initialisation phase but before that finishes, the per-cpu page
allocator (pcpu) calculates the number of pages allocated/freed in
batches as well as the maximum number of pages allowed on a per-cpu
list. As zone->managed_pages is not up to date yet, the pcpu
initialisation calculates inappropriately low batch and high values.
This increases zone lock contention quite severely in some cases with
the degree of severity depending on how many CPUs share a local zone and
the size of the zone. A private report indicated that kernel build
times were excessive with extremely high system CPU usage. A perf
profile indicated that a large chunk of time was lost on zone->lock
contention.
This patch recalculates the pcpu batch and high values after deferred
initialisation completes for every populated zone in the system. It was
tested on a 2-socket AMD EPYC 2 machine using a kernel compilation
workload -- allmodconfig and all available CPUs.
mmtests configuration: config-workload-kernbench-max Configuration was
modified to build on a fresh XFS partition.
kernbench
5.4.0-rc3 5.4.0-rc3
vanilla resetpcpu-v2
Amean user-256 13249.50 ( 0.00%) 16401.31 * -23.79%*
Amean syst-256 14760.30 ( 0.00%) 4448.39 * 69.86%*
Amean elsp-256 162.42 ( 0.00%) 119.13 * 26.65%*
Stddev user-256 42.97 ( 0.00%) 19.15 ( 55.43%)
Stddev syst-256 336.87 ( 0.00%) 6.71 ( 98.01%)
Stddev elsp-256 2.46 ( 0.00%) 0.39 ( 84.03%)
5.4.0-rc3 5.4.0-rc3
vanilla resetpcpu-v2
Duration User 39766.24 49221.79
Duration System 44298.10 13361.67
Duration Elapsed 519.11 388.87
The patch reduces system CPU usage by 69.86% and total build time by
26.65%. The variance of system CPU usage is also much reduced.
Before, this was the breakdown of batch and high values over all zones
was:
256 batch: 1
256 batch: 63
512 batch: 7
256 high: 0
256 high: 378
512 high: 42
512 pcpu pagesets had a batch limit of 7 and a high limit of 42. After
the patch:
256 batch: 1
768 batch: 63
256 high: 0
768 high: 378
[mgorman@techsingularity.net: fix merge/linkage snafu]
Link: http://lkml.kernel.org/r/20191023084705.GD3016@techsingularity.netLink: http://lkml.kernel.org/r/20191021094808.28824-2-mgorman@techsingularity.net
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Qian Cai <cai@lca.pw>
Cc: <stable@vger.kernel.org> [4.1+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
||
|
|
be21fa3044 |
UPSTREAM: kasan, mm, arm64: tag non slab memory allocated via pagealloc
(Upstream commit 2813b9c0296259fb11e75c839bab2d958ba4f96c). Tag-based KASAN doesn't check memory accesses through pointers tagged with 0xff. When page_address is used to get pointer to memory that corresponds to some page, the tag of the resulting pointer gets set to 0xff, even though the allocated memory might have been tagged differently. For slab pages it's impossible to recover the correct tag to return from page_address, since the page might contain multiple slab objects tagged with different values, and we can't know in advance which one of them is going to get accessed. For non slab pages however, we can recover the tag in page_address, since the whole page was marked with the same tag. This patch adds tagging to non slab memory allocated with pagealloc. To set the tag of the pointer returned from page_address, the tag gets stored to page->flags when the memory gets allocated. Link: http://lkml.kernel.org/r/d758ddcef46a5abc9970182b9137e2fbee202a2c.1544099024.git.andreyknvl@google.com Signed-off-by: Andrey Konovalov <andreyknvl@google.com> Reviewed-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Reviewed-by: Dmitry Vyukov <dvyukov@google.com> Acked-by: Will Deacon <will.deacon@arm.com> Cc: Christoph Lameter <cl@linux.com> Cc: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Andrey Konovalov <andreyknvl@google.com> Bug: 128674696 Change-Id: I500bdf42462fee0ee14495a6be51815e7e44460f |
||
|
|
a5587d8fce |
UPSTREAM: mm: security: introduce init_on_alloc=1 and init_on_free=1 boot options
Upstream commit 6471384af2a6 ("mm: security: introduce init_on_alloc=1
and init_on_free=1 boot options").
Patch series "add init_on_alloc/init_on_free boot options", v10.
Provide init_on_alloc and init_on_free boot options.
These are aimed at preventing possible information leaks and making the
control-flow bugs that depend on uninitialized values more deterministic.
Enabling either of the options guarantees that the memory returned by the
page allocator and SL[AU]B is initialized with zeroes. SLOB allocator
isn't supported at the moment, as its emulation of kmem caches complicates
handling of SLAB_TYPESAFE_BY_RCU caches correctly.
Enabling init_on_free also guarantees that pages and heap objects are
initialized right after they're freed, so it won't be possible to access
stale data by using a dangling pointer.
As suggested by Michal Hocko, right now we don't let the heap users to
disable initialization for certain allocations. There's not enough
evidence that doing so can speed up real-life cases, and introducing ways
to opt-out may result in things going out of control.
This patch (of 2):
The new options are needed to prevent possible information leaks and make
control-flow bugs that depend on uninitialized values more deterministic.
This is expected to be on-by-default on Android and Chrome OS. And it
gives the opportunity for anyone else to use it under distros too via the
boot args. (The init_on_free feature is regularly requested by folks
where memory forensics is included in their threat models.)
init_on_alloc=1 makes the kernel initialize newly allocated pages and heap
objects with zeroes. Initialization is done at allocation time at the
places where checks for __GFP_ZERO are performed.
init_on_free=1 makes the kernel initialize freed pages and heap objects
with zeroes upon their deletion. This helps to ensure sensitive data
doesn't leak via use-after-free accesses.
Both init_on_alloc=1 and init_on_free=1 guarantee that the allocator
returns zeroed memory. The two exceptions are slab caches with
constructors and SLAB_TYPESAFE_BY_RCU flag. Those are never
zero-initialized to preserve their semantics.
Both init_on_alloc and init_on_free default to zero, but those defaults
can be overridden with CONFIG_INIT_ON_ALLOC_DEFAULT_ON and
CONFIG_INIT_ON_FREE_DEFAULT_ON.
If either SLUB poisoning or page poisoning is enabled, those options take
precedence over init_on_alloc and init_on_free: initialization is only
applied to unpoisoned allocations.
Slowdown for the new features compared to init_on_free=0, init_on_alloc=0:
hackbench, init_on_free=1: +7.62% sys time (st.err 0.74%)
hackbench, init_on_alloc=1: +7.75% sys time (st.err 2.14%)
Linux build with -j12, init_on_free=1: +8.38% wall time (st.err 0.39%)
Linux build with -j12, init_on_free=1: +24.42% sys time (st.err 0.52%)
Linux build with -j12, init_on_alloc=1: -0.13% wall time (st.err 0.42%)
Linux build with -j12, init_on_alloc=1: +0.57% sys time (st.err 0.40%)
The slowdown for init_on_free=0, init_on_alloc=0 compared to the baseline
is within the standard error.
The new features are also going to pave the way for hardware memory
tagging (e.g. arm64's MTE), which will require both on_alloc and on_free
hooks to set the tags for heap objects. With MTE, tagging will have the
same cost as memory initialization.
Although init_on_free is rather costly, there are paranoid use-cases where
in-memory data lifetime is desired to be minimized. There are various
arguments for/against the realism of the associated threat models, but
given that we'll need the infrastructure for MTE anyway, and there are
people who want wipe-on-free behavior no matter what the performance cost,
it seems reasonable to include it in this series.
[glider@google.com: v8]
Link: http://lkml.kernel.org/r/20190626121943.131390-2-glider@google.com
[glider@google.com: v9]
Link: http://lkml.kernel.org/r/20190627130316.254309-2-glider@google.com
[glider@google.com: v10]
Link: http://lkml.kernel.org/r/20190628093131.199499-2-glider@google.com
Link: http://lkml.kernel.org/r/20190617151050.92663-2-glider@google.com
Signed-off-by: Alexander Potapenko <glider@google.com>
Acked-by: Kees Cook <keescook@chromium.org>
Acked-by: Michal Hocko <mhocko@suse.cz> [page and dmapool parts
Acked-by: James Morris <jamorris@linux.microsoft.com>]
Cc: Christoph Lameter <cl@linux.com>
Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
Cc: "Serge E. Hallyn" <serge@hallyn.com>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Kostya Serebryany <kcc@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Sandeep Patil <sspatil@android.com>
Cc: Laura Abbott <labbott@redhat.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Jann Horn <jannh@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Marco Elver <elver@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Change-Id: If0620a6a8aed34c21e98458c965e94f5a9dfd297
Bug: 138435492
Test: Boot cuttlefish with and without
Test: CONFIG_INIT_ON_ALLOC_DEFAULT_ON/CONFIG_INIT_ON_FREE_DEFAULT_ON
Signed-off-by: Alexander Potapenko <glider@google.com>
|
||
|
|
d1f7f3be99 |
Merge 4.19.51 into android-4.19
Changes in 4.19.51
rapidio: fix a NULL pointer dereference when create_workqueue() fails
fs/fat/file.c: issue flush after the writeback of FAT
sysctl: return -EINVAL if val violates minmax
ipc: prevent lockup on alloc_msg and free_msg
drm/pl111: Initialize clock spinlock early
ARM: prevent tracing IPI_CPU_BACKTRACE
mm/hmm: select mmu notifier when selecting HMM
hugetlbfs: on restore reserve error path retain subpool reservation
mem-hotplug: fix node spanned pages when we have a node with only ZONE_MOVABLE
mm/cma.c: fix crash on CMA allocation if bitmap allocation fails
initramfs: free initrd memory if opening /initrd.image fails
mm/cma.c: fix the bitmap status to show failed allocation reason
mm: page_mkclean vs MADV_DONTNEED race
mm/cma_debug.c: fix the break condition in cma_maxchunk_get()
mm/slab.c: fix an infinite loop in leaks_show()
kernel/sys.c: prctl: fix false positive in validate_prctl_map()
thermal: rcar_gen3_thermal: disable interrupt in .remove
drivers: thermal: tsens: Don't print error message on -EPROBE_DEFER
mfd: tps65912-spi: Add missing of table registration
mfd: intel-lpss: Set the device in reset state when init
drm/nouveau/disp/dp: respect sink limits when selecting failsafe link configuration
mfd: twl6040: Fix device init errors for ACCCTL register
perf/x86/intel: Allow PEBS multi-entry in watermark mode
drm/nouveau/kms/gf119-gp10x: push HeadSetControlOutputResource() mthd when encoders change
drm/bridge: adv7511: Fix low refresh rate selection
objtool: Don't use ignore flag for fake jumps
drm/nouveau/kms/gv100-: fix spurious window immediate interlocks
bpf: fix undefined behavior in narrow load handling
EDAC/mpc85xx: Prevent building as a module
pwm: meson: Use the spin-lock only to protect register modifications
mailbox: stm32-ipcc: check invalid irq
ntp: Allow TAI-UTC offset to be set to zero
f2fs: fix to avoid panic in do_recover_data()
f2fs: fix to avoid panic in f2fs_inplace_write_data()
f2fs: fix to avoid panic in f2fs_remove_inode_page()
f2fs: fix to do sanity check on free nid
f2fs: fix to clear dirty inode in error path of f2fs_iget()
f2fs: fix to avoid panic in dec_valid_block_count()
f2fs: fix to use inline space only if inline_xattr is enable
f2fs: fix to do sanity check on valid block count of segment
f2fs: fix to do checksum even if inode page is uptodate
percpu: remove spurious lock dependency between percpu and sched
configfs: fix possible use-after-free in configfs_register_group
uml: fix a boot splat wrt use of cpu_all_mask
PCI: dwc: Free MSI in dw_pcie_host_init() error path
PCI: dwc: Free MSI IRQ page in dw_pcie_free_msi()
ovl: do not generate duplicate fsnotify events for "fake" path
mmc: mmci: Prevent polling for busy detection in IRQ context
netfilter: nf_flow_table: fix missing error check for rhashtable_insert_fast
netfilter: nf_conntrack_h323: restore boundary check correctness
mips: Make sure dt memory regions are valid
netfilter: nf_tables: fix base chain stat rcu_dereference usage
watchdog: imx2_wdt: Fix set_timeout for big timeout values
watchdog: fix compile time error of pretimeout governors
blk-mq: move cancel of requeue_work into blk_mq_release
iommu/vt-d: Set intel_iommu_gfx_mapped correctly
misc: pci_endpoint_test: Fix test_reg_bar to be updated in pci_endpoint_test
PCI: designware-ep: Use aligned ATU window for raising MSI interrupts
nvme-pci: unquiesce admin queue on shutdown
nvme-pci: shutdown on timeout during deletion
netfilter: nf_flow_table: check ttl value in flow offload data path
netfilter: nf_flow_table: fix netdev refcnt leak
ALSA: hda - Register irq handler after the chip initialization
nvmem: core: fix read buffer in place
nvmem: sunxi_sid: Support SID on A83T and H5
fuse: retrieve: cap requested size to negotiated max_write
nfsd: allow fh_want_write to be called twice
nfsd: avoid uninitialized variable warning
vfio: Fix WARNING "do not call blocking ops when !TASK_RUNNING"
iommu/arm-smmu-v3: Don't disable SMMU in kdump kernel
switchtec: Fix unintended mask of MRPC event
net: thunderbolt: Unregister ThunderboltIP protocol handler when suspending
x86/PCI: Fix PCI IRQ routing table memory leak
i40e: Queues are reserved despite "Invalid argument" error
platform/chrome: cros_ec_proto: check for NULL transfer function
PCI: keystone: Prevent ARM32 specific code to be compiled for ARM64
soc: mediatek: pwrap: Zero initialize rdata in pwrap_init_cipher
clk: rockchip: Turn on "aclk_dmac1" for suspend on rk3288
soc: rockchip: Set the proper PWM for rk3288
ARM: dts: imx51: Specify IMX5_CLK_IPG as "ahb" clock to SDMA
ARM: dts: imx50: Specify IMX5_CLK_IPG as "ahb" clock to SDMA
ARM: dts: imx53: Specify IMX5_CLK_IPG as "ahb" clock to SDMA
ARM: dts: imx6sx: Specify IMX6SX_CLK_IPG as "ahb" clock to SDMA
ARM: dts: imx6sll: Specify IMX6SLL_CLK_IPG as "ipg" clock to SDMA
ARM: dts: imx7d: Specify IMX7D_CLK_IPG as "ipg" clock to SDMA
ARM: dts: imx6ul: Specify IMX6UL_CLK_IPG as "ipg" clock to SDMA
ARM: dts: imx6sx: Specify IMX6SX_CLK_IPG as "ipg" clock to SDMA
ARM: dts: imx6qdl: Specify IMX6QDL_CLK_IPG as "ipg" clock to SDMA
PCI: rpadlpar: Fix leaked device_node references in add/remove paths
drm/amd/display: Use plane->color_space for dpp if specified
ARM: OMAP2+: pm33xx-core: Do not Turn OFF CEFUSE as PPA may be using it
platform/x86: intel_pmc_ipc: adding error handling
power: supply: max14656: fix potential use-before-alloc
net: hns3: return 0 and print warning when hit duplicate MAC
PCI: rcar: Fix a potential NULL pointer dereference
PCI: rcar: Fix 64bit MSI message address handling
scsi: qla2xxx: Reset the FCF_ASYNC_{SENT|ACTIVE} flags
video: hgafb: fix potential NULL pointer dereference
video: imsttfb: fix potential NULL pointer dereferences
block, bfq: increase idling for weight-raised queues
PCI: xilinx: Check for __get_free_pages() failure
gpio: gpio-omap: add check for off wake capable gpios
ice: Add missing case in print_link_msg for printing flow control
dmaengine: idma64: Use actual device for DMA transfers
pwm: tiehrpwm: Update shadow register for disabling PWMs
ARM: dts: exynos: Always enable necessary APIO_1V8 and ABB_1V8 regulators on Arndale Octa
pwm: Fix deadlock warning when removing PWM device
ARM: exynos: Fix undefined instruction during Exynos5422 resume
usb: typec: fusb302: Check vconn is off when we start toggling
soc: renesas: Identify R-Car M3-W ES1.3
gpio: vf610: Do not share irq_chip
percpu: do not search past bitmap when allocating an area
Revert "Bluetooth: Align minimum encryption key size for LE and BR/EDR connections"
Revert "drm/nouveau: add kconfig option to turn off nouveau legacy contexts. (v3)"
ovl: check the capability before cred overridden
ovl: support stacked SEEK_HOLE/SEEK_DATA
drm/vc4: fix fb references in async update
ALSA: seq: Cover unsubscribe_port() in list_mutex
Linux 4.19.51
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
|
||
|
|
5094a85d6d |
mem-hotplug: fix node spanned pages when we have a node with only ZONE_MOVABLE
[ Upstream commit 299c83dce9ea3a79bb4b5511d2cb996b6b8e5111 ] |
||
|
|
d885da678e |
Merge 4.19.34 into android-4.19
Changes in 4.19.34 arm64: debug: Don't propagate UNKNOWN FAR into si_code for debug signals ext4: cleanup bh release code in ext4_ind_remove_space() tty/serial: atmel: Add is_half_duplex helper tty/serial: atmel: RS485 HD w/DMA: enable RX after TX is stopped CIFS: fix POSIX lock leak and invalid ptr deref h8300: use cc-cross-prefix instead of hardcoding h8300-unknown-linux- f2fs: fix to adapt small inline xattr space in __find_inline_xattr() f2fs: fix to avoid deadlock in f2fs_read_inline_dir() tracing: kdb: Fix ftdump to not sleep net/mlx5: Avoid panic when setting vport rate net/mlx5: Avoid panic when setting vport mac, getting vport config gpio: gpio-omap: fix level interrupt idling include/linux/relay.h: fix percpu annotation in struct rchan sysctl: handle overflow for file-max net: stmmac: Avoid sometimes uninitialized Clang warnings enic: fix build warning without CONFIG_CPUMASK_OFFSTACK libbpf: force fixdep compilation at the start of the build scsi: hisi_sas: Set PHY linkrate when disconnected scsi: hisi_sas: Fix a timeout race of driver internal and SMP IO iio: adc: fix warning in Qualcomm PM8xxx HK/XOADC driver x86/hyperv: Fix kernel panic when kexec on HyperV perf c2c: Fix c2c report for empty numa node mm/sparse: fix a bad comparison mm/cma.c: cma_declare_contiguous: correct err handling mm/page_ext.c: fix an imbalance with kmemleak mm, swap: bounds check swap_info array accesses to avoid NULL derefs mm,oom: don't kill global init via memory.oom.group memcg: killed threads should not invoke memcg OOM killer mm, mempolicy: fix uninit memory access mm/vmalloc.c: fix kernel BUG at mm/vmalloc.c:512! mm/slab.c: kmemleak no scan alien caches ocfs2: fix a panic problem caused by o2cb_ctl f2fs: do not use mutex lock in atomic context fs/file.c: initialize init_files.resize_wait page_poison: play nicely with KASAN cifs: use correct format characters dm thin: add sanity checks to thin-pool and external snapshot creation f2fs: fix to check inline_xattr_size boundary correctly cifs: Accept validate negotiate if server return NT_STATUS_NOT_SUPPORTED cifs: Fix NULL pointer dereference of devname netfilter: nf_tables: check the result of dereferencing base_chain->stats netfilter: conntrack: tcp: only close if RST matches exact sequence jbd2: fix invalid descriptor block checksum fs: fix guard_bio_eod to check for real EOD errors tools lib traceevent: Fix buffer overflow in arg_eval PCI/PME: Fix hotplug/sysfs remove deadlock in pcie_pme_remove() wil6210: check null pointer in _wil_cfg80211_merge_extra_ies mt76: fix a leaked reference by adding a missing of_node_put crypto: crypto4xx - add missing of_node_put after of_device_is_available crypto: cavium/zip - fix collision with generic cra_driver_name usb: chipidea: Grab the (legacy) USB PHY by phandle first powerpc/powernv/ioda: Fix locked_vm counting for memory used by IOMMU tables scsi: core: replace GFP_ATOMIC with GFP_KERNEL in scsi_scan.c kbuild: invoke syncconfig if include/config/auto.conf.cmd is missing powerpc/xmon: Fix opcode being uninitialized in print_insn_powerpc coresight: etm4x: Add support to enable ETMv4.2 serial: 8250_pxa: honor the port number from devicetree ARM: 8840/1: use a raw_spinlock_t in unwind iommu/io-pgtable-arm-v7s: Only kmemleak_ignore L2 tables powerpc/hugetlb: Handle mmap_min_addr correctly in get_unmapped_area callback btrfs: qgroup: Make qgroup async transaction commit more aggressive mmc: omap: fix the maximum timeout setting net: dsa: mv88e6xxx: Add lockdep classes to fix false positive splat e1000e: Fix -Wformat-truncation warnings mlxsw: spectrum: Avoid -Wformat-truncation warnings platform/x86: ideapad-laptop: Fix no_hw_rfkill_list for Lenovo RESCUER R720-15IKBN platform/mellanox: mlxreg-hotplug: Fix KASAN warning loop: set GENHD_FL_NO_PART_SCAN after blkdev_reread_part() IB/mlx4: Increase the timeout for CM cache clk: fractional-divider: check parent rate only if flag is set perf annotate: Fix getting source line failure ASoC: qcom: Fix of-node refcount unbalance in qcom_snd_parse_of() cpufreq: acpi-cpufreq: Report if CPU doesn't support boost technologies efi: cper: Fix possible out-of-bounds access s390/ism: ignore some errors during deregistration scsi: megaraid_sas: return error when create DMA pool failed scsi: fcoe: make use of fip_mode enum complete drm/amd/display: Clear stream->mode_changed after commit perf test: Fix failure of 'evsel-tp-sched' test on s390 mwifiex: don't advertise IBSS features without FW support perf report: Don't shadow inlined symbol with different addr range SoC: imx-sgtl5000: add missing put_device() media: ov7740: fix runtime pm initialization media: sh_veu: Correct return type for mem2mem buffer helpers media: s5p-jpeg: Correct return type for mem2mem buffer helpers media: rockchip/rga: Correct return type for mem2mem buffer helpers media: s5p-g2d: Correct return type for mem2mem buffer helpers media: mx2_emmaprp: Correct return type for mem2mem buffer helpers media: mtk-jpeg: Correct return type for mem2mem buffer helpers mt76: usb: do not run mt76u_queues_deinit twice xen/gntdev: Do not destroy context while dma-bufs are in use vfs: fix preadv64v2 and pwritev64v2 compat syscalls with offset == -1 HID: intel-ish-hid: avoid binding wrong ishtp_cl_device cgroup, rstat: Don't flush subtree root unless necessary jbd2: fix race when writing superblock leds: lp55xx: fix null deref on firmware load failure perf report: Add s390 diagnosic sampling descriptor size iwlwifi: pcie: fix emergency path ACPI / video: Refactor and fix dmi_is_desktop() selftests: skip seccomp get_metadata test if not real root kprobes: Prohibit probing on bsearch() kprobes: Prohibit probing on RCU debug routine netfilter: conntrack: fix cloned unconfirmed skb->_nfct race in __nf_conntrack_confirm ARM: 8833/1: Ensure that NEON code always compiles with Clang ARM: dts: meson8b: fix the Ethernet data line signals in eth_rgmii_pins ALSA: PCM: check if ops are defined before suspending PCM ath10k: fix shadow register implementation for WCN3990 usb: f_fs: Avoid crash due to out-of-scope stack ptr access sched/topology: Fix percpu data types in struct sd_data & struct s_data bcache: fix input overflow to cache set sysfs file io_error_halflife bcache: fix input overflow to sequential_cutoff bcache: fix potential div-zero error of writeback_rate_i_term_inverse bcache: improve sysfs_strtoul_clamp() genirq: Avoid summation loops for /proc/stat net: marvell: mvpp2: fix stuck in-band SGMII negotiation iw_cxgb4: fix srqidx leak during connection abort net: phy: consider latched link-down status in polling mode fbdev: fbmem: fix memory access if logo is bigger than the screen cdrom: Fix race condition in cdrom_sysctl_register drm: rcar-du: add missing of_node_put drm/amd/display: Don't re-program planes for DPMS changes drm/amd/display: Disconnect mpcc when changing tg perf/aux: Make perf_event accessible to setup_aux() e1000e: fix cyclic resets at link up with active tx e1000e: Exclude device from suspend direct complete optimization platform/x86: intel_pmc_core: Fix PCH IP sts reading i2c: of: Try to find an I2C adapter matching the parent staging: spi: mt7621: Add return code check on device_reset() iwlwifi: mvm: fix RFH config command with >=10 CPUs ASoC: fsl-asoc-card: fix object reference leaks in fsl_asoc_card_probe sched/debug: Initialize sd_sysctl_cpus if !CONFIG_CPUMASK_OFFSTACK efi/memattr: Don't bail on zero VA if it equals the region's PA sched/core: Use READ_ONCE()/WRITE_ONCE() in move_queued_task()/task_rq_lock() drm/vkms: Bugfix extra vblank frame ARM: dts: lpc32xx: Remove leading 0x and 0s from bindings notation efi/arm/arm64: Allow SetVirtualAddressMap() to be omitted soc: qcom: gsbi: Fix error handling in gsbi_probe() mt7601u: bump supported EEPROM version ARM: 8830/1: NOMMU: Toggle only bits in EXC_RETURN we are really care of ARM: avoid Cortex-A9 livelock on tight dmb loops block, bfq: fix in-service-queue check for queue merging bpf: fix missing prototype warnings selftests/bpf: skip verifier tests for unsupported program types powerpc/64s: Clear on-stack exception marker upon exception return cgroup/pids: turn cgroup_subsys->free() into cgroup_subsys->release() to fix the accounting backlight: pwm_bl: Use gpiod_get_value_cansleep() to get initial state tty: increase the default flip buffer limit to 2*640K powerpc/pseries: Perform full re-add of CPU for topology update post-migration drm/amd/display: Enable vblank interrupt during CRC capture ALSA: dice: add support for Solid State Logic Duende Classic/Mini usb: dwc3: gadget: Fix OTG events when gadget driver isn't loaded platform/x86: intel-hid: Missing power button release on some Dell models perf script python: Use PyBytes for attr in trace-event-python perf script python: Add trace_context extension module to sys.modules media: mt9m111: set initial frame size other than 0x0 hwrng: virtio - Avoid repeated init of completion soc/tegra: fuse: Fix illegal free of IO base address HID: intel-ish: ipc: handle PIMR before ish_wakeup also clear PISR busy_clear bit f2fs: UBSAN: set boolean value iostat_enable correctly hpet: Fix missing '=' character in the __setup() code of hpet_mmap_enable cpu/hotplug: Mute hotplug lockdep during init dmaengine: imx-dma: fix warning comparison of distinct pointer types dmaengine: qcom_hidma: assign channel cookie correctly dmaengine: qcom_hidma: initialize tx flags in hidma_prep_dma_* netfilter: physdev: relax br_netfilter dependency media: rcar-vin: Allow independent VIN link enablement media: s5p-jpeg: Check for fmt_ver_flag when doing fmt enumeration regulator: act8865: Fix act8600_sudcdc_voltage_ranges setting pinctrl: meson: meson8b: add the eth_rxd2 and eth_rxd3 pins drm: Auto-set allow_fb_modifiers when given modifiers at plane init drm/nouveau: Stop using drm_crtc_force_disable x86/build: Specify elf_i386 linker emulation explicitly for i386 objects selinux: do not override context on context mounts brcmfmac: Use firmware_request_nowarn for the clm_blob wlcore: Fix memory leak in case wl12xx_fetch_firmware failure x86/build: Mark per-CPU symbols as absolute explicitly for LLD drm/fb-helper: fix leaks in error path of drm_fb_helper_fbdev_setup clk: meson: clean-up clock registration clk: rockchip: fix frac settings of GPLL clock for rk3328 dmaengine: tegra: avoid overflow of byte tracking Input: soc_button_array - fix mapping of the 5th GPIO in a PNP0C40 device drm/dp/mst: Configure no_stop_bit correctly for remote i2c xfers net: stmmac: Avoid one more sometimes uninitialized Clang warning ACPI / video: Extend chassis-type detection with a "Lunch Box" check bcache: fix potential div-zero error of writeback_rate_p_term_inverse kprobes/x86: Blacklist non-attachable interrupt functions Linux 4.19.34 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> |
||
|
|
a6c56bf63e |
page_poison: play nicely with KASAN
[ Upstream commit 4117992df66a26fa33908b4969e04801534baab1 ] KASAN does not play well with the page poisoning (CONFIG_PAGE_POISONING). It triggers false positives in the allocation path: BUG: KASAN: use-after-free in memchr_inv+0x2ea/0x330 Read of size 8 at addr ffff88881f800000 by task swapper/0 CPU: 0 PID: 0 Comm: swapper Not tainted 5.0.0-rc1+ #54 Call Trace: dump_stack+0xe0/0x19a print_address_description.cold.2+0x9/0x28b kasan_report.cold.3+0x7a/0xb5 __asan_report_load8_noabort+0x19/0x20 memchr_inv+0x2ea/0x330 kernel_poison_pages+0x103/0x3d5 get_page_from_freelist+0x15e7/0x4d90 because KASAN has not yet unpoisoned the shadow page for allocation before it checks memchr_inv() but only found a stale poison pattern. Also, false positives in free path, BUG: KASAN: slab-out-of-bounds in kernel_poison_pages+0x29e/0x3d5 Write of size 4096 at addr ffff8888112cc000 by task swapper/0/1 CPU: 5 PID: 1 Comm: swapper/0 Not tainted 5.0.0-rc1+ #55 Call Trace: dump_stack+0xe0/0x19a print_address_description.cold.2+0x9/0x28b kasan_report.cold.3+0x7a/0xb5 check_memory_region+0x22d/0x250 memset+0x28/0x40 kernel_poison_pages+0x29e/0x3d5 __free_pages_ok+0x75f/0x13e0 due to KASAN adds poisoned redzones around slab objects, but the page poisoning needs to poison the whole page. Link: http://lkml.kernel.org/r/20190114233405.67843-1-cai@lca.pw Signed-off-by: Qian Cai <cai@lca.pw> Acked-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org> |
||
|
|
bb418a146a |
Merge 4.19.31 into android-4.19
Changes in 4.19.31 media: videobuf2-v4l2: drop WARN_ON in vb2_warn_zero_bytesused() 9p: use inode->i_lock to protect i_size_write() under 32-bit 9p/net: fix memory leak in p9_client_create ASoC: fsl_esai: fix register setting issue in RIGHT_J mode ASoC: codecs: pcm186x: fix wrong usage of DECLARE_TLV_DB_SCALE() ASoC: codecs: pcm186x: Fix energysense SLEEP bit iio: adc: exynos-adc: Fix NULL pointer exception on unbind mei: hbm: clean the feature flags on link reset mei: bus: move hw module get/put to probe/release stm class: Fix an endless loop in channel allocation crypto: caam - fix hash context DMA unmap size crypto: ccree - fix missing break in switch statement crypto: caam - fixed handling of sg list crypto: caam - fix DMA mapping of stack memory crypto: ccree - fix free of unallocated mlli buffer crypto: ccree - unmap buffer before copying IV crypto: ccree - don't copy zero size ciphertext crypto: cfb - add missing 'chunksize' property crypto: cfb - remove bogus memcpy() with src == dest crypto: ahash - fix another early termination in hash walk crypto: rockchip - fix scatterlist nents error crypto: rockchip - update new iv to device in multiple operations drm/imx: ignore plane updates on disabled crtcs gpu: ipu-v3: Fix i.MX51 CSI control registers offset drm/imx: imx-ldb: add missing of_node_puts gpu: ipu-v3: Fix CSI offsets for imx53 ASoC: rt5682: Correct the setting while select ASRC clk for AD/DA filter clocksource: timer-ti-dm: Fix pwm dmtimer usage of fck reparenting KVM: arm/arm64: vgic: Make vgic_dist->lpi_list_lock a raw_spinlock arm64: dts: rockchip: fix graph_port warning on rk3399 bob kevin and excavator s390/dasd: fix using offset into zero size array error Input: pwm-vibra - prevent unbalanced regulator Input: pwm-vibra - stop regulator after disabling pwm, not before ARM: dts: Configure clock parent for pwm vibra ARM: OMAP2+: Variable "reg" in function omap4_dsi_mux_pads() could be uninitialized ASoC: dapm: fix out-of-bounds accesses to DAPM lookup tables ASoC: rsnd: fixup rsnd_ssi_master_clk_start() user count check KVM: arm/arm64: Reset the VCPU without preemption and vcpu state loaded arm/arm64: KVM: Allow a VCPU to fully reset itself arm/arm64: KVM: Don't panic on failure to properly reset system registers KVM: arm/arm64: vgic: Always initialize the group of private IRQs KVM: arm64: Forbid kprobing of the VHE world-switch code ASoC: samsung: Prevent clk_get_rate() calls in atomic context ARM: OMAP2+: fix lack of timer interrupts on CPU1 after hotplug Input: cap11xx - switch to using set_brightness_blocking() Input: ps2-gpio - flush TX work when closing port Input: matrix_keypad - use flush_delayed_work() mac80211: call drv_ibss_join() on restart mac80211: Fix Tx aggregation session tear down with ITXQs netfilter: compat: initialize all fields in xt_init blk-mq: insert rq with DONTPREP to hctx dispatch list when requeue ipvs: fix dependency on nf_defrag_ipv6 floppy: check_events callback should not return a negative number xprtrdma: Make sure Send CQ is allocated on an existing compvec NFS: Don't use page_file_mapping after removing the page mm/gup: fix gup_pmd_range() for dax Revert "mm: use early_pfn_to_nid in page_ext_init" scsi: qla2xxx: Fix panic from use after free in qla2x00_async_tm_cmd net: dsa: bcm_sf2: potential array overflow in bcm_sf2_sw_suspend() x86/CPU: Add Icelake model number mm: page_alloc: fix ref bias in page_frag_alloc() for 1-byte allocs net: hns: Fix object reference leaks in hns_dsaf_roce_reset() i2c: cadence: Fix the hold bit setting i2c: bcm2835: Clear current buffer pointers and counts after a transfer auxdisplay: ht16k33: fix potential user-after-free on module unload Input: st-keyscan - fix potential zalloc NULL dereference clk: sunxi-ng: v3s: Fix TCON reset de-assert bit kallsyms: Handle too long symbols in kallsyms.c clk: sunxi: A31: Fix wrong AHB gate number esp: Skip TX bytes accounting when sending from a request socket ARM: 8824/1: fix a migrating irq bug when hotplug cpu bpf: only adjust gso_size on bytestream protocols bpf: fix lockdep false positive in stackmap af_key: unconditionally clone on broadcast ARM: 8835/1: dma-mapping: Clear DMA ops on teardown assoc_array: Fix shortcut creation keys: Fix dependency loop between construction record and auth key scsi: libiscsi: Fix race between iscsi_xmit_task and iscsi_complete_task net: systemport: Fix reception of BPDUs net: dsa: bcm_sf2: Do not assume DSA master supports WoL pinctrl: meson: meson8b: fix the sdxc_a data 1..3 pins qmi_wwan: apply SET_DTR quirk to Sierra WP7607 net: mv643xx_eth: disable clk on error path in mv643xx_eth_shared_probe() xfrm: Fix inbound traffic via XFRM interfaces across network namespaces mailbox: bcm-flexrm-mailbox: Fix FlexRM ring flush timeout issue ASoC: topology: free created components in tplg load error qed: Fix iWARP buffer size provided for syn packet processing. qed: Fix iWARP syn packet mac address validation. ARM: dts: armada-xp: fix Armada XP boards NAND description arm64: Relax GIC version check during early boot ARM: tegra: Restore DT ABI on Tegra124 Chromebooks net: marvell: mvneta: fix DMA debug warning mm: handle lru_add_drain_all for UP properly tmpfs: fix link accounting when a tmpfile is linked in ixgbe: fix older devices that do not support IXGBE_MRQC_L3L4TXSWEN ARCv2: lib: memcpy: fix doing prefetchw outside of buffer ARC: uacces: remove lp_start, lp_end from clobber list ARCv2: support manual regfile save on interrupts ARCv2: don't assume core 0x54 has dual issue phonet: fix building with clang mac80211_hwsim: propagate genlmsg_reply return code bpf, lpm: fix lookup bug in map_delete_elem net: thunderx: make CFG_DONE message to run through generic send-ack sequence net: thunderx: add nicvf_send_msg_to_pf result check for set_rx_mode_task nfp: bpf: fix code-gen bug on BPF_ALU | BPF_XOR | BPF_K nfp: bpf: fix ALU32 high bits clearance bug bnxt_en: Fix typo in firmware message timeout logic. bnxt_en: Wait longer for the firmware message response to complete. net: set static variable an initial value in atl2_probe() selftests: fib_tests: sleep after changing carrier. again. tmpfs: fix uninitialized return value in shmem_link stm class: Prevent division by zero nfit: acpi_nfit_ctl(): Check out_obj->type in the right place acpi/nfit: Fix bus command validation nfit/ars: Attempt a short-ARS whenever the ARS state is idle at boot nfit/ars: Attempt short-ARS even in the no_init_ars case libnvdimm/label: Clear 'updating' flag after label-set update libnvdimm, pfn: Fix over-trim in trim_pfn_device() libnvdimm/pmem: Honor force_raw for legacy pmem regions libnvdimm: Fix altmap reservation size calculation fix cgroup_do_mount() handling of failure exits crypto: aead - set CRYPTO_TFM_NEED_KEY if ->setkey() fails crypto: aegis - fix handling chunked inputs crypto: arm/crct10dif - revert to C code for short inputs crypto: arm64/aes-neonbs - fix returning final keystream block crypto: arm64/crct10dif - revert to C code for short inputs crypto: hash - set CRYPTO_TFM_NEED_KEY if ->setkey() fails crypto: morus - fix handling chunked inputs crypto: pcbc - remove bogus memcpy()s with src == dest crypto: skcipher - set CRYPTO_TFM_NEED_KEY if ->setkey() fails crypto: testmgr - skip crc32c context test for ahash algorithms crypto: x86/aegis - fix handling chunked inputs and MAY_SLEEP crypto: x86/aesni-gcm - fix crash on empty plaintext crypto: x86/morus - fix handling chunked inputs and MAY_SLEEP crypto: arm64/aes-ccm - fix logical bug in AAD MAC handling crypto: arm64/aes-ccm - fix bugs in non-NEON fallback routine CIFS: Do not reset lease state to NONE on lease break CIFS: Do not skip SMB2 message IDs on send failures CIFS: Fix read after write for files with read caching tracing: Use strncpy instead of memcpy for string keys in hist triggers tracing: Do not free iter->trace in fail path of tracing_open_pipe() tracing/perf: Use strndup_user() instead of buggy open-coded version xen: fix dom0 boot on huge systems ACPI / device_sysfs: Avoid OF modalias creation for removed device mmc: sdhci-esdhc-imx: fix HS400 timing issue mmc:fix a bug when max_discard is 0 netfilter: ipt_CLUSTERIP: fix warning unused variable cn spi: ti-qspi: Fix mmap read when more than one CS in use spi: pxa2xx: Setup maximum supported DMA transfer length regulator: s2mps11: Fix steps for buck7, buck8 and LDO35 regulator: max77620: Initialize values for DT properties regulator: s2mpa01: Fix step values for some LDOs clocksource/drivers/exynos_mct: Move one-shot check from tick clear to ISR clocksource/drivers/exynos_mct: Clear timer interrupt when shutdown clocksource/drivers/arch_timer: Workaround for Allwinner A64 timer instability s390/setup: fix early warning messages s390/virtio: handle find on invalid queue gracefully scsi: virtio_scsi: don't send sc payload with tmfs scsi: aacraid: Fix performance issue on logical drives scsi: sd: Optimal I/O size should be a multiple of physical block size scsi: target/iscsi: Avoid iscsit_release_commands_from_conn() deadlock scsi: qla2xxx: Fix LUN discovery if loop id is not assigned yet by firmware fs/devpts: always delete dcache dentry-s in dput() splice: don't merge into linked buffers ovl: During copy up, first copy up data and then xattrs ovl: Do not lose security.capability xattr over metadata file copy-up m68k: Add -ffreestanding to CFLAGS Btrfs: setup a nofs context for memory allocation at btrfs_create_tree() Btrfs: setup a nofs context for memory allocation at __btrfs_set_acl btrfs: ensure that a DUP or RAID1 block group has exactly two stripes Btrfs: fix corruption reading shared and compressed extents after hole punching soc: qcom: rpmh: Avoid accessing freed memory from batch API libertas_tf: don't set URB_ZERO_PACKET on IN USB transfer irqchip/gic-v3-its: Avoid parsing _indirect_ twice for Device table irqchip/brcmstb-l2: Use _irqsave locking variants in non-interrupt code x86/kprobes: Prohibit probing on optprobe template code cpufreq: kryo: Release OPP tables on module removal cpufreq: tegra124: add missing of_node_put() cpufreq: pxa2xx: remove incorrect __init annotation ext4: fix check of inode in swap_inode_boot_loader ext4: cleanup pagecache before swap i_data ext4: update quota information while swapping boot loader inode ext4: add mask of ext4 flags to swap ext4: fix crash during online resizing PCI/ASPM: Use LTR if already enabled by platform PCI/DPC: Fix print AER status in DPC event handling PCI: dwc: skip MSI init if MSIs have been explicitly disabled IB/hfi1: Close race condition on user context disable and close cxl: Wrap iterations over afu slices inside 'afu_list_lock' ext2: Fix underflow in ext2_max_size() clk: uniphier: Fix update register for CPU-gear clk: clk-twl6040: Fix imprecise external abort for pdmclk clk: samsung: exynos5: Fix possible NULL pointer exception on platform_device_alloc() failure clk: samsung: exynos5: Fix kfree() of const memory on setting driver_override clk: ingenic: Fix round_rate misbehaving with non-integer dividers clk: ingenic: Fix doc of ingenic_cgu_div_info usb: chipidea: tegra: Fix missed ci_hdrc_remove_device() usb: typec: tps6598x: handle block writes separately with plain-I2C adapters dmaengine: usb-dmac: Make DMAC system sleep callbacks explicit mm: hwpoison: fix thp split handing in soft_offline_in_use_page() mm/vmalloc: fix size check for remap_vmalloc_range_partial() mm/memory.c: do_fault: avoid usage of stale vm_area_struct kernel/sysctl.c: add missing range check in do_proc_dointvec_minmax_conv device property: Fix the length used in PROPERTY_ENTRY_STRING() intel_th: Don't reference unassigned outputs parport_pc: fix find_superio io compare code, should use equal test. i2c: tegra: fix maximum transfer size media: i2c: ov5640: Fix post-reset delay gpio: pca953x: Fix dereference of irq data in shutdown can: flexcan: FLEXCAN_IFLAG_MB: add () around macro argument drm/i915: Relax mmap VMA check bpf: only test gso type on gso packets serial: uartps: Fix stuck ISR if RX disabled with non-empty FIFO serial: 8250_of: assume reg-shift of 2 for mrvl,mmp-uart serial: 8250_pci: Fix number of ports for ACCES serial cards serial: 8250_pci: Have ACCES cards that use the four port Pericom PI7C9X7954 chip use the pci_pericom_setup() jbd2: clear dirty flag when revoking a buffer from an older transaction jbd2: fix compile warning when using JBUFFER_TRACE selinux: add the missing walk_size + len check in selinux_sctp_bind_connect security/selinux: fix SECURITY_LSM_NATIVE_LABELS on reused superblock powerpc/32: Clear on-stack exception marker upon exception return powerpc/wii: properly disable use of BATs when requested. powerpc/powernv: Make opal log only readable by root powerpc/83xx: Also save/restore SPRG4-7 during suspend powerpc/powernv: Don't reprogram SLW image on every KVM guest entry/exit powerpc: Fix 32-bit KVM-PR lockup and host crash with MacOS guest powerpc/ptrace: Simplify vr_get/set() to avoid GCC warning powerpc/hugetlb: Don't do runtime allocation of 16G pages in LPAR configuration powerpc/traps: fix recoverability of machine check handling on book3s/32 powerpc/traps: Fix the message printed when stack overflows ARM: s3c24xx: Fix boolean expressions in osiris_dvs_notify arm64: Fix HCR.TGE status for NMI contexts arm64: debug: Ensure debug handlers check triggering exception level arm64: KVM: Fix architecturally invalid reset value for FPEXC32_EL2 ipmi_si: fix use-after-free of resource->name dm: fix to_sector() for 32bit dm integrity: limit the rate of error messages mfd: sm501: Fix potential NULL pointer dereference cpcap-charger: generate events for userspace NFS: Fix I/O request leakages NFS: Fix an I/O request leakage in nfs_do_recoalesce NFS: Don't recoalesce on error in nfs_pageio_complete_mirror() nfsd: fix performance-limiting session calculation nfsd: fix memory corruption caused by readdir nfsd: fix wrong check in write_v4_end_grace() NFSv4.1: Reinitialise sequence results before retransmitting a request svcrpc: fix UDP on servers with lots of threads PM / wakeup: Rework wakeup source timer cancellation bcache: never writeback a discard operation stable-kernel-rules.rst: add link to networking patch queue vt: perform safe console erase in the right order x86/unwind/orc: Fix ORC unwind table alignment perf intel-pt: Fix CYC timestamp calculation after OVF perf tools: Fix split_kallsyms_for_kcore() for trampoline symbols perf auxtrace: Define auxtrace record alignment perf intel-pt: Fix overlap calculation for padding perf/x86/intel/uncore: Fix client IMC events return huge result perf intel-pt: Fix divide by zero when TSC is not available md: Fix failed allocation of md_register_thread tpm/tpm_crb: Avoid unaligned reads in crb_recv() tpm: Unify the send callback behaviour rcu: Do RCU GP kthread self-wakeup from softirq and interrupt media: imx: prpencvf: Stop upstream before disabling IDMA channel media: lgdt330x: fix lock status reporting media: uvcvideo: Avoid NULL pointer dereference at the end of streaming media: vimc: Add vimc-streamer for stream control media: imx: csi: Disable CSI immediately after last EOF media: imx: csi: Stop upstream before disabling IDMA channel drm/fb-helper: generic: Fix drm_fbdev_client_restore() drm/radeon/evergreen_cs: fix missing break in switch statement drm/amd/powerplay: correct power reading on fiji drm/amd/display: don't call dm_pp_ function from an fpu block KVM: Call kvm_arch_memslots_updated() before updating memslots KVM: x86/mmu: Detect MMIO generation wrap in any address space KVM: x86/mmu: Do not cache MMIO accesses while memslots are in flux KVM: nVMX: Sign extend displacements of VMX instr's mem operands KVM: nVMX: Apply addr size mask to effective address for VMX instructions KVM: nVMX: Ignore limit checks on VMX instructions using flat segments bcache: use (REQ_META|REQ_PRIO) to indicate bio for metadata s390/setup: fix boot crash for machine without EDAT-1 Linux 4.19.31 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> |
||
|
|
33e83ea302 |
mm: page_alloc: fix ref bias in page_frag_alloc() for 1-byte allocs
[ Upstream commit 2c2ade81741c66082f8211f0b96cf509cc4c0218 ] The basic idea behind ->pagecnt_bias is: If we pre-allocate the maximum number of references that we might need to create in the fastpath later, the bump-allocation fastpath only has to modify the non-atomic bias value that tracks the number of extra references we hold instead of the atomic refcount. The maximum number of allocations we can serve (under the assumption that no allocation is made with size 0) is nc->size, so that's the bias used. However, even when all memory in the allocation has been given away, a reference to the page is still held; and in the `offset < 0` slowpath, the page may be reused if everyone else has dropped their references. This means that the necessary number of references is actually `nc->size+1`. Luckily, from a quick grep, it looks like the only path that can call page_frag_alloc(fragsz=1) is TAP with the IFF_NAPI_FRAGS flag, which requires CAP_NET_ADMIN in the init namespace and is only intended to be used for kernel testing and fuzzing. To test for this issue, put a `WARN_ON(page_ref_count(page) == 0)` in the `offset < 0` path, below the virt_to_page() call, and then repeatedly call writev() on a TAP device with IFF_TAP|IFF_NO_PI|IFF_NAPI_FRAGS|IFF_NAPI, with a vector consisting of 15 elements containing 1 byte each. Signed-off-by: Jann Horn <jannh@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org> |
||
|
|
e550f94252 |
UPSTREAM: psi: pressure stall information for CPU, memory, and IO
When systems are overcommitted and resources become contended, it's hard
to tell exactly the impact this has on workload productivity, or how close
the system is to lockups and OOM kills. In particular, when machines work
multiple jobs concurrently, the impact of overcommit in terms of latency
and throughput on the individual job can be enormous.
In order to maximize hardware utilization without sacrificing individual
job health or risk complete machine lockups, this patch implements a way
to quantify resource pressure in the system.
A kernel built with CONFIG_PSI=y creates files in /proc/pressure/ that
expose the percentage of time the system is stalled on CPU, memory, or IO,
respectively. Stall states are aggregate versions of the per-task delay
accounting delays:
cpu: some tasks are runnable but not executing on a CPU
memory: tasks are reclaiming, or waiting for swapin or thrashing cache
io: tasks are waiting for io completions
These percentages of walltime can be thought of as pressure percentages,
and they give a general sense of system health and productivity loss
incurred by resource overcommit. They can also indicate when the system
is approaching lockup scenarios and OOMs.
To do this, psi keeps track of the task states associated with each CPU
and samples the time they spend in stall states. Every 2 seconds, the
samples are averaged across CPUs - weighted by the CPUs' non-idle time to
eliminate artifacts from unused CPUs - and translated into percentages of
walltime. A running average of those percentages is maintained over 10s,
1m, and 5m periods (similar to the loadaverage).
[hannes@cmpxchg.org: doc fixlet, per Randy]
Link: http://lkml.kernel.org/r/20180828205625.GA14030@cmpxchg.org
[hannes@cmpxchg.org: code optimization]
Link: http://lkml.kernel.org/r/20180907175015.GA8479@cmpxchg.org
[hannes@cmpxchg.org: rename psi_clock() to psi_update_work(), per Peter]
Link: http://lkml.kernel.org/r/20180907145404.GB11088@cmpxchg.org
[hannes@cmpxchg.org: fix build]
Link: http://lkml.kernel.org/r/20180913014222.GA2370@cmpxchg.org
Link: http://lkml.kernel.org/r/20180828172258.3185-9-hannes@cmpxchg.org
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Daniel Drake <drake@endlessm.com>
Tested-by: Suren Baghdasaryan <surenb@google.com>
Cc: Christopher Lameter <cl@linux.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Johannes Weiner <jweiner@fb.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Enderborg <peter.enderborg@sony.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Vinayak Menon <vinmenon@codeaurora.org>
Cc: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit eb414681d5a07d28d2ff90dc05f69ec6b232ebd2)
Bug: 127712811
Test: lmkd in PSI mode
Change-Id: Id00d23c977169b0c4636d92016fc1fee0274be05
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
|
||
|
|
6e0411bdc2 |
Merge 4.19.21 into android-4.19
Changes in 4.19.21
devres: Align data[] to ARCH_KMALLOC_MINALIGN
drm/bufs: Fix Spectre v1 vulnerability
staging: iio: adc: ad7280a: handle error from __ad7280_read32()
drm/vgem: Fix vgem_init to get drm device available.
pinctrl: bcm2835: Use raw spinlock for RT compatibility
ASoC: Intel: mrfld: fix uninitialized variable access
gpiolib: Fix possible use after free on label
drm/sun4i: Initialize registers in tcon-top driver
genirq/affinity: Spread IRQs to all available NUMA nodes
gpu: ipu-v3: image-convert: Prevent race between run and unprepare
nds32: Fix gcc 8.0 compiler option incompatible.
wil6210: fix reset flow for Talyn-mb
wil6210: fix memory leak in wil_find_tx_bcast_2
ath10k: assign 'n_cipher_suites' for WCN3990
ath9k: dynack: use authentication messages for 'late' ack
scsi: lpfc: Correct LCB RJT handling
scsi: mpt3sas: Call sas_remove_host before removing the target devices
scsi: lpfc: Fix LOGO/PLOGI handling when triggerd by ABTS Timeout event
ARM: 8808/1: kexec:offline panic_smp_self_stop CPU
clk: boston: fix possible memory leak in clk_boston_setup()
dlm: Don't swamp the CPU with callbacks queued during recovery
x86/PCI: Fix Broadcom CNB20LE unintended sign extension (redux)
powerpc/pseries: add of_node_put() in dlpar_detach_node()
crypto: aes_ti - disable interrupts while accessing S-box
drm/vc4: ->x_scaling[1] should never be set to VC4_SCALING_NONE
serial: fsl_lpuart: clear parity enable bit when disable parity
ptp: check gettime64 return code in PTP_SYS_OFFSET ioctl
MIPS: Boston: Disable EG20T prefetch
dpaa2-ptp: defer probe when portal allocation failed
iwlwifi: fw: do not set sgi bits for HE connection
staging:iio:ad2s90: Make probe handle spi_setup failure
fpga: altera-cvp: Fix registration for CvP incapable devices
Tools: hv: kvp: Fix a warning of buffer overflow with gcc 8.0.1
fpga: altera-cvp: fix 'bad IO access' on x86_64
vbox: fix link error with 'gcc -Og'
platform/chrome: don't report EC_MKBP_EVENT_SENSOR_FIFO as wakeup
i40e: prevent overlapping tx_timeout recover
scsi: hisi_sas: change the time of SAS SSP connection
staging: iio: ad7780: update voltage on read
usbnet: smsc95xx: fix rx packet alignment
drm/rockchip: fix for mailbox read size
ARM: OMAP2+: hwmod: Fix some section annotations
drm/amd/display: fix gamma not being applied correctly
drm/amd/display: calculate stream->phy_pix_clk before clock mapping
bpf: libbpf: retry map creation without the name
net/mlx5: EQ, Use the right place to store/read IRQ affinity hint
modpost: validate symbol names also in find_elf_symbol
perf tools: Add Hygon Dhyana support
soc/tegra: Don't leak device tree node reference
media: rc: ensure close() is called on rc_unregister_device
media: video-i2c: avoid accessing released memory area when removing driver
media: mtk-vcodec: Release device nodes in mtk_vcodec_init_enc_pm()
staging: erofs: fix the definition of DBG_BUGON
clk: meson: meson8b: do not use cpu_div3 for cpu_scale_out_sel
clk: meson: meson8b: fix the width of the cpu_scale_div clock
clk: meson: meson8b: mark the CPU clock as CLK_IS_CRITICAL
ptp: Fix pass zero to ERR_PTR() in ptp_clock_register
dmaengine: xilinx_dma: Remove __aligned attribute on zynqmp_dma_desc_ll
powerpc/32: Add .data..Lubsan_data*/.data..Lubsan_type* sections explicitly
iio: adc: meson-saradc: check for devm_kasprintf failure
iio: adc: meson-saradc: fix internal clock names
iio: accel: kxcjk1013: Add KIOX010A ACPI Hardware-ID
media: adv*/tc358743/ths8200: fill in min width/height/pixelclock
ACPI: SPCR: Consider baud rate 0 as preconfigured state
staging: pi433: fix potential null dereference
f2fs: move dir data flush to write checkpoint process
f2fs: fix race between write_checkpoint and write_begin
f2fs: fix wrong return value of f2fs_acl_create
i2c: sh_mobile: add support for r8a77990 (R-Car E3)
arm64: io: Ensure calls to delay routines are ordered against prior readX()
net: aquantia: return 'err' if set MPI_DEINIT state fails
sunvdc: Do not spin in an infinite loop when vio_ldc_send() returns EAGAIN
soc: bcm: brcmstb: Don't leak device tree node reference
nfsd4: fix crash on writing v4_end_grace before nfsd startup
drm: Clear state->acquire_ctx before leaving drm_atomic_helper_commit_duplicated_state()
perf: arm_spe: handle devm_kasprintf() failure
arm64: io: Ensure value passed to __iormb() is held in a 64-bit register
Thermal: do not clear passive state during system sleep
thermal: Fix locking in cooling device sysfs update cur_state
firmware/efi: Add NULL pointer checks in efivars API functions
s390/zcrypt: improve special ap message cmd handling
mt76x0: dfs: fix IBI_R11 configuration on non-radar channels
arm64: ftrace: don't adjust the LR value
drm/v3d: Fix prime imports of buffers from other drivers.
ARM: dts: mmp2: fix TWSI2
ARM: dts: aspeed: add missing memory unit-address
x86/fpu: Add might_fault() to user_insn()
media: i2c: TDA1997x: select CONFIG_HDMI
media: DaVinci-VPBE: fix error handling in vpbe_initialize()
smack: fix access permissions for keyring
xtensa: xtfpga.dtsi: fix dtc warnings about SPI
usb: dwc3: Correct the logic for checking TRB full in __dwc3_prepare_one_trb()
usb: dwc2: Disable power down feature on Samsung SoCs
usb: hub: delay hub autosuspend if USB3 port is still link training
timekeeping: Use proper seqcount initializer
usb: mtu3: fix the issue about SetFeature(U1/U2_Enable)
clk: sunxi-ng: a33: Set CLK_SET_RATE_PARENT for all audio module clocks
media: imx274: select REGMAP_I2C
drm/amdgpu/powerplay: fix clock stretcher limits on polaris (v2)
tipc: fix node keep alive interval calculation
driver core: Move async_synchronize_full call
kobject: return error code if writing /sys/.../uevent fails
IB/hfi1: Unreserve a reserved request when it is completed
usb: dwc3: trace: add missing break statement to make compiler happy
gpio: mt7621: report failure of devm_kasprintf()
gpio: mt7621: pass mediatek_gpio_bank_probe() failure up the stack
pinctrl: sx150x: handle failure case of devm_kstrdup
iommu/amd: Fix amd_iommu=force_isolation
ARM: dts: Fix OMAP4430 SDP Ethernet startup
mips: bpf: fix encoding bug for mm_srlv32_op
media: coda: fix H.264 deblocking filter controls
ARM: dts: Fix up the D-Link DIR-685 MTD partition info
watchdog: renesas_wdt: don't set divider while watchdog is running
ARM: dts: imx51-zii-rdu1: Do not specify "power-gpio" for hpa1
usb: dwc3: gadget: Disable CSP for stream OUT ep
iommu/arm-smmu-v3: Avoid memory corruption from Hisilicon MSI payloads
iommu/arm-smmu: Add support for qcom,smmu-v2 variant
iommu/arm-smmu-v3: Use explicit mb() when moving cons pointer
sata_rcar: fix deferred probing
clk: imx6sl: ensure MMDC CH0 handshake is bypassed
platform/x86: mlx-platform: Fix tachometer registers
cpuidle: big.LITTLE: fix refcount leak
OPP: Use opp_table->regulators to verify no regulator case
tee: optee: avoid possible double list_del()
drm/msm/dsi: fix dsi clock names in DSI 10nm PLL driver
drm/msm: dpu: Only check flush register against pending flushes
lightnvm: pblk: fix resubmission of overwritten write err lbas
lightnvm: pblk: add lock protection to list operations
i2c-axxia: check for error conditions first
phy: sun4i-usb: add support for missing USB PHY index
mlxsw: spectrum_acl: Limit priority value
udf: Fix BUG on corrupted inode
switchtec: Fix SWITCHTEC_IOCTL_EVENT_IDX_ALL flags overwrite
selftests/bpf: use __bpf_constant_htons in test_prog.c
ARM: pxa: avoid section mismatch warning
ASoC: fsl: Fix SND_SOC_EUKREA_TLV320 build error on i.MX8M
KVM: PPC: Book3S: Only report KVM_CAP_SPAPR_TCE_VFIO on powernv machines
mmc: bcm2835: Recover from MMC_SEND_EXT_CSD
mmc: bcm2835: reset host on timeout
mmc: meson-mx-sdio: check devm_kasprintf for failure
memstick: Prevent memstick host from getting runtime suspended during card detection
mmc: sdhci-of-esdhc: Fix timeout checks
mmc: sdhci-omap: Fix timeout checks
mmc: sdhci-xenon: Fix timeout checks
mmc: jz4740: Get CD/WP GPIOs from descriptors
usb: renesas_usbhs: add support for RZ/G2E
btrfs: harden agaist duplicate fsid on scanned devices
serial: sh-sci: Fix locking in sci_submit_rx()
serial: sh-sci: Resume PIO in sci_rx_interrupt() on DMA failure
tty: serial: samsung: Properly set flags in autoCTS mode
perf test: Fix perf_event_attr test failure
perf dso: Fix unchecked usage of strncpy()
perf header: Fix unchecked usage of strncpy()
btrfs: use tagged writepage to mitigate livelock of snapshot
perf probe: Fix unchecked usage of strncpy()
i2c: sh_mobile: Add support for r8a774c0 (RZ/G2E)
bnxt_en: Disable MSIX before re-reserving NQs/CMPL rings.
tools/power/x86/intel_pstate_tracer: Fix non root execution for post processing a trace file
livepatch: check kzalloc return values
arm64: KVM: Skip MMIO insn after emulation
usb: musb: dsps: fix otg state machine
usb: musb: dsps: fix runtime pm for peripheral mode
perf header: Fix up argument to ctime()
perf tools: Cast off_t to s64 to avoid warning on bionic libc
percpu: convert spin_lock_irq to spin_lock_irqsave.
net: hns3: fix incomplete uninitialization of IRQ in the hns3_nic_uninit_vector_data()
drm/amd/display: Add retry to read ddc_clock pin
Bluetooth: hci_bcm: Handle deferred probing for the clock supply
drm/amd/display: fix YCbCr420 blank color
powerpc/uaccess: fix warning/error with access_ok()
mac80211: fix radiotap vendor presence bitmap handling
xfrm6_tunnel: Fix spi check in __xfrm6_tunnel_alloc_spi
mlxsw: spectrum: Properly cleanup LAG uppers when removing port from LAG
scsi: smartpqi: correct host serial num for ssa
scsi: smartpqi: correct volume status
scsi: smartpqi: increase fw status register read timeout
cw1200: Fix concurrency use-after-free bugs in cw1200_hw_scan()
net: hns3: add max vector number check for pf
powerpc/perf: Fix thresholding counter data for unknown type
iwlwifi: mvm: fix setting HE ppe FW config
powerpc/powernv/ioda: Allocate indirect TCE levels of cached userspace addresses on demand
mlx5: update timecounter at least twice per counter overflow
drbd: narrow rcu_read_lock in drbd_sync_handshake
drbd: disconnect, if the wrong UUIDs are attached on a connected peer
drbd: skip spurious timeout (ping-timeo) when failing promote
drbd: Avoid Clang warning about pointless switch statment
drm/amd/display: validate extended dongle caps
video: clps711x-fb: release disp device node in probe()
md: fix raid10 hang issue caused by barrier
fbdev: fbmem: behave better with small rotated displays and many CPUs
i40e: define proper net_device::neigh_priv_len
ice: Do not enable NAPI on q_vectors that have no rings
igb: Fix an issue that PME is not enabled during runtime suspend
ACPI/APEI: Clear GHES block_status before panic()
fbdev: fbcon: Fix unregister crash when more than one framebuffer
powerpc/mm: Fix reporting of kernel execute faults on the 8xx
pinctrl: meson: meson8: fix the GPIO function for the GPIOAO pins
pinctrl: meson: meson8b: fix the GPIO function for the GPIOAO pins
KVM: x86: svm: report MSR_IA32_MCG_EXT_CTL as unsupported
powerpc/fadump: Do not allow hot-remove memory from fadump reserved area.
kvm: Change offset in kvm_write_guest_offset_cached to unsigned
NFS: nfs_compare_mount_options always compare auth flavors.
perf build: Don't unconditionally link the libbfd feature test to -liberty and -lz
hwmon: (lm80) fix a missing check of the status of SMBus read
hwmon: (lm80) fix a missing check of bus read in lm80 probe
seq_buf: Make seq_buf_puts() null-terminate the buffer
crypto: ux500 - Use proper enum in cryp_set_dma_transfer
crypto: ux500 - Use proper enum in hash_set_dma_transfer
MIPS: ralink: Select CONFIG_CPU_MIPSR2_IRQ_VI on MT7620/8
cifs: check ntwrk_buf_start for NULL before dereferencing it
f2fs: fix use-after-free issue when accessing sbi->stat_info
um: Avoid marking pages with "changed protection"
niu: fix missing checks of niu_pci_eeprom_read
f2fs: fix sbi->extent_list corruption issue
cgroup: fix parsing empty mount option string
perf python: Do not force closing original perf descriptor in evlist.get_pollfd()
scripts/decode_stacktrace: only strip base path when a prefix of the path
arch/sh/boards/mach-kfr2r09/setup.c: fix struct mtd_oob_ops build warning
ocfs2: don't clear bh uptodate for block read
ocfs2: improve ocfs2 Makefile
mm/page_alloc.c: don't call kasan_free_pages() at deferred mem init
zram: fix lockdep warning of free block handling
isdn: hisax: hfc_pci: Fix a possible concurrency use-after-free bug in HFCPCI_l1hw()
gdrom: fix a memory leak bug
fsl/fman: Use GFP_ATOMIC in {memac,tgec}_add_hash_mac_address()
block/swim3: Fix -EBUSY error when re-opening device after unmount
thermal: bcm2835: enable hwmon explicitly
kdb: Don't back trace on a cpu that didn't round up
PCI: imx: Enable MSI from downstream components
thermal: generic-adc: Fix adc to temp interpolation
HID: lenovo: Add checks to fix of_led_classdev_register
arm64/sve: ptrace: Fix SVE_PT_REGS_OFFSET definition
kernel/hung_task.c: break RCU locks based on jiffies
proc/sysctl: fix return error for proc_doulongvec_minmax()
kernel/hung_task.c: force console verbose before panic
fs/epoll: drop ovflist branch prediction
exec: load_script: don't blindly truncate shebang string
kernel/kcov.c: mark write_comp_data() as notrace
scripts/gdb: fix lx-version string output
xfs: Fix xqmstats offsets in /proc/fs/xfs/xqmstat
xfs: cancel COW blocks before swapext
xfs: Fix error code in 'xfs_ioc_getbmap()'
xfs: fix overflow in xfs_attr3_leaf_verify
xfs: fix shared extent data corruption due to missing cow reservation
xfs: fix transient reference count error in xfs_buf_resubmit_failed_buffers
xfs: delalloc -> unwritten COW fork allocation can go wrong
fs/xfs: fix f_ffree value for statfs when project quota is set
xfs: fix PAGE_MASK usage in xfs_free_file_space
xfs: fix inverted return from xfs_btree_sblock_verify_crc
thermal: hwmon: inline helpers when CONFIG_THERMAL_HWMON is not set
dccp: fool proof ccid_hc_[rt]x_parse_options()
enic: fix checksum validation for IPv6
lib/test_rhashtable: Make test_insert_dup() allocate its hash table dynamically
net: dp83640: expire old TX-skb
net: dsa: Fix lockdep false positive splat
net: dsa: Fix NULL checking in dsa_slave_set_eee()
net: dsa: mv88e6xxx: Fix counting of ATU violations
net: dsa: slave: Don't propagate flag changes on down slave interfaces
net/mlx5e: Force CHECKSUM_UNNECESSARY for short ethernet frames
net: systemport: Fix WoL with password after deep sleep
rds: fix refcount bug in rds_sock_addref
Revert "net: phy: marvell: avoid pause mode on SGMII-to-Copper for 88e151x"
rxrpc: bad unlock balance in rxrpc_recvmsg
sctp: check and update stream->out_curr when allocating stream_out
sctp: walk the list of asoc safely
skge: potential memory corruption in skge_get_regs()
virtio_net: Account for tx bytes and packets on sending xdp_frames
net/mlx5e: FPGA, fix Innova IPsec TX offload data path performance
xfs: eof trim writeback mapping as soon as it is cached
ALSA: compress: Fix stop handling on compressed capture streams
ALSA: usb-audio: Add support for new T+A USB DAC
ALSA: hda - Serialize codec registrations
ALSA: hda/realtek - Fix lose hp_pins for disable auto mute
ALSA: hda/realtek - Use a common helper for hp pin reference
ALSA: hda/realtek - Headset microphone support for System76 darp5
fuse: call pipe_buf_release() under pipe lock
fuse: decrement NR_WRITEBACK_TEMP on the right page
fuse: handle zero sized retrieve correctly
HID: debug: fix the ring buffer implementation
dmaengine: bcm2835: Fix interrupt race on RT
dmaengine: bcm2835: Fix abort of transactions
dmaengine: imx-dma: fix wrong callback invoke
futex: Handle early deadlock return correctly
irqchip/gic-v3-its: Plug allocation race for devices sharing a DevID
usb: phy: am335x: fix race condition in _probe
usb: dwc3: gadget: Handle 0 xfer length for OUT EP
usb: gadget: udc: net2272: Fix bitwise and boolean operations
usb: gadget: musb: fix short isoc packets with inventra dma
staging: speakup: fix tty-operation NULL derefs
scsi: cxlflash: Prevent deadlock when adapter probe fails
scsi: aic94xx: fix module loading
KVM: x86: work around leak of uninitialized stack contents (CVE-2019-7222)
kvm: fix kvm_ioctl_create_device() reference counting (CVE-2019-6974)
KVM: nVMX: unconditionally cancel preemption timer in free_nested (CVE-2019-7221)
cpu/hotplug: Fix "SMT disabled by BIOS" detection for KVM
perf/x86/intel/uncore: Add Node ID mask
x86/MCE: Initialize mce.bank in the case of a fatal error in mce_no_way_out()
perf/core: Don't WARN() for impossible ring-buffer sizes
perf tests evsel-tp-sched: Fix bitwise operator
serial: fix race between flush_to_ldisc and tty_open
serial: 8250_pci: Make PCI class test non fatal
serial: sh-sci: Do not free irqs that have already been freed
cacheinfo: Keep the old value if of_property_read_u32 fails
IB/hfi1: Add limit test for RC/UC send via loopback
perf/x86/intel: Delay memory deallocation until x86_pmu_dead_cpu()
ath9k: dynack: make ewma estimation faster
ath9k: dynack: check da->enabled first in sampling routines
Linux 4.19.21
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
|
||
|
|
f73c77535f |
mm/page_alloc.c: don't call kasan_free_pages() at deferred mem init
[ Upstream commit 3c0c12cc8f00ca5f81acb010023b8eb13e9a7004 ]
When CONFIG_KASAN is enabled on large memory SMP systems, the deferrred
pages initialization can take a long time. Below were the reported init
times on a 8-socket 96-core 4TB IvyBridge system.
1) Non-debug kernel without CONFIG_KASAN
[ 8.764222] node 1 initialised, 132086516 pages in 7027ms
2) Debug kernel with CONFIG_KASAN
[ 146.288115] node 1 initialised, 132075466 pages in 143052ms
So the page init time in a debug kernel was 20X of the non-debug kernel.
The long init time can be problematic as the page initialization is done
with interrupt disabled. In this particular case, it caused the
appearance of following warning messages as well as NMI backtraces of all
the cores that were doing the initialization.
[ 68.240049] rcu: INFO: rcu_sched detected stalls on CPUs/tasks:
[ 68.241000] rcu: 25-...0: (100 ticks this GP) idle=b72/1/0x4000000000000000 softirq=915/915 fqs=16252
[ 68.241000] rcu: 44-...0: (95 ticks this GP) idle=49a/1/0x4000000000000000 softirq=788/788 fqs=16253
[ 68.241000] rcu: 54-...0: (104 ticks this GP) idle=03a/1/0x4000000000000000 softirq=721/825 fqs=16253
[ 68.241000] rcu: 60-...0: (103 ticks this GP) idle=cbe/1/0x4000000000000000 softirq=637/740 fqs=16253
[ 68.241000] rcu: 72-...0: (105 ticks this GP) idle=786/1/0x4000000000000000 softirq=536/641 fqs=16253
[ 68.241000] rcu: 84-...0: (99 ticks this GP) idle=292/1/0x4000000000000000 softirq=537/537 fqs=16253
[ 68.241000] rcu: 111-...0: (104 ticks this GP) idle=bde/1/0x4000000000000000 softirq=474/476 fqs=16253
[ 68.241000] rcu: (detected by 13, t=65018 jiffies, g=249, q=2)
The long init time was mainly caused by the call to kasan_free_pages() to
poison the newly initialized pages. On a 4TB system, we are talking about
almost 500GB of memory probably on the same node.
In reality, we may not need to poison the newly initialized pages before
they are ever allocated. So KASAN poisoning of freed pages before the
completion of deferred memory initialization is now disabled. Those pages
will be properly poisoned when they are allocated or freed after deferred
pages initialization is done.
With this change, the new page initialization time became:
[ 21.948010] node 1 initialised, 132075466 pages in 18702ms
This was still about double the non-debug kernel time, but was much
better than before.
Link: http://lkml.kernel.org/r/1544459388-8736-1-git-send-email-longman@redhat.com
Signed-off-by: Waiman Long <longman@redhat.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Pasha Tatashin <Pavel.Tatashin@microsoft.com>
Cc: Oscar Salvador <osalvador@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
||
|
|
18ba00a34e |
Merge 4.19.19 into android-4.19
Changes in 4.19.19
amd-xgbe: Fix mdio access for non-zero ports and clause 45 PHYs
net: bridge: Fix ethernet header pointer before check skb forwardable
net: Fix usage of pskb_trim_rcsum
net: phy: marvell: Errata for mv88e6390 internal PHYs
net: phy: mdio_bus: add missing device_del() in mdiobus_register() error handling
net/sched: act_tunnel_key: fix memory leak in case of action replace
net_sched: refetch skb protocol for each filter
openvswitch: Avoid OOB read when parsing flow nlattrs
vhost: log dirty page correctly
mlxsw: pci: Increase PCI SW reset timeout
net: ipv4: Fix memory leak in network namespace dismantle
mlxsw: spectrum_fid: Update dummy FID index
mlxsw: pci: Ring CQ's doorbell before RDQ's
net/sched: cls_flower: allocate mask dynamically in fl_change()
udp: with udp_segment release on error path
ip6_gre: fix tunnel list corruption for x-netns
erspan: build the header with the right proto according to erspan_ver
net: phy: marvell: Fix deadlock from wrong locking
ip6_gre: update version related info when changing link
tcp: allow MSG_ZEROCOPY transmission also in CLOSE_WAIT state
mei: me: mark LBG devices as having dma support
mei: me: add denverton innovation engine device IDs
USB: leds: fix regression in usbport led trigger
USB: serial: simple: add Motorola Tetra TPG2200 device id
USB: serial: pl2303: add new PID to support PL2303TB
ceph: clear inode pointer when snap realm gets dropped by its inode
ASoC: atom: fix a missing check of snd_pcm_lib_malloc_pages
ASoC: rt5514-spi: Fix potential NULL pointer dereference
ASoC: tlv320aic32x4: Kernel OOPS while entering DAPM standby mode
clk: socfpga: stratix10: fix rate calculation for pll clocks
clk: socfpga: stratix10: fix naming convention for the fixed-clocks
inotify: Fix fd refcount leak in inotify_add_watch().
ALSA: hda/realtek - Fix typo for ALC225 model
ALSA: hda - Add mute LED support for HP ProBook 470 G5
ARCv2: lib: memeset: fix doing prefetchw outside of buffer
ARC: adjust memblock_reserve of kernel memory
ARC: perf: map generic branches to correct hardware condition
s390/mm: always force a load of the primary ASCE on context switch
s390/early: improve machine detection
s390/smp: fix CPU hotplug deadlock with CPU rescan
misc: ibmvsm: Fix potential NULL pointer dereference
char/mwave: fix potential Spectre v1 vulnerability
mmc: dw_mmc-bluefield: : Fix the license information
mmc: meson-gx: Free irq in release() callback
staging: rtl8188eu: Add device code for D-Link DWA-121 rev B1
tty: Handle problem if line discipline does not have receive_buf
uart: Fix crash in uart_write and uart_put_char
tty/n_hdlc: fix __might_sleep warning
hv_balloon: avoid touching uninitialized struct page during tail onlining
Drivers: hv: vmbus: Check for ring when getting debug info
vgacon: unconfuse vc_origin when using soft scrollback
CIFS: Fix possible hang during async MTU reads and writes
CIFS: Fix credits calculations for reads with errors
CIFS: Fix credit calculation for encrypted reads with errors
CIFS: Do not reconnect TCP session in add_credits()
smb3: add credits we receive from oplock/break PDUs
Input: xpad - add support for SteelSeries Stratus Duo
Input: input_event - provide override for sparc64
Input: uinput - fix undefined behavior in uinput_validate_absinfo()
acpi/nfit: Block function zero DSMs
acpi/nfit: Fix command-supported detection
scsi: ufs: Use explicit access size in ufshcd_dump_regs
dm thin: fix passdown_double_checking_shared_status()
dm crypt: fix parsing of extended IV arguments
drm/amdgpu: Add APTX quirk for Lenovo laptop
KVM: x86: Fix single-step debugging
KVM: x86: Fix PV IPIs for 32-bit KVM host
KVM: x86: WARN_ONCE if sending a PV IPI returns a fatal error
kvm: x86/vmx: Use kzalloc for cached_vmcs12
KVM/nVMX: Do not validate that posted_intr_desc_addr is page aligned
x86/pkeys: Properly copy pkey state at fork()
x86/selftests/pkeys: Fork() to check for state being preserved
x86/kaslr: Fix incorrect i8254 outb() parameters
x86/entry/64/compat: Fix stack switching for XEN PV
posix-cpu-timers: Unbreak timer rearming
net: sun: cassini: Cleanup license conflict
irqchip/gic-v3-its: Align PCI Multi-MSI allocation on their size
can: dev: __can_get_echo_skb(): fix bogous check for non-existing skb by removing it
can: bcm: check timer values before ktime conversion
can: flexcan: fix NULL pointer exception during bringup
vt: make vt_console_print() compatible with the unicode screen buffer
vt: always call notifier with the console lock held
vt: invoke notifier on screen size change
drm/meson: Fix atomic mode switching regression
bpf: improve verifier branch analysis
bpf: add per-insn complexity limit
bpf: move {prev_,}insn_idx into verifier env
bpf: move tmp variable into ax register in interpreter
bpf: enable access to ax register also from verifier rewrite
bpf: restrict map value pointer arithmetic for unprivileged
bpf: restrict stack pointer arithmetic for unprivileged
bpf: restrict unknown scalars of mixed signed bounds for unprivileged
bpf: fix check_map_access smin_value test when pointer contains offset
bpf: prevent out of bounds speculation on pointer arithmetic
bpf: fix sanitation of alu op with pointer / scalar type from different paths
bpf: fix inner map masking to prevent oob under speculation
s390/smp: Fix calling smp_call_ipl_cpu() from ipl CPU
nvmet-rdma: Add unlikely for response allocated check
nvmet-rdma: fix null dereference under heavy load
Revert "mm, memory_hotplug: initialize struct pages for the full memory section"
usb: dwc3: gadget: Clear req->needs_extra_trb flag on cleanup
ide: fix a typo in the settings proc file name
Input: input_event - fix the CONFIG_SPARC64 mixup
Linux 4.19.19
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
|
||
|
|
6bab957396 |
Revert "mm, memory_hotplug: initialize struct pages for the full memory section"
commit 4aa9fc2a435abe95a1e8d7f8c7b3d6356514b37a upstream.
This reverts commit 2830bf6f05fb3e05bc4743274b806c821807a684.
The underlying assumption that one sparse section belongs into a single
numa node doesn't hold really. Robert Shteynfeld has reported a boot
failure. The boot log was not captured but his memory layout is as
follows:
Early memory node ranges
node 1: [mem 0x0000000000001000-0x0000000000090fff]
node 1: [mem 0x0000000000100000-0x00000000dbdf8fff]
node 1: [mem 0x0000000100000000-0x0000001423ffffff]
node 0: [mem 0x0000001424000000-0x0000002023ffffff]
This means that node0 starts in the middle of a memory section which is
also in node1. memmap_init_zone tries to initialize padding of a
section even when it is outside of the given pfn range because there are
code paths (e.g. memory hotplug) which assume that the full worth of
memory section is always initialized.
In this particular case, though, such a range is already intialized and
most likely already managed by the page allocator. Scribbling over
those pages corrupts the internal state and likely blows up when any of
those pages gets used.
Reported-by: Robert Shteynfeld <robert.shteynfeld@gmail.com>
Fixes: 2830bf6f05fb ("mm, memory_hotplug: initialize struct pages for the full memory section")
Cc: stable@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
||
|
|
a872d2d074 |
Merge 4.19.13 into android-4.19
Changes in 4.19.13 iomap: Revert "fs/iomap.c: get/put the page in iomap_page_create/release()" Revert "vfs: Allow userns root to call mknod on owned filesystems." USB: hso: Fix OOB memory access in hso_probe/hso_get_config_data xhci: Don't prevent USB2 bus suspend in state check intended for USB3 only USB: xhci: fix 'broken_suspend' placement in struct xchi_hcd USB: serial: option: add GosunCn ZTE WeLink ME3630 USB: serial: option: add HP lt4132 USB: serial: option: add Simcom SIM7500/SIM7600 (MBIM mode) USB: serial: option: add Fibocom NL668 series USB: serial: option: add Telit LN940 series ubifs: Handle re-linking of inodes correctly while recovery scsi: t10-pi: Return correct ref tag when queue has no integrity profile scsi: sd: use mempool for discard special page mmc: core: Reset HPI enabled state during re-init and in case of errors mmc: core: Allow BKOPS and CACHE ctrl even if no HPI support mmc: core: Use a minimum 1600ms timeout when enabling CACHE ctrl mmc: omap_hsmmc: fix DMA API warning gpio: max7301: fix driver for use with CONFIG_VMAP_STACK gpiolib-acpi: Only defer request_irq for GpioInt ACPI event handlers posix-timers: Fix division by zero bug KVM: X86: Fix NULL deref in vcpu_scan_ioapic kvm: x86: Add AMD's EX_CFG to the list of ignored MSRs KVM: Fix UAF in nested posted interrupt processing Drivers: hv: vmbus: Return -EINVAL for the sys files for unopened channels futex: Cure exit race x86/mtrr: Don't copy uninitialized gentry fields back to userspace x86/mm: Fix decoy address handling vs 32-bit builds x86/vdso: Pass --eh-frame-hdr to the linker x86/intel_rdt: Ensure a CPU remains online for the region's pseudo-locking sequence panic: avoid deadlocks in re-entrant console drivers mm: add mm_pxd_folded checks to pgtable_bytes accounting functions mm: make the __PAGETABLE_PxD_FOLDED defines non-empty mm: introduce mm_[p4d|pud|pmd]_folded xfrm_user: fix freeing of xfrm states on acquire rtlwifi: Fix leak of skb when processing C2H_BT_INFO iwlwifi: mvm: don't send GEO_TX_POWER_LIMIT to old firmwares Revert "mwifiex: restructure rx_reorder_tbl_lock usage" iwlwifi: add new cards for 9560, 9462, 9461 and killer series media: ov5640: Fix set format regression mm, memory_hotplug: initialize struct pages for the full memory section mm: thp: fix flags for pmd migration when split mm, page_alloc: fix has_unmovable_pages for HugePages mm: don't miss the last page because of round-off error Input: elantech - disable elan-i2c for P52 and P72 proc/sysctl: don't return ENOMEM on lookup when a table is unregistering drm/ioctl: Fix Spectre v1 vulnerabilities Linux 4.19.13 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> |
||
|
|
e27666dd8f |
mm, page_alloc: fix has_unmovable_pages for HugePages
commit 17e2e7d7e1b83fa324b3f099bfe426659aa3c2a4 upstream. While playing with gigantic hugepages and memory_hotplug, I triggered the following #PF when "cat memoryX/removable": BUG: unable to handle kernel NULL pointer dereference at 0000000000000008 #PF error: [normal kernel read fault] PGD 0 P4D 0 Oops: 0000 [#1] SMP PTI CPU: 1 PID: 1481 Comm: cat Tainted: G E 4.20.0-rc6-mm1-1-default+ #18 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.0.0-prebuilt.qemu-project.org 04/01/2014 RIP: 0010:has_unmovable_pages+0x154/0x210 Call Trace: is_mem_section_removable+0x7d/0x100 removable_show+0x90/0xb0 dev_attr_show+0x1c/0x50 sysfs_kf_seq_show+0xca/0x1b0 seq_read+0x133/0x380 __vfs_read+0x26/0x180 vfs_read+0x89/0x140 ksys_read+0x42/0x90 do_syscall_64+0x5b/0x180 entry_SYSCALL_64_after_hwframe+0x44/0xa9 The reason is we do not pass the Head to page_hstate(), and so, the call to compound_order() in page_hstate() returns 0, so we end up checking all hstates's size to match PAGE_SIZE. Obviously, we do not find any hstate matching that size, and we return NULL. Then, we dereference that NULL pointer in hugepage_migration_supported() and we got the #PF from above. Fix that by getting the head page before calling page_hstate(). Also, since gigantic pages span several pageblocks, re-adjust the logic for skipping pages. While are it, we can also get rid of the round_up(). [osalvador@suse.de: remove round_up(), adjust skip pages logic per Michal] Link: http://lkml.kernel.org/r/20181221062809.31771-1-osalvador@suse.de Link: http://lkml.kernel.org/r/20181217225113.17864-1-osalvador@suse.de Signed-off-by: Oscar Salvador <osalvador@suse.de> Acked-by: Michal Hocko <mhocko@suse.com> Reviewed-by: David Hildenbrand <david@redhat.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Pavel Tatashin <pavel.tatashin@microsoft.com> Cc: Mike Rapoport <rppt@linux.vnet.ibm.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
||
|
|
7592dbfaf3 |
mm, memory_hotplug: initialize struct pages for the full memory section
commit 2830bf6f05fb3e05bc4743274b806c821807a684 upstream. If memory end is not aligned with the sparse memory section boundary, the mapping of such a section is only partly initialized. This may lead to VM_BUG_ON due to uninitialized struct page access from is_mem_section_removable() or test_pages_in_a_zone() function triggered by memory_hotplug sysfs handlers: Here are the the panic examples: CONFIG_DEBUG_VM=y CONFIG_DEBUG_VM_PGFLAGS=y kernel parameter mem=2050M -------------------------- page:000003d082008000 is uninitialized and poisoned page dumped because: VM_BUG_ON_PAGE(PagePoisoned(p)) Call Trace: ( test_pages_in_a_zone+0xde/0x160) show_valid_zones+0x5c/0x190 dev_attr_show+0x34/0x70 sysfs_kf_seq_show+0xc8/0x148 seq_read+0x204/0x480 __vfs_read+0x32/0x178 vfs_read+0x82/0x138 ksys_read+0x5a/0xb0 system_call+0xdc/0x2d8 Last Breaking-Event-Address: test_pages_in_a_zone+0xde/0x160 Kernel panic - not syncing: Fatal exception: panic_on_oops kernel parameter mem=3075M -------------------------- page:000003d08300c000 is uninitialized and poisoned page dumped because: VM_BUG_ON_PAGE(PagePoisoned(p)) Call Trace: ( is_mem_section_removable+0xb4/0x190) show_mem_removable+0x9a/0xd8 dev_attr_show+0x34/0x70 sysfs_kf_seq_show+0xc8/0x148 seq_read+0x204/0x480 __vfs_read+0x32/0x178 vfs_read+0x82/0x138 ksys_read+0x5a/0xb0 system_call+0xdc/0x2d8 Last Breaking-Event-Address: is_mem_section_removable+0xb4/0x190 Kernel panic - not syncing: Fatal exception: panic_on_oops Fix the problem by initializing the last memory section of each zone in memmap_init_zone() till the very end, even if it goes beyond the zone end. Michal said: : This has alwways been problem AFAIU. It just went unnoticed because we : have zeroed memmaps during allocation before |
||
|
|
67319b77a0 |
Merge 4.19.10 into android-4.19
Changes in 4.19.10 ipv4: ipv6: netfilter: Adjust the frag mem limit when truesize changes ipv6: Check available headroom in ip6_xmit() even without options neighbour: Avoid writing before skb->head in neigh_hh_output() ipv6: sr: properly initialize flowi6 prior passing to ip6_route_output net: 8139cp: fix a BUG triggered by changing mtu with network traffic net/mlx4_core: Correctly set PFC param if global pause is turned off. net/mlx4_en: Change min MTU size to ETH_MIN_MTU net: phy: don't allow __set_phy_supported to add unsupported modes net: Prevent invalid access to skb->prev in __qdisc_drop_all net: use skb_list_del_init() to remove from RX sublists Revert "net/ibm/emac: wrong bit is used for STA control" rtnetlink: ndo_dflt_fdb_dump() only work for ARPHRD_ETHER devices sctp: kfree_rcu asoc tcp: Do not underestimate rwnd_limited tcp: fix NULL ref in tail loss probe tun: forbid iface creation with rtnl ops virtio-net: keep vnet header zeroed after processing XDP net: phy: sfp: correct store of detected link modes sctp: update frag_point when stream_interleave is set net: restore call to netdev_queue_numa_node_write when resetting XPS net: fix XPS static_key accounting ARM: OMAP2+: prm44xx: Fix section annotation on omap44xx_prm_enable_io_wakeup ASoC: rsnd: fixup clock start checker ASoC: qdsp6: q6afe: Fix wrong MI2S SD line mask ASoC: qdsp6: q6afe-dai: Fix the dai widgets staging: rtl8723bs: Fix the return value in case of error in 'rtw_wx_read32()' ARM: dts: am3517: Fix pinmuxing for CD on MMC1 ARM: dts: LogicPD Torpedo: Fix mmc3_dat1 interrupt ARM: dts: logicpd-somlv: Fix interrupt on mmc3_dat1 ARM: dts: am3517-som: Fix WL127x Wifi interrupt ARM: OMAP1: ams-delta: Fix possible use of uninitialized field tools: bpftool: prevent infinite loop in get_fdinfo() ASoC: sun8i-codec: fix crash on module removal arm64: dts: sdm845-mtp: Reserve reserved gpios sysv: return 'err' instead of 0 in __sysv_write_inode netfilter: nf_conncount: use spin_lock_bh instead of spin_lock netfilter: nf_conncount: fix list_del corruption in conn_free netfilter: nf_conncount: fix unexpected permanent node of list. netfilter: nf_tables: don't skip inactive chains during update selftests: add script to stress-test nft packet path vs. control plane perf tools: Fix crash on synthesizing the unit netfilter: xt_RATEEST: remove netns exit routine netfilter: nf_tables: fix use-after-free when deleting compat expressions s390/cio: Fix cleanup of pfn_array alloc failure s390/cio: Fix cleanup when unsupported IDA format is used hwmon (ina2xx) Fix NULL id pointer in probe() hwmon: (raspberrypi) Fix initial notify ASoC: rockchip: add missing slave_config setting for I2S ASoC: wm_adsp: Fix dma-unsafe read of scratch registers ASoC: Intel: Power down links before turning off display audio power ASoC: qcom: Set dai_link id to each dai_link s390/cpum_cf: Reject request for sampling in event initialization hwmon: (ina2xx) Fix current value calculation ASoC: omap-abe-twl6040: Fix missing audio card caused by deferred probing ASoC: dapm: Recalculate audio map forcely when card instantiated spi: omap2-mcspi: Add missing suspend and resume calls hwmon: (mlxreg-fan) Fix macros for tacho fault reading bpf: allocate local storage buffers using GFP_ATOMIC aio: fix failure to put the file pointer netfilter: xt_hashlimit: fix a possible memory leak in htable_create() hwmon: (w83795) temp4_type has writable permission perf tools: Restore proper cwd on return from mnt namespace PCI: imx6: Fix link training status detection in link up check ASoC: acpi: fix: continue searching when machine is ignored objtool: Fix double-free in .cold detection error path objtool: Fix segfault in .cold detection with -ffunction-sections phy: qcom-qusb2: Use HSTX_TRIM fused value as is phy: qcom-qusb2: Fix HSTX_TRIM tuning with fused value for SDM845 ARM: dts: at91: sama5d2: use the divided clock for SMC Btrfs: send, fix infinite loop due to directory rename dependencies RDMA/mlx5: Fix fence type for IB_WR_LOCAL_INV WR RDMA/core: Add GIDs while changing MAC addr only for registered ndev RDMA/bnxt_re: Fix system hang when registration with L2 driver fails RDMA/bnxt_re: Avoid accessing the device structure after it is freed RDMA/rdmavt: Fix rvt_create_ah function signature tools: bpftool: fix potential NULL pointer dereference in do_load ASoC: omap-mcbsp: Fix latency value calculation for pm_qos ASoC: omap-mcpdm: Add pm_qos handling to avoid under/overruns with CPU_IDLE ASoC: omap-dmic: Add pm_qos handling to avoid overruns with CPU_IDLE exportfs: do not read dentry after free RDMA/hns: Bugfix pbl configuration for rereg mr bpf: fix check of allowed specifiers in bpf_trace_printk fsi: master-ast-cf: select GENERIC_ALLOCATOR ipvs: call ip_vs_dst_notifier earlier than ipv6_dev_notf USB: omap_udc: use devm_request_irq() USB: omap_udc: fix crashes on probe error and module removal USB: omap_udc: fix omap_udc_start() on 15xx machines USB: omap_udc: fix USB gadget functionality on Palm Tungsten E USB: omap_udc: fix rejection of out transfers when DMA is used thunderbolt: Prevent root port runtime suspend during NVM upgrade drm/meson: add support for 1080p25 mode netfilter: ipv6: Preserve link scope traffic original oif IB/mlx5: Fix page fault handling for MW netfilter: add missing error handling code for register functions netfilter: nat: fix double register in masquerade modules netfilter: nf_conncount: remove wrong condition check routine KVM: VMX: Update shared MSRs to be saved/restored on MSR_EFER.LMA changes KVM: x86: fix empty-body warnings x86/kvm/vmx: fix old-style function declaration net: thunderx: fix NULL pointer dereference in nic_remove usb: gadget: u_ether: fix unsafe list iteration netfilter: nf_tables: deactivate expressions in rule replecement routine ALSA: usb-audio: Add vendor and product name for Dell WD19 Dock cachefiles: Fix an assertion failure when trying to update a failed object fscache: Fix race in fscache_op_complete() due to split atomic_sub & read cachefiles: Fix page leak in cachefiles_read_backing_file while vmscan is active igb: fix uninitialized variables ixgbe: recognize 1000BaseLX SFP modules as 1Gbps net: hisilicon: remove unexpected free_netdev drm/amdgpu: Add delay after enable RLC ucode drm/ast: fixed reading monitor EDID not stable issue xen: xlate_mmu: add missing header to fix 'W=1' warning Revert "xen/balloon: Mark unallocated host memory as UNUSABLE" pvcalls-front: fixes incorrect error handling pstore/ram: Correctly calculate usable PRZ bytes afs: Fix validation/callback interaction fscache: fix race between enablement and dropping of object cachefiles: Explicitly cast enumerated type in put_object fscache, cachefiles: remove redundant variable 'cache' nvme: warn when finding multi-port subsystems without multipathing enabled nvme: flush namespace scanning work just before removing namespaces nvme-rdma: fix double freeing of async event data ACPI/IORT: Fix iort_get_platform_device_domain() uninitialized pointer value ocfs2: fix deadlock caused by ocfs2_defrag_extent() mm/page_alloc.c: fix calculation of pgdat->nr_zones hfs: do not free node before using hfsplus: do not free node before using debugobjects: avoid recursive calls with kmemleak proc: fixup map_files test on arm kernel/kcov.c: mark funcs in __sanitizer_cov_trace_pc() as notrace initramfs: clean old path before creating a hardlink ocfs2: fix potential use after free flexfiles: enforce per-mirror stateid only for v4 DSes dax: Check page->mapping isn't NULL ALSA: fireface: fix reference to wrong register for clock configuration ALSA: hda/realtek - Fixed headphone issue for ALC700 ALSA: hda/realtek: ALC294 mic and headset-mode fixups for ASUS X542UN ALSA: hda/realtek: Enable audio jacks of ASUS UX533FD with ALC294 ALSA: hda/realtek: Enable audio jacks of ASUS UX433FN/UX333FA with ALC294 ALSA: hda/realtek - Fix the mute LED regresion on Lenovo X1 Carbon IB/hfi1: Fix an out-of-bounds access in get_hw_stats bpf: fix off-by-one error in adjust_subprog_starts tcp: lack of available data can also cause TSO defer Linux 4.19.10 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> |
||
|
|
505bc9f389 |
mm/page_alloc.c: fix calculation of pgdat->nr_zones
[ Upstream commit 8f416836c0d50b198cad1225132e5abebf8980dc ]
init_currently_empty_zone() will adjust pgdat->nr_zones and set it to
'zone_idx(zone) + 1' unconditionally. This is correct in the normal
case, while not exact in hot-plug situation.
This function is used in two places:
* free_area_init_core()
* move_pfn_range_to_zone()
In the first case, we are sure zone index increase monotonically. While
in the second one, this is under users control.
One way to reproduce this is:
----------------------------
1. create a virtual machine with empty node1
-m 4G,slots=32,maxmem=32G \
-smp 4,maxcpus=8 \
-numa node,nodeid=0,mem=4G,cpus=0-3 \
-numa node,nodeid=1,mem=0G,cpus=4-7
2. hot-add cpu 3-7
cpu-add [3-7]
2. hot-add memory to nod1
object_add memory-backend-ram,id=ram0,size=1G
device_add pc-dimm,id=dimm0,memdev=ram0,node=1
3. online memory with following order
echo online_movable > memory47/state
echo online > memory40/state
After this, node1 will have its nr_zones equals to (ZONE_NORMAL + 1)
instead of (ZONE_MOVABLE + 1).
Michal said:
"Having an incorrect nr_zones might result in all sorts of problems
which would be quite hard to debug (e.g. reclaim not considering the
movable zone). I do not expect many users would suffer from this it
but still this is trivial and obviously right thing to do so
backporting to the stable tree shouldn't be harmful (last famous
words)"
Link: http://lkml.kernel.org/r/20181117022022.9956-1-richard.weiyang@gmail.com
Fixes:
|
||
|
|
635c56d224 |
Merge 4.19.6 into android-4.19
Changes in 4.19.6 HID: steam: remove input device when a hid client is running. efi/libstub: arm: support building with clang usb: core: Fix hub port connection events lost usb: dwc3: gadget: fix ISOC TRB type on unaligned transfers usb: dwc3: gadget: Properly check last unaligned/zero chain TRB usb: dwc3: core: Clean up ULPI device usb: dwc3: Fix NULL pointer exception in dwc3_pci_remove() xhci: Fix leaking USB3 shared_hcd at xhci removal xhci: handle port status events for removed USB3 hcd xhci: Add check for invalid byte size error when UAS devices are connected. usb: xhci: fix uninitialized completion when USB3 port got wrong status usb: xhci: fix timeout for transition from RExit to U0 xhci: Add quirk to workaround the errata seen on Cavium Thunder-X2 Soc usb: xhci: Prevent bus suspend if a port connect change or polling state is detected ALSA: oss: Use kvzalloc() for local buffer allocations MAINTAINERS: Add Sasha as a stable branch maintainer Documentation/security-bugs: Clarify treatment of embargoed information Documentation/security-bugs: Postpone fix publication in exceptional cases mmc: sdhci-pci: Try "cd" for card-detect lookup before using NULL mmc: sdhci-pci: Workaround GLK firmware failing to restore the tuning value gpio: don't free unallocated ida on gpiochip_add_data_with_key() error path iwlwifi: fix wrong WGDS_WIFI_DATA_SIZE iwlwifi: mvm: support sta_statistics() even on older firmware iwlwifi: mvm: fix regulatory domain update when the firmware starts iwlwifi: mvm: don't use SAR Geo if basic SAR is not used brcmfmac: fix reporting support for 160 MHz channels opp: ti-opp-supply: Dynamically update u_volt_min opp: ti-opp-supply: Correct the supply in _get_optimal_vdd_voltage call tools/power/cpupower: fix compilation with STATIC=true v9fs_dir_readdir: fix double-free on p9stat_read error selinux: Add __GFP_NOWARN to allocation at str_read() Input: synaptics - avoid using uninitialized variable when probing bfs: add sanity check at bfs_fill_super() sctp: clear the transport of some out_chunk_list chunks in sctp_assoc_rm_peer gfs2: Don't leave s_fs_info pointing to freed memory in init_sbd llc: do not use sk_eat_skb() mm: don't warn about large allocations for slab mm/memory.c: recheck page table entry with page table lock held tcp: do not release socket ownership in tcp_close() drm/fb-helper: Blacklist writeback when adding connectors to fbdev drm/amdgpu: Add missing firmware entry for HAINAN drm/vc4: Set ->legacy_cursor_update to false when doing non-async updates drm/amdgpu: Fix oops when pp_funcs->switch_power_profile is unset drm/i915: Disable LP3 watermarks on all SNB machines drm/ast: change resolution may cause screen blurred drm/ast: fixed cursor may disappear sometimes drm/ast: Remove existing framebuffers before loading driver can: flexcan: Unlock the MB unconditionally can: dev: can_get_echo_skb(): factor out non sending code to __can_get_echo_skb() can: dev: __can_get_echo_skb(): replace struct can_frame by canfd_frame to access frame length can: dev: __can_get_echo_skb(): Don't crash the kernel if can_priv::echo_skb is accessed out of bounds can: dev: __can_get_echo_skb(): print error message, if trying to echo non existing skb can: rx-offload: introduce can_rx_offload_get_echo_skb() and can_rx_offload_queue_sorted() functions can: rx-offload: rename can_rx_offload_irq_queue_err_skb() to can_rx_offload_queue_tail() can: flexcan: use can_rx_offload_queue_sorted() for flexcan_irq_bus_*() can: flexcan: handle tx-complete CAN frames via rx-offload infrastructure can: raw: check for CAN FD capable netdev in raw_sendmsg() can: hi311x: Use level-triggered interrupt can: flexcan: Always use last mailbox for TX can: flexcan: remove not needed struct flexcan_priv::tx_mb and struct flexcan_priv::tx_mb_idx ACPICA: AML interpreter: add region addresses in global list during initialization IB/hfi1: Eliminate races in the SDMA send error path fsnotify: generalize handling of extra event flags fanotify: fix handling of events on child sub-directory pinctrl: meson: fix pinconf bias disable pinctrl: meson: fix gxbb ao pull register bits pinctrl: meson: fix gxl ao pull register bits pinctrl: meson: fix meson8 ao pull register bits pinctrl: meson: fix meson8b ao pull register bits tools/testing/nvdimm: Fix the array size for dimm devices. scsi: lpfc: fix remoteport access scsi: hisi_sas: Remove set but not used variable 'dq_list' KVM: PPC: Move and undef TRACE_INCLUDE_PATH/FILE cpufreq: imx6q: add return value check for voltage scale rtc: cmos: Do not export alarm rtc_ops when we do not support alarms rtc: pcf2127: fix a kmemleak caused in pcf2127_i2c_gather_write crypto: simd - correctly take reqsize of wrapped skcipher into account floppy: fix race condition in __floppy_read_block_0() powerpc/io: Fix the IO workarounds code to work with Radix sched/fair: Fix cpu_util_wake() for 'execl' type workloads perf/x86/intel/uncore: Add more IMC PCI IDs for KabyLake and CoffeeLake CPUs block: copy ioprio in __bio_clone_fast() and bounce SUNRPC: Fix a bogus get/put in generic_key_to_expire() riscv: add missing vdso_install target RISC-V: Silence some module warnings on 32-bit drm/amdgpu: fix bug with IH ring setup kdb: Use strscpy with destination buffer size NFSv4: Fix an Oops during delegation callbacks powerpc/numa: Suppress "VPHN is not supported" messages efi/arm: Revert deferred unmap of early memmap mapping z3fold: fix possible reclaim races mm, memory_hotplug: check zone_movable in has_unmovable_pages tmpfs: make lseek(SEEK_DATA/SEK_HOLE) return ENXIO with a negative offset mm, page_alloc: check for max order in hot path dax: Avoid losing wakeup in dax_lock_mapping_entry include/linux/pfn_t.h: force '~' to be parsed as an unary operator tty: wipe buffer. tty: wipe buffer if not echoing data gfs2: Fix iomap buffer head reference counting bug rcu: Make need_resched() respond to urgent RCU-QS needs media: ov5640: Re-work MIPI startup sequence media: ov5640: Fix timings setup code media: ov5640: fix exposure regression media: ov5640: fix auto gain & exposure when changing mode media: ov5640: fix wrong binning value in exposure calculation media: ov5640: fix auto controls values when switching to manual mode Linux 4.19.6 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> |
||
|
|
fe0ea3084c |
ANDROID: add extra free kbytes tunable
Add a userspace visible knob to tell the VM to keep an extra amount of memory free, by increasing the gap between each zone's min and low watermarks. This is useful for realtime applications that call system calls and have a bound on the number of allocations that happen in any short time period. In this application, extra_free_kbytes would be left at an amount equal to or larger than than the maximum number of allocations that happen in any burst. It may also be useful to reduce the memory use of virtual machines (temporarily?), in a way that does not cause memory fragmentation like ballooning does. [ccross] Revived for use on old kernels where no other solution exists. The tunable will be removed on kernels that do better at avoiding direct reclaim. [surenb] Will be reverted as soon as Android framework is reworked to use upstream-supported watermark_scale_factor instead of extra_free_kbytes. Bug: 86445363 Bug: 109664768 Bug: 120445732 Change-Id: I765a42be8e964bfd3e2886d1ca85a29d60c3bb3e Signed-off-by: Rik van Riel <riel@redhat.com> Signed-off-by: Colin Cross <ccross@android.com> Signed-off-by: Suren Baghdasaryan <surenb@google.com> |
||
|
|
9dec38554a |
mm, page_alloc: check for max order in hot path
[ Upstream commit c63ae43ba53bc432b414fd73dd5f4b01fcb1ab43 ] Konstantin has noticed that kvmalloc might trigger the following warning: WARNING: CPU: 0 PID: 6676 at mm/vmstat.c:986 __fragmentation_index+0x54/0x60 [...] Call Trace: fragmentation_index+0x76/0x90 compaction_suitable+0x4f/0xf0 shrink_node+0x295/0x310 node_reclaim+0x205/0x250 get_page_from_freelist+0x649/0xad0 __alloc_pages_nodemask+0x12a/0x2a0 kmalloc_large_node+0x47/0x90 __kmalloc_node+0x22b/0x2e0 kvmalloc_node+0x3e/0x70 xt_alloc_table_info+0x3a/0x80 [x_tables] do_ip6t_set_ctl+0xcd/0x1c0 [ip6_tables] nf_setsockopt+0x44/0x60 SyS_setsockopt+0x6f/0xc0 do_syscall_64+0x67/0x120 entry_SYSCALL_64_after_hwframe+0x3d/0xa2 the problem is that we only check for an out of bound order in the slow path and the node reclaim might happen from the fast path already. This is fixable by making sure that kvmalloc doesn't ever use kmalloc for requests that are larger than KMALLOC_MAX_SIZE but this also shows that the code is rather fragile. A recent UBSAN report just underlines that by the following report UBSAN: Undefined behaviour in mm/page_alloc.c:3117:19 shift exponent 51 is too large for 32-bit type 'int' CPU: 0 PID: 6520 Comm: syz-executor1 Not tainted 4.19.0-rc2 #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0xd2/0x148 lib/dump_stack.c:113 ubsan_epilogue+0x12/0x94 lib/ubsan.c:159 __ubsan_handle_shift_out_of_bounds+0x2b6/0x30b lib/ubsan.c:425 __zone_watermark_ok+0x2c7/0x400 mm/page_alloc.c:3117 zone_watermark_fast mm/page_alloc.c:3216 [inline] get_page_from_freelist+0xc49/0x44c0 mm/page_alloc.c:3300 __alloc_pages_nodemask+0x21e/0x640 mm/page_alloc.c:4370 alloc_pages_current+0xcc/0x210 mm/mempolicy.c:2093 alloc_pages include/linux/gfp.h:509 [inline] __get_free_pages+0x12/0x60 mm/page_alloc.c:4414 dma_mem_alloc+0x36/0x50 arch/x86/include/asm/floppy.h:156 raw_cmd_copyin drivers/block/floppy.c:3159 [inline] raw_cmd_ioctl drivers/block/floppy.c:3206 [inline] fd_locked_ioctl+0xa00/0x2c10 drivers/block/floppy.c:3544 fd_ioctl+0x40/0x60 drivers/block/floppy.c:3571 __blkdev_driver_ioctl block/ioctl.c:303 [inline] blkdev_ioctl+0xb3c/0x1a30 block/ioctl.c:601 block_ioctl+0x105/0x150 fs/block_dev.c:1883 vfs_ioctl fs/ioctl.c:46 [inline] do_vfs_ioctl+0x1c0/0x1150 fs/ioctl.c:687 ksys_ioctl+0x9e/0xb0 fs/ioctl.c:702 __do_sys_ioctl fs/ioctl.c:709 [inline] __se_sys_ioctl fs/ioctl.c:707 [inline] __x64_sys_ioctl+0x7e/0xc0 fs/ioctl.c:707 do_syscall_64+0xc4/0x510 arch/x86/entry/common.c:290 entry_SYSCALL_64_after_hwframe+0x49/0xbe Note that this is not a kvmalloc path. It is just that the fast path really depends on having sanitzed order as well. Therefore move the order check to the fast path. Link: http://lkml.kernel.org/r/20181113094305.GM15120@dhcp22.suse.cz Signed-off-by: Michal Hocko <mhocko@suse.com> Reported-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Reported-by: Kyungtae Kim <kt0755@gmail.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Balbir Singh <bsingharora@gmail.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Pavel Tatashin <pavel.tatashin@microsoft.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Mike Rapoport <rppt@linux.vnet.ibm.com> Cc: Aaron Lu <aaron.lu@intel.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Byoungyoung Lee <lifeasageek@gmail.com> Cc: "Dae R. Jeong" <threeearcat@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org> |
||
|
|
b44fd1268b |
mm, memory_hotplug: check zone_movable in has_unmovable_pages
[ Upstream commit 9d7899999c62c1a81129b76d2a6ecbc4655e1597 ] Page state checks are racy. Under a heavy memory workload (e.g. stress -m 200 -t 2h) it is quite easy to hit a race window when the page is allocated but its state is not fully populated yet. A debugging patch to dump the struct page state shows has_unmovable_pages: pfn:0x10dfec00, found:0x1, count:0x0 page:ffffea0437fb0000 count:1 mapcount:1 mapping:ffff880e05239841 index:0x7f26e5000 compound_mapcount: 1 flags: 0x5fffffc0090034(uptodate|lru|active|head|swapbacked) Note that the state has been checked for both PageLRU and PageSwapBacked already. Closing this race completely would require some sort of retry logic. This can be tricky and error prone (think of potential endless or long taking loops). Workaround this problem for movable zones at least. Such a zone should only contain movable pages. Commit |
||
|
|
e054637597 |
mm, sched/numa: Remove remaining traces of NUMA rate-limiting
Remove the leftover pglist_data::numabalancing_migrate_lock and its
initialization, we stopped using this lock with:
|
||
|
|
efaffc5e40 |
mm, sched/numa: Remove rate-limiting of automatic NUMA balancing migration
Rate limiting of page migrations due to automatic NUMA balancing was
introduced to mitigate the worst-case scenario of migrating at high
frequency due to false sharing or slowly ping-ponging between nodes.
Since then, a lot of effort was spent on correctly identifying these
pages and avoiding unnecessary migrations and the safety net may no longer
be required.
Jirka Hladky reported a regression in 4.17 due to a scheduler patch that
avoids spreading STREAM tasks wide prematurely. However, once the task
was properly placed, it delayed migrating the memory due to rate limiting.
Increasing the limit fixed the problem for him.
Currently, the limit is hard-coded and does not account for the real
capabilities of the hardware. Even if an estimate was attempted, it would
not properly account for the number of memory controllers and it could
not account for the amount of bandwidth used for normal accesses. Rather
than fudging, this patch simply eliminates the rate limiting.
However, Jirka reports that a STREAM configuration using multiple
processes achieved similar performance to 4.16. In local tests, this patch
improved performance of STREAM relative to the baseline but it is somewhat
machine-dependent. Most workloads show little or not performance difference
implying that there is not a heavily reliance on the throttling mechanism
and it is safe to remove.
STREAM on 2-socket machine
4.19.0-rc5 4.19.0-rc5
numab-v1r1 noratelimit-v1r1
MB/sec copy 43298.52 ( 0.00%) 44673.38 ( 3.18%)
MB/sec scale 30115.06 ( 0.00%) 31293.06 ( 3.91%)
MB/sec add 32825.12 ( 0.00%) 34883.62 ( 6.27%)
MB/sec triad 32549.52 ( 0.00%) 34906.60 ( 7.24%
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Reviewed-by: Rik van Riel <riel@surriel.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Jirka Hladky <jhladky@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Linux-MM <linux-mm@kvack.org>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20181001100525.29789-2-mgorman@techsingularity.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
||
|
|
464c7ffbcb |
mm/hugetlb: filter out hugetlb pages if HUGEPAGE migration is not supported.
When scanning for movable pages, filter out Hugetlb pages if hugepage
migration is not supported. Without this we hit infinte loop in
__offline_pages() where we do
pfn = scan_movable_pages(start_pfn, end_pfn);
if (pfn) { /* We have movable pages */
ret = do_migrate_range(pfn, end_pfn);
goto repeat;
}
Fix this by checking hugepage_migration_supported both in
has_unmovable_pages which is the primary backoff mechanism for page
offlining and for consistency reasons also into scan_movable_pages
because it doesn't make any sense to return a pfn to non-migrateable
huge page.
This issue was revealed by, but not caused by
|
||
|
|
13ba17bee1 |
notifier: Remove notifier header file wherever not used
The conversion of the hotplug notifiers to a state machine left the notifier.h includes around in some places. Remove them. Signed-off-by: Mukesh Ojha <mojha@codeaurora.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/1535114033-4605-1-git-send-email-mojha@codeaurora.org |
||
|
|
d4ae9916ea |
mm: soft-offline: close the race against page allocation
A process can be killed with SIGBUS(BUS_MCEERR_AR) when it tries to allocate a page that was just freed on the way of soft-offline. This is undesirable because soft-offline (which is about corrected error) is less aggressive than hard-offline (which is about uncorrected error), and we can make soft-offline fail and keep using the page for good reason like "system is busy." Two main changes of this patch are: - setting migrate type of the target page to MIGRATE_ISOLATE. As done in free_unref_page_commit(), this makes kernel bypass pcplist when freeing the page. So we can assume that the page is in freelist just after put_page() returns, - setting PG_hwpoison on free page under zone->lock which protects freelists, so this allows us to avoid setting PG_hwpoison on a page that is decided to be allocated soon. [akpm@linux-foundation.org: tweak set_hwpoison_free_buddy_page() comment] Link: http://lkml.kernel.org/r/1531452366-11661-3-git-send-email-n-horiguchi@ah.jp.nec.com Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Reported-by: Xishi Qiu <xishi.qiuxishi@alibaba-inc.com> Tested-by: Mike Kravetz <mike.kravetz@oracle.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: <zy.zhengyi@alibaba-inc.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
03e85f9d5f |
mm/page_alloc: Introduce free_area_init_core_hotplug
Currently, whenever a new node is created/re-used from the memhotplug path, we call free_area_init_node()->free_area_init_core(). But there is some code that we do not really need to run when we are coming from such path. free_area_init_core() performs the following actions: 1) Initializes pgdat internals, such as spinlock, waitqueues and more. 2) Account # nr_all_pages and # nr_kernel_pages. These values are used later on when creating hash tables. 3) Account number of managed_pages per zone, substracting dma_reserved and memmap pages. 4) Initializes some fields of the zone structure data 5) Calls init_currently_empty_zone to initialize all the freelists 6) Calls memmap_init to initialize all pages belonging to certain zone When called from memhotplug path, free_area_init_core() only performs actions #1 and #4. Action #2 is pointless as the zones do not have any pages since either the node was freed, or we are re-using it, eitherway all zones belonging to this node should have 0 pages. For the same reason, action #3 results always in manages_pages being 0. Action #5 and #6 are performed later on when onlining the pages: online_pages()->move_pfn_range_to_zone()->init_currently_empty_zone() online_pages()->move_pfn_range_to_zone()->memmap_init_zone() This patch does two things: First, moves the node/zone initializtion to their own function, so it allows us to create a small version of free_area_init_core, where we only perform: 1) Initialization of pgdat internals, such as spinlock, waitqueues and more 4) Initialization of some fields of the zone structure data These two functions are: pgdat_init_internals() and zone_init_internals(). The second thing this patch does, is to introduce free_area_init_core_hotplug(), the memhotplug version of free_area_init_core(): Currently, we call free_area_init_node() from the memhotplug path. In there, we set some pgdat's fields, and call calculate_node_totalpages(). calculate_node_totalpages() calculates the # of pages the node has. Since the node is either new, or we are re-using it, the zones belonging to this node should not have any pages, so there is no point to calculate this now. Actually, we re-set these values to 0 later on with the calls to: reset_node_managed_pages() reset_node_present_pages() The # of pages per node and the # of pages per zone will be calculated when onlining the pages: online_pages()->move_pfn_range()->move_pfn_range_to_zone()->resize_zone_range() online_pages()->move_pfn_range()->move_pfn_range_to_zone()->resize_pgdat_range() Also, since free_area_init_core/free_area_init_node will now only get called during early init, let us replace __paginginit with __init, so their code gets freed up. [osalvador@techadventures.net: fix section usage] Link: http://lkml.kernel.org/r/20180731101752.GA473@techadventures.net [osalvador@suse.de: v6] Link: http://lkml.kernel.org/r/20180801122348.21588-6-osalvador@techadventures.net Link: http://lkml.kernel.org/r/20180730101757.28058-5-osalvador@techadventures.net Signed-off-by: Oscar Salvador <osalvador@suse.de> Reviewed-by: Pavel Tatashin <pasha.tatashin@oracle.com> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Pasha Tatashin <Pavel.Tatashin@microsoft.com> Cc: Aaron Lu <aaron.lu@intel.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: David Hildenbrand <david@redhat.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Mel Gorman <mgorman@techsingularity.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
0188dc98ad |
mm/page_alloc: inline function to handle CONFIG_DEFERRED_STRUCT_PAGE_INIT
Let us move the code between CONFIG_DEFERRED_STRUCT_PAGE_INIT to an inline function. Not having an ifdef in the function makes the code more readable. Link: http://lkml.kernel.org/r/20180730101757.28058-4-osalvador@techadventures.net Signed-off-by: Oscar Salvador <osalvador@suse.de> Acked-by: Michal Hocko <mhocko@suse.com> Reviewed-by: Pavel Tatashin <pasha.tatashin@oracle.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Aaron Lu <aaron.lu@intel.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: David Hildenbrand <david@redhat.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Pasha Tatashin <Pavel.Tatashin@microsoft.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
7cc2a9596d |
mm: remove __paginginit
__paginginit is the same thing as __meminit except for platforms without sparsemem, there it is defined as __init. Remove __paginginit and use __meminit. Use __ref in one single function that merges __meminit and __init sections: setup_usemap(). Link: http://lkml.kernel.org/r/20180801122348.21588-4-osalvador@techadventures.net Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com> Signed-off-by: Oscar Salvador <osalvador@suse.de> Reviewed-by: Oscar Salvador <osalvador@suse.de> Cc: Pasha Tatashin <Pavel.Tatashin@microsoft.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
c1093b746c |
mm: access zone->node via zone_to_nid() and zone_set_nid()
zone->node is configured only when CONFIG_NUMA=y, so it is a good idea to have inline functions to access this field in order to avoid ifdef's in c files. Link: http://lkml.kernel.org/r/20180730101757.28058-3-osalvador@techadventures.net Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com> Signed-off-by: Oscar Salvador <osalvador@suse.de> Reviewed-by: Oscar Salvador <osalvador@suse.de> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Aaron Lu <aaron.lu@intel.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: David Hildenbrand <david@redhat.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Pasha Tatashin <Pavel.Tatashin@microsoft.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
ace1db3976 |
mm/page_alloc.c: move ifdefery out of free_area_init_core
Patch series "Refactor free_area_init_core and add
free_area_init_core_hotplug", v6.
This patchset does three things:
1) Clean up/refactor free_area_init_core/free_area_init_node
by moving the ifdefery out of the functions.
2) Move the pgdat/zone initialization in free_area_init_core to its
own function.
3) Introduce free_area_init_core_hotplug, a small subset of
free_area_init_core, which is only called from memhotlug code path. In this
way, we have:
free_area_init_core: called during early initialization
free_area_init_core_hotplug: called whenever a new node is allocated/re-used (memhotplug path)
This patch (of 5):
Moving the #ifdefs out of the function makes it easier to follow.
Link: http://lkml.kernel.org/r/20180730101757.28058-2-osalvador@techadventures.net
Signed-off-by: Oscar Salvador <osalvador@suse.de>
Acked-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Pasha Tatashin <Pavel.Tatashin@microsoft.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Aaron Lu <aaron.lu@intel.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: David Hildenbrand <david@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
||
|
|
d8a759b570 |
mm, page_alloc: double zone's batchsize
To improve page allocator's performance for order-0 pages, each CPU has
a Per-CPU-Pageset(PCP) per zone. Whenever an order-0 page is needed,
PCP will be checked first before asking pages from Buddy. When PCP is
used up, a batch of pages will be fetched from Buddy to improve
performance and the size of batch can affect performance.
zone's batch size gets doubled last time by commit ba56e91c9401("mm:
page_alloc: increase size of per-cpu-pages") over ten years ago. Since
then, CPU has envolved a lot and CPU's cache sizes also increased.
Dave Hansen is concerned the current batch size doesn't fit well with
modern hardware and suggested me to do two things: first, use a page
allocator intensive benchmark, e.g. will-it-scale/page_fault1 to find
out how performance changes with different batch sizes on various
machines and then choose a new default batch size; second, see how this
new batch size work with other workloads.
In the first test, we saw performance gains on high-core-count systems
and little to no effect on older systems with more modest core counts.
In this phase's test data, two candidates: 63 and 127 are chosen.
In the second step, ebizzy, oltp, kbuild, pigz, netperf, vm-scalability
and more will-it-scale sub-tests are tested to see how these two
candidates work with these workloads and decides a new default according
to their results.
Most test results are flat. will-it-scale/page_fault2 process mode has
10%-18% performance increase on 4-sockets Skylake and Broadwell.
vm-scalability/lru-file-mmap-read has 17%-47% performance increase for
4-sockets servers while for 2-sockets servers, it caused 3%-8% performance
drop. Further analysis showed that, with a larger pcp->batch and thus
larger pcp->high(the relationship of pcp->high=6 * pcp->batch is
maintained in this patch), zone lock contention shifted to LRU add side
lock contention and that caused performance drop. This performance drop
might be mitigated by others' work on optimizing LRU lock.
Another downside of increasing pcp->batch is, when PCP is used up and need
to fetch a batch of pages from Buddy, since batch is increased, that time
can be longer than before. My understanding is, this doesn't affect
slowpath where direct reclaim and compaction dominates. For fastpath,
throughput is a win(according to will-it-scale/page_fault1) but worst
latency can be larger now.
Overall, I think double the batch size from 31 to 63 is relatively safe
and provide good performance boost for high-core-count systems.
The two phase's test results are listed below(all tests are done with THP
disabled).
Phase one(will-it-scale/page_fault1) test results:
Skylake-EX: increased batch size has a good effect on zone->lock
contention, though LRU contention will rise at the same time and
limited the final performance increase.
batch score change zone_contention lru_contention total_contention
31 15345900 +0.00% 64% 8% 72%
53 17903847 +16.67% 32% 38% 70%
63 17992886 +17.25% 24% 45% 69%
73 18022825 +17.44% 10% 61% 71%
119 18023401 +17.45% 4% 66% 70%
127 18029012 +17.48% 3% 66% 69%
137 18036075 +17.53% 4% 66% 70%
165 18035964 +17.53% 2% 67% 69%
188 18101105 +17.95% 2% 67% 69%
223 18130951 +18.15% 2% 67% 69%
255 18118898 +18.07% 2% 67% 69%
267 18101559 +17.96% 2% 67% 69%
299 18160468 +18.34% 2% 68% 70%
320 18139845 +18.21% 2% 67% 69%
393 18160869 +18.34% 2% 68% 70%
424 18170999 +18.41% 2% 68% 70%
458 18144868 +18.24% 2% 68% 70%
467 18142366 +18.22% 2% 68% 70%
498 18154549 +18.30% 1% 68% 69%
511 18134525 +18.17% 1% 69% 70%
Broadwell-EX: similar pattern as Skylake-EX.
batch score change zone_contention lru_contention total_contention
31 16703983 +0.00% 67% 7% 74%
53 18195393 +8.93% 43% 28% 71%
63 18288885 +9.49% 38% 33% 71%
73 18344329 +9.82% 35% 37% 72%
119 18535529 +10.96% 24% 46% 70%
127 18513596 +10.83% 23% 48% 71%
137 18514327 +10.84% 23% 48% 71%
165 18511840 +10.82% 22% 49% 71%
188 18593478 +11.31% 17% 53% 70%
223 18601667 +11.36% 17% 52% 69%
255 18774825 +12.40% 12% 58% 70%
267 18754781 +12.28% 9% 60% 69%
299 18892265 +13.10% 7% 63% 70%
320 18873812 +12.99% 8% 62% 70%
393 18891174 +13.09% 6% 64% 70%
424 18975108 +13.60% 6% 64% 70%
458 18932364 +13.34% 8% 62% 70%
467 18960891 +13.51% 5% 65% 70%
498 18944526 +13.41% 5% 64% 69%
511 18960839 +13.51% 5% 64% 69%
Skylake-EP: although increased batch reduced zone->lock contention, but
the effect is not as good as EX: zone->lock contention is still as high as
20% with a very high batch value instead of 1% on Skylake-EX or 5% on
Broadwell-EX. Also, total_contention actually decreased with a higher
batch but that doesn't translate to performance increase.
batch score change zone_contention lru_contention total_contention
31 9554867 +0.00% 66% 3% 69%
53 9855486 +3.15% 63% 3% 66%
63 9980145 +4.45% 62% 4% 66%
73 10092774 +5.63% 62% 5% 67%
119 10310061 +7.90% 45% 19% 64%
127 10342019 +8.24% 42% 19% 61%
137 10358182 +8.41% 42% 21% 63%
165 10397060 +8.81% 37% 24% 61%
188 10341808 +8.24% 34% 26% 60%
223 10349135 +8.31% 31% 27% 58%
255 10327189 +8.08% 28% 29% 57%
267 10344204 +8.26% 27% 29% 56%
299 10325043 +8.06% 25% 30% 55%
320 10310325 +7.91% 25% 31% 56%
393 10293274 +7.73% 21% 31% 52%
424 10311099 +7.91% 21% 32% 53%
458 10321375 +8.02% 21% 32% 53%
467 10303881 +7.84% 21% 32% 53%
498 10332462 +8.14% 20% 33% 53%
511 10325016 +8.06% 20% 32% 52%
Broadwell-EP: zone->lock and lru lock had an agreement to make sure
performance doesn't increase and they successfully managed to keep total
contention at 70%.
batch score change zone_contention lru_contention total_contention
31 10121178 +0.00% 19% 50% 69%
53 10142366 +0.21% 6% 63% 69%
63 10117984 -0.03% 11% 58% 69%
73 10123330 +0.02% 7% 63% 70%
119 10108791 -0.12% 2% 67% 69%
127 10166074 +0.44% 3% 66% 69%
137 10141574 +0.20% 3% 66% 69%
165 10154499 +0.33% 2% 68% 70%
188 10124921 +0.04% 2% 67% 69%
223 10137399 +0.16% 2% 67% 69%
255 10143289 +0.22% 0% 68% 68%
267 10123535 +0.02% 1% 68% 69%
299 10140952 +0.20% 0% 68% 68%
320 10163170 +0.41% 0% 68% 68%
393 10000633 -1.19% 0% 69% 69%
424 10087998 -0.33% 0% 69% 69%
458 10187116 +0.65% 0% 69% 69%
467 10146790 +0.25% 0% 69% 69%
498 10197958 +0.76% 0% 69% 69%
511 10152326 +0.31% 0% 69% 69%
Haswell-EP: similar to Broadwell-EP.
batch score change zone_contention lru_contention total_contention
31 10442205 +0.00% 14% 48% 62%
53 10442255 +0.00% 5% 57% 62%
63 10452059 +0.09% 6% 57% 63%
73 10482349 +0.38% 5% 59% 64%
119 10454644 +0.12% 3% 60% 63%
127 10431514 -0.10% 3% 59% 62%
137 10423785 -0.18% 3% 60% 63%
165 10481216 +0.37% 2% 61% 63%
188 10448755 +0.06% 2% 61% 63%
223 10467144 +0.24% 2% 61% 63%
255 10480215 +0.36% 2% 61% 63%
267 10484279 +0.40% 2% 61% 63%
299 10466450 +0.23% 2% 61% 63%
320 10452578 +0.10% 2% 61% 63%
393 10499678 +0.55% 1% 62% 63%
424 10481454 +0.38% 1% 62% 63%
458 10473562 +0.30% 1% 62% 63%
467 10484269 +0.40% 0% 62% 62%
498 10505599 +0.61% 0% 62% 62%
511 10483395 +0.39% 0% 62% 62%
Westmere-EP: contention is pretty small so not interesting. Note too high
a batch value could hurt performance.
batch score change zone_contention lru_contention total_contention
31 4831523 +0.00% 2% 3% 5%
53 4834086 +0.05% 2% 4% 6%
63 4834262 +0.06% 2% 3% 5%
73
|
||
|
|
9ea9a68064 |
mm: drop VM_BUG_ON from __get_free_pages
There is no real reason to blow up just because the caller doesn't know that __get_free_pages cannot return highmem pages. Simply fix that up silently. Even if we have some confused users such a fixup will not be harmful. [akpm@linux-foundation.org: mask off __GFP_HIGHMEM] Link: http://lkml.kernel.org/r/20180622162841.25114-1-mhocko@kernel.org Signed-off-by: Michal Hocko <mhocko@suse.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Jiankang Chen <chenjiankang1@huawei.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Yisheng Xie <xieyisheng1@huawei.com> Cc: Hanjun Guo <guohanjun@huawei.com> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
d6a24df006 |
mm, page_alloc: actually ignore mempolicies for high priority allocations
__alloc_pages_slowpath() has for a long time contained code to ignore node restrictions from memory policies for high priority allocations. The current code that resets the zonelist iterator however does effectively nothing after commit |
||
|
|
720e14ebec |
mm: skip invalid pages block at a time in zero_resv_unresv()
The role of zero_resv_unavail() is to make sure that every struct page that is allocated but is not backed by memory that is accessible by kernel is zeroed and not in some uninitialized state. Since struct pages are allocated in blocks (2M pages in x86 case), we can skip pageblock_nr_pages at a time, when the first one is found to be invalid. This optimization may help since now on x86 every hole in e820 maps is marked as reserved in memblock, and thus will go through this function. This function is called before sched_clock() is initialized, so I used my x86 early boot clock patches to measure the performance improvement. With 1T hole on i7-8700 currently we would take 0.606918s of boot time, but with this optimization 0.001103s. Link: http://lkml.kernel.org/r/20180615155733.1175-1-pasha.tatashin@oracle.com Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com> Reviewed-by: Oscar Salvador <osalvador@suse.de> Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Pasha Tatashin <Pavel.Tatashin@microsoft.com> Cc: Steven Sistare <steven.sistare@oracle.com> Cc: Daniel Jordan <daniel.m.jordan@oracle.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Dan Williams <dan.j.williams@intel.com> Cc: "Huang, Ying" <ying.huang@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
b018fc9800 |
Merge tag 'pm-4.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management updates from Rafael Wysocki:
"These add a new framework for CPU idle time injection, to be used by
all of the idle injection code in the kernel in the future, fix some
issues and add a number of relatively small extensions in multiple
places.
Specifics:
- Add a new framework for CPU idle time injection (Daniel Lezcano).
- Add AVS support to the armada-37xx cpufreq driver (Gregory
CLEMENT).
- Add support for current CPU frequency reporting to the ACPI CPPC
cpufreq driver (George Cherian).
- Rework the cooling device registration in the imx6q/thermal driver
(Bastian Stender).
- Make the pcc-cpufreq driver refuse to work with dynamic scaling
governors on systems with many CPUs to avoid scalability issues
with it (Rafael Wysocki).
- Fix the intel_pstate driver to report different maximum CPU
frequencies on systems where they really are different and to
ignore the turbo active ratio if hardware-managend P-states (HWP)
are in use; make it use the match_string() helper (Xie Yisheng,
Srinivas Pandruvada).
- Fix a minor deferred probe issue in the qcom-kryo cpufreq driver
(Niklas Cassel).
- Add a tracepoint for the tracking of frequency limits changes (from
Andriod) to the cpufreq core (Ruchi Kandoi).
- Fix a circular lock dependency between CPU hotplug and sysfs
locking in the cpufreq core reported by lockdep (Waiman Long).
- Avoid excessive error reports on driver registration failures in
the ARM cpuidle driver (Sudeep Holla).
- Add a new device links flag to the driver core to make links go
away automatically on supplier driver removal (Vivek Gautam).
- Eliminate potential race condition between system-wide power
management transitions and system shutdown (Pingfan Liu).
- Add a quirk to save NVS memory on system suspend for the ASUS 1025C
laptop (Willy Tarreau).
- Make more systems use suspend-to-idle (instead of ACPI S3) by
default (Tristian Celestin).
- Get rid of stack VLA usage in the low-level hibernation code on
64-bit x86 (Kees Cook).
- Fix error handling in the hibernation core and mark an expected
fall-through switch in it (Chengguang Xu, Gustavo Silva).
- Extend the generic power domains (genpd) framework to support
attaching a device to a power domain by name (Ulf Hansson).
- Fix device reference counting and user limits initialization in the
devfreq core (Arvind Yadav, Matthias Kaehlcke).
- Fix a few issues in the rk3399_dmc devfreq driver and improve its
documentation (Enric Balletbo i Serra, Lin Huang, Nick Milner).
- Drop a redundant error message from the exynos-ppmu devfreq driver
(Markus Elfring)"
* tag 'pm-4.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (35 commits)
PM / reboot: Eliminate race between reboot and suspend
PM / hibernate: Mark expected switch fall-through
cpufreq: intel_pstate: Ignore turbo active ratio in HWP
cpufreq: Fix a circular lock dependency problem
cpu/hotplug: Add a cpus_read_trylock() function
x86/power/hibernate_64: Remove VLA usage
cpufreq: trace frequency limits change
cpufreq: intel_pstate: Show different max frequency with turbo 3 and HWP
cpufreq: pcc-cpufreq: Disable dynamic scaling on many-CPU systems
cpufreq: qcom-kryo: Silently error out on EPROBE_DEFER
cpufreq / CPPC: Add cpuinfo_cur_freq support for CPPC
cpufreq: armada-37xx: Add AVS support
dt-bindings: marvell: Add documentation for the Armada 3700 AVS binding
PM / devfreq: rk3399_dmc: Fix duplicated opp table on reload.
PM / devfreq: Init user limits from OPP limits, not viceversa
PM / devfreq: rk3399_dmc: fix spelling mistakes.
PM / devfreq: rk3399_dmc: do not print error when get supply and clk defer.
dt-bindings: devfreq: rk3399_dmc: move interrupts to be optional.
PM / devfreq: rk3399_dmc: remove wait for dcf irq event.
dt-bindings: clock: add rk3399 DDR3 standard speed bins.
...
|
||
|
|
17bc3432e3 |
Merge branches 'pm-core', 'pm-domains', 'pm-sleep', 'acpi-pm' and 'pm-cpuidle'
Merge changes in the PM core, system-wide PM infrastructure, generic power domains (genpd) framework, ACPI PM infrastructure and cpuidle for 4.19. * pm-core: driver core: Add flag to autoremove device link on supplier unbind driver core: Rename flag AUTOREMOVE to AUTOREMOVE_CONSUMER * pm-domains: PM / Domains: Introduce dev_pm_domain_attach_by_name() PM / Domains: Introduce option to attach a device by name to genpd PM / Domains: dt: Add a power-domain-names property * pm-sleep: PM / reboot: Eliminate race between reboot and suspend PM / hibernate: Mark expected switch fall-through x86/power/hibernate_64: Remove VLA usage PM / hibernate: cast PAGE_SIZE to int when comparing with error code * acpi-pm: ACPI / PM: save NVS memory for ASUS 1025C laptop ACPI / PM: Default to s2idle in all machines supporting LP S0 * pm-cpuidle: ARM: cpuidle: silence error on driver registration failure |
||
|
|
55f2503c3b |
PM / reboot: Eliminate race between reboot and suspend
At present, "systemctl suspend" and "shutdown" can run in parrallel. A system can suspend after devices_shutdown(), and resume. Then the shutdown task goes on to power off. This causes many devices are not really shut off. Hence replacing reboot_mutex with system_transition_mutex (renamed from pm_mutex) to achieve the exclusion. The renaming of pm_mutex as system_transition_mutex can be better to reflect the purpose of the mutex. Signed-off-by: Pingfan Liu <kernelfans@gmail.com> Acked-by: Pavel Machek <pavel@ucw.cz> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> |
||
|
|
0d83432811 |
mm: Allow non-direct-map arguments to free_reserved_area()
free_reserved_area() takes pointers as arguments to show which addresses should be freed. However, it does this in a somewhat ambiguous way. If it gets a kernel direct map address, it always works. However, if it gets an address that is part of the kernel image alias mapping, it can fail. It fails if all of the following happen: * The specified address is part of the kernel image alias * Poisoning is requested (forcing a memset()) * The address is in a read-only portion of the kernel image The memset() fails on the read-only mapping, of course. free_reserved_area() *is* called both on the direct map and on kernel image alias addresses. We've just lucked out thus far that the kernel image alias areas it gets used on are read-write. I'm fairly sure this has been just a happy accident. It is quite easy to make free_reserved_area() work for all cases: just convert the address to a direct map address before doing the memset(), and do this unconditionally. There is little chance of a regression here because we previously did a virt_to_page() on the address for the memset, so we know these are not highmem pages for which virt_to_page() would fail. Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: keescook@google.com Cc: aarcange@redhat.com Cc: jgross@suse.com Cc: jpoimboe@redhat.com Cc: gregkh@linuxfoundation.org Cc: peterz@infradead.org Cc: hughd@google.com Cc: torvalds@linux-foundation.org Cc: bp@alien8.de Cc: luto@kernel.org Cc: ak@linux.intel.com Cc: Kees Cook <keescook@google.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Hugh Dickins <hughd@google.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Andi Kleen <ak@linux.intel.com> Link: https://lkml.kernel.org/r/20180802225826.1287AE3E@viggo.jf.intel.com |
||
|
|
d1b47a7c9e |
mm: don't do zero_resv_unavail if memmap is not allocated
Moving zero_resv_unavail before memmap_init_zone(), caused a regression on
x86-32.
The cause is that we access struct pages before they are allocated when
CONFIG_FLAT_NODE_MEM_MAP is used.
free_area_init_nodes()
zero_resv_unavail()
mm_zero_struct_page(pfn_to_page(pfn)); <- struct page is not alloced
free_area_init_node()
if CONFIG_FLAT_NODE_MEM_MAP
alloc_node_mem_map()
memblock_virt_alloc_node_nopanic() <- struct page alloced here
On the other hand memblock_virt_alloc_node_nopanic() zeroes all the memory
that it returns, so we do not need to do zero_resv_unavail() here.
Fixes:
|
||
|
|
e181ae0c5d |
mm: zero unavailable pages before memmap init
We must zero struct pages for memory that is not backed by physical memory, or kernel does not have access to. Recently, there was a change which zeroed all memmap for all holes in e820. Unfortunately, it introduced a bug that is discussed here: https://www.spinics.net/lists/linux-mm/msg156764.html Linus, also saw this bug on his machine, and confirmed that reverting commit |
||
|
|
0825a6f986 |
mm: use octal not symbolic permissions
mm/*.c files use symbolic and octal styles for permissions. Using octal and not symbolic permissions is preferred by many as more readable. https://lkml.org/lkml/2016/8/2/1945 Prefer the direct use of octal for permissions. Done using $ scripts/checkpatch.pl -f --types=SYMBOLIC_PERMS --fix-inplace mm/*.c and some typing. Before: $ git grep -P -w "0[0-7]{3,3}" mm | wc -l 44 After: $ git grep -P -w "0[0-7]{3,3}" mm | wc -l 86 Miscellanea: o Whitespace neatening around these conversions. Link: http://lkml.kernel.org/r/2e032ef111eebcd4c5952bae86763b541d373469.1522102887.git.joe@perches.com Signed-off-by: Joe Perches <joe@perches.com> Acked-by: David Rientjes <rientjes@google.com> Acked-by: Michal Hocko <mhocko@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
7810e6781e |
mm, page_alloc: do not break __GFP_THISNODE by zonelist reset
In __alloc_pages_slowpath() we reset zonelist and preferred_zoneref for allocations that can ignore memory policies. The zonelist is obtained from current CPU's node. This is a problem for __GFP_THISNODE allocations that want to allocate on a different node, e.g. because the allocating thread has been migrated to a different CPU. This has been observed to break SLAB in our 4.4-based kernel, because there it relies on __GFP_THISNODE working as intended. If a slab page is put on wrong node's list, then further list manipulations may corrupt the list because page_to_nid() is used to determine which node's list_lock should be locked and thus we may take a wrong lock and race. Current SLAB implementation seems to be immune by luck thanks to commit |