24f2bc34ebb95e5ced9a123fecf4ced16a4b54fe
277 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
e8e6720b34 |
Merge tag 'ASB-2024-05-05_4.19-stable' of https://android.googlesource.com/kernel/common into android13-4.19-kona
https://source.android.com/docs/security/bulletin/2024-05-01 CVE-2023-4622 * tag 'ASB-2024-05-05_4.19-stable' of https://android.googlesource.com/kernel/common: Revert "timers: Rename del_timer_sync() to timer_delete_sync()" Revert "geneve: make sure to pull inner header in geneve_rx()" Linux 4.19.312 amdkfd: use calloc instead of kzalloc to avoid integer overflow initramfs: fix populate_initrd_image() section mismatch ip_gre: do not report erspan version on GRE interface erspan: Check IFLA_GRE_ERSPAN_VER is set. VMCI: Fix possible memcpy() run-time warning in vmci_datagram_invoke_guest_handler() Bluetooth: btintel: Fixe build regression x86/mm/pat: fix VM_PAT handling in COW mappings virtio: reenable config if freezing device failed drm/vkms: call drm_atomic_helper_shutdown before drm_dev_put() tty: n_gsm: require CAP_NET_ADMIN to attach N_GSM0710 ldisc fbmon: prevent division by zero in fb_videomode_from_videomode() fbdev: viafb: fix typo in hw_bitblt_1 and hw_bitblt_2 usb: sl811-hcd: only defined function checkdone if QUIRK2 is defined tools: iio: replace seekdir() in iio_generic_buffer ktest: force $buildonly = 1 for 'make_warnings_file' test type Input: allocate keycode for Display refresh rate toggle block: prevent division by zero in blk_rq_stat_sum() SUNRPC: increase size of rpc_wait_queue.qlen from unsigned short to unsigned int drm/amd/display: Fix nanosec stat overflow media: sta2x11: fix irq handler cast isofs: handle CDs with bad root inode but good Joliet root directory scsi: lpfc: Fix possible memory leak in lpfc_rcv_padisc() sysv: don't call sb_bread() with pointers_lock held Input: synaptics-rmi4 - fail probing if memory allocation for "phys" fails Bluetooth: btintel: Fix null ptr deref in btintel_read_version btrfs: send: handle path ref underflow in header iterate_inode_ref() btrfs: export: handle invalid inode or root reference in btrfs_get_parent() btrfs: handle chunk tree lookup error in btrfs_relocate_sys_chunks() tools/power x86_energy_perf_policy: Fix file leak in get_pkg_num() arm64: dts: rockchip: fix rk3399 hdmi ports node VMCI: Fix memcpy() run-time warning in dg_dispatch_as_host() wifi: ath9k: fix LNA selection in ath_ant_try_scan() ALSA: hda/realtek: Update Panasonic CF-SZ6 quirk to support headset with microphone ata: sata_mv: Fix PCI device ID table declaration compilation warning ata: sata_sx4: fix pdc20621_get_from_dimm() on 64-bit ASoC: ops: Fix wraparound for mask in snd_soc_get_volsw erspan: make sure erspan_base_hdr is present in skb->head erspan: Add type I version 0 support. init: open /initrd.image with O_LARGEFILE initramfs: switch initramfs unpacking to struct file based APIs fs: add a vfs_fchmod helper fs: add a vfs_fchown helper initramfs: factor out a helper to populate the initrd image staging: vc04_services: fix information leak in create_component() staging: vc04_services: changen strncpy() to strscpy_pad() staging: mmal-vchiq: Fix client_component for 64 bit kernel staging: mmal-vchiq: Allocate and free components as required staging: mmal-vchiq: Avoid use of bool in structures i40e: fix vf may be used uninitialized in this function warning ipv6: Fix infinite recursion in fib6_dump_done(). selftests: reuseaddr_conflict: add missing new line at the end of the output net: stmmac: fix rx queue priority assignment net/sched: act_skbmod: prevent kernel-infoleak netfilter: nf_tables: Fix potential data-race in __nft_flowtable_type_get() mm, vmscan: prevent infinite loop for costly GFP_NOIO | __GFP_RETRY_MAYFAIL allocations Revert "x86/mm/ident_map: Use gbpages only where full GB page should be mapped." net/rds: fix possible cp null dereference netfilter: nf_tables: disallow timeout for anonymous sets Bluetooth: Fix TOCTOU in HCI debugfs implementation Bluetooth: hci_event: set the conn encrypted before conn establishes r8169: fix issue caused by buggy BIOS on certain boards with RTL8168d tcp: properly terminate timers for kernel sockets mptcp: add sk_stop_timer_sync helper nfc: nci: Fix uninit-value in nci_dev_up and nci_ntf_packet USB: core: Fix deadlock in usb_deauthorize_interface() scsi: lpfc: Correct size for wqe for memset() x86/cpu: Enable STIBP on AMD if Automatic IBRS is enabled scsi: qla2xxx: Fix command flush on cable pull usb: udc: remove warning when queue disabled ep usb: dwc2: gadget: LPM flow fix usb: dwc2: host: Fix ISOC flow in DDMA mode usb: dwc2: host: Fix hibernation flow usb: dwc2: host: Fix remote wakeup from hibernation loop: loop_set_status_from_info() check before assignment loop: Check for overflow while configuring loop loop: Factor out configuring loop from status powerpc: xor_vmx: Add '-mhard-float' to CFLAGS efivarfs: Request at most 512 bytes for variable names perf/core: Fix reentry problem in perf_output_read_group() loop: properly observe rotational flag of underlying device loop: Refactor loop_set_status() size calculation loop: Factor out setting loop device size loop: Remove sector_t truncation checks loop: Call loop_config_discard() only after new config is applied Revert "loop: Check for overflow while configuring loop" btrfs: allocate btrfs_ioctl_defrag_range_args on stack printk: Update @console_may_schedule in console_trylock_spinning() fs/aio: Check IOCB_AIO_RW before the struct aio_kiocb conversion ALSA: sh: aica: reorder cleanup operations to avoid UAF bugs usb: cdc-wdm: close race between read and workqueue exec: Fix NOMMU linux_binprm::exec in transfer_args_to_stack() wifi: mac80211: check/clear fast rx for non-4addr sta VLAN changes mm/migrate: set swap entry values of THP tail pages properly. mm/memory-failure: fix an incorrect use of tail pages vt: fix memory overlapping when deleting chars in the buffer vt: fix unicode buffer corruption when deleting characters tty: serial: fsl_lpuart: avoid idle preamble pending if CTS is enabled usb: port: Don't try to peer unused USB ports based on location usb: gadget: ncm: Fix handling of zero block length packets USB: usb-storage: Prevent divide-by-0 error in isd200_ata_command ALSA: hda/realtek - Fix headset Mic no show at resume back for Lenovo ALC897 platform xfrm: Avoid clang fortify warning in copy_to_user_tmpl() netfilter: nf_tables: reject constant set with timeout netfilter: nf_tables: disallow anonymous set with timeout flag comedi: comedi_test: Prevent timers rescheduling during deletion ahci: asm1064: asm1166: don't limit reported ports ahci: asm1064: correct count of reported ports x86/CPU/AMD: Update the Zenbleed microcode revisions nilfs2: prevent kernel bug at submit_bh_wbc() nilfs2: use a more common logging style nilfs2: fix failure to detect DAT corruption in btree and direct mappings memtest: use {READ,WRITE}_ONCE in memory scanning drm/vc4: hdmi: do not return negative values from .get_modes() drm/imx/ipuv3: do not return negative values from .get_modes() s390/zcrypt: fix reference counting on zcrypt card objects soc: fsl: qbman: Use raw spinlock for cgr_lock soc: fsl: qbman: Add CGR update function soc: fsl: qbman: Add helper for sanity checking cgr ops soc: fsl: qbman: Always disable interrupts when taking cgr_lock vfio/platform: Disable virqfds on cleanup kbuild: Move -Wenum-{compare-conditional,enum-conversion} into W=1 speakup: Fix 8bit characters from direct synth slimbus: core: Remove usage of the deprecated ida_simple_xx() API ext4: fix corruption during on-line resize hwmon: (amc6821) add of_match table mmc: core: Fix switch on gp3 partition dm-raid: fix lockdep waring in "pers->hot_add_disk" Revert "Revert "md/raid5: Wait for MD_SB_CHANGE_PENDING in raid5d"" PCI/PM: Drain runtime-idle callbacks before driver removal PCI: Drop pci_device_remove() test of pci_dev->driver fuse: don't unhash root mmc: tmio: avoid concurrent runs of mmc_request_done() PM: sleep: wakeirq: fix wake irq warning in system suspend USB: serial: cp210x: add pid/vid for TDK NC0110013M and MM0110113M USB: serial: option: add MeiG Smart SLM320 product USB: serial: cp210x: add ID for MGP Instruments PDS100 USB: serial: add device ID for VeriFone adapter USB: serial: ftdi_sio: add support for GMC Z216C Adapter IR-USB powerpc/fsl: Fix mfpmr build errors with newer binutils clk: qcom: mmcc-msm8974: fix terminating of frequency table arrays clk: qcom: mmcc-apq8084: fix terminating of frequency table arrays clk: qcom: gcc-ipq8074: fix terminating of frequency table arrays PM: suspend: Set mem_sleep_current during kernel command line setup parisc: Strip upper 32 bit of sum in csum_ipv6_magic for 64-bit builds parisc: Fix csum_ipv6_magic on 64-bit systems parisc: Fix csum_ipv6_magic on 32-bit systems parisc: Fix ip_fast_csum parisc: Do not hardcode registers in checksum functions ubi: correct the calculation of fastmap size ubi: Check for too small LEB size in VTBL code ubifs: Set page uptodate in the correct place fat: fix uninitialized field in nostale filehandles crypto: qat - resolve race condition during AER recovery crypto: qat - fix double free during reset sparc: vDSO: fix return value of __setup handler sparc64: NMI watchdog: fix return value of __setup handler KVM: Always flush async #PF workqueue when vCPU is being destroyed media: xc4000: Fix atomicity violation in xc4000_get_frequency arm: dts: marvell: Fix maxium->maxim typo in brownstone dts ARM: dts: mmp2-brownstone: Don't redeclare phandle references smack: Handle SMACK64TRANSMUTE in smack_inode_setsecurity() smack: Set SMACK64TRANSMUTE only for dirs in smack_inode_setxattr() wifi: brcmfmac: Fix use-after-free bug in brcmf_cfg80211_detach timers: Rename del_timer_sync() to timer_delete_sync() timers: Use del_timer_sync() even on UP timers: Update kernel-doc for various functions timers: Prepare support for PREEMPT_RT timer/trace: Improve timer tracing timer/trace: Replace deprecated vsprintf pointer extension %pf by %ps x86/bugs: Use sysfs_emit() x86/cpu: Support AMD Automatic IBRS Documentation/hw-vuln: Update spectre doc Linux 4.19.311 crypto: af_alg - Work around empty control messages without MSG_MORE crypto: af_alg - Fix regression on empty requests spi: spi-mt65xx: Fix NULL pointer access in interrupt handler net/bnx2x: Prevent access to a freed page in page_pool hsr: Handle failures in module init rds: introduce acquire/release ordering in acquire/release_in_xmit() hsr: Fix uninit-value access in hsr_get_node() net: hsr: fix placement of logical operator in a multi-line statement usb: gadget: net2272: Use irqflags in the call to net2272_probe_fin staging: greybus: fix get_channel_from_mode() failure path serial: 8250_exar: Don't remove GPIO device on suspend rtc: mt6397: select IRQ_DOMAIN instead of depending on it kconfig: fix infinite loop when expanding a macro at the end of file tty: serial: samsung: fix tx_empty() to return TIOCSER_TEMT serial: max310x: fix syntax error in IRQ error message clk: qcom: gdsc: Add support to update GDSC transition delay NFS: Fix an off by one in root_nfs_cat() net: sunrpc: Fix an off by one in rpc_sockaddr2uaddr() scsi: bfa: Fix function pointer type mismatch for hcb_qe->cbfn scsi: csiostor: Avoid function pointer casts ALSA: usb-audio: Stop parsing channels bits when all channels are found. sparc32: Fix section mismatch in leon_pci_grpci backlight: lp8788: Fully initialize backlight_properties during probe backlight: lm3639: Fully initialize backlight_properties during probe backlight: da9052: Fully initialize backlight_properties during probe backlight: lm3630a: Don't set bl->props.brightness in get_brightness backlight: lm3630a: Initialize backlight_properties on init powerpc/embedded6xx: Fix no previous prototype for avr_uart_send() etc. powerpc/hv-gpci: Fix the H_GET_PERF_COUNTER_INFO hcall return value checks drm/mediatek: Fix a null pointer crash in mtk_drm_crtc_finish_page_flip media: go7007: fix a memleak in go7007_load_encoder media: dvb-frontends: avoid stack overflow warnings with clang media: pvrusb2: fix uaf in pvr2_context_set_notify drm/amdgpu: Fix missing break in ATOM_ARG_IMM Case of atom_get_src_int() ASoC: meson: axg-tdm-interface: fix mclk setup without mclk-fs mtd: rawnand: lpc32xx_mlc: fix irq handler prototype crypto: arm/sha - fix function cast warnings crypto: arm - Rename functions to avoid conflict with crypto/sha256.h mfd: syscon: Call of_node_put() only when of_parse_phandle() takes a ref drm/tegra: put drm_gem_object ref on error in tegra_fb_create clk: hisilicon: hi3519: Release the correct number of gates in hi3519_clk_unregister() PCI: Mark 3ware-9650SE Root Port Extended Tags as broken drm/mediatek: dsi: Fix DSI RGB666 formats and definitions clk: qcom: dispcc-sdm845: Adjust internal GDSC wait times firmware: qcom: scm: Add WLAN VMID for Qualcomm SCM interface media: pvrusb2: fix pvr2_stream_callback casts media: go7007: add check of return value of go7007_read_addr() ALSA: seq: fix function cast warnings drm/radeon/ni: Fix wrong firmware size logging in ni_init_microcode() perf thread_map: Free strlist on normal path in thread_map__new_by_tid_str() quota: Fix rcu annotations of inode dquot pointers quota: Fix potential NULL pointer dereference quota: simplify drop_dquot_ref() quota: check time limit when back out space/inode change fs/quota: erase unused but set variable warning quota: code cleanup for __dquot_alloc_space() clk: qcom: reset: Ensure write completion on reset de/assertion clk: qcom: reset: Commonize the de/assert functions clk: qcom: reset: support resetting multiple bits clk: qcom: reset: Allow specifying custom reset delay media: edia: dvbdev: fix a use-after-free media: dvb-core: Fix use-after-free due to race at dvb_register_device() media: dvbdev: fix error logic at dvb_register_device() media: dvbdev: Fix memleak in dvb_register_device media: media/dvb: Use kmemdup rather than duplicating its implementation media: dvbdev: remove double-unlock media: v4l2-mem2mem: fix a memleak in v4l2_m2m_register_entity media: v4l2-tpg: fix some memleaks in tpg_alloc media: em28xx: annotate unchecked call to media_device_register() ABI: sysfs-bus-pci-devices-aer_stats uses an invalid tag perf evsel: Fix duplicate initialization of data->id in evsel__parse_sample() media: tc358743: register v4l2 async device only after successful setup drm/rockchip: lvds: do not print scary message when probing defer drm/rockchip: lvds: do not overwrite error code drm: Don't treat 0 as -1 in drm_fixp2int_ceil drm/rockchip: inno_hdmi: Fix video timing drm/tegra: dsi: Fix missing pm_runtime_disable() in the error handling path of tegra_dsi_probe() drm/tegra: dsi: Fix some error handling paths in tegra_dsi_probe() drm/tegra: dsi: Make use of the helper function dev_err_probe() gpu: host1x: mipi: Update tegra_mipi_request() to be node based drm/tegra: dsi: Add missing check for of_find_device_by_node dm: call the resume method on internal suspend dm raid: fix false positive for requeue needed during reshape nfp: flower: handle acti_netdevs allocation failure net/x25: fix incorrect parameter validation in the x25_getsockopt() function net: kcm: fix incorrect parameter validation in the kcm_getsockopt) function udp: fix incorrect parameter validation in the udp_lib_getsockopt() function l2tp: fix incorrect parameter validation in the pppol2tp_getsockopt() function tcp: fix incorrect parameter validation in the do_tcp_getsockopt() function ipv6: fib6_rules: flush route cache when rule is changed bpf: Fix stackmap overflow check on 32-bit arches bpf: Fix hashtab overflow check on 32-bit arches sr9800: Add check for usbnet_get_endpoints Bluetooth: hci_core: Fix possible buffer overflow Bluetooth: Remove superfluous call to hci_conn_check_pending() igb: Fix missing time sync events igb: move PEROUT and EXTTS isr logic to separate functions mmc: wmt-sdmmc: remove an incorrect release_mem_region() call in the .remove function SUNRPC: fix some memleaks in gssx_dec_option_array x86, relocs: Ignore relocations in .notes section ACPI: scan: Fix device check notification handling ARM: dts: arm: realview: Fix development chip ROM compatible value wifi: brcmsmac: avoid function pointer casts iommu/amd: Mark interrupt as managed bus: tegra-aconnect: Update dependency to ARCH_TEGRA ACPI: processor_idle: Fix memory leak in acpi_processor_power_exit() wifi: libertas: fix some memleaks in lbs_allocate_cmd_buffer() af_unix: Annotate data-race of gc_in_progress in wait_for_unix_gc(). sock_diag: annotate data-races around sock_diag_handlers[family] wifi: mwifiex: debugfs: Drop unnecessary error check for debugfs_create_dir() wifi: b43: Disable QoS for bcm4331 wifi: b43: Stop correct queue in DMA worker when QoS is disabled b43: main: Fix use true/false for bool type wifi: b43: Stop/wake correct queue in PIO Tx path when QoS is disabled wifi: b43: Stop/wake correct queue in DMA Tx path when QoS is disabled b43: dma: Fix use true/false for bool type variable wifi: ath10k: fix NULL pointer dereference in ath10k_wmi_tlv_op_pull_mgmt_tx_compl_ev() timekeeping: Fix cross-timestamp interpolation for non-x86 timekeeping: Fix cross-timestamp interpolation corner case decision timekeeping: Fix cross-timestamp interpolation on counter wrap aoe: fix the potential use-after-free problem in aoecmd_cfg_pkts md: Don't clear MD_CLOSING when the raid is about to stop md: implement ->set_read_only to hook into BLKROSET processing block: add a new set_read_only method md: switch to ->check_events for media change notifications fs/select: rework stack allocation hack for clang do_sys_name_to_handle(): use kzalloc() to fix kernel-infoleak crypto: algif_aead - Only wake up when ctx->more is zero crypto: af_alg - make some functions static crypto: algif_aead - fix uninitialized ctx->init ASoC: wm8962: Fix up incorrect error message in wm8962_set_fll ASoC: wm8962: Enable both SPKOUTR_ENA and SPKOUTL_ENA in mono mode ASoC: wm8962: Enable oscillator if selecting WM8962_FLL_OSC Input: gpio_keys_polled - suppress deferred probe error for gpio ASoC: Intel: bytcr_rt5640: Add an extra entry for the Chuwi Vi8 tablet firewire: core: use long bus reset on gap count error Bluetooth: rfcomm: Fix null-ptr-deref in rfcomm_check_security scsi: mpt3sas: Prevent sending diag_reset when the controller is ready dm-verity, dm-crypt: align "struct bvec_iter" correctly block: sed-opal: handle empty atoms when parsing response net/iucv: fix the allocation size of iucv_path_table array MIPS: Clear Cause.BD in instruction_pointer_set x86/xen: Add some null pointer checking to smp.c ASoC: rt5645: Make LattePanda board DMI match more precise Linux 4.19.310 selftests/vm: fix map_hugetlb length used for testing read and write selftests/vm: fix display of page size in map_hugetlb getrusage: use sig->stats_lock rather than lock_task_sighand() getrusage: use __for_each_thread() getrusage: move thread_group_cputime_adjusted() outside of lock_task_sighand() getrusage: add the "signal_struct *sig" local variable y2038: rusage: use __kernel_old_timeval hv_netvsc: Register VF in netvsc_probe if NET_DEVICE_REGISTER missed hv_netvsc: use netif_is_bond_master() instead of open code hv_netvsc: Make netvsc/VF binding check both MAC and serial number Input: i8042 - fix strange behavior of touchpad on Clevo NS70PU um: allow not setting extra rpaths in the linux binary selftests: mm: fix map_hugetlb failure on 64K page size systems tools/selftest/vm: allow choosing mem size and page size in map_hugetlb btrfs: ref-verify: free ref cache before clearing mount opt netrom: Fix data-races around sysctl_net_busy_read netrom: Fix a data-race around sysctl_netrom_link_fails_count netrom: Fix a data-race around sysctl_netrom_routing_control netrom: Fix a data-race around sysctl_netrom_transport_no_activity_timeout netrom: Fix a data-race around sysctl_netrom_transport_requested_window_size netrom: Fix a data-race around sysctl_netrom_transport_busy_delay netrom: Fix a data-race around sysctl_netrom_transport_acknowledge_delay netrom: Fix a data-race around sysctl_netrom_transport_maximum_tries netrom: Fix a data-race around sysctl_netrom_transport_timeout netrom: Fix data-races around sysctl_netrom_network_ttl_initialiser netrom: Fix a data-race around sysctl_netrom_obsolescence_count_initialiser netrom: Fix a data-race around sysctl_netrom_default_path_quality netfilter: nf_conntrack_h323: Add protection for bmp length out of range net/rds: fix WARNING in rds_conn_connect_if_down net/ipv6: avoid possible UAF in ip6_route_mpath_notify() geneve: make sure to pull inner header in geneve_rx() net: move definition of pcpu_lstats to header file net: lan78xx: fix runtime PM count underflow on link stop lan78xx: Fix race conditions in suspend/resume handling lan78xx: Fix partial packet errors on suspend/resume lan78xx: Add missing return code checks lan78xx: Fix white space and style issues net: usb: lan78xx: Remove lots of set but unused 'ret' variables Linux 4.19.309 gpio: 74x164: Enable output pins after registers are reset cachefiles: fix memory leak in cachefiles_add_cache() mmc: core: Fix eMMC initialization with 1-bit bus connection btrfs: dev-replace: properly validate device names wifi: nl80211: reject iftype change with mesh ID change gtp: fix use-after-free and null-ptr-deref in gtp_newlink() ALSA: Drop leftover snd-rtctimer stuff from Makefile power: supply: bq27xxx-i2c: Do not free non existing IRQ efi/capsule-loader: fix incorrect allocation size Bluetooth: Enforce validation on max value of connection interval Bluetooth: hci_event: Fix handling of HCI_EV_IO_CAPA_REQUEST Bluetooth: Avoid potential use-after-free in hci_error_reset net: usb: dm9601: fix wrong return value in dm9601_mdio_read lan78xx: enable auto speed configuration for LAN7850 if no EEPROM is detected tun: Fix xdp_rxq_info's queue_index when detaching netlink: Fix kernel-infoleak-after-free in __skb_datagram_iter Linux 4.19.308 scripts/bpf: Fix xdp_md forward declaration typo fs/aio: Restrict kiocb_set_cancel_fn() to I/O submitted via libaio KVM: arm64: vgic-its: Test for valid IRQ in MOVALL handler KVM: arm64: vgic-its: Test for valid IRQ in its_sync_lpi_pending_table() PCI/MSI: Prevent MSI hardware interrupt number truncation s390: use the correct count for __iowrite64_copy() packet: move from strlcpy with unused retval to strscpy ipv6: sr: fix possible use-after-free and null-ptr-deref nouveau: fix function cast warnings scsi: jazz_esp: Only build if SCSI core is builtin bpf, scripts: Correct GPL license name scripts/bpf: teach bpf_helpers_doc.py to dump BPF helper definitions RDMA/srpt: fix function pointer cast warnings RDMA/srpt: Make debug output more detailed RDMA/ulp: Use dev_name instead of ibdev->name RDMA/srpt: Support specifying the srpt_service_guid parameter RDMA/bnxt_re: Return error for SRQ resize IB/hfi1: Fix a memleak in init_credit_return usb: roles: don't get/set_role() when usb_role_switch is unregistered usb: gadget: ncm: Avoid dropping datagrams of properly parsed NTBs ARM: ep93xx: Add terminator to gpiod_lookup_table l2tp: pass correct message length to ip6_append_data gtp: fix use-after-free and null-ptr-deref in gtp_genl_dump_pdp() dm-crypt: don't modify the data when using authenticated encryption mm: memcontrol: switch to rcu protection in drain_all_stock() IB/hfi1: Fix sdma.h tx->num_descs off-by-one error pmdomain: renesas: r8a77980-sysc: CR7 must be always on s390/qeth: Fix potential loss of L3-IP@ in case of network issues virtio-blk: Ensure no requests in virtqueues before deleting vqs. firewire: core: send bus reset promptly on gap count error hwmon: (coretemp) Enlarge per package core count limit regulator: pwm-regulator: Add validity checks in continuous .get_voltage ext4: avoid allocating blocks from corrupted group in ext4_mb_find_by_goal() ext4: avoid allocating blocks from corrupted group in ext4_mb_try_best_found() ahci: asm1166: correct count of reported ports fbdev: sis: Error out if pixclock equals zero fbdev: savage: Error out if pixclock equals zero wifi: mac80211: fix race condition on enabling fast-xmit wifi: cfg80211: fix missing interfaces when dumping dmaengine: shdma: increase size of 'dev_id' scsi: target: core: Add TMF to tmr_list handling sched/rt: Disallow writing invalid values to sched_rt_period_us sched/rt: sysctl_sched_rr_timeslice show default timeslice after reset sched/rt: Fix sysctl_sched_rr_timeslice intial value userfaultfd: fix mmap_changing checking in mfill_atomic_hugetlb nilfs2: replace WARN_ONs for invalid DAT metadata block requests memcg: add refcnt for pcpu stock to avoid UAF problem in drain_all_stock() net: stmmac: fix notifier registration stmmac: no need to check return value of debugfs_create functions net/sched: Retire dsmark qdisc net/sched: Retire ATM qdisc net/sched: Retire CBQ qdisc Linux 4.19.307 netfilter: nf_tables: fix pointer math issue in nft_byteorder_eval() lsm: new security_file_ioctl_compat() hook nilfs2: fix potential bug in end_buffer_async_write sched/membarrier: reduce the ability to hammer on sys_membarrier Revert "md/raid5: Wait for MD_SB_CHANGE_PENDING in raid5d" pmdomain: core: Move the unused cleanup to a _sync initcall irqchip/irq-brcmstb-l2: Add write memory barrier before exit nfp: use correct macro for LengthSelect in BAR config nilfs2: fix hang in nilfs_lookup_dirty_data_buffers() nilfs2: fix data corruption in dsync block recovery for small block sizes ALSA: hda/conexant: Add quirk for SWS JS201D x86/mm/ident_map: Use gbpages only where full GB page should be mapped. x86/Kconfig: Transmeta Crusoe is CPU family 5, not 6 serial: max310x: improve crystal stable clock detection serial: max310x: set default value when reading clock ready bit ring-buffer: Clean ring_buffer_poll_wait() error return staging: iio: ad5933: fix type mismatch regression ext4: fix double-free of blocks due to wrong extents moved_len binder: signal epoll threads of self-work xen-netback: properly sync TX responses nfc: nci: free rx_data_reassembly skb on NCI device cleanup firewire: core: correct documentation of fw_csr_string() kernel API scsi: Revert "scsi: fcoe: Fix potential deadlock on &fip->ctlr_lock" usb: f_mass_storage: forbid async queue when shutdown happen USB: hub: check for alternate port before enabling A_ALT_HNP_SUPPORT HID: wacom: Do not register input devices until after hid_hw_start HID: wacom: generic: Avoid reporting a serial of '0' to userspace mm/writeback: fix possible divide-by-zero in wb_dirty_limits(), again tracing/trigger: Fix to return error if failed to alloc snapshot i40e: Fix waiting for queues of all VSIs to be disabled MIPS: Add 'memory' clobber to csum_ipv6_magic() inline assembler net: sysfs: Fix /sys/class/net/<iface> path for statistics Documentation: net-sysfs: describe missing statistics ASoC: rt5645: Fix deadlock in rt5645_jack_detect_work() spi: ppc4xx: Drop write-only variable btrfs: send: return EOPNOTSUPP on unknown flags btrfs: forbid creating subvol qgroups hrtimer: Report offline hrtimer enqueue vhost: use kzalloc() instead of kmalloc() followed by memset() Input: atkbd - skip ATKBD_CMD_SETLEDS when skipping ATKBD_CMD_GETID USB: serial: cp210x: add ID for IMST iM871A-USB USB: serial: option: add Fibocom FM101-GL variant USB: serial: qcserial: add new usb-id for Dell Wireless DW5826e net/af_iucv: clean up a try_then_request_module() netfilter: nft_compat: restrict match/target protocol to u16 netfilter: nft_compat: reject unused compat flag ppp_async: limit MRU to 64K tipc: Check the bearer type before calling tipc_udp_nl_bearer_add() rxrpc: Fix response to PING RESPONSE ACKs to a dead call inet: read sk->sk_family once in inet_recv_error() hwmon: (coretemp) Fix bogus core_id to attr name mapping hwmon: (coretemp) Fix out-of-bounds memory access hwmon: (aspeed-pwm-tacho) mutex for tach reading atm: idt77252: fix a memleak in open_card_ubr0 phy: ti: phy-omap-usb2: Fix NULL pointer dereference for SRP dmaengine: fix is_slave_direction() return false when DMA_DEV_TO_DEV bonding: remove print in bond_verify_device_path HID: apple: Add 2021 magic keyboard FN key mapping HID: apple: Swap the Fn and Left Control keys on Apple keyboards HID: apple: Add support for the 2021 Magic Keyboard net: sysfs: Fix /sys/class/net/<iface> path af_unix: fix lockdep positive in sk_diag_dump_icons() net: ipv4: fix a memleak in ip_setup_cork netfilter: nf_log: replace BUG_ON by WARN_ON_ONCE when putting logger llc: call sock_orphan() at release time ipv6: Ensure natural alignment of const ipv6 loopback and router addresses ixgbe: Fix an error handling path in ixgbe_read_iosf_sb_reg_x550() ixgbe: Refactor overtemp event handling ixgbe: Refactor returning internal error codes ixgbe: Remove non-inclusive language net: remove unneeded break scsi: isci: Fix an error code problem in isci_io_request_build() wifi: cfg80211: fix RCU dereference in __cfg80211_bss_update drm/amdgpu: Release 'adev->pm.fw' before return in 'amdgpu_device_need_post()' ceph: fix deadlock or deadcode of misusing dget() blk-mq: fix IO hang from sbitmap wakeup race virtio_net: Fix "‘%d’ directive writing between 1 and 11 bytes into a region of size 10" warnings libsubcmd: Fix memory leak in uniq() usb: hub: Replace hardcoded quirk value with BIT() macro PCI: Only override AMD USB controller if required mfd: ti_am335x_tscadc: Fix TI SoC dependencies um: net: Fix return type of uml_net_start_xmit() um: Don't use vfprintf() for os_info() um: Fix naming clash between UML and scheduler leds: trigger: panic: Don't register panic notifier if creating the trigger failed drm/amdgpu: Drop 'fence' check in 'to_amdgpu_amdkfd_fence()' drm/amdgpu: Let KFD sync with VM fences clk: mmp: pxa168: Fix memory leak in pxa168_clk_init() clk: hi3620: Fix memory leak in hi3620_mmc_clk_init() drm/msm/dpu: Ratelimit framedone timeout msgs media: ddbridge: fix an error code problem in ddb_probe IB/ipoib: Fix mcast list locking drm/exynos: Call drm_atomic_helper_shutdown() at shutdown/unbind time ALSA: hda: Intel: add HDA_ARL PCI ID support PCI: add INTEL_HDA_ARL to pci_ids.h media: rockchip: rga: fix swizzling for RGB formats media: stk1160: Fixed high volume of stk1160_dbg messages drm/mipi-dsi: Fix detach call without attach drm/framebuffer: Fix use of uninitialized variable drm/drm_file: fix use of uninitialized variable RDMA/IPoIB: Fix error code return in ipoib_mcast_join fast_dput(): handle underflows gracefully ASoC: doc: Fix undefined SND_SOC_DAPM_NOPM argument f2fs: fix to check return value of f2fs_reserve_new_block() wifi: cfg80211: free beacon_ies when overridden from hidden BSS wifi: rtlwifi: rtl8723{be,ae}: using calculate_bit_shift() wifi: rtl8xxxu: Add additional USB IDs for RTL8192EU devices md: Whenassemble the array, consult the superblock of the freshest device ARM: dts: imx23/28: Fix the DMA controller node name ARM: dts: imx23-sansa: Use preferred i2c-gpios properties ARM: dts: imx27-apf27dev: Fix LED name ARM: dts: imx1: Fix sram node ARM: dts: imx27: Fix sram node ARM: dts: imx: Use flash@0,0 pattern ARM: dts: imx25/27-eukrea: Fix RTC node name ARM: dts: rockchip: fix rk3036 hdmi ports node scsi: libfc: Fix up timeout error in fc_fcp_rec_error() scsi: libfc: Don't schedule abort twice bpf: Add map and need_defer parameters to .map_fd_put_ptr() wifi: ath9k: Fix potential array-index-out-of-bounds read in ath9k_htc_txstatus() ARM: dts: imx7s: Fix nand-controller #size-cells ARM: dts: imx7s: Fix lcdif compatible bonding: return -ENOMEM instead of BUG in alb_upper_dev_walk PCI: Add no PM reset quirk for NVIDIA Spectrum devices scsi: lpfc: Fix possible file string name overflow when updating firmware ext4: avoid online resizing failures due to oversized flex bg ext4: remove unnecessary check from alloc_flex_gd() ext4: unify the type of flexbg_size to unsigned int ext4: fix inconsistent between segment fstrim and full fstrim SUNRPC: Fix a suspicious RCU usage warning KVM: s390: fix setting of fpc register s390/ptrace: handle setting of fpc register correctly jfs: fix array-index-out-of-bounds in diNewExt rxrpc_find_service_conn_rcu: fix the usage of read_seqbegin_or_lock() afs: fix the usage of read_seqbegin_or_lock() in afs_find_server*() crypto: stm32/crc32 - fix parsing list of devices pstore/ram: Fix crash when setting number of cpus to an odd number jfs: fix uaf in jfs_evict_inode jfs: fix array-index-out-of-bounds in dbAdjTree jfs: fix slab-out-of-bounds Read in dtSearch UBSAN: array-index-out-of-bounds in dtSplitRoot FS:JFS:UBSAN:array-index-out-of-bounds in dbAdjTree ACPI: extlog: fix NULL pointer dereference check PNP: ACPI: fix fortify warning ACPI: video: Add quirk for the Colorful X15 AT 23 Laptop audit: Send netlink ACK before setting connection in auditd_set powerpc/lib: Validate size for vector operations powerpc/mm: Fix build failures due to arch_reserved_kernel_pages() powerpc: Fix build error due to is_valid_bugaddr() powerpc/mm: Fix null-pointer dereference in pgtable_cache_add net/sched: cbs: Fix not adding cbs instance to list x86/entry/ia32: Ensure s32 is sign extended to s64 tick/sched: Preserve number of idle sleeps across CPU hotplug events mips: Call lose_fpu(0) before initializing fcr31 in mips_set_personality_nan gpio: eic-sprd: Clear interrupt after set the interrupt type drm/exynos: gsc: minor fix for loop iteration in gsc_runtime_resume drm/bridge: nxp-ptn3460: simplify some error checking drm/bridge: nxp-ptn3460: fix i2c_master_send() error checking drm: Don't unref the same fb many times by mistake due to deadlock handling gpiolib: acpi: Ignore touchpad wakeup on GPD G1619-04 netfilter: nf_tables: reject QUEUE/DROP verdict parameters btrfs: defrag: reject unknown flags of btrfs_ioctl_defrag_range_args btrfs: don't warn if discard range is not aligned to sector net: fec: fix the unhandled context fault from smmu fjes: fix memleaks in fjes_hw_setup netfilter: nf_tables: restrict anonymous set and map names to 16 bytes net/mlx5e: fix a double-free in arfs_create_groups net/mlx5: Use kfree(ft->g) in arfs_create_groups() netlink: fix potential sleeping issue in mqueue_flush_file Conflicts: include/linux/fs.h include/linux/timer.h init/initramfs.c kernel/time/timer.c mm/memory-failure.c mm/page_alloc.c net/core/sock.c scripts/Makefile.extrawarn Change-Id: I0ccfce4c1a43240cfb997b426ef9fc59e61e3c55 |
||
|
|
ec0ad95612 |
Merge 4.19.312 into android-4.19-stable
Changes in 4.19.312
Documentation/hw-vuln: Update spectre doc
x86/cpu: Support AMD Automatic IBRS
x86/bugs: Use sysfs_emit()
timer/trace: Replace deprecated vsprintf pointer extension %pf by %ps
timer/trace: Improve timer tracing
timers: Prepare support for PREEMPT_RT
timers: Update kernel-doc for various functions
timers: Use del_timer_sync() even on UP
timers: Rename del_timer_sync() to timer_delete_sync()
wifi: brcmfmac: Fix use-after-free bug in brcmf_cfg80211_detach
smack: Set SMACK64TRANSMUTE only for dirs in smack_inode_setxattr()
smack: Handle SMACK64TRANSMUTE in smack_inode_setsecurity()
ARM: dts: mmp2-brownstone: Don't redeclare phandle references
arm: dts: marvell: Fix maxium->maxim typo in brownstone dts
media: xc4000: Fix atomicity violation in xc4000_get_frequency
KVM: Always flush async #PF workqueue when vCPU is being destroyed
sparc64: NMI watchdog: fix return value of __setup handler
sparc: vDSO: fix return value of __setup handler
crypto: qat - fix double free during reset
crypto: qat - resolve race condition during AER recovery
fat: fix uninitialized field in nostale filehandles
ubifs: Set page uptodate in the correct place
ubi: Check for too small LEB size in VTBL code
ubi: correct the calculation of fastmap size
parisc: Do not hardcode registers in checksum functions
parisc: Fix ip_fast_csum
parisc: Fix csum_ipv6_magic on 32-bit systems
parisc: Fix csum_ipv6_magic on 64-bit systems
parisc: Strip upper 32 bit of sum in csum_ipv6_magic for 64-bit builds
PM: suspend: Set mem_sleep_current during kernel command line setup
clk: qcom: gcc-ipq8074: fix terminating of frequency table arrays
clk: qcom: mmcc-apq8084: fix terminating of frequency table arrays
clk: qcom: mmcc-msm8974: fix terminating of frequency table arrays
powerpc/fsl: Fix mfpmr build errors with newer binutils
USB: serial: ftdi_sio: add support for GMC Z216C Adapter IR-USB
USB: serial: add device ID for VeriFone adapter
USB: serial: cp210x: add ID for MGP Instruments PDS100
USB: serial: option: add MeiG Smart SLM320 product
USB: serial: cp210x: add pid/vid for TDK NC0110013M and MM0110113M
PM: sleep: wakeirq: fix wake irq warning in system suspend
mmc: tmio: avoid concurrent runs of mmc_request_done()
fuse: don't unhash root
PCI: Drop pci_device_remove() test of pci_dev->driver
PCI/PM: Drain runtime-idle callbacks before driver removal
Revert "Revert "md/raid5: Wait for MD_SB_CHANGE_PENDING in raid5d""
dm-raid: fix lockdep waring in "pers->hot_add_disk"
mmc: core: Fix switch on gp3 partition
hwmon: (amc6821) add of_match table
ext4: fix corruption during on-line resize
slimbus: core: Remove usage of the deprecated ida_simple_xx() API
speakup: Fix 8bit characters from direct synth
kbuild: Move -Wenum-{compare-conditional,enum-conversion} into W=1
vfio/platform: Disable virqfds on cleanup
soc: fsl: qbman: Always disable interrupts when taking cgr_lock
soc: fsl: qbman: Add helper for sanity checking cgr ops
soc: fsl: qbman: Add CGR update function
soc: fsl: qbman: Use raw spinlock for cgr_lock
s390/zcrypt: fix reference counting on zcrypt card objects
drm/imx/ipuv3: do not return negative values from .get_modes()
drm/vc4: hdmi: do not return negative values from .get_modes()
memtest: use {READ,WRITE}_ONCE in memory scanning
nilfs2: fix failure to detect DAT corruption in btree and direct mappings
nilfs2: use a more common logging style
nilfs2: prevent kernel bug at submit_bh_wbc()
x86/CPU/AMD: Update the Zenbleed microcode revisions
ahci: asm1064: correct count of reported ports
ahci: asm1064: asm1166: don't limit reported ports
comedi: comedi_test: Prevent timers rescheduling during deletion
netfilter: nf_tables: disallow anonymous set with timeout flag
netfilter: nf_tables: reject constant set with timeout
xfrm: Avoid clang fortify warning in copy_to_user_tmpl()
ALSA: hda/realtek - Fix headset Mic no show at resume back for Lenovo ALC897 platform
USB: usb-storage: Prevent divide-by-0 error in isd200_ata_command
usb: gadget: ncm: Fix handling of zero block length packets
usb: port: Don't try to peer unused USB ports based on location
tty: serial: fsl_lpuart: avoid idle preamble pending if CTS is enabled
vt: fix unicode buffer corruption when deleting characters
vt: fix memory overlapping when deleting chars in the buffer
mm/memory-failure: fix an incorrect use of tail pages
mm/migrate: set swap entry values of THP tail pages properly.
wifi: mac80211: check/clear fast rx for non-4addr sta VLAN changes
exec: Fix NOMMU linux_binprm::exec in transfer_args_to_stack()
usb: cdc-wdm: close race between read and workqueue
ALSA: sh: aica: reorder cleanup operations to avoid UAF bugs
fs/aio: Check IOCB_AIO_RW before the struct aio_kiocb conversion
printk: Update @console_may_schedule in console_trylock_spinning()
btrfs: allocate btrfs_ioctl_defrag_range_args on stack
Revert "loop: Check for overflow while configuring loop"
loop: Call loop_config_discard() only after new config is applied
loop: Remove sector_t truncation checks
loop: Factor out setting loop device size
loop: Refactor loop_set_status() size calculation
loop: properly observe rotational flag of underlying device
perf/core: Fix reentry problem in perf_output_read_group()
efivarfs: Request at most 512 bytes for variable names
powerpc: xor_vmx: Add '-mhard-float' to CFLAGS
loop: Factor out configuring loop from status
loop: Check for overflow while configuring loop
loop: loop_set_status_from_info() check before assignment
usb: dwc2: host: Fix remote wakeup from hibernation
usb: dwc2: host: Fix hibernation flow
usb: dwc2: host: Fix ISOC flow in DDMA mode
usb: dwc2: gadget: LPM flow fix
usb: udc: remove warning when queue disabled ep
scsi: qla2xxx: Fix command flush on cable pull
x86/cpu: Enable STIBP on AMD if Automatic IBRS is enabled
scsi: lpfc: Correct size for wqe for memset()
USB: core: Fix deadlock in usb_deauthorize_interface()
nfc: nci: Fix uninit-value in nci_dev_up and nci_ntf_packet
mptcp: add sk_stop_timer_sync helper
tcp: properly terminate timers for kernel sockets
r8169: fix issue caused by buggy BIOS on certain boards with RTL8168d
Bluetooth: hci_event: set the conn encrypted before conn establishes
Bluetooth: Fix TOCTOU in HCI debugfs implementation
netfilter: nf_tables: disallow timeout for anonymous sets
net/rds: fix possible cp null dereference
Revert "x86/mm/ident_map: Use gbpages only where full GB page should be mapped."
mm, vmscan: prevent infinite loop for costly GFP_NOIO | __GFP_RETRY_MAYFAIL allocations
netfilter: nf_tables: Fix potential data-race in __nft_flowtable_type_get()
net/sched: act_skbmod: prevent kernel-infoleak
net: stmmac: fix rx queue priority assignment
selftests: reuseaddr_conflict: add missing new line at the end of the output
ipv6: Fix infinite recursion in fib6_dump_done().
i40e: fix vf may be used uninitialized in this function warning
staging: mmal-vchiq: Avoid use of bool in structures
staging: mmal-vchiq: Allocate and free components as required
staging: mmal-vchiq: Fix client_component for 64 bit kernel
staging: vc04_services: changen strncpy() to strscpy_pad()
staging: vc04_services: fix information leak in create_component()
initramfs: factor out a helper to populate the initrd image
fs: add a vfs_fchown helper
fs: add a vfs_fchmod helper
initramfs: switch initramfs unpacking to struct file based APIs
init: open /initrd.image with O_LARGEFILE
erspan: Add type I version 0 support.
erspan: make sure erspan_base_hdr is present in skb->head
ASoC: ops: Fix wraparound for mask in snd_soc_get_volsw
ata: sata_sx4: fix pdc20621_get_from_dimm() on 64-bit
ata: sata_mv: Fix PCI device ID table declaration compilation warning
ALSA: hda/realtek: Update Panasonic CF-SZ6 quirk to support headset with microphone
wifi: ath9k: fix LNA selection in ath_ant_try_scan()
VMCI: Fix memcpy() run-time warning in dg_dispatch_as_host()
arm64: dts: rockchip: fix rk3399 hdmi ports node
tools/power x86_energy_perf_policy: Fix file leak in get_pkg_num()
btrfs: handle chunk tree lookup error in btrfs_relocate_sys_chunks()
btrfs: export: handle invalid inode or root reference in btrfs_get_parent()
btrfs: send: handle path ref underflow in header iterate_inode_ref()
Bluetooth: btintel: Fix null ptr deref in btintel_read_version
Input: synaptics-rmi4 - fail probing if memory allocation for "phys" fails
sysv: don't call sb_bread() with pointers_lock held
scsi: lpfc: Fix possible memory leak in lpfc_rcv_padisc()
isofs: handle CDs with bad root inode but good Joliet root directory
media: sta2x11: fix irq handler cast
drm/amd/display: Fix nanosec stat overflow
SUNRPC: increase size of rpc_wait_queue.qlen from unsigned short to unsigned int
block: prevent division by zero in blk_rq_stat_sum()
Input: allocate keycode for Display refresh rate toggle
ktest: force $buildonly = 1 for 'make_warnings_file' test type
tools: iio: replace seekdir() in iio_generic_buffer
usb: sl811-hcd: only defined function checkdone if QUIRK2 is defined
fbdev: viafb: fix typo in hw_bitblt_1 and hw_bitblt_2
fbmon: prevent division by zero in fb_videomode_from_videomode()
tty: n_gsm: require CAP_NET_ADMIN to attach N_GSM0710 ldisc
drm/vkms: call drm_atomic_helper_shutdown before drm_dev_put()
virtio: reenable config if freezing device failed
x86/mm/pat: fix VM_PAT handling in COW mappings
Bluetooth: btintel: Fixe build regression
VMCI: Fix possible memcpy() run-time warning in vmci_datagram_invoke_guest_handler()
erspan: Check IFLA_GRE_ERSPAN_VER is set.
ip_gre: do not report erspan version on GRE interface
initramfs: fix populate_initrd_image() section mismatch
amdkfd: use calloc instead of kzalloc to avoid integer overflow
Linux 4.19.312
Change-Id: Ic4c50de6afb4c88c8011be6cc93f960d2dc968e0
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
|
||
|
|
c82a659cc8 |
mm, vmscan: prevent infinite loop for costly GFP_NOIO | __GFP_RETRY_MAYFAIL allocations
commit 803de9000f334b771afacb6ff3e78622916668b0 upstream.
Sven reports an infinite loop in __alloc_pages_slowpath() for costly order
__GFP_RETRY_MAYFAIL allocations that are also GFP_NOIO. Such combination
can happen in a suspend/resume context where a GFP_KERNEL allocation can
have __GFP_IO masked out via gfp_allowed_mask.
Quoting Sven:
1. try to do a "costly" allocation (order > PAGE_ALLOC_COSTLY_ORDER)
with __GFP_RETRY_MAYFAIL set.
2. page alloc's __alloc_pages_slowpath tries to get a page from the
freelist. This fails because there is nothing free of that costly
order.
3. page alloc tries to reclaim by calling __alloc_pages_direct_reclaim,
which bails out because a zone is ready to be compacted; it pretends
to have made a single page of progress.
4. page alloc tries to compact, but this always bails out early because
__GFP_IO is not set (it's not passed by the snd allocator, and even
if it were, we are suspending so the __GFP_IO flag would be cleared
anyway).
5. page alloc believes reclaim progress was made (because of the
pretense in item 3) and so it checks whether it should retry
compaction. The compaction retry logic thinks it should try again,
because:
a) reclaim is needed because of the early bail-out in item 4
b) a zonelist is suitable for compaction
6. goto 2. indefinite stall.
(end quote)
The immediate root cause is confusing the COMPACT_SKIPPED returned from
__alloc_pages_direct_compact() (step 4) due to lack of __GFP_IO to be
indicating a lack of order-0 pages, and in step 5 evaluating that in
should_compact_retry() as a reason to retry, before incrementing and
limiting the number of retries. There are however other places that
wrongly assume that compaction can happen while we lack __GFP_IO.
To fix this, introduce gfp_compaction_allowed() to abstract the __GFP_IO
evaluation and switch the open-coded test in try_to_compact_pages() to use
it.
Also use the new helper in:
- compaction_ready(), which will make reclaim not bail out in step 3, so
there's at least one attempt to actually reclaim, even if chances are
small for a costly order
- in_reclaim_compaction() which will make should_continue_reclaim()
return false and we don't over-reclaim unnecessarily
- in __alloc_pages_slowpath() to set a local variable can_compact,
which is then used to avoid retrying reclaim/compaction for costly
allocations (step 5) if we can't compact and also to skip the early
compaction attempt that we do in some cases
Link: https://lkml.kernel.org/r/20240221114357.13655-2-vbabka@suse.cz
Fixes:
|
||
|
|
fbc355c16d |
Revert "BACKPORT: mm: move zone watermark accesses behind an accessor"
This reverts commit
|
||
|
|
776eba0172 |
Revert "BACKPORT: mm, compaction: be selective about what pageblocks to clear skip hints"
This reverts commit
|
||
|
|
fd9c71c06b |
BACKPORT: mm, compaction: be selective about what pageblocks to clear skip hints
Pageblock hints are cleared when compaction restarts or kswapd makes enough progress that it can sleep but it's over-eager in that the bit is cleared for migration sources with no LRU pages and migration targets with no free pages. As pageblock skip hint flushes are relatively rare and out-of-band with respect to kswapd, this patch makes a few more expensive checks to see if it's appropriate to even clear the bit. Every pageblock that is not cleared will avoid 512 pages being scanned unnecessarily on x86-64. The impact is variable with different workloads showing small differences in latency, success rates and scan rates. This is expected as clearing the hints is not that common but doing a small amount of work out-of-band to avoid a large amount of work in-band later is generally a good thing. Link: http://lkml.kernel.org/r/20190118175136.31341-22-mgorman@techsingularity.net Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Signed-off-by: Qian Cai <cai@lca.pw> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Dan Carpenter <dan.carpenter@oracle.com> Cc: David Rientjes <rientjes@google.com> Cc: YueHaibing <yuehaibing@huawei.com> [cai@lca.pw: no stuck in __reset_isolation_pfn()] Link: http://lkml.kernel.org/r/20190206034732.75687-1-cai@lca.pw Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Change-Id: I27b3d1bf8100d0281ec297ef5ce79d100d0cb37e Git-commit: e332f741a8dd1ec9a6dc8aa997296ecbfe64323e Git-repo: git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git Signed-off-by: Vinayak Menon <vinmenon@codeaurora.org> (cherry picked from commit e332f741a8dd1ec9a6dc8aa997296ecbfe64323e) Signed-off-by: Mark Salyzyn <salyzyn@google.com> Bug: 150378964 |
||
|
|
acfb1c608b |
BACKPORT: mm: move zone watermark accesses behind an accessor
This is a preparation patch only, no functional change. Link: http://lkml.kernel.org/r/20181123114528.28802-3-mgorman@techsingularity.net Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: David Rientjes <rientjes@google.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Zi Yan <zi.yan@cs.rutgers.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Change-Id: Iccef5f02348ab59d75cbbe2b48f40017391b88b1 Git-commit: a921444382b49cc7fdeca3fba3e278bc09484a27 Git-repo: git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git [vinmenon@codeaurora.org: trivial merge conflict fixes] Signed-off-by: Vinayak Menon <vinmenon@codeaurora.org> (cherry picked from commit a921444382b49cc7fdeca3fba3e278bc09484a27) Signed-off-by: Mark Salyzyn <salyzyn@google.com> Bug: 150378964 |
||
|
|
3225a4fef0 |
Merge "Merge android-4.19-q.78 (d9e388f) into msm-4.19"
|
||
|
|
2a239da9de |
mm, compaction: fix wrong pfn handling in __reset_isolation_pfn()
Florian and Dave reported [1] a NULL pointer dereference in __reset_isolation_pfn(). While the exact cause is unclear, staring at the code revealed two bugs, which might be related. One bug is that if zone starts in the middle of pageblock, block_page might correspond to different pfn than block_pfn, and then the pfn_valid_within() checks will check different pfn's than those accessed via struct page. This might result in acessing an unitialized page in CONFIG_HOLES_IN_ZONE configs. The other bug is that end_page refers to the first page of next pageblock and not last page of current pageblock. The online and valid check is then wrong and with sections, the while (page < end_page) loop might wander off actual struct page arrays. [1] https://lore.kernel.org/linux-xfs/87o8z1fvqu.fsf@mid.deneb.enyo.de/ Change-Id: I4d2352cba861245ed844943846d5c569ff2f3e24 Link: http://lkml.kernel.org/r/20191008152915.24704-1-vbabka@suse.cz Fixes: 6b0868c820ff ("mm/compaction.c: correct zone boundary handling when resetting pageblock skip hints") Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Reported-by: Florian Weimer <fw@deneb.enyo.de> Reported-by: Dave Chinner <david@fromorbit.com> Acked-by: Mel Gorman <mgorman@techsingularity.net> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Git-Commit: a2e9a5afce080226edbf1882d63d99bf32070e9e Git-Repo: git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git Signed-off-by: Vinayak Menon <vinmenon@codeaurora.org> |
||
|
|
2857d83eee |
Merge android-4.19-q.77 (cd1eb9f) into msm-4.19
* refs/heads/tmp-cd1eb9f:
Revert "net: qrtr: Stop rx_worker before freeing node"
Linux 4.19.77
drm/amd/display: Restore backlight brightness after system resume
mm/compaction.c: clear total_{migrate,free}_scanned before scanning a new zone
fuse: fix deadlock with aio poll and fuse_iqueue::waitq.lock
md/raid0: avoid RAID0 data corruption due to layout confusion.
CIFS: Fix oplock handling for SMB 2.1+ protocols
CIFS: fix max ea value size
i2c: riic: Clear NACK in tend isr
hwrng: core - don't wait on add_early_randomness()
quota: fix wrong condition in is_quota_modification()
ext4: fix punch hole for inline_data file systems
ext4: fix warning inside ext4_convert_unwritten_extents_endio
/dev/mem: Bail out upon SIGKILL.
cfg80211: Purge frame registrations on iftype change
md: only call set_in_sync() when it is expected to succeed.
md: don't report active array_state until after revalidate_disk() completes.
md/raid6: Set R5_ReadError when there is read failure on parity disk
Btrfs: fix race setting up and completing qgroup rescan workers
btrfs: qgroup: Fix reserved data space leak if we have multiple reserve calls
btrfs: qgroup: Fix the wrong target io_tree when freeing reserved data space
btrfs: Relinquish CPUs in btrfs_compare_trees
Btrfs: fix use-after-free when using the tree modification log
btrfs: fix allocation of free space cache v1 bitmap pages
ovl: filter of trusted xattr results in audit
ovl: Fix dereferencing possible ERR_PTR()
smb3: allow disabling requesting leases
block: fix null pointer dereference in blk_mq_rq_timed_out()
i40e: check __I40E_VF_DISABLE bit in i40e_sync_filters_subtask
memcg, kmem: do not fail __GFP_NOFAIL charges
memcg, oom: don't require __GFP_FS when invoking memcg OOM killer
gfs2: clear buf_in_tr when ending a transaction in sweep_bh_for_rgrps
efifb: BGRT: Improve efifb_bgrt_sanity_check
regulator: Defer init completion for a while after late_initcall
alarmtimer: Use EOPNOTSUPP instead of ENOTSUPP
arm64: dts: rockchip: limit clock rate of MMC controllers for RK3328
arm64: tlb: Ensure we execute an ISB following walk cache invalidation
Revert "arm64: Remove unnecessary ISBs from set_{pte,pmd,pud}"
ARM: zynq: Use memcpy_toio instead of memcpy on smp bring-up
ARM: samsung: Fix system restart on S3C6410
ASoC: Intel: Fix use of potentially uninitialized variable
ASoC: Intel: Skylake: Use correct function to access iomem space
ASoC: Intel: NHLT: Fix debug print format
binfmt_elf: Do not move brk for INTERP-less ET_EXEC
media: don't drop front-end reference count for ->detach
media: sn9c20x: Add MSI MS-1039 laptop to flip_dmi_table
KVM: x86: Manually calculate reserved bits when loading PDPTRS
KVM: x86: set ctxt->have_exception in x86_decode_insn()
KVM: x86: always stop emulation on page fault
parisc: Disable HP HSC-PCI Cards to prevent kernel crash
fuse: fix missing unlock_page in fuse_writepage()
powerpc/imc: Dont create debugfs files for cpu-less nodes
scsi: implement .cleanup_rq callback
blk-mq: add callback of .cleanup_rq
ALSA: hda/realtek - PCI quirk for Medion E4254
ceph: use ceph_evict_inode to cleanup inode's resource
Revert "ceph: use ceph_evict_inode to cleanup inode's resource"
randstruct: Check member structs in is_pure_ops_struct()
IB/hfi1: Define variables as unsigned long to fix KASAN warning
IB/mlx5: Free mpi in mp_slave mode
printk: Do not lose last line in kmsg buffer dump
scsi: qla2xxx: Fix Relogin to prevent modifying scan_state flag
scsi: scsi_dh_rdac: zero cdb in send_mode_select()
ALSA: firewire-tascam: check intermediate state of clock status and retry
ALSA: firewire-tascam: handle error code when getting current source of clock
iwlwifi: fw: don't send GEO_TX_POWER_LIMIT command to FW version 36
PM / devfreq: passive: fix compiler warning
media: omap3isp: Set device on omap3isp subdevs
btrfs: extent-tree: Make sure we only allocate extents from block groups with the same type
iommu/amd: Override wrong IVRS IOAPIC on Raven Ridge systems
ALSA: hda/realtek - Blacklist PC beep for Lenovo ThinkCentre M73/93
media: ttusb-dec: Fix info-leak in ttusb_dec_send_command()
drm/amd/powerplay/smu7: enforce minimal VBITimeout (v2)
ALSA: hda - Drop unsol event handler for Intel HDMI codecs
e1000e: add workaround for possible stalled packet
libertas: Add missing sentinel at end of if_usb.c fw_table
raid5: don't increment read_errors on EILSEQ return
mmc: dw_mmc: Re-store SDIO IRQs mask at system resume
mmc: core: Add helper function to indicate if SDIO IRQs is enabled
mmc: sdhci: Fix incorrect switch to HS mode
mmc: core: Clarify sdio_irq_pending flag for MMC_CAP2_SDIO_IRQ_NOTHREAD
raid5: don't set STRIPE_HANDLE to stripe which is in batch list
ASoC: dmaengine: Make the pcm->name equal to pcm->id if the name is not set
platform/x86: intel_pmc_core: Do not ioremap RAM
x86/cpu: Add Tiger Lake to Intel family
s390/crypto: xts-aes-s390 fix extra run-time crypto self tests finding
kprobes: Prohibit probing on BUG() and WARN() address
dmaengine: ti: edma: Do not reset reserved paRAM slots
md/raid1: fail run raid1 array when active disk less than one
hwmon: (acpi_power_meter) Change log level for 'unsafe software power cap'
closures: fix a race on wakeup from closure_sync
ACPI / PCI: fix acpi_pci_irq_enable() memory leak
ACPI: custom_method: fix memory leaks
ARM: dts: exynos: Mark LDO10 as always-on on Peach Pit/Pi Chromebooks
libtraceevent: Change users plugin directory
iommu/iova: Avoid false sharing on fq_timer_on
libata/ahci: Drop PCS quirk for Denverton and beyond
iommu/amd: Silence warnings under memory pressure
ALSA: firewire-motu: add support for MOTU 4pre
nvme-multipath: fix ana log nsid lookup when nsid is not found
nvmet: fix data units read and written counters in SMART log
x86/mm/pti: Handle unaligned address gracefully in pti_clone_pagetable()
ASoC: fsl_ssi: Fix clock control issue in master mode
x86/mm/pti: Do not invoke PTI functions when PTI is disabled
arm64: kpti: ensure patched kernel text is fetched from PoU
x86/apic/vector: Warn when vector space exhaustion breaks affinity
sched/cpufreq: Align trace event behavior of fast switching
ACPI / CPPC: do not require the _PSD method
ASoC: es8316: fix headphone mixer volume table
media: ov9650: add a sanity check
perf trace beauty ioctl: Fix off-by-one error in cmd->string table
media: saa7134: fix terminology around saa7134_i2c_eeprom_md7134_gate()
media: cpia2_usb: fix memory leaks
media: saa7146: add cleanup in hexium_attach()
media: cec-notifier: clear cec_adap in cec_notifier_unregister
PM / devfreq: exynos-bus: Correct clock enable sequence
PM / devfreq: passive: Use non-devm notifiers
EDAC/amd64: Decode syndrome before translating address
EDAC/amd64: Recognize DRAM device type ECC capability
libperf: Fix alignment trap with xyarray contents in 'perf stat'
media: dvb-core: fix a memory leak bug
posix-cpu-timers: Sanitize bogus WARNONS
media: dvb-frontends: use ida for pll number
media: mceusb: fix (eliminate) TX IR signal length limit
nbd: add missing config put
led: triggers: Fix a memory leak bug
ASoC: sun4i-i2s: Don't use the oversample to calculate BCLK
tools headers: Fixup bitsperlong per arch includes
ASoC: uniphier: Fix double reset assersion when transitioning to suspend state
media: hdpvr: add terminating 0 at end of string
media: radio/si470x: kill urb on error
ARM: dts: imx7-colibri: disable HS400
ARM: dts: imx7d: cl-som-imx7: make ethernet work again
m68k: Prevent some compiler warnings in Coldfire builds
net: lpc-enet: fix printk format strings
media: imx: mipi csi-2: Don't fail if initial state times-out
media: omap3isp: Don't set streaming state on random subdevs
media: i2c: ov5645: Fix power sequence
media: vsp1: fix memory leak of dl on error return path
perf record: Support aarch64 random socket_id assignment
dmaengine: iop-adma: use correct printk format strings
media: rc: imon: Allow iMON RC protocol for ffdc 7e device
media: em28xx: modules workqueue not inited for 2nd device
media: fdp1: Reduce FCP not found message level to debug
media: mtk-mdp: fix reference count on old device tree
perf test vfs_getname: Disable ~/.perfconfig to get default output
perf config: Honour $PERF_CONFIG env var to specify alternate .perfconfig
media: gspca: zero usb_buf on error
idle: Prevent late-arriving interrupts from disrupting offline
sched/fair: Use rq_lock/unlock in online_fair_sched_group
firmware: arm_scmi: Check if platform has released shmem before using
efi: cper: print AER info of PCIe fatal error
EDAC, pnd2: Fix ioremap() size in dnv_rd_reg()
loop: Add LOOP_SET_DIRECT_IO to compat ioctl
ACPI / processor: don't print errors for processorIDs == 0xff
media: media/platform: fsl-viu.c: fix build for MICROBLAZE
md: don't set In_sync if array is frozen
md: don't call spare_active in md_reap_sync_thread if all member devices can't work
md/raid1: end bio when the device faulty
arm64/prefetch: fix a -Wtype-limits warning
ASoC: rsnd: don't call clk_get_rate() under atomic context
EDAC/altera: Use the proper type for the IRQ status bits
ia64:unwind: fix double free for mod->arch.init_unw_table
ALSA: usb-audio: Skip bSynchAddress endpoint check if it is invalid
base: soc: Export soc_device_register/unregister APIs
media: iguanair: add sanity checks
EDAC/mc: Fix grain_bits calculation
ALSA: i2c: ak4xxx-adda: Fix a possible null pointer dereference in build_adc_controls()
ALSA: hda - Show the fatal CORB/RIRB error more clearly
x86/apic: Soft disable APIC before initializing it
x86/reboot: Always use NMI fallback when shutdown via reboot vector IPI fails
sched/deadline: Fix bandwidth accounting at all levels after offline migration
x86/apic: Make apic_pending_intr_clear() more robust
sched/core: Fix CPU controller for !RT_GROUP_SCHED
sched/fair: Fix imbalance due to CPU affinity
time/tick-broadcast: Fix tick_broadcast_offline() lockdep complaint
media: i2c: ov5640: Check for devm_gpiod_get_optional() error
media: hdpvr: Add device num check and handling
media: exynos4-is: fix leaked of_node references
media: mtk-cir: lower de-glitch counter for rc-mm protocol
media: dib0700: fix link error for dibx000_i2c_set_speed
leds: leds-lp5562 allow firmware files up to the maximum length
dmaengine: bcm2835: Print error in case setting DMA mask fails
firmware: qcom_scm: Use proper types for dma mappings
ASoC: sgtl5000: Fix charge pump source assignment
ASoC: sgtl5000: Fix of unmute outputs on probe
ASoC: tlv320aic31xx: suppress error message for EPROBE_DEFER
regulator: lm363x: Fix off-by-one n_voltages for lm3632 ldo_vpos/ldo_vneg
ALSA: hda: Flush interrupts on disabling
nfp: flower: prevent memory leak in nfp_flower_spawn_phy_reprs
nfc: enforce CAP_NET_RAW for raw sockets
ieee802154: enforce CAP_NET_RAW for raw sockets
ax25: enforce CAP_NET_RAW for raw sockets
appletalk: enforce CAP_NET_RAW for raw sockets
mISDN: enforce CAP_NET_RAW for raw sockets
net/mlx5: Add device ID of upcoming BlueField-2
tcp: better handle TCP_USER_TIMEOUT in SYN_SENT state
net: sched: fix possible crash in tcf_action_destroy()
usbnet: sanity checking of packet sizes and device mtu
usbnet: ignore endpoints with invalid wMaxPacketSize
skge: fix checksum byte order
sch_netem: fix a divide by zero in tabledist()
ppp: Fix memory leak in ppp_write
openvswitch: change type of UPCALL_PID attribute to NLA_UNSPEC
nfp: flower: fix memory leak in nfp_flower_spawn_vnic_reprs
net_sched: add max len check for TCA_KIND
net/sched: act_sample: don't push mac header on ip6gre ingress
net: qrtr: Stop rx_worker before freeing node
net/phy: fix DP83865 10 Mbps HDX loopback disable function
macsec: drop skb sk before calling gro_cells_receive
cdc_ncm: fix divide-by-zero caused by invalid wMaxPacketSize
arcnet: provide a buffer big enough to actually receive packets
ANDROID: properly export new symbols with _GPL tag
Conflicts:
kernel/sched/cpufreq_schedutil.c
mm/compaction.c
Change-Id: I3f2cc4a1421480479b9040aa5eefe7e17437021d
Signed-off-by: Ivaylo Georgiev <irgeorgiev@codeaurora.org>
|
||
|
|
cd1eb9f1b9 |
Merge 4.19.77 into android-4.19-q
Changes in 4.19.77
arcnet: provide a buffer big enough to actually receive packets
cdc_ncm: fix divide-by-zero caused by invalid wMaxPacketSize
macsec: drop skb sk before calling gro_cells_receive
net/phy: fix DP83865 10 Mbps HDX loopback disable function
net: qrtr: Stop rx_worker before freeing node
net/sched: act_sample: don't push mac header on ip6gre ingress
net_sched: add max len check for TCA_KIND
nfp: flower: fix memory leak in nfp_flower_spawn_vnic_reprs
openvswitch: change type of UPCALL_PID attribute to NLA_UNSPEC
ppp: Fix memory leak in ppp_write
sch_netem: fix a divide by zero in tabledist()
skge: fix checksum byte order
usbnet: ignore endpoints with invalid wMaxPacketSize
usbnet: sanity checking of packet sizes and device mtu
net: sched: fix possible crash in tcf_action_destroy()
tcp: better handle TCP_USER_TIMEOUT in SYN_SENT state
net/mlx5: Add device ID of upcoming BlueField-2
mISDN: enforce CAP_NET_RAW for raw sockets
appletalk: enforce CAP_NET_RAW for raw sockets
ax25: enforce CAP_NET_RAW for raw sockets
ieee802154: enforce CAP_NET_RAW for raw sockets
nfc: enforce CAP_NET_RAW for raw sockets
nfp: flower: prevent memory leak in nfp_flower_spawn_phy_reprs
ALSA: hda: Flush interrupts on disabling
regulator: lm363x: Fix off-by-one n_voltages for lm3632 ldo_vpos/ldo_vneg
ASoC: tlv320aic31xx: suppress error message for EPROBE_DEFER
ASoC: sgtl5000: Fix of unmute outputs on probe
ASoC: sgtl5000: Fix charge pump source assignment
firmware: qcom_scm: Use proper types for dma mappings
dmaengine: bcm2835: Print error in case setting DMA mask fails
leds: leds-lp5562 allow firmware files up to the maximum length
media: dib0700: fix link error for dibx000_i2c_set_speed
media: mtk-cir: lower de-glitch counter for rc-mm protocol
media: exynos4-is: fix leaked of_node references
media: hdpvr: Add device num check and handling
media: i2c: ov5640: Check for devm_gpiod_get_optional() error
time/tick-broadcast: Fix tick_broadcast_offline() lockdep complaint
sched/fair: Fix imbalance due to CPU affinity
sched/core: Fix CPU controller for !RT_GROUP_SCHED
x86/apic: Make apic_pending_intr_clear() more robust
sched/deadline: Fix bandwidth accounting at all levels after offline migration
x86/reboot: Always use NMI fallback when shutdown via reboot vector IPI fails
x86/apic: Soft disable APIC before initializing it
ALSA: hda - Show the fatal CORB/RIRB error more clearly
ALSA: i2c: ak4xxx-adda: Fix a possible null pointer dereference in build_adc_controls()
EDAC/mc: Fix grain_bits calculation
media: iguanair: add sanity checks
base: soc: Export soc_device_register/unregister APIs
ALSA: usb-audio: Skip bSynchAddress endpoint check if it is invalid
ia64:unwind: fix double free for mod->arch.init_unw_table
EDAC/altera: Use the proper type for the IRQ status bits
ASoC: rsnd: don't call clk_get_rate() under atomic context
arm64/prefetch: fix a -Wtype-limits warning
md/raid1: end bio when the device faulty
md: don't call spare_active in md_reap_sync_thread if all member devices can't work
md: don't set In_sync if array is frozen
media: media/platform: fsl-viu.c: fix build for MICROBLAZE
ACPI / processor: don't print errors for processorIDs == 0xff
loop: Add LOOP_SET_DIRECT_IO to compat ioctl
EDAC, pnd2: Fix ioremap() size in dnv_rd_reg()
efi: cper: print AER info of PCIe fatal error
firmware: arm_scmi: Check if platform has released shmem before using
sched/fair: Use rq_lock/unlock in online_fair_sched_group
idle: Prevent late-arriving interrupts from disrupting offline
media: gspca: zero usb_buf on error
perf config: Honour $PERF_CONFIG env var to specify alternate .perfconfig
perf test vfs_getname: Disable ~/.perfconfig to get default output
media: mtk-mdp: fix reference count on old device tree
media: fdp1: Reduce FCP not found message level to debug
media: em28xx: modules workqueue not inited for 2nd device
media: rc: imon: Allow iMON RC protocol for ffdc 7e device
dmaengine: iop-adma: use correct printk format strings
perf record: Support aarch64 random socket_id assignment
media: vsp1: fix memory leak of dl on error return path
media: i2c: ov5645: Fix power sequence
media: omap3isp: Don't set streaming state on random subdevs
media: imx: mipi csi-2: Don't fail if initial state times-out
net: lpc-enet: fix printk format strings
m68k: Prevent some compiler warnings in Coldfire builds
ARM: dts: imx7d: cl-som-imx7: make ethernet work again
ARM: dts: imx7-colibri: disable HS400
media: radio/si470x: kill urb on error
media: hdpvr: add terminating 0 at end of string
ASoC: uniphier: Fix double reset assersion when transitioning to suspend state
tools headers: Fixup bitsperlong per arch includes
ASoC: sun4i-i2s: Don't use the oversample to calculate BCLK
led: triggers: Fix a memory leak bug
nbd: add missing config put
media: mceusb: fix (eliminate) TX IR signal length limit
media: dvb-frontends: use ida for pll number
posix-cpu-timers: Sanitize bogus WARNONS
media: dvb-core: fix a memory leak bug
libperf: Fix alignment trap with xyarray contents in 'perf stat'
EDAC/amd64: Recognize DRAM device type ECC capability
EDAC/amd64: Decode syndrome before translating address
PM / devfreq: passive: Use non-devm notifiers
PM / devfreq: exynos-bus: Correct clock enable sequence
media: cec-notifier: clear cec_adap in cec_notifier_unregister
media: saa7146: add cleanup in hexium_attach()
media: cpia2_usb: fix memory leaks
media: saa7134: fix terminology around saa7134_i2c_eeprom_md7134_gate()
perf trace beauty ioctl: Fix off-by-one error in cmd->string table
media: ov9650: add a sanity check
ASoC: es8316: fix headphone mixer volume table
ACPI / CPPC: do not require the _PSD method
sched/cpufreq: Align trace event behavior of fast switching
x86/apic/vector: Warn when vector space exhaustion breaks affinity
arm64: kpti: ensure patched kernel text is fetched from PoU
x86/mm/pti: Do not invoke PTI functions when PTI is disabled
ASoC: fsl_ssi: Fix clock control issue in master mode
x86/mm/pti: Handle unaligned address gracefully in pti_clone_pagetable()
nvmet: fix data units read and written counters in SMART log
nvme-multipath: fix ana log nsid lookup when nsid is not found
ALSA: firewire-motu: add support for MOTU 4pre
iommu/amd: Silence warnings under memory pressure
libata/ahci: Drop PCS quirk for Denverton and beyond
iommu/iova: Avoid false sharing on fq_timer_on
libtraceevent: Change users plugin directory
ARM: dts: exynos: Mark LDO10 as always-on on Peach Pit/Pi Chromebooks
ACPI: custom_method: fix memory leaks
ACPI / PCI: fix acpi_pci_irq_enable() memory leak
closures: fix a race on wakeup from closure_sync
hwmon: (acpi_power_meter) Change log level for 'unsafe software power cap'
md/raid1: fail run raid1 array when active disk less than one
dmaengine: ti: edma: Do not reset reserved paRAM slots
kprobes: Prohibit probing on BUG() and WARN() address
s390/crypto: xts-aes-s390 fix extra run-time crypto self tests finding
x86/cpu: Add Tiger Lake to Intel family
platform/x86: intel_pmc_core: Do not ioremap RAM
ASoC: dmaengine: Make the pcm->name equal to pcm->id if the name is not set
raid5: don't set STRIPE_HANDLE to stripe which is in batch list
mmc: core: Clarify sdio_irq_pending flag for MMC_CAP2_SDIO_IRQ_NOTHREAD
mmc: sdhci: Fix incorrect switch to HS mode
mmc: core: Add helper function to indicate if SDIO IRQs is enabled
mmc: dw_mmc: Re-store SDIO IRQs mask at system resume
raid5: don't increment read_errors on EILSEQ return
libertas: Add missing sentinel at end of if_usb.c fw_table
e1000e: add workaround for possible stalled packet
ALSA: hda - Drop unsol event handler for Intel HDMI codecs
drm/amd/powerplay/smu7: enforce minimal VBITimeout (v2)
media: ttusb-dec: Fix info-leak in ttusb_dec_send_command()
ALSA: hda/realtek - Blacklist PC beep for Lenovo ThinkCentre M73/93
iommu/amd: Override wrong IVRS IOAPIC on Raven Ridge systems
btrfs: extent-tree: Make sure we only allocate extents from block groups with the same type
media: omap3isp: Set device on omap3isp subdevs
PM / devfreq: passive: fix compiler warning
iwlwifi: fw: don't send GEO_TX_POWER_LIMIT command to FW version 36
ALSA: firewire-tascam: handle error code when getting current source of clock
ALSA: firewire-tascam: check intermediate state of clock status and retry
scsi: scsi_dh_rdac: zero cdb in send_mode_select()
scsi: qla2xxx: Fix Relogin to prevent modifying scan_state flag
printk: Do not lose last line in kmsg buffer dump
IB/mlx5: Free mpi in mp_slave mode
IB/hfi1: Define variables as unsigned long to fix KASAN warning
randstruct: Check member structs in is_pure_ops_struct()
Revert "ceph: use ceph_evict_inode to cleanup inode's resource"
ceph: use ceph_evict_inode to cleanup inode's resource
ALSA: hda/realtek - PCI quirk for Medion E4254
blk-mq: add callback of .cleanup_rq
scsi: implement .cleanup_rq callback
powerpc/imc: Dont create debugfs files for cpu-less nodes
fuse: fix missing unlock_page in fuse_writepage()
parisc: Disable HP HSC-PCI Cards to prevent kernel crash
KVM: x86: always stop emulation on page fault
KVM: x86: set ctxt->have_exception in x86_decode_insn()
KVM: x86: Manually calculate reserved bits when loading PDPTRS
media: sn9c20x: Add MSI MS-1039 laptop to flip_dmi_table
media: don't drop front-end reference count for ->detach
binfmt_elf: Do not move brk for INTERP-less ET_EXEC
ASoC: Intel: NHLT: Fix debug print format
ASoC: Intel: Skylake: Use correct function to access iomem space
ASoC: Intel: Fix use of potentially uninitialized variable
ARM: samsung: Fix system restart on S3C6410
ARM: zynq: Use memcpy_toio instead of memcpy on smp bring-up
Revert "arm64: Remove unnecessary ISBs from set_{pte,pmd,pud}"
arm64: tlb: Ensure we execute an ISB following walk cache invalidation
arm64: dts: rockchip: limit clock rate of MMC controllers for RK3328
alarmtimer: Use EOPNOTSUPP instead of ENOTSUPP
regulator: Defer init completion for a while after late_initcall
efifb: BGRT: Improve efifb_bgrt_sanity_check
gfs2: clear buf_in_tr when ending a transaction in sweep_bh_for_rgrps
memcg, oom: don't require __GFP_FS when invoking memcg OOM killer
memcg, kmem: do not fail __GFP_NOFAIL charges
i40e: check __I40E_VF_DISABLE bit in i40e_sync_filters_subtask
block: fix null pointer dereference in blk_mq_rq_timed_out()
smb3: allow disabling requesting leases
ovl: Fix dereferencing possible ERR_PTR()
ovl: filter of trusted xattr results in audit
btrfs: fix allocation of free space cache v1 bitmap pages
Btrfs: fix use-after-free when using the tree modification log
btrfs: Relinquish CPUs in btrfs_compare_trees
btrfs: qgroup: Fix the wrong target io_tree when freeing reserved data space
btrfs: qgroup: Fix reserved data space leak if we have multiple reserve calls
Btrfs: fix race setting up and completing qgroup rescan workers
md/raid6: Set R5_ReadError when there is read failure on parity disk
md: don't report active array_state until after revalidate_disk() completes.
md: only call set_in_sync() when it is expected to succeed.
cfg80211: Purge frame registrations on iftype change
/dev/mem: Bail out upon SIGKILL.
ext4: fix warning inside ext4_convert_unwritten_extents_endio
ext4: fix punch hole for inline_data file systems
quota: fix wrong condition in is_quota_modification()
hwrng: core - don't wait on add_early_randomness()
i2c: riic: Clear NACK in tend isr
CIFS: fix max ea value size
CIFS: Fix oplock handling for SMB 2.1+ protocols
md/raid0: avoid RAID0 data corruption due to layout confusion.
fuse: fix deadlock with aio poll and fuse_iqueue::waitq.lock
mm/compaction.c: clear total_{migrate,free}_scanned before scanning a new zone
drm/amd/display: Restore backlight brightness after system resume
Linux 4.19.77
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I024d8f764cd5910ba1210299375accb2e79f0abc
|
||
|
|
7f1f24fed2 |
Merge 4.19.77 into android-4.19
Changes in 4.19.77
arcnet: provide a buffer big enough to actually receive packets
cdc_ncm: fix divide-by-zero caused by invalid wMaxPacketSize
macsec: drop skb sk before calling gro_cells_receive
net/phy: fix DP83865 10 Mbps HDX loopback disable function
net: qrtr: Stop rx_worker before freeing node
net/sched: act_sample: don't push mac header on ip6gre ingress
net_sched: add max len check for TCA_KIND
nfp: flower: fix memory leak in nfp_flower_spawn_vnic_reprs
openvswitch: change type of UPCALL_PID attribute to NLA_UNSPEC
ppp: Fix memory leak in ppp_write
sch_netem: fix a divide by zero in tabledist()
skge: fix checksum byte order
usbnet: ignore endpoints with invalid wMaxPacketSize
usbnet: sanity checking of packet sizes and device mtu
net: sched: fix possible crash in tcf_action_destroy()
tcp: better handle TCP_USER_TIMEOUT in SYN_SENT state
net/mlx5: Add device ID of upcoming BlueField-2
mISDN: enforce CAP_NET_RAW for raw sockets
appletalk: enforce CAP_NET_RAW for raw sockets
ax25: enforce CAP_NET_RAW for raw sockets
ieee802154: enforce CAP_NET_RAW for raw sockets
nfc: enforce CAP_NET_RAW for raw sockets
nfp: flower: prevent memory leak in nfp_flower_spawn_phy_reprs
ALSA: hda: Flush interrupts on disabling
regulator: lm363x: Fix off-by-one n_voltages for lm3632 ldo_vpos/ldo_vneg
ASoC: tlv320aic31xx: suppress error message for EPROBE_DEFER
ASoC: sgtl5000: Fix of unmute outputs on probe
ASoC: sgtl5000: Fix charge pump source assignment
firmware: qcom_scm: Use proper types for dma mappings
dmaengine: bcm2835: Print error in case setting DMA mask fails
leds: leds-lp5562 allow firmware files up to the maximum length
media: dib0700: fix link error for dibx000_i2c_set_speed
media: mtk-cir: lower de-glitch counter for rc-mm protocol
media: exynos4-is: fix leaked of_node references
media: hdpvr: Add device num check and handling
media: i2c: ov5640: Check for devm_gpiod_get_optional() error
time/tick-broadcast: Fix tick_broadcast_offline() lockdep complaint
sched/fair: Fix imbalance due to CPU affinity
sched/core: Fix CPU controller for !RT_GROUP_SCHED
x86/apic: Make apic_pending_intr_clear() more robust
sched/deadline: Fix bandwidth accounting at all levels after offline migration
x86/reboot: Always use NMI fallback when shutdown via reboot vector IPI fails
x86/apic: Soft disable APIC before initializing it
ALSA: hda - Show the fatal CORB/RIRB error more clearly
ALSA: i2c: ak4xxx-adda: Fix a possible null pointer dereference in build_adc_controls()
EDAC/mc: Fix grain_bits calculation
media: iguanair: add sanity checks
base: soc: Export soc_device_register/unregister APIs
ALSA: usb-audio: Skip bSynchAddress endpoint check if it is invalid
ia64:unwind: fix double free for mod->arch.init_unw_table
EDAC/altera: Use the proper type for the IRQ status bits
ASoC: rsnd: don't call clk_get_rate() under atomic context
arm64/prefetch: fix a -Wtype-limits warning
md/raid1: end bio when the device faulty
md: don't call spare_active in md_reap_sync_thread if all member devices can't work
md: don't set In_sync if array is frozen
media: media/platform: fsl-viu.c: fix build for MICROBLAZE
ACPI / processor: don't print errors for processorIDs == 0xff
loop: Add LOOP_SET_DIRECT_IO to compat ioctl
EDAC, pnd2: Fix ioremap() size in dnv_rd_reg()
efi: cper: print AER info of PCIe fatal error
firmware: arm_scmi: Check if platform has released shmem before using
sched/fair: Use rq_lock/unlock in online_fair_sched_group
idle: Prevent late-arriving interrupts from disrupting offline
media: gspca: zero usb_buf on error
perf config: Honour $PERF_CONFIG env var to specify alternate .perfconfig
perf test vfs_getname: Disable ~/.perfconfig to get default output
media: mtk-mdp: fix reference count on old device tree
media: fdp1: Reduce FCP not found message level to debug
media: em28xx: modules workqueue not inited for 2nd device
media: rc: imon: Allow iMON RC protocol for ffdc 7e device
dmaengine: iop-adma: use correct printk format strings
perf record: Support aarch64 random socket_id assignment
media: vsp1: fix memory leak of dl on error return path
media: i2c: ov5645: Fix power sequence
media: omap3isp: Don't set streaming state on random subdevs
media: imx: mipi csi-2: Don't fail if initial state times-out
net: lpc-enet: fix printk format strings
m68k: Prevent some compiler warnings in Coldfire builds
ARM: dts: imx7d: cl-som-imx7: make ethernet work again
ARM: dts: imx7-colibri: disable HS400
media: radio/si470x: kill urb on error
media: hdpvr: add terminating 0 at end of string
ASoC: uniphier: Fix double reset assersion when transitioning to suspend state
tools headers: Fixup bitsperlong per arch includes
ASoC: sun4i-i2s: Don't use the oversample to calculate BCLK
led: triggers: Fix a memory leak bug
nbd: add missing config put
media: mceusb: fix (eliminate) TX IR signal length limit
media: dvb-frontends: use ida for pll number
posix-cpu-timers: Sanitize bogus WARNONS
media: dvb-core: fix a memory leak bug
libperf: Fix alignment trap with xyarray contents in 'perf stat'
EDAC/amd64: Recognize DRAM device type ECC capability
EDAC/amd64: Decode syndrome before translating address
PM / devfreq: passive: Use non-devm notifiers
PM / devfreq: exynos-bus: Correct clock enable sequence
media: cec-notifier: clear cec_adap in cec_notifier_unregister
media: saa7146: add cleanup in hexium_attach()
media: cpia2_usb: fix memory leaks
media: saa7134: fix terminology around saa7134_i2c_eeprom_md7134_gate()
perf trace beauty ioctl: Fix off-by-one error in cmd->string table
media: ov9650: add a sanity check
ASoC: es8316: fix headphone mixer volume table
ACPI / CPPC: do not require the _PSD method
sched/cpufreq: Align trace event behavior of fast switching
x86/apic/vector: Warn when vector space exhaustion breaks affinity
arm64: kpti: ensure patched kernel text is fetched from PoU
x86/mm/pti: Do not invoke PTI functions when PTI is disabled
ASoC: fsl_ssi: Fix clock control issue in master mode
x86/mm/pti: Handle unaligned address gracefully in pti_clone_pagetable()
nvmet: fix data units read and written counters in SMART log
nvme-multipath: fix ana log nsid lookup when nsid is not found
ALSA: firewire-motu: add support for MOTU 4pre
iommu/amd: Silence warnings under memory pressure
libata/ahci: Drop PCS quirk for Denverton and beyond
iommu/iova: Avoid false sharing on fq_timer_on
libtraceevent: Change users plugin directory
ARM: dts: exynos: Mark LDO10 as always-on on Peach Pit/Pi Chromebooks
ACPI: custom_method: fix memory leaks
ACPI / PCI: fix acpi_pci_irq_enable() memory leak
closures: fix a race on wakeup from closure_sync
hwmon: (acpi_power_meter) Change log level for 'unsafe software power cap'
md/raid1: fail run raid1 array when active disk less than one
dmaengine: ti: edma: Do not reset reserved paRAM slots
kprobes: Prohibit probing on BUG() and WARN() address
s390/crypto: xts-aes-s390 fix extra run-time crypto self tests finding
x86/cpu: Add Tiger Lake to Intel family
platform/x86: intel_pmc_core: Do not ioremap RAM
ASoC: dmaengine: Make the pcm->name equal to pcm->id if the name is not set
raid5: don't set STRIPE_HANDLE to stripe which is in batch list
mmc: core: Clarify sdio_irq_pending flag for MMC_CAP2_SDIO_IRQ_NOTHREAD
mmc: sdhci: Fix incorrect switch to HS mode
mmc: core: Add helper function to indicate if SDIO IRQs is enabled
mmc: dw_mmc: Re-store SDIO IRQs mask at system resume
raid5: don't increment read_errors on EILSEQ return
libertas: Add missing sentinel at end of if_usb.c fw_table
e1000e: add workaround for possible stalled packet
ALSA: hda - Drop unsol event handler for Intel HDMI codecs
drm/amd/powerplay/smu7: enforce minimal VBITimeout (v2)
media: ttusb-dec: Fix info-leak in ttusb_dec_send_command()
ALSA: hda/realtek - Blacklist PC beep for Lenovo ThinkCentre M73/93
iommu/amd: Override wrong IVRS IOAPIC on Raven Ridge systems
btrfs: extent-tree: Make sure we only allocate extents from block groups with the same type
media: omap3isp: Set device on omap3isp subdevs
PM / devfreq: passive: fix compiler warning
iwlwifi: fw: don't send GEO_TX_POWER_LIMIT command to FW version 36
ALSA: firewire-tascam: handle error code when getting current source of clock
ALSA: firewire-tascam: check intermediate state of clock status and retry
scsi: scsi_dh_rdac: zero cdb in send_mode_select()
scsi: qla2xxx: Fix Relogin to prevent modifying scan_state flag
printk: Do not lose last line in kmsg buffer dump
IB/mlx5: Free mpi in mp_slave mode
IB/hfi1: Define variables as unsigned long to fix KASAN warning
randstruct: Check member structs in is_pure_ops_struct()
Revert "ceph: use ceph_evict_inode to cleanup inode's resource"
ceph: use ceph_evict_inode to cleanup inode's resource
ALSA: hda/realtek - PCI quirk for Medion E4254
blk-mq: add callback of .cleanup_rq
scsi: implement .cleanup_rq callback
powerpc/imc: Dont create debugfs files for cpu-less nodes
fuse: fix missing unlock_page in fuse_writepage()
parisc: Disable HP HSC-PCI Cards to prevent kernel crash
KVM: x86: always stop emulation on page fault
KVM: x86: set ctxt->have_exception in x86_decode_insn()
KVM: x86: Manually calculate reserved bits when loading PDPTRS
media: sn9c20x: Add MSI MS-1039 laptop to flip_dmi_table
media: don't drop front-end reference count for ->detach
binfmt_elf: Do not move brk for INTERP-less ET_EXEC
ASoC: Intel: NHLT: Fix debug print format
ASoC: Intel: Skylake: Use correct function to access iomem space
ASoC: Intel: Fix use of potentially uninitialized variable
ARM: samsung: Fix system restart on S3C6410
ARM: zynq: Use memcpy_toio instead of memcpy on smp bring-up
Revert "arm64: Remove unnecessary ISBs from set_{pte,pmd,pud}"
arm64: tlb: Ensure we execute an ISB following walk cache invalidation
arm64: dts: rockchip: limit clock rate of MMC controllers for RK3328
alarmtimer: Use EOPNOTSUPP instead of ENOTSUPP
regulator: Defer init completion for a while after late_initcall
efifb: BGRT: Improve efifb_bgrt_sanity_check
gfs2: clear buf_in_tr when ending a transaction in sweep_bh_for_rgrps
memcg, oom: don't require __GFP_FS when invoking memcg OOM killer
memcg, kmem: do not fail __GFP_NOFAIL charges
i40e: check __I40E_VF_DISABLE bit in i40e_sync_filters_subtask
block: fix null pointer dereference in blk_mq_rq_timed_out()
smb3: allow disabling requesting leases
ovl: Fix dereferencing possible ERR_PTR()
ovl: filter of trusted xattr results in audit
btrfs: fix allocation of free space cache v1 bitmap pages
Btrfs: fix use-after-free when using the tree modification log
btrfs: Relinquish CPUs in btrfs_compare_trees
btrfs: qgroup: Fix the wrong target io_tree when freeing reserved data space
btrfs: qgroup: Fix reserved data space leak if we have multiple reserve calls
Btrfs: fix race setting up and completing qgroup rescan workers
md/raid6: Set R5_ReadError when there is read failure on parity disk
md: don't report active array_state until after revalidate_disk() completes.
md: only call set_in_sync() when it is expected to succeed.
cfg80211: Purge frame registrations on iftype change
/dev/mem: Bail out upon SIGKILL.
ext4: fix warning inside ext4_convert_unwritten_extents_endio
ext4: fix punch hole for inline_data file systems
quota: fix wrong condition in is_quota_modification()
hwrng: core - don't wait on add_early_randomness()
i2c: riic: Clear NACK in tend isr
CIFS: fix max ea value size
CIFS: Fix oplock handling for SMB 2.1+ protocols
md/raid0: avoid RAID0 data corruption due to layout confusion.
fuse: fix deadlock with aio poll and fuse_iqueue::waitq.lock
mm/compaction.c: clear total_{migrate,free}_scanned before scanning a new zone
drm/amd/display: Restore backlight brightness after system resume
Linux 4.19.77
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I2c74f09f497a4b45b244a7dd263c5116533dfccb
|
||
|
|
4d8bdf7f3a |
mm/compaction.c: clear total_{migrate,free}_scanned before scanning a new zone
[ Upstream commit a94b525241c0fff3598809131d7cfcfe1d572d8c ]
total_{migrate,free}_scanned will be added to COMPACTMIGRATE_SCANNED and
COMPACTFREE_SCANNED in compact_zone(). We should clear them before
scanning a new zone. In the proc triggered compaction, we forgot clearing
them.
[laoar.shao@gmail.com: introduce a helper compact_zone_counters_init()]
Link: http://lkml.kernel.org/r/1563869295-25748-1-git-send-email-laoar.shao@gmail.com
[akpm@linux-foundation.org: expand compact_zone_counters_init() into its single callsite, per mhocko]
[vbabka@suse.cz: squash compact_zone() list_head init as well]
Link: http://lkml.kernel.org/r/1fb6f7da-f776-9e42-22f8-bbb79b030b98@suse.cz
[akpm@linux-foundation.org: kcompactd_do_work(): avoid unnecessary initialization of cc.zone]
Link: http://lkml.kernel.org/r/1563789275-9639-1-git-send-email-laoar.shao@gmail.com
Fixes:
|
||
|
|
4375a566f0 |
mm: compaction: avoid 100% CPU usage during compaction when a task is killed
"howaboutsynergy" reported via kernel buzilla number 204165 that
compact_zone_order was consuming 100% CPU during a stress test for
prolonged periods of time. Specifically the following command, which
should exit in 10 seconds, was taking an excessive time to finish while
the CPU was pegged at 100%.
stress -m 220 --vm-bytes 1000000000 --timeout 10
Tracing indicated a pattern as follows
stress-3923 [007] 519.106208: mm_compaction_isolate_migratepages: range=(0x70bb80 ~ 0x70bb80) nr_scanned=0 nr_taken=0
stress-3923 [007] 519.106212: mm_compaction_isolate_migratepages: range=(0x70bb80 ~ 0x70bb80) nr_scanned=0 nr_taken=0
stress-3923 [007] 519.106216: mm_compaction_isolate_migratepages: range=(0x70bb80 ~ 0x70bb80) nr_scanned=0 nr_taken=0
stress-3923 [007] 519.106219: mm_compaction_isolate_migratepages: range=(0x70bb80 ~ 0x70bb80) nr_scanned=0 nr_taken=0
stress-3923 [007] 519.106223: mm_compaction_isolate_migratepages: range=(0x70bb80 ~ 0x70bb80) nr_scanned=0 nr_taken=0
stress-3923 [007] 519.106227: mm_compaction_isolate_migratepages: range=(0x70bb80 ~ 0x70bb80) nr_scanned=0 nr_taken=0
stress-3923 [007] 519.106231: mm_compaction_isolate_migratepages: range=(0x70bb80 ~ 0x70bb80) nr_scanned=0 nr_taken=0
stress-3923 [007] 519.106235: mm_compaction_isolate_migratepages: range=(0x70bb80 ~ 0x70bb80) nr_scanned=0 nr_taken=0
stress-3923 [007] 519.106238: mm_compaction_isolate_migratepages: range=(0x70bb80 ~ 0x70bb80) nr_scanned=0 nr_taken=0
stress-3923 [007] 519.106242: mm_compaction_isolate_migratepages: range=(0x70bb80 ~ 0x70bb80) nr_scanned=0 nr_taken=0
Note that compaction is entered in rapid succession while scanning and
isolating nothing. The problem is that when a task that is compacting
receives a fatal signal, it retries indefinitely instead of exiting
while making no progress as a fatal signal is pending.
It's not easy to trigger this condition although enabling zswap helps on
the basis that the timing is altered. A very small window has to be hit
for the problem to occur (signal delivered while compacting and
isolating a PFN for migration that is not aligned to SWAP_CLUSTER_MAX).
This was reproduced locally -- 16G single socket system, 8G swap, 30%
zswap configured, vm-bytes 22000000000 using Colin Kings stress-ng
implementation from github running in a loop until the problem hits).
Tracing recorded the problem occurring almost 200K times in a short
window. With this patch, the problem hit 4 times but the task existed
normally instead of consuming CPU.
This problem has existed for some time but it was made worse by commit
cf66f0700c8f ("mm, compaction: do not consider a need to reschedule as
contention"). Before that commit, if the same condition was hit then
locks would be quickly contended and compaction would exit that way.
Change-Id: I67b546921390d17b393c1f3f2f195db9de499255
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=204165
Link: http://lkml.kernel.org/r/20190718085708.GE24383@techsingularity.net
Fixes: cf66f0700c8f ("mm, compaction: do not consider a need to reschedule as contention")
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Cc: <stable@vger.kernel.org> [5.1+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Git-Commit: 670105a25608affe01cb0ccdc2a1f4bd2327172b
Git-Repo: git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
[vinmenon@codeaurora.org: trivial conflict fixes]
Signed-off-by: Vinayak Menon <vinmenon@codeaurora.org>
|
||
|
|
b3378ed4cf |
mm/compaction.c: correct zone boundary handling when isolating pages from a pageblock
syzbot reported the following error from a tree with a head commit of
baf76f0c58ae ("slip: make slhc_free() silently accept an error pointer")
BUG: unable to handle kernel paging request at ffffea0003348000
#PF error: [normal kernel read fault]
PGD 12c3f9067 P4D 12c3f9067 PUD 12c3f8067 PMD 0
Oops: 0000 [#1] PREEMPT SMP KASAN
CPU: 1 PID: 28916 Comm: syz-executor.2 Not tainted 5.1.0-rc6+ #89
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:constant_test_bit arch/x86/include/asm/bitops.h:314 [inline]
RIP: 0010:PageCompound include/linux/page-flags.h:186 [inline]
RIP: 0010:isolate_freepages_block+0x1c0/0xd40 mm/compaction.c:579
Code: 01 d8 ff 4d 85 ed 0f 84 ef 07 00 00 e8 29 00 d8 ff 4c 89 e0 83 85 38 ff
ff ff 01 48 c1 e8 03 42 80 3c 38 00 0f 85 31 0a 00 00 <4d> 8b 2c 24 31 ff 49
c1 ed 10 41 83 e5 01 44 89 ee e8 3a 01 d8 ff
RSP: 0018:ffff88802b31eab8 EFLAGS: 00010246
RAX: 1ffffd4000669000 RBX: 00000000000cd200 RCX: ffffc9000a235000
RDX: 000000000001ca5e RSI: ffffffff81988cc7 RDI: 0000000000000001
RBP: ffff88802b31ebd8 R08: ffff88805af700c0 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: ffffea0003348000
R13: 0000000000000000 R14: ffff88802b31f030 R15: dffffc0000000000
FS: 00007f61648dc700(0000) GS:ffff8880ae900000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffffea0003348000 CR3: 0000000037c64000 CR4: 00000000001426e0
Call Trace:
fast_isolate_around mm/compaction.c:1243 [inline]
fast_isolate_freepages mm/compaction.c:1418 [inline]
isolate_freepages mm/compaction.c:1438 [inline]
compaction_alloc+0x1aee/0x22e0 mm/compaction.c:1550
There is no reproducer and it is difficult to hit -- 1 crash every few
days. The issue is very similar to the fix in commit 6b0868c820ff
("mm/compaction.c: correct zone boundary handling when resetting pageblock
skip hints"). When isolating free pages around a target pageblock, the
boundary handling is off by one and can stray into the next pageblock.
Triggering the syzbot error requires that the end of pageblock is section
or zone aligned, and that the next section is unpopulated.
A more subtle consequence of the bug is that pageblocks were being
improperly used as migration targets which potentially hurts fragmentation
avoidance in the long-term one page at a time.
A debugging patch revealed that it's definitely possible to stray outside
of a pageblock which is not intended. While syzbot cannot be used to
verify this patch, it was confirmed that the debugging warning no longer
triggers with this patch applied. It has also been confirmed that the THP
allocation stress tests are not degraded by this patch.
Change-Id: I66ba899d30ee101dfbb3cd06bca790338a888682
Link: http://lkml.kernel.org/r/20190510182124.GI18914@techsingularity.net
Fixes: e332f741a8dd ("mm, compaction: be selective about what pageblocks to clear skip hints")
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Reported-by: syzbot+d84c80f9fe26a0f7a734@syzkaller.appspotmail.com
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Qian Cai <cai@lca.pw>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: <stable@vger.kernel.org> # v5.1+
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Git-Commit: 60fce36afa9c77c7ccbf980c4f670f3be3651fce
Git-Repo: git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
Signed-off-by: Vinayak Menon <vinmenon@codeaurora.org>
|
||
|
|
211b97e45a |
mm, compaction: make sure we isolate a valid PFN
When we have holes in a normal memory zone, we could endup having cached_migrate_pfns which may not necessarily be valid, under heavy memory pressure with swapping enabled ( via __reset_isolation_suitable(), triggered by kswapd). Later if we fail to find a page via fast_isolate_freepages(), we may end up using the migrate_pfn we started the search with, as valid page. This could lead to accessing NULL pointer derefernces like below, due to an invalid mem_section pointer. Unable to handle kernel NULL pointer dereference at virtual address 0000000000000008 [47/1825] Mem abort info: ESR = 0x96000004 Exception class = DABT (current EL), IL = 32 bits SET = 0, FnV = 0 EA = 0, S1PTW = 0 Data abort info: ISV = 0, ISS = 0x00000004 CM = 0, WnR = 0 user pgtable: 4k pages, 48-bit VAs, pgdp = 0000000082f94ae9 [0000000000000008] pgd=0000000000000000 Internal error: Oops: 96000004 [#1] SMP ... CPU: 10 PID: 6080 Comm: qemu-system-aar Not tainted 510-rc1+ #6 Hardware name: AmpereComputing(R) OSPREY EV-883832-X3-0001/OSPREY, BIOS 4819 09/25/2018 pstate: 60000005 (nZCv daif -PAN -UAO) pc : set_pfnblock_flags_mask+0x58/0xe8 lr : compaction_alloc+0x300/0x950 [...] Process qemu-system-aar (pid: 6080, stack limit = 0x0000000095070da5) Call trace: set_pfnblock_flags_mask+0x58/0xe8 compaction_alloc+0x300/0x950 migrate_pages+0x1a4/0xbb0 compact_zone+0x750/0xde8 compact_zone_order+0xd8/0x118 try_to_compact_pages+0xb4/0x290 __alloc_pages_direct_compact+0x84/0x1e0 __alloc_pages_nodemask+0x5e0/0xe18 alloc_pages_vma+0x1cc/0x210 do_huge_pmd_anonymous_page+0x108/0x7c8 __handle_mm_fault+0xdd4/0x1190 handle_mm_fault+0x114/0x1c0 __get_user_pages+0x198/0x3c0 get_user_pages_unlocked+0xb4/0x1d8 __gfn_to_pfn_memslot+0x12c/0x3b8 gfn_to_pfn_prot+0x4c/0x60 kvm_handle_guest_abort+0x4b0/0xcd8 handle_exit+0x140/0x1b8 kvm_arch_vcpu_ioctl_run+0x260/0x768 kvm_vcpu_ioctl+0x490/0x898 do_vfs_ioctl+0xc4/0x898 ksys_ioctl+0x8c/0xa0 __arm64_sys_ioctl+0x28/0x38 el0_svc_common+0x74/0x118 el0_svc_handler+0x38/0x78 el0_svc+0x8/0xc Code: f8607840 f100001f 8b011401 9a801020 (f9400400) ---[ end trace af6a35219325a9b6 ]--- The issue was reported on an arm64 server with 128GB with holes in the zone (e.g, [32GB@4GB, 96GB@544GB]), with a swap device enabled, while running 100 KVM guest instances. This patch fixes the issue by ensuring that the page belongs to a valid PFN when we fallback to using the lower limit of the scan range upon failure in fast_isolate_freepages(). Change-Id: I813c890772ba42d056e2ea2bb58c7cd2e4c969c7 Link: http://lkml.kernel.org/r/1558711908-15688-1-git-send-email-suzuki.poulose@arm.com Fixes: 5a811889de10f1eb ("mm, compaction: use free lists to quickly locate a migration target") Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Reported-by: Marc Zyngier <marc.zyngier@arm.com> Reviewed-by: Mel Gorman <mgorman@techsingularity.net> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Qian Cai <cai@lca.pw> Cc: Marc Zyngier <marc.zyngier@arm.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Git-commit: 6b036c707d5b4476b6be9213fc5253c28324589d Git-repo: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git Signed-off-by: Sudarshan Rajagopalan <sudaraja@codeaurora.org> |
||
|
|
6829f2dfa5 |
mm/compaction.c: correct zone boundary handling when resetting pageblock skip hints
Mikhail Gavrilo reported the following bug being triggered in a Fedora kernel based on 5.1-rc1 but it is relevant to a vanilla kernel. kernel: page dumped because: VM_BUG_ON_PAGE(PagePoisoned(p)) kernel: ------------[ cut here ]------------ kernel: kernel BUG at include/linux/mm.h:1021! kernel: invalid opcode: 0000 [#1] SMP NOPTI kernel: CPU: 6 PID: 116 Comm: kswapd0 Tainted: G C 5.1.0-0.rc1.git1.3.fc31.x86_64 #1 kernel: Hardware name: System manufacturer System Product Name/ROG STRIX X470-I GAMING, BIOS 1201 12/07/2018 kernel: RIP: 0010:__reset_isolation_pfn+0x244/0x2b0 kernel: Code: fe 06 e8 0f 8e fc ff 44 0f b6 4c 24 04 48 85 c0 0f 85 dc fe ff ff e9 68 fe ff ff 48 c7 c6 58 b7 2e 8c 4c 89 ff e8 0c 75 00 00 <0f> 0b 48 c7 c6 58 b7 2e 8c e8 fe 74 00 00 0f 0b 48 89 fa 41 b8 01 kernel: RSP: 0018:ffff9e2d03f0fde8 EFLAGS: 00010246 kernel: RAX: 0000000000000034 RBX: 000000000081f380 RCX: ffff8cffbddd6c20 kernel: RDX: 0000000000000000 RSI: 0000000000000006 RDI: ffff8cffbddd6c20 kernel: RBP: 0000000000000001 R08: 0000009898b94613 R09: 0000000000000000 kernel: R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000100000 kernel: R13: 0000000000100000 R14: 0000000000000001 R15: ffffca7de07ce000 kernel: FS: 0000000000000000(0000) GS:ffff8cffbdc00000(0000) knlGS:0000000000000000 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 kernel: CR2: 00007fc1670e9000 CR3: 00000007f5276000 CR4: 00000000003406e0 kernel: Call Trace: kernel: __reset_isolation_suitable+0x62/0x120 kernel: reset_isolation_suitable+0x3b/0x40 kernel: kswapd+0x147/0x540 kernel: ? finish_wait+0x90/0x90 kernel: kthread+0x108/0x140 kernel: ? balance_pgdat+0x560/0x560 kernel: ? kthread_park+0x90/0x90 kernel: ret_from_fork+0x27/0x50 He bisected it down to e332f741a8dd ("mm, compaction: be selective about what pageblocks to clear skip hints"). The problem is that the patch in question was sloppy with respect to the handling of zone boundaries. In some instances, it was possible for PFNs outside of a zone to be examined and if those were not properly initialised or poisoned then it would trigger the VM_BUG_ON. This patch corrects the zone boundary issues when resetting pageblock skip hints and Mikhail reported that the bug did not trigger after 30 hours of testing. Change-Id: Ie7a0dddfa966725e35805a97b59b6d337723227a Link: http://lkml.kernel.org/r/20190327085424.GL3189@techsingularity.net Fixes: e332f741a8dd ("mm, compaction: be selective about what pageblocks to clear skip hints") Reported-by: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com> Tested-by: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com> Cc: Daniel Jordan <daniel.m.jordan@oracle.com> Cc: Qian Cai <cai@lca.pw> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Git-commit: 6b0868c820ff7370d15d6ddfe71b1ce6bbe8a25d Git-repo: git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git Signed-off-by: Vinayak Menon <vinmenon@codeaurora.org> |
||
|
|
4092087a75 |
mm/compaction.c: abort search if isolation fails
Running LTP oom01 in a tight loop or memory stress testing put the system in a low-memory situation could triggers random memory corruption like page flag corruption below due to in fast_isolate_freepages(), if isolation fails, next_search_order() does not abort the search immediately could lead to improper accesses. UBSAN: Undefined behaviour in ./include/linux/mm.h:1195:50 index 7 is out of range for type 'zone [5]' Call Trace: dump_stack+0x62/0x9a ubsan_epilogue+0xd/0x7f __ubsan_handle_out_of_bounds+0x14d/0x192 __isolate_free_page+0x52c/0x600 compaction_alloc+0x886/0x25f0 unmap_and_move+0x37/0x1e70 migrate_pages+0x2ca/0xb20 compact_zone+0x19cb/0x3620 kcompactd_do_work+0x2df/0x680 kcompactd+0x1d8/0x6c0 kthread+0x32c/0x3f0 ret_from_fork+0x35/0x40 ------------[ cut here ]------------ kernel BUG at mm/page_alloc.c:3124! invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN PTI RIP: 0010:__isolate_free_page+0x464/0x600 RSP: 0000:ffff888b9e1af848 EFLAGS: 00010007 RAX: 0000000030000000 RBX: ffff888c39fcf0f8 RCX: 0000000000000000 RDX: 1ffff111873f9e25 RSI: 0000000000000004 RDI: ffffed1173c35ef6 RBP: ffff888b9e1af898 R08: fffffbfff4fc2461 R09: fffffbfff4fc2460 R10: fffffbfff4fc2460 R11: ffffffffa7e12303 R12: 0000000000000008 R13: dffffc0000000000 R14: 0000000000000000 R15: 0000000000000007 FS: 0000000000000000(0000) GS:ffff888ba8e80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fc7abc00000 CR3: 0000000752416004 CR4: 00000000001606a0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: compaction_alloc+0x886/0x25f0 unmap_and_move+0x37/0x1e70 migrate_pages+0x2ca/0xb20 compact_zone+0x19cb/0x3620 kcompactd_do_work+0x2df/0x680 kcompactd+0x1d8/0x6c0 kthread+0x32c/0x3f0 ret_from_fork+0x35/0x40 Change-Id: I420de189e2f98d3bcb7b9407c3329eeac0379945 Link: http://lkml.kernel.org/r/20190320192648.52499-1-cai@lca.pw Fixes: dbe2d4e4f12e ("mm, compaction: round-robin the order while searching the free lists for a target") Signed-off-by: Qian Cai <cai@lca.pw> Acked-by: Mel Gorman <mgorman@techsingularity.net> Cc: Daniel Jordan <daniel.m.jordan@oracle.com> Cc: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Pavel Tatashin <pasha.tatashin@soleen.com> Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Git-commit: 5b56d996dd50a9d2ca87c25ebb50c07b255b7e04 Git-repo: git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git Signed-off-by: Vinayak Menon <vinmenon@codeaurora.org> |
||
|
|
781f40c173 |
mm/compaction.c: fix an undefined behaviour
In a low-memory situation, cc->fast_search_fail can keep increasing as it is unable to find an available page to isolate in fast_isolate_freepages(). As the result, it could trigger an error below, so just compare with the maximum bits can be shifted first. UBSAN: Undefined behaviour in mm/compaction.c:1160:30 shift exponent 64 is too large for 64-bit type 'unsigned long' CPU: 131 PID: 1308 Comm: kcompactd1 Kdump: loaded Tainted: G W L 5.0.0+ #17 Call trace: dump_backtrace+0x0/0x450 show_stack+0x20/0x2c dump_stack+0xc8/0x14c __ubsan_handle_shift_out_of_bounds+0x7e8/0x8c4 compaction_alloc+0x2344/0x2484 unmap_and_move+0xdc/0x1dbc migrate_pages+0x274/0x1310 compact_zone+0x26ec/0x43bc kcompactd+0x15b8/0x1a24 kthread+0x374/0x390 ret_from_fork+0x10/0x18 Change-Id: I3e4d978adff2be903ee85ae2668a557eadc71ad2 Link: http://lkml.kernel.org/r/20190320203338.53367-1-cai@lca.pw Fixes: 70b44595eafe ("mm, compaction: use free lists to quickly locate a migration source") Signed-off-by: Qian Cai <cai@lca.pw> Acked-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Mel Gorman <mgorman@techsingularity.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Git-commit: b6643a56f0e5958ee33affdcfc06eb7d88557a79 Git-repo: git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git Signed-off-by: Vinayak Menon <vinmenon@codeaurora.org> |
||
|
|
b3ef0156f7 |
mm, compaction: capture a page under direct compaction
Compaction is inherently race-prone as a suitable page freed during
compaction can be allocated by any parallel task. This patch uses a
capture_control structure to isolate a page immediately when it is freed
by a direct compactor in the slow path of the page allocator. The
intent is to avoid redundant scanning.
5.0.0-rc1 5.0.0-rc1
selective-v3r17 capture-v3r19
Amean fault-both-1 0.00 ( 0.00%) 0.00 * 0.00%*
Amean fault-both-3 2582.11 ( 0.00%) 2563.68 ( 0.71%)
Amean fault-both-5 4500.26 ( 0.00%) 4233.52 ( 5.93%)
Amean fault-both-7 5819.53 ( 0.00%) 6333.65 ( -8.83%)
Amean fault-both-12 9321.18 ( 0.00%) 9759.38 ( -4.70%)
Amean fault-both-18 9782.76 ( 0.00%) 10338.76 ( -5.68%)
Amean fault-both-24 15272.81 ( 0.00%) 13379.55 * 12.40%*
Amean fault-both-30 15121.34 ( 0.00%) 16158.25 ( -6.86%)
Amean fault-both-32 18466.67 ( 0.00%) 18971.21 ( -2.73%)
Latency is only moderately affected but the devil is in the details. A
closer examination indicates that base page fault latency is reduced but
latency of huge pages is increased as it takes creater care to succeed.
Part of the "problem" is that allocation success rates are close to 100%
even when under pressure and compaction gets harder
5.0.0-rc1 5.0.0-rc1
selective-v3r17 capture-v3r19
Percentage huge-3 96.70 ( 0.00%) 98.23 ( 1.58%)
Percentage huge-5 96.99 ( 0.00%) 95.30 ( -1.75%)
Percentage huge-7 94.19 ( 0.00%) 97.24 ( 3.24%)
Percentage huge-12 94.95 ( 0.00%) 97.35 ( 2.53%)
Percentage huge-18 96.74 ( 0.00%) 97.30 ( 0.58%)
Percentage huge-24 97.07 ( 0.00%) 97.55 ( 0.50%)
Percentage huge-30 95.69 ( 0.00%) 98.50 ( 2.95%)
Percentage huge-32 96.70 ( 0.00%) 99.27 ( 2.65%)
And scan rates are reduced as expected by 6% for the migration scanner
and 29% for the free scanner indicating that there is less redundant
work.
Compaction migrate scanned 20815362 19573286
Compaction free scanned 16352612 11510663
[mgorman@techsingularity.net: remove redundant check]
Link: http://lkml.kernel.org/r/20190201143853.GH9565@techsingularity.net
Link: http://lkml.kernel.org/r/20190118175136.31341-23-mgorman@techsingularity.net
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Cc: David Rientjes <rientjes@google.com>
Cc: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Change-Id: I4ee7abef8cfd6fa3f227e14eeb518f84027b224e
Git-commit: 5e1f0f098b4649fad53011246bcaeff011ffdf5d
Git-repo: git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
Signed-off-by: Vinayak Menon <vinmenon@codeaurora.org>
|
||
|
|
b065d58e1f |
mm, compaction: be selective about what pageblocks to clear skip hints
Pageblock hints are cleared when compaction restarts or kswapd makes enough progress that it can sleep but it's over-eager in that the bit is cleared for migration sources with no LRU pages and migration targets with no free pages. As pageblock skip hint flushes are relatively rare and out-of-band with respect to kswapd, this patch makes a few more expensive checks to see if it's appropriate to even clear the bit. Every pageblock that is not cleared will avoid 512 pages being scanned unnecessarily on x86-64. The impact is variable with different workloads showing small differences in latency, success rates and scan rates. This is expected as clearing the hints is not that common but doing a small amount of work out-of-band to avoid a large amount of work in-band later is generally a good thing. Link: http://lkml.kernel.org/r/20190118175136.31341-22-mgorman@techsingularity.net Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Signed-off-by: Qian Cai <cai@lca.pw> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Dan Carpenter <dan.carpenter@oracle.com> Cc: David Rientjes <rientjes@google.com> Cc: YueHaibing <yuehaibing@huawei.com> [cai@lca.pw: no stuck in __reset_isolation_pfn()] Link: http://lkml.kernel.org/r/20190206034732.75687-1-cai@lca.pw Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Change-Id: I27b3d1bf8100d0281ec297ef5ce79d100d0cb37e Git-commit: e332f741a8dd1ec9a6dc8aa997296ecbfe64323e Git-repo: git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git Signed-off-by: Vinayak Menon <vinmenon@codeaurora.org> |
||
|
|
c04b3699a7 |
mm, compaction: sample pageblocks for free pages
Once fast searching finishes, there is a possibility that the linear
scanner is scanning full blocks found by the fast scanner earlier. This
patch uses an adaptive stride to sample pageblocks for free pages. The
more consecutive full pageblocks encountered, the larger the stride
until a pageblock with free pages is found. The scanners might meet
slightly sooner but it is an acceptable risk given that the search of
the free lists may still encounter the pages and adjust the cached PFN
of the free scanner accordingly.
5.0.0-rc1 5.0.0-rc1
roundrobin-v3r17 samplefree-v3r17
Amean fault-both-1 0.00 ( 0.00%) 0.00 * 0.00%*
Amean fault-both-3 2752.37 ( 0.00%) 2729.95 ( 0.81%)
Amean fault-both-5 4341.69 ( 0.00%) 4397.80 ( -1.29%)
Amean fault-both-7 6308.75 ( 0.00%) 6097.61 ( 3.35%)
Amean fault-both-12 10241.81 ( 0.00%) 9407.15 ( 8.15%)
Amean fault-both-18 13736.09 ( 0.00%) 10857.63 * 20.96%*
Amean fault-both-24 16853.95 ( 0.00%) 13323.24 * 20.95%*
Amean fault-both-30 15862.61 ( 0.00%) 17345.44 ( -9.35%)
Amean fault-both-32 18450.85 ( 0.00%) 16892.00 ( 8.45%)
The latency is mildly improved offseting some overhead from earlier
patches that are prerequisites for the rest of the series. However, a
major impact is on the free scan rate with an 82% reduction.
5.0.0-rc1 5.0.0-rc1
roundrobin-v3r17 samplefree-v3r17
Compaction migrate scanned 21607271 20116887
Compaction free scanned 95336406 16668703
It's also the first time in the series where the number of pages scanned
by the migration scanner is greater than the free scanner due to the
increased search efficiency.
Change-Id: I402919a00925ad98bbc386f0711ac759c7547576
Link: http://lkml.kernel.org/r/20190118175136.31341-21-mgorman@techsingularity.net
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Cc: David Rientjes <rientjes@google.com>
Cc: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Git-commit: 4fca9730c51d51f643f2a3f8f10ebd718349c80f
Git-repo: git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
Signed-off-by: Vinayak Menon <vinmenon@codeaurora.org>
|
||
|
|
f50303ed0f |
mm, compaction: round-robin the order while searching the free lists for a target
As compaction proceeds and creates high-order blocks, the free list search gets less efficient as the larger blocks are used as compaction targets. Eventually, the larger blocks will be behind the migration scanner for partially migrated pageblocks and the search fails. This patch round-robins what orders are searched so that larger blocks can be ignored and find smaller blocks that can be used as migration targets. The overall impact was small on 1-socket but it avoids corner cases where the migration/free scanners meet prematurely or situations where many of the pageblocks encountered by the free scanner are almost full instead of being properly packed. Previous testing had indicated that without this patch there were occasional large spikes in the free scanner without this patch. [dan.carpenter@oracle.com: fix static checker warning] Link: http://lkml.kernel.org/r/20190118175136.31341-20-mgorman@techsingularity.net Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: David Rientjes <rientjes@google.com> Cc: YueHaibing <yuehaibing@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Change-Id: I97df21cf1b0ed2966426235abbf3d35ef49c6195 Git-commit: dbe2d4e4f12e07c6a2215e3603a5f77056323081 Git-repo: git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git Signed-off-by: Vinayak Menon <vinmenon@codeaurora.org> |
||
|
|
3049a660e2 |
mm, compaction: reduce premature advancement of the migration target scanner
The fast isolation of free pages allows the cached PFN of the free
scanner to advance faster than necessary depending on the contents of
the free list. The key is that fast_isolate_freepages() can update
zone->compact_cached_free_pfn via isolate_freepages_block(). When the
fast search fails, the linear scan can start from a point that has
skipped valid migration targets, particularly pageblocks with just
low-order free pages. This can cause the migration source/target
scanners to meet prematurely causing a reset.
This patch starts by avoiding an update of the pageblock skip
information and cached PFN from isolate_freepages_block() and puts the
responsibility of updating that information in the callers. The fast
scanner will update the cached PFN if and only if it finds a block that
is higher than the existing cached PFN and sets the skip if the
pageblock is full or nearly full. The linear scanner will update
skipped information and the cached PFN only when a block is completely
scanned. The total impact is that the free scanner advances more slowly
as it is primarily driven by the linear scanner instead of the fast
search.
5.0.0-rc1 5.0.0-rc1
noresched-v3r17 slowfree-v3r17
Amean fault-both-3 2965.68 ( 0.00%) 3036.75 ( -2.40%)
Amean fault-both-5 3995.90 ( 0.00%) 4522.24 * -13.17%*
Amean fault-both-7 5842.12 ( 0.00%) 6365.35 ( -8.96%)
Amean fault-both-12 9550.87 ( 0.00%) 10340.93 ( -8.27%)
Amean fault-both-18 13304.72 ( 0.00%) 14732.46 ( -10.73%)
Amean fault-both-24 14618.59 ( 0.00%) 16288.96 ( -11.43%)
Amean fault-both-30 16650.96 ( 0.00%) 16346.21 ( 1.83%)
Amean fault-both-32 17145.15 ( 0.00%) 19317.49 ( -12.67%)
The impact to latency is higher than the last version but it appears to
be due to a slight increase in the free scan rates which is a potential
side-effect of the patch. However, this is necessary for later patches
that are more careful about how pageblocks are treated as earlier
iterations of those patches hit corner cases where the restarts were
punishing and very visible.
Change-Id: I12f3f18f286fa2564d8150923fa09de42420773f
Link: http://lkml.kernel.org/r/20190118175136.31341-19-mgorman@techsingularity.net
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Cc: David Rientjes <rientjes@google.com>
Cc: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Git-commit: d097a6f63522547dfc7c75c7084a05b6a7f9e838
Git-repo: git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
Signed-off-by: Vinayak Menon <vinmenon@codeaurora.org>
|
||
|
|
7c50c7815a |
mm, compaction: do not consider a need to reschedule as contention
Scanning on large machines can take a considerable length of time and
eventually need to be rescheduled. This is treated as an abort event
but that's not appropriate as the attempt is likely to be retried after
making numerous checks and taking another cycle through the page
allocator. This patch will check the need to reschedule if necessary
but continue the scanning.
The main benefit is reduced scanning when compaction is taking a long
time or the machine is over-saturated. It also avoids an unnecessary
exit of compaction that ends up being retried by the page allocator in
the outer loop.
5.0.0-rc1 5.0.0-rc1
synccached-v3r16 noresched-v3r17
Amean fault-both-1 0.00 ( 0.00%) 0.00 * 0.00%*
Amean fault-both-3 2958.27 ( 0.00%) 2965.68 ( -0.25%)
Amean fault-both-5 4091.90 ( 0.00%) 3995.90 ( 2.35%)
Amean fault-both-7 5803.05 ( 0.00%) 5842.12 ( -0.67%)
Amean fault-both-12 9481.06 ( 0.00%) 9550.87 ( -0.74%)
Amean fault-both-18 14141.51 ( 0.00%) 13304.72 ( 5.92%)
Amean fault-both-24 16438.00 ( 0.00%) 14618.59 ( 11.07%)
Amean fault-both-30 17531.72 ( 0.00%) 16650.96 ( 5.02%)
Amean fault-both-32 17101.96 ( 0.00%) 17145.15 ( -0.25%)
Change-Id: I002a0fb5cfa1898e4a1ea5aa7c2111a4c9ced1a3
Link: http://lkml.kernel.org/r/20190118175136.31341-18-mgorman@techsingularity.net
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Cc: David Rientjes <rientjes@google.com>
Cc: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Git-commit: cf66f0700c8f1d7c7c1c1d7e5e846a1836814601
Git-repo: git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
Signed-off-by: Vinayak Menon <vinmenon@codeaurora.org>
|
||
|
|
ef120f1793 |
mm, compaction: rework compact_should_abort as compact_check_resched
With incremental changes, compact_should_abort no longer makes any documented sense. Rename to compact_check_resched and update the associated comments. There is no benefit other than reducing redundant code and making the intent slightly clearer. It could potentially be merged with earlier patches but it just makes the review slightly harder. Change-Id: I1cc005bcde88921503966a2428c72564c897a9dd Link: http://lkml.kernel.org/r/20190118175136.31341-17-mgorman@techsingularity.net Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Dan Carpenter <dan.carpenter@oracle.com> Cc: David Rientjes <rientjes@google.com> Cc: YueHaibing <yuehaibing@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Git-commit: cb810ad294d3c3a454e51b12fbb483bbb7096b98 Git-repo: git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git Signed-off-by: Vinayak Menon <vinmenon@codeaurora.org> |
||
|
|
acda146d71 |
mm, compaction: keep cached migration PFNs synced for unusable pageblocks
Migrate has separate cached PFNs for ASYNC and SYNC* migration on the basis that some migrations will fail in ASYNC mode. However, if the cached PFNs match at the start of scanning and pageblocks are skipped due to having no isolation candidates, then the sync state does not matter. This patch keeps matching cached PFNs in sync until a pageblock with isolation candidates is found. The actual benefit is marginal given that the sync scanner following the async scanner will often skip a number of pageblocks but it's useless work. Any benefit depends heavily on whether the scanners restarted recently. Change-Id: Ib2e156848785f6783276add839b3f2639f1c9de0 Link: http://lkml.kernel.org/r/20190118175136.31341-16-mgorman@techsingularity.net Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Dan Carpenter <dan.carpenter@oracle.com> Cc: David Rientjes <rientjes@google.com> Cc: YueHaibing <yuehaibing@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Git-commit: 8854c55f54bcc104e3adae42abe16948286ec75c Git-repo: git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git Signed-off-by: Vinayak Menon <vinmenon@codeaurora.org> |
||
|
|
27162b405a |
mm, compaction: check early for huge pages encountered by the migration scanner
When scanning for sources or targets, PageCompound is checked for huge pages as they can be skipped quickly but it happens relatively late after a lot of setup and checking. This patch short-cuts the check to make it earlier. It might still change when the lock is acquired but this has less overhead overall. The free scanner advances but the migration scanner does not. Typically the free scanner encounters more movable blocks that change state over the lifetime of the system and also tends to scan more aggressively as it's actively filling its portion of the physical address space with data. This could change in the future but for the moment, this worked better in practice and incurred fewer scan restarts. The impact on latency and allocation success rates is marginal but the free scan rates are reduced by 15% and system CPU usage is reduced by 3.3%. The 2-socket results are not materially different. Change-Id: Id53eb162686c09caff271fac4c2b0accb5ab3404 Link: http://lkml.kernel.org/r/20190118175136.31341-15-mgorman@techsingularity.net Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Dan Carpenter <dan.carpenter@oracle.com> Cc: David Rientjes <rientjes@google.com> Cc: YueHaibing <yuehaibing@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Git-commit: 9bebefd59084af7c75b66eeee241bf0777f39b88 Git-repo: git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git Signed-off-by: Vinayak Menon <vinmenon@codeaurora.org> |
||
|
|
b6daf86e23 |
mm, compaction: finish pageblock scanning on contention
Async migration aborts on spinlock contention but contention can be high
when there are multiple compaction attempts and kswapd is active. The
consequence is that the migration scanners move forward uselessly while
still contending on locks for longer while leaving suitable migration
sources behind.
This patch will acquire the lock but track when contention occurs. When
it does, the current pageblock will finish as compaction may succeed for
that block and then abort. This will have a variable impact on latency
as in some cases useless scanning is avoided (reduces latency) but a
lock will be contended (increase latency) or a single contended
pageblock is scanned that would otherwise have been skipped (increase
latency).
5.0.0-rc1 5.0.0-rc1
norescan-v3r16 finishcontend-v3r16
Amean fault-both-1 0.00 ( 0.00%) 0.00 * 0.00%*
Amean fault-both-3 3002.07 ( 0.00%) 3153.17 ( -5.03%)
Amean fault-both-5 4684.47 ( 0.00%) 4280.52 ( 8.62%)
Amean fault-both-7 6815.54 ( 0.00%) 5811.50 * 14.73%*
Amean fault-both-12 10864.02 ( 0.00%) 9276.85 ( 14.61%)
Amean fault-both-18 12247.52 ( 0.00%) 11032.67 ( 9.92%)
Amean fault-both-24 15683.99 ( 0.00%) 14285.70 ( 8.92%)
Amean fault-both-30 18620.02 ( 0.00%) 16293.76 * 12.49%*
Amean fault-both-32 19250.28 ( 0.00%) 16721.02 * 13.14%*
5.0.0-rc1 5.0.0-rc1
norescan-v3r16 finishcontend-v3r16
Percentage huge-1 0.00 ( 0.00%) 0.00 ( 0.00%)
Percentage huge-3 95.00 ( 0.00%) 96.82 ( 1.92%)
Percentage huge-5 94.22 ( 0.00%) 95.40 ( 1.26%)
Percentage huge-7 92.35 ( 0.00%) 95.92 ( 3.86%)
Percentage huge-12 91.90 ( 0.00%) 96.73 ( 5.25%)
Percentage huge-18 89.58 ( 0.00%) 96.77 ( 8.03%)
Percentage huge-24 90.03 ( 0.00%) 96.05 ( 6.69%)
Percentage huge-30 89.14 ( 0.00%) 96.81 ( 8.60%)
Percentage huge-32 90.58 ( 0.00%) 97.41 ( 7.54%)
There is a variable impact that is mostly good on latency while allocation
success rates are slightly higher. System CPU usage is reduced by about
10% but scan rate impact is mixed
Compaction migrate scanned 27997659.00 20148867
Compaction free scanned 120782791.00 118324914
Migration scan rates are reduced 28% which is expected as a pageblock is
used by the async scanner instead of skipped. The impact on the free
scanner is known to be variable. Overall the primary justification for
this patch is that completing scanning of a pageblock is very important
for later patches.
[yuehaibing@huawei.com: fix unused variable warning]
Link: http://lkml.kernel.org/r/20190118175136.31341-14-mgorman@techsingularity.net
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: YueHaibing <yuehaibing@huawei.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Change-Id: Id1fe132cb48c49bfe47c3b17a3e07fe50462a5fa
Git-commit: cb2dcaf023c2cf12d45289c82d4030d33f7df73e
Git-repo: git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
Signed-off-by: Vinayak Menon <vinmenon@codeaurora.org>
|
||
|
|
6ab5cc9614 |
mm, compaction: avoid rescanning the same pageblock multiple times
Pageblocks are marked for skip when no pages are isolated after a scan.
However, it's possible to hit corner cases where the migration scanner
gets stuck near the boundary between the source and target scanner. Due
to pages being migrated in blocks of COMPACT_CLUSTER_MAX, pages that are
migrated can be reallocated before the pageblock is complete. The
pageblock is not necessarily skipped so it can be rescanned multiple
times. Similarly, a pageblock with some dirty/writeback pages may fail
to migrate and be rescanned until writeback completes which is wasteful.
This patch tracks if a pageblock is being rescanned. If so, then the
entire pageblock will be migrated as one operation. This narrows the
race window during which pages can be reallocated during migration.
Secondly, if there are pages that cannot be isolated then the pageblock
will still be fully scanned and marked for skipping. On the second
rescan, the pageblock skip is set and the migration scanner makes
progress.
5.0.0-rc1 5.0.0-rc1
findfree-v3r16 norescan-v3r16
Amean fault-both-1 0.00 ( 0.00%) 0.00 * 0.00%*
Amean fault-both-3 3200.68 ( 0.00%) 3002.07 ( 6.21%)
Amean fault-both-5 4847.75 ( 0.00%) 4684.47 ( 3.37%)
Amean fault-both-7 6658.92 ( 0.00%) 6815.54 ( -2.35%)
Amean fault-both-12 11077.62 ( 0.00%) 10864.02 ( 1.93%)
Amean fault-both-18 12403.97 ( 0.00%) 12247.52 ( 1.26%)
Amean fault-both-24 15607.10 ( 0.00%) 15683.99 ( -0.49%)
Amean fault-both-30 18752.27 ( 0.00%) 18620.02 ( 0.71%)
Amean fault-both-32 21207.54 ( 0.00%) 19250.28 * 9.23%*
5.0.0-rc1 5.0.0-rc1
findfree-v3r16 norescan-v3r16
Percentage huge-3 96.86 ( 0.00%) 95.00 ( -1.91%)
Percentage huge-5 93.72 ( 0.00%) 94.22 ( 0.53%)
Percentage huge-7 94.31 ( 0.00%) 92.35 ( -2.08%)
Percentage huge-12 92.66 ( 0.00%) 91.90 ( -0.82%)
Percentage huge-18 91.51 ( 0.00%) 89.58 ( -2.11%)
Percentage huge-24 90.50 ( 0.00%) 90.03 ( -0.52%)
Percentage huge-30 91.57 ( 0.00%) 89.14 ( -2.65%)
Percentage huge-32 91.00 ( 0.00%) 90.58 ( -0.46%)
Negligible difference but this was likely a case when the specific
corner case was not hit. A previous run of the same patch based on an
earlier iteration of the series showed large differences where migration
rates could be halved when the corner case was hit.
The specific corner case where migration scan rates go through the roof
was due to a dirty/writeback pageblock located at the boundary of the
migration/free scanner did not happen in this case. When it does
happen, the scan rates multipled by massive margins.
Change-Id: I2c520b3b13d96a179b28b1138cbd1c8eb86b083e
Link: http://lkml.kernel.org/r/20190118175136.31341-13-mgorman@techsingularity.net
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Cc: David Rientjes <rientjes@google.com>
Cc: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Git-commit: 804d3121ba5f03af0ab225e2f688ee3ee669c0d2
Git-repo: git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
Signed-off-by: Vinayak Menon <vinmenon@codeaurora.org>
|
||
|
|
aa32e50664 |
mm, compaction: use free lists to quickly locate a migration target
Similar to the migration scanner, this patch uses the free lists to
quickly locate a migration target. The search is different in that
lower orders will be searched for a suitable high PFN if necessary but
the search is still bound. This is justified on the grounds that the
free scanner typically scans linearly much more than the migration
scanner.
If a free page is found, it is isolated and compaction continues if
enough pages were isolated. For SYNC* scanning, the full pageblock is
scanned for any remaining free pages so that is can be marked for
skipping in the near future.
1-socket thpfioscale
5.0.0-rc1 5.0.0-rc1
isolmig-v3r15 findfree-v3r16
Amean fault-both-3 3024.41 ( 0.00%) 3200.68 ( -5.83%)
Amean fault-both-5 4749.30 ( 0.00%) 4847.75 ( -2.07%)
Amean fault-both-7 6454.95 ( 0.00%) 6658.92 ( -3.16%)
Amean fault-both-12 10324.83 ( 0.00%) 11077.62 ( -7.29%)
Amean fault-both-18 12896.82 ( 0.00%) 12403.97 ( 3.82%)
Amean fault-both-24 13470.60 ( 0.00%) 15607.10 * -15.86%*
Amean fault-both-30 17143.99 ( 0.00%) 18752.27 ( -9.38%)
Amean fault-both-32 17743.91 ( 0.00%) 21207.54 * -19.52%*
The impact on latency is variable but the search is optimistic and
sensitive to the exact system state. Success rates are similar but the
major impact is to the rate of scanning
5.0.0-rc1 5.0.0-rc1
isolmig-v3r15 findfree-v3r16
Compaction migrate scanned 25646769 29507205
Compaction free scanned 201558184 100359571
The free scan rates are reduced by 50%. The 2-socket reductions for the
free scanner are more dramatic which is a likely reflection that the
machine has more memory.
[dan.carpenter@oracle.com: fix static checker warning]
[vbabka@suse.cz: correct number of pages scanned for lower orders]
Link: http://lkml.kernel.org/r/20190118175136.31341-12-mgorman@techsingularity.net
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Cc: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Change-Id: If5a453f546138631a88acbf89a70f238623df491
Git-commit: 5a811889de10f1ebb8e03a2744be006e909c405c
Git-repo: git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
Signed-off-by: Vinayak Menon <vinmenon@codeaurora.org>
|
||
|
|
15d9664b00 |
mm, compaction: keep migration source private to a single compaction instance
Due to either a fast search of the free list or a linear scan, it is
possible for multiple compaction instances to pick the same pageblock
for migration. This is lucky for one scanner and increased scanning for
all the others. It also allows a race between requests on which first
allocates the resulting free block.
This patch tests and updates the pageblock skip for the migration
scanner carefully. When isolating a block, it will check and skip if
the block is already in use. Once the zone lock is acquired, it will be
rechecked so that only one scanner can set the pageblock skip for
exclusive use. Any scanner contending will continue with a linear scan.
The skip bit is still set if no pages can be isolated in a range. While
this may result in redundant scanning, it avoids unnecessarily acquiring
the zone lock when there are no suitable migration sources.
1-socket thpscale
Amean fault-both-1 0.00 ( 0.00%) 0.00 * 0.00%*
Amean fault-both-3 3390.40 ( 0.00%) 3024.41 ( 10.80%)
Amean fault-both-5 5082.28 ( 0.00%) 4749.30 ( 6.55%)
Amean fault-both-7 7012.51 ( 0.00%) 6454.95 ( 7.95%)
Amean fault-both-12 11346.63 ( 0.00%) 10324.83 ( 9.01%)
Amean fault-both-18 15324.19 ( 0.00%) 12896.82 * 15.84%*
Amean fault-both-24 16088.50 ( 0.00%) 13470.60 * 16.27%*
Amean fault-both-30 18723.42 ( 0.00%) 17143.99 ( 8.44%)
Amean fault-both-32 18612.01 ( 0.00%) 17743.91 ( 4.66%)
5.0.0-rc1 5.0.0-rc1
findmig-v3r15 isolmig-v3r15
Percentage huge-3 89.83 ( 0.00%) 92.96 ( 3.48%)
Percentage huge-5 91.96 ( 0.00%) 93.26 ( 1.41%)
Percentage huge-7 92.85 ( 0.00%) 93.63 ( 0.84%)
Percentage huge-12 92.74 ( 0.00%) 92.80 ( 0.07%)
Percentage huge-18 91.71 ( 0.00%) 91.62 ( -0.10%)
Percentage huge-24 92.13 ( 0.00%) 91.50 ( -0.69%)
Percentage huge-30 93.79 ( 0.00%) 92.73 ( -1.13%)
Percentage huge-32 91.27 ( 0.00%) 91.94 ( 0.74%)
This shows a reasonable reduction in latency as multiple compaction
scanners do not operate on the same blocks with a similar allocation
success rate.
Compaction migrate scanned 41093126 25646769
Migration scan rates are reduced by 38%.
Link: http://lkml.kernel.org/r/20190118175136.31341-11-mgorman@techsingularity.net
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Cc: David Rientjes <rientjes@google.com>
Cc: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Change-Id: I059fe2e0a3dfa5a5329f41aadee7a548ca749d37
Git-commit: e380bebe4771548df9bece8b7ad9dab07d9158a6
Git-repo: git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
[vinmenon@codeaurora.org: trivial merge conflict fixes]
Signed-off-by: Vinayak Menon <vinmenon@codeaurora.org>
|
||
|
|
78e48f6a7f |
mm, compaction: use free lists to quickly locate a migration source
The migration scanner is a linear scan of a zone with a potentiall large
search space. Furthermore, many pageblocks are unusable such as those
filled with reserved pages or partially filled with pages that cannot
migrate. These still get scanned in the common case of allocating a THP
and the cost accumulates.
The patch uses a partial search of the free lists to locate a migration
source candidate that is marked as MOVABLE when allocating a THP. It
prefers picking a block with a larger number of free pages already on
the basis that there are fewer pages to migrate to free the entire
block. The lowest PFN found during searches is tracked as the basis of
the start for the linear search after the first search of the free list
fails. After the search, the free list is shuffled so that the next
search will not encounter the same page. If the search fails then the
subsequent searches will be shorter and the linear scanner is used.
If this search fails, or if the request is for a small or
unmovable/reclaimable allocation then the linear scanner is still used.
It is somewhat pointless to use the list search in those cases. Small
free pages must be used for the search and there is no guarantee that
movable pages are located within that block that are contiguous.
5.0.0-rc1 5.0.0-rc1
noboost-v3r10 findmig-v3r15
Amean fault-both-3 3771.41 ( 0.00%) 3390.40 ( 10.10%)
Amean fault-both-5 5409.05 ( 0.00%) 5082.28 ( 6.04%)
Amean fault-both-7 7040.74 ( 0.00%) 7012.51 ( 0.40%)
Amean fault-both-12 11887.35 ( 0.00%) 11346.63 ( 4.55%)
Amean fault-both-18 16718.19 ( 0.00%) 15324.19 ( 8.34%)
Amean fault-both-24 21157.19 ( 0.00%) 16088.50 * 23.96%*
Amean fault-both-30 21175.92 ( 0.00%) 18723.42 * 11.58%*
Amean fault-both-32 21339.03 ( 0.00%) 18612.01 * 12.78%*
5.0.0-rc1 5.0.0-rc1
noboost-v3r10 findmig-v3r15
Percentage huge-3 86.50 ( 0.00%) 89.83 ( 3.85%)
Percentage huge-5 92.52 ( 0.00%) 91.96 ( -0.61%)
Percentage huge-7 92.44 ( 0.00%) 92.85 ( 0.44%)
Percentage huge-12 92.98 ( 0.00%) 92.74 ( -0.25%)
Percentage huge-18 91.70 ( 0.00%) 91.71 ( 0.02%)
Percentage huge-24 91.59 ( 0.00%) 92.13 ( 0.60%)
Percentage huge-30 90.14 ( 0.00%) 93.79 ( 4.04%)
Percentage huge-32 90.03 ( 0.00%) 91.27 ( 1.37%)
This shows an improvement in allocation latencies with similar
allocation success rates. While not presented, there was a 31%
reduction in migration scanning and a 8% reduction on system CPU usage.
A 2-socket machine showed similar benefits.
[mgorman@techsingularity.net: several fixes]
Link: http://lkml.kernel.org/r/20190204120111.GL9565@techsingularity.net
[vbabka@suse.cz: migrate block that was found-fast, some optimisations]
Link: http://lkml.kernel.org/r/20190118175136.31341-10-mgorman@techsingularity.net
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Vlastimil Babka <Vbabka@suse.cz>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Cc: David Rientjes <rientjes@google.com>
Cc: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Change-Id: Ic16515e4ca883de62d0acdfb24b15b72d695c861
Git-commit: 70b44595eafe9c7c235f076d653a268ca1ab9fdb
Git-repo: git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
[vinmenon@codeaurora.org: trivial merge conflict fixes]
Signed-off-by: Vinayak Menon <vinmenon@codeaurora.org>
|
||
|
|
ed33da8dd5 |
mm, compaction: always finish scanning of a full pageblock
When compaction is finishing, it uses a flag to ensure the pageblock is complete but it makes sense to always complete migration of a pageblock. Minimally, skip information is based on a pageblock and partially scanned pageblocks may incur more scanning in the future. The pageblock skip handling also becomes more strict later in the series and the hint is more useful if a complete pageblock was always scanned. The potentially impacts latency as more scanning is done but it's not a consistent win or loss as the scanning is not always a high percentage of the pageblock and sometimes it is offset by future reductions in scanning. Hence, the results are not presented this time due to a misleading mix of gains/losses without any clear pattern. However, full scanning of the pageblock is important for later patches. Change-Id: I459a28f61d538903d0b672adbb30a87597693487 Link: http://lkml.kernel.org/r/20190118175136.31341-8-mgorman@techsingularity.net Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Dan Carpenter <dan.carpenter@oracle.com> Cc: David Rientjes <rientjes@google.com> Cc: YueHaibing <yuehaibing@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Git-commit: efe771c7603bc524425070d651e70e9c56c57f28 Git-repo: git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git Signed-off-by: Vinayak Menon <vinmenon@codeaurora.org> |
||
|
|
427fc8feda |
mm, compaction: rename map_pages to split_map_pages
It's non-obvious that high-order free pages are split into order-0 pages from the function name. Fix it. Change-Id: I53fc91c5fc093e3fbd7e1c396fc70c061f9c21e0 Link: http://lkml.kernel.org/r/20190118175136.31341-6-mgorman@techsingularity.net Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Dan Carpenter <dan.carpenter@oracle.com> Cc: David Rientjes <rientjes@google.com> Cc: YueHaibing <yuehaibing@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Git-commit: 4469ab98477b290f6728b79f8d225d9d88ce16e3 Git-repo: git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git Signed-off-by: Vinayak Menon <vinmenon@codeaurora.org> |
||
|
|
09a394d1c1 |
mm, compaction: remove unnecessary zone parameter in some instances
A zone parameter is passed into a number of top-level compaction functions despite the fact that it's already in compact_control. This is harmless but it did need an audit to check if zone actually ever changes meaningfully. This patches removes the parameter in a number of top-level functions. The change could be much deeper but this was enough to briefly clarify the flow. No functional change. Change-Id: I6555d4c731319842ebfc489a890de7e287362ff8 Link: http://lkml.kernel.org/r/20190118175136.31341-5-mgorman@techsingularity.net Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Dan Carpenter <dan.carpenter@oracle.com> Cc: David Rientjes <rientjes@google.com> Cc: YueHaibing <yuehaibing@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Git-commit: 40cacbcb324036233a927418441323459d28d19b Git-repo: git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git Signed-off-by: Vinayak Menon <vinmenon@codeaurora.org> |
||
|
|
2ff32e84b0 |
mm, compaction: remove last_migrated_pfn from compact_control
The last_migrated_pfn field is a bit dubious as to whether it really helps but either way, the information from it can be inferred without increasing the size of compact_control so remove the field. Change-Id: I10d6e13ae9989f9799ff9a824fb4fe35c4856027 Link: http://lkml.kernel.org/r/20190118175136.31341-4-mgorman@techsingularity.net Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Dan Carpenter <dan.carpenter@oracle.com> Cc: David Rientjes <rientjes@google.com> Cc: YueHaibing <yuehaibing@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Git-commit: 566e54e113eb2b669f9300db2c2df400cbb06646 Git-repo: git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git Signed-off-by: Vinayak Menon <vinmenon@codeaurora.org> |
||
|
|
83eb0c67a3 |
mm: move zone watermark accesses behind an accessor
This is a preparation patch only, no functional change. Link: http://lkml.kernel.org/r/20181123114528.28802-3-mgorman@techsingularity.net Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: David Rientjes <rientjes@google.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Zi Yan <zi.yan@cs.rutgers.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Change-Id: Iccef5f02348ab59d75cbbe2b48f40017391b88b1 Git-commit: a921444382b49cc7fdeca3fba3e278bc09484a27 Git-repo: git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git [vinmenon@codeaurora.org: trivial merge conflict fixes] Signed-off-by: Vinayak Menon <vinmenon@codeaurora.org> |
||
|
|
e550f94252 |
UPSTREAM: psi: pressure stall information for CPU, memory, and IO
When systems are overcommitted and resources become contended, it's hard
to tell exactly the impact this has on workload productivity, or how close
the system is to lockups and OOM kills. In particular, when machines work
multiple jobs concurrently, the impact of overcommit in terms of latency
and throughput on the individual job can be enormous.
In order to maximize hardware utilization without sacrificing individual
job health or risk complete machine lockups, this patch implements a way
to quantify resource pressure in the system.
A kernel built with CONFIG_PSI=y creates files in /proc/pressure/ that
expose the percentage of time the system is stalled on CPU, memory, or IO,
respectively. Stall states are aggregate versions of the per-task delay
accounting delays:
cpu: some tasks are runnable but not executing on a CPU
memory: tasks are reclaiming, or waiting for swapin or thrashing cache
io: tasks are waiting for io completions
These percentages of walltime can be thought of as pressure percentages,
and they give a general sense of system health and productivity loss
incurred by resource overcommit. They can also indicate when the system
is approaching lockup scenarios and OOMs.
To do this, psi keeps track of the task states associated with each CPU
and samples the time they spend in stall states. Every 2 seconds, the
samples are averaged across CPUs - weighted by the CPUs' non-idle time to
eliminate artifacts from unused CPUs - and translated into percentages of
walltime. A running average of those percentages is maintained over 10s,
1m, and 5m periods (similar to the loadaverage).
[hannes@cmpxchg.org: doc fixlet, per Randy]
Link: http://lkml.kernel.org/r/20180828205625.GA14030@cmpxchg.org
[hannes@cmpxchg.org: code optimization]
Link: http://lkml.kernel.org/r/20180907175015.GA8479@cmpxchg.org
[hannes@cmpxchg.org: rename psi_clock() to psi_update_work(), per Peter]
Link: http://lkml.kernel.org/r/20180907145404.GB11088@cmpxchg.org
[hannes@cmpxchg.org: fix build]
Link: http://lkml.kernel.org/r/20180913014222.GA2370@cmpxchg.org
Link: http://lkml.kernel.org/r/20180828172258.3185-9-hannes@cmpxchg.org
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Daniel Drake <drake@endlessm.com>
Tested-by: Suren Baghdasaryan <surenb@google.com>
Cc: Christopher Lameter <cl@linux.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Johannes Weiner <jweiner@fb.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Enderborg <peter.enderborg@sony.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Vinayak Menon <vinmenon@codeaurora.org>
Cc: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit eb414681d5a07d28d2ff90dc05f69ec6b232ebd2)
Bug: 127712811
Test: lmkd in PSI mode
Change-Id: Id00d23c977169b0c4636d92016fc1fee0274be05
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
|
||
|
|
25f0c86db1 |
psi: pressure stall information for CPU, memory, and IO
When systems are overcommitted and resources become contended, it's hard
to tell exactly the impact this has on workload productivity, or how close
the system is to lockups and OOM kills. In particular, when machines work
multiple jobs concurrently, the impact of overcommit in terms of latency
and throughput on the individual job can be enormous.
In order to maximize hardware utilization without sacrificing individual
job health or risk complete machine lockups, this patch implements a way
to quantify resource pressure in the system.
A kernel built with CONFIG_PSI=y creates files in /proc/pressure/ that
expose the percentage of time the system is stalled on CPU, memory, or IO,
respectively. Stall states are aggregate versions of the per-task delay
accounting delays:
cpu: some tasks are runnable but not executing on a CPU
memory: tasks are reclaiming, or waiting for swapin or thrashing cache
io: tasks are waiting for io completions
These percentages of walltime can be thought of as pressure percentages,
and they give a general sense of system health and productivity loss
incurred by resource overcommit. They can also indicate when the system
is approaching lockup scenarios and OOMs.
To do this, psi keeps track of the task states associated with each CPU
and samples the time they spend in stall states. Every 2 seconds, the
samples are averaged across CPUs - weighted by the CPUs' non-idle time to
eliminate artifacts from unused CPUs - and translated into percentages of
walltime. A running average of those percentages is maintained over 10s,
1m, and 5m periods (similar to the loadaverage).
[hannes@cmpxchg.org: doc fixlet, per Randy]
Link: http://lkml.kernel.org/r/20180828205625.GA14030@cmpxchg.org
[hannes@cmpxchg.org: code optimization]
Link: http://lkml.kernel.org/r/20180907175015.GA8479@cmpxchg.org
[hannes@cmpxchg.org: rename psi_clock() to psi_update_work(), per Peter]
Link: http://lkml.kernel.org/r/20180907145404.GB11088@cmpxchg.org
[hannes@cmpxchg.org: fix build]
Link: http://lkml.kernel.org/r/20180913014222.GA2370@cmpxchg.org
Link: http://lkml.kernel.org/r/20180828172258.3185-9-hannes@cmpxchg.org
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Daniel Drake <drake@endlessm.com>
Tested-by: Suren Baghdasaryan <surenb@google.com>
Cc: Christopher Lameter <cl@linux.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Johannes Weiner <jweiner@fb.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Enderborg <peter.enderborg@sony.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Vinayak Menon <vinmenon@codeaurora.org>
Cc: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Change-Id: Ic4b6fddb7013719ffcba5d68abfda87ada90715a
Git-commit: eb414681d5a07d28d2ff90dc05f69ec6b232ebd2
Git-repo: https://source.codeaurora.org/quic/la/kernel/msm-4.19
[pdaly@codeaurora.org: Move PF_MEMSTALL flag]
Signed-off-by: Patrick Daly <pdaly@codeaurora.org>
|
||
|
|
0825a6f986 |
mm: use octal not symbolic permissions
mm/*.c files use symbolic and octal styles for permissions. Using octal and not symbolic permissions is preferred by many as more readable. https://lkml.org/lkml/2016/8/2/1945 Prefer the direct use of octal for permissions. Done using $ scripts/checkpatch.pl -f --types=SYMBOLIC_PERMS --fix-inplace mm/*.c and some typing. Before: $ git grep -P -w "0[0-7]{3,3}" mm | wc -l 44 After: $ git grep -P -w "0[0-7]{3,3}" mm | wc -l 86 Miscellanea: o Whitespace neatening around these conversions. Link: http://lkml.kernel.org/r/2e032ef111eebcd4c5952bae86763b541d373469.1522102887.git.joe@perches.com Signed-off-by: Joe Perches <joe@perches.com> Acked-by: David Rientjes <rientjes@google.com> Acked-by: Michal Hocko <mhocko@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
d883c6cf3b |
Revert "mm/cma: manage the memory of the CMA area by using the ZONE_MOVABLE"
This reverts the following commits that change CMA design in MM. |
||
|
|
1d47a3ec09 |
mm/cma: remove ALLOC_CMA
Now, all reserved pages for CMA region are belong to the ZONE_MOVABLE and it only serves for a request with GFP_HIGHMEM && GFP_MOVABLE. Therefore, we don't need to maintain ALLOC_CMA at all. Link: http://lkml.kernel.org/r/1512114786-5085-3-git-send-email-iamjoonsoo.kim@lge.com Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Tested-by: Tony Lindgren <tony@atomide.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Laura Abbott <lauraa@codeaurora.org> Cc: Marek Szyprowski <m.szyprowski@samsung.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Michal Hocko <mhocko@suse.com> Cc: Michal Nazarewicz <mina86@mina86.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Rik van Riel <riel@redhat.com> Cc: Russell King <linux@armlinux.org.uk> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
666feb21a0 |
mm, migrate: remove reason argument from new_page_t
No allocation callback is using this argument anymore. new_page_node used to use this parameter to convey node_id resp. migration error up to move_pages code (do_move_page_to_node_array). The error status never made it into the final status field and we have a better way to communicate node id to the status field now. All other allocation callbacks simply ignored the argument so we can drop it finally. [mhocko@suse.com: fix migration callback] Link: http://lkml.kernel.org/r/20180105085259.GH2801@dhcp22.suse.cz [akpm@linux-foundation.org: fix alloc_misplaced_dst_page()] [mhocko@kernel.org: fix build] Link: http://lkml.kernel.org/r/20180103091134.GB11319@dhcp22.suse.cz Link: http://lkml.kernel.org/r/20180103082555.14592-3-mhocko@kernel.org Signed-off-by: Michal Hocko <mhocko@suse.com> Reviewed-by: Zi Yan <zi.yan@cs.rutgers.edu> Cc: Andrea Reale <ar@linux.vnet.ibm.com> Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
e8b098fc57 |
mm: kernel-doc: add missing parameter descriptions
Link: http://lkml.kernel.org/r/1519585191-10180-4-git-send-email-rppt@linux.vnet.ibm.com Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
bc3106b26c |
mm, compaction: drain pcps for zone when kcompactd fails
It's possible for free pages to become stranded on per-cpu pagesets
(pcps) that, if drained, could be merged with buddy pages on the zone's
free area to form large order pages, including up to MAX_ORDER.
Consider a verbose example using the tools/vm/page-types tool at the
beginning of a ZONE_NORMAL ('B' indicates a buddy page and 'S' indicates
a slab page). Pages on pcps do not have any page flags set.
109954 1 _______S________________________________________________________
109955 2 __________B_____________________________________________________
109957 1 ________________________________________________________________
109958 1 __________B_____________________________________________________
109959 7 ________________________________________________________________
109960 1 __________B_____________________________________________________
109961 9 ________________________________________________________________
10996a 1 __________B_____________________________________________________
10996b 3 ________________________________________________________________
10996e 1 __________B_____________________________________________________
10996f 1 ________________________________________________________________
...
109f8c 1 __________B_____________________________________________________
109f8d 2 ________________________________________________________________
109f8f 2 __________B_____________________________________________________
109f91 f ________________________________________________________________
109fa0 1 __________B_____________________________________________________
109fa1 7 ________________________________________________________________
109fa8 1 __________B_____________________________________________________
109fa9 1 ________________________________________________________________
109faa 1 __________B_____________________________________________________
109fab 1 _______S________________________________________________________
The compaction migration scanner is attempting to defragment this memory
since it is at the beginning of the zone. It has done so quite well,
all movable pages have been migrated. From pfn [0x109955, 0x109fab),
there are only buddy pages and pages without flags set.
These pages may be stranded on pcps that could otherwise allow this
memory to be coalesced if freed back to the zone free area. It is
possible that some of these pages may not be on pcps and that something
has called alloc_pages() and used the memory directly, but we rely on
the absence of __GFP_MOVABLE in these cases to allocate from
MIGATE_UNMOVABLE pageblocks to try to keep these MIGRATE_MOVABLE
pageblocks as free as possible.
These buddy and pcp pages, spanning 1,621 pages, could be coalesced and
allow for three transparent hugepages to be dynamically allocated.
Running the numbers for all such spans on the system, it was found that
there were over 400 such spans of only buddy pages and pages without
flags set at the time this /proc/kpageflags sample was collected.
Without this support, there were _no_ order-9 or order-10 pages free.
When kcompactd fails to defragment memory such that a cc.order page can
be allocated, drain all pcps for the zone back to the buddy allocator so
this stranding cannot occur. Compaction for that order will
subsequently be deferred, which acts as a ratelimit on this drain.
Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1803010340100.88270@chino.kir.corp.google.com
Signed-off-by: David Rientjes <rientjes@google.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
||
|
|
112d2d29fc |
mm/compaction.c: fix comment for try_to_compact_pages()
"mode" argument is not used by try_to_compact_pages() and sub functions anymore, it has been replaced by "prio". Fix the comment to explain the use of "prio" argument. Link: http://lkml.kernel.org/r/1515801336-20611-1-git-send-email-yang.shi@linux.alibaba.com Signed-off-by: Yang Shi <yang.shi@linux.alibaba.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Michal Hocko <mhocko@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
d3c85bad89 |
mm, compaction: remove unneeded pageblock_skip_persistent() checks
Commit f3c931633a59 ("mm, compaction: persistently skip hugetlbfs
pageblocks") has introduced pageblock_skip_persistent() checks into
migration and free scanners, to make sure pageblocks that should be
persistently skipped are marked as such, regardless of the
ignore_skip_hint flag.
Since the previous patch introduced a new no_set_skip_hint flag, the
ignore flag no longer prevents marking pageblocks as skipped. Therefore
we can remove the special cases. The relevant pageblocks will be marked
as skipped by the common logic which marks each pageblock where no page
could be isolated. This makes the code simpler.
Link: http://lkml.kernel.org/r/20171102121706.21504-3-vbabka@suse.cz
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
||
|
|
2583d67132 |
mm, compaction: split off flag for not updating skip hints
Pageblock skip hints were added as a heuristic for compaction, which
shares core code with CMA. Since CMA reliability would suffer from the
heuristics, compact_control flag ignore_skip_hint was added for the CMA
use case. Since
|
||
|
|
b527cfe5bc |
mm, compaction: extend pageblock_skip_persistent() to all compound pages
pageblock_skip_persistent() checks for HugeTLB pages of pageblock order. When clearing pageblock skip bits for compaction, the bits are not cleared for such pageblocks, because they cannot contain base pages suitable for migration, nor free pages to use as migration targets. This optimization can be simply extended to all compound pages of order equal or larger than pageblock order, because migrating such pages (if they support it) cannot help sub-pageblock fragmentation. This includes THP's and also gigantic HugeTLB pages, which the current implementation doesn't persistently skip due to a strict pageblock_order equality check and not recognizing tail pages. While THP pages are generally less "persistent" than HugeTLB, we can still expect that if a THP exists at the point of __reset_isolation_suitable(), it will exist also during the subsequent compaction run. The time difference here could be actually smaller than between a compaction run that sets a (non-persistent) skip bit on a THP, and the next compaction run that observes it. Link: http://lkml.kernel.org/r/20171102121706.21504-1-vbabka@suse.cz Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Mel Gorman <mgorman@techsingularity.net> Acked-by: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |