Commit Graph

277 Commits

Author SHA1 Message Date
Michael Bestas
e8e6720b34 Merge tag 'ASB-2024-05-05_4.19-stable' of https://android.googlesource.com/kernel/common into android13-4.19-kona
https://source.android.com/docs/security/bulletin/2024-05-01
CVE-2023-4622

* tag 'ASB-2024-05-05_4.19-stable' of https://android.googlesource.com/kernel/common:
  Revert "timers: Rename del_timer_sync() to timer_delete_sync()"
  Revert "geneve: make sure to pull inner header in geneve_rx()"
  Linux 4.19.312
  amdkfd: use calloc instead of kzalloc to avoid integer overflow
  initramfs: fix populate_initrd_image() section mismatch
  ip_gre: do not report erspan version on GRE interface
  erspan: Check IFLA_GRE_ERSPAN_VER is set.
  VMCI: Fix possible memcpy() run-time warning in vmci_datagram_invoke_guest_handler()
  Bluetooth: btintel: Fixe build regression
  x86/mm/pat: fix VM_PAT handling in COW mappings
  virtio: reenable config if freezing device failed
  drm/vkms: call drm_atomic_helper_shutdown before drm_dev_put()
  tty: n_gsm: require CAP_NET_ADMIN to attach N_GSM0710 ldisc
  fbmon: prevent division by zero in fb_videomode_from_videomode()
  fbdev: viafb: fix typo in hw_bitblt_1 and hw_bitblt_2
  usb: sl811-hcd: only defined function checkdone if QUIRK2 is defined
  tools: iio: replace seekdir() in iio_generic_buffer
  ktest: force $buildonly = 1 for 'make_warnings_file' test type
  Input: allocate keycode for Display refresh rate toggle
  block: prevent division by zero in blk_rq_stat_sum()
  SUNRPC: increase size of rpc_wait_queue.qlen from unsigned short to unsigned int
  drm/amd/display: Fix nanosec stat overflow
  media: sta2x11: fix irq handler cast
  isofs: handle CDs with bad root inode but good Joliet root directory
  scsi: lpfc: Fix possible memory leak in lpfc_rcv_padisc()
  sysv: don't call sb_bread() with pointers_lock held
  Input: synaptics-rmi4 - fail probing if memory allocation for "phys" fails
  Bluetooth: btintel: Fix null ptr deref in btintel_read_version
  btrfs: send: handle path ref underflow in header iterate_inode_ref()
  btrfs: export: handle invalid inode or root reference in btrfs_get_parent()
  btrfs: handle chunk tree lookup error in btrfs_relocate_sys_chunks()
  tools/power x86_energy_perf_policy: Fix file leak in get_pkg_num()
  arm64: dts: rockchip: fix rk3399 hdmi ports node
  VMCI: Fix memcpy() run-time warning in dg_dispatch_as_host()
  wifi: ath9k: fix LNA selection in ath_ant_try_scan()
  ALSA: hda/realtek: Update Panasonic CF-SZ6 quirk to support headset with microphone
  ata: sata_mv: Fix PCI device ID table declaration compilation warning
  ata: sata_sx4: fix pdc20621_get_from_dimm() on 64-bit
  ASoC: ops: Fix wraparound for mask in snd_soc_get_volsw
  erspan: make sure erspan_base_hdr is present in skb->head
  erspan: Add type I version 0 support.
  init: open /initrd.image with O_LARGEFILE
  initramfs: switch initramfs unpacking to struct file based APIs
  fs: add a vfs_fchmod helper
  fs: add a vfs_fchown helper
  initramfs: factor out a helper to populate the initrd image
  staging: vc04_services: fix information leak in create_component()
  staging: vc04_services: changen strncpy() to strscpy_pad()
  staging: mmal-vchiq: Fix client_component for 64 bit kernel
  staging: mmal-vchiq: Allocate and free components as required
  staging: mmal-vchiq: Avoid use of bool in structures
  i40e: fix vf may be used uninitialized in this function warning
  ipv6: Fix infinite recursion in fib6_dump_done().
  selftests: reuseaddr_conflict: add missing new line at the end of the output
  net: stmmac: fix rx queue priority assignment
  net/sched: act_skbmod: prevent kernel-infoleak
  netfilter: nf_tables: Fix potential data-race in __nft_flowtable_type_get()
  mm, vmscan: prevent infinite loop for costly GFP_NOIO | __GFP_RETRY_MAYFAIL allocations
  Revert "x86/mm/ident_map: Use gbpages only where full GB page should be mapped."
  net/rds: fix possible cp null dereference
  netfilter: nf_tables: disallow timeout for anonymous sets
  Bluetooth: Fix TOCTOU in HCI debugfs implementation
  Bluetooth: hci_event: set the conn encrypted before conn establishes
  r8169: fix issue caused by buggy BIOS on certain boards with RTL8168d
  tcp: properly terminate timers for kernel sockets
  mptcp: add sk_stop_timer_sync helper
  nfc: nci: Fix uninit-value in nci_dev_up and nci_ntf_packet
  USB: core: Fix deadlock in usb_deauthorize_interface()
  scsi: lpfc: Correct size for wqe for memset()
  x86/cpu: Enable STIBP on AMD if Automatic IBRS is enabled
  scsi: qla2xxx: Fix command flush on cable pull
  usb: udc: remove warning when queue disabled ep
  usb: dwc2: gadget: LPM flow fix
  usb: dwc2: host: Fix ISOC flow in DDMA mode
  usb: dwc2: host: Fix hibernation flow
  usb: dwc2: host: Fix remote wakeup from hibernation
  loop: loop_set_status_from_info() check before assignment
  loop: Check for overflow while configuring loop
  loop: Factor out configuring loop from status
  powerpc: xor_vmx: Add '-mhard-float' to CFLAGS
  efivarfs: Request at most 512 bytes for variable names
  perf/core: Fix reentry problem in perf_output_read_group()
  loop: properly observe rotational flag of underlying device
  loop: Refactor loop_set_status() size calculation
  loop: Factor out setting loop device size
  loop: Remove sector_t truncation checks
  loop: Call loop_config_discard() only after new config is applied
  Revert "loop: Check for overflow while configuring loop"
  btrfs: allocate btrfs_ioctl_defrag_range_args on stack
  printk: Update @console_may_schedule in console_trylock_spinning()
  fs/aio: Check IOCB_AIO_RW before the struct aio_kiocb conversion
  ALSA: sh: aica: reorder cleanup operations to avoid UAF bugs
  usb: cdc-wdm: close race between read and workqueue
  exec: Fix NOMMU linux_binprm::exec in transfer_args_to_stack()
  wifi: mac80211: check/clear fast rx for non-4addr sta VLAN changes
  mm/migrate: set swap entry values of THP tail pages properly.
  mm/memory-failure: fix an incorrect use of tail pages
  vt: fix memory overlapping when deleting chars in the buffer
  vt: fix unicode buffer corruption when deleting characters
  tty: serial: fsl_lpuart: avoid idle preamble pending if CTS is enabled
  usb: port: Don't try to peer unused USB ports based on location
  usb: gadget: ncm: Fix handling of zero block length packets
  USB: usb-storage: Prevent divide-by-0 error in isd200_ata_command
  ALSA: hda/realtek - Fix headset Mic no show at resume back for Lenovo ALC897 platform
  xfrm: Avoid clang fortify warning in copy_to_user_tmpl()
  netfilter: nf_tables: reject constant set with timeout
  netfilter: nf_tables: disallow anonymous set with timeout flag
  comedi: comedi_test: Prevent timers rescheduling during deletion
  ahci: asm1064: asm1166: don't limit reported ports
  ahci: asm1064: correct count of reported ports
  x86/CPU/AMD: Update the Zenbleed microcode revisions
  nilfs2: prevent kernel bug at submit_bh_wbc()
  nilfs2: use a more common logging style
  nilfs2: fix failure to detect DAT corruption in btree and direct mappings
  memtest: use {READ,WRITE}_ONCE in memory scanning
  drm/vc4: hdmi: do not return negative values from .get_modes()
  drm/imx/ipuv3: do not return negative values from .get_modes()
  s390/zcrypt: fix reference counting on zcrypt card objects
  soc: fsl: qbman: Use raw spinlock for cgr_lock
  soc: fsl: qbman: Add CGR update function
  soc: fsl: qbman: Add helper for sanity checking cgr ops
  soc: fsl: qbman: Always disable interrupts when taking cgr_lock
  vfio/platform: Disable virqfds on cleanup
  kbuild: Move -Wenum-{compare-conditional,enum-conversion} into W=1
  speakup: Fix 8bit characters from direct synth
  slimbus: core: Remove usage of the deprecated ida_simple_xx() API
  ext4: fix corruption during on-line resize
  hwmon: (amc6821) add of_match table
  mmc: core: Fix switch on gp3 partition
  dm-raid: fix lockdep waring in "pers->hot_add_disk"
  Revert "Revert "md/raid5: Wait for MD_SB_CHANGE_PENDING in raid5d""
  PCI/PM: Drain runtime-idle callbacks before driver removal
  PCI: Drop pci_device_remove() test of pci_dev->driver
  fuse: don't unhash root
  mmc: tmio: avoid concurrent runs of mmc_request_done()
  PM: sleep: wakeirq: fix wake irq warning in system suspend
  USB: serial: cp210x: add pid/vid for TDK NC0110013M and MM0110113M
  USB: serial: option: add MeiG Smart SLM320 product
  USB: serial: cp210x: add ID for MGP Instruments PDS100
  USB: serial: add device ID for VeriFone adapter
  USB: serial: ftdi_sio: add support for GMC Z216C Adapter IR-USB
  powerpc/fsl: Fix mfpmr build errors with newer binutils
  clk: qcom: mmcc-msm8974: fix terminating of frequency table arrays
  clk: qcom: mmcc-apq8084: fix terminating of frequency table arrays
  clk: qcom: gcc-ipq8074: fix terminating of frequency table arrays
  PM: suspend: Set mem_sleep_current during kernel command line setup
  parisc: Strip upper 32 bit of sum in csum_ipv6_magic for 64-bit builds
  parisc: Fix csum_ipv6_magic on 64-bit systems
  parisc: Fix csum_ipv6_magic on 32-bit systems
  parisc: Fix ip_fast_csum
  parisc: Do not hardcode registers in checksum functions
  ubi: correct the calculation of fastmap size
  ubi: Check for too small LEB size in VTBL code
  ubifs: Set page uptodate in the correct place
  fat: fix uninitialized field in nostale filehandles
  crypto: qat - resolve race condition during AER recovery
  crypto: qat - fix double free during reset
  sparc: vDSO: fix return value of __setup handler
  sparc64: NMI watchdog: fix return value of __setup handler
  KVM: Always flush async #PF workqueue when vCPU is being destroyed
  media: xc4000: Fix atomicity violation in xc4000_get_frequency
  arm: dts: marvell: Fix maxium->maxim typo in brownstone dts
  ARM: dts: mmp2-brownstone: Don't redeclare phandle references
  smack: Handle SMACK64TRANSMUTE in smack_inode_setsecurity()
  smack: Set SMACK64TRANSMUTE only for dirs in smack_inode_setxattr()
  wifi: brcmfmac: Fix use-after-free bug in brcmf_cfg80211_detach
  timers: Rename del_timer_sync() to timer_delete_sync()
  timers: Use del_timer_sync() even on UP
  timers: Update kernel-doc for various functions
  timers: Prepare support for PREEMPT_RT
  timer/trace: Improve timer tracing
  timer/trace: Replace deprecated vsprintf pointer extension %pf by %ps
  x86/bugs: Use sysfs_emit()
  x86/cpu: Support AMD Automatic IBRS
  Documentation/hw-vuln: Update spectre doc
  Linux 4.19.311
  crypto: af_alg - Work around empty control messages without MSG_MORE
  crypto: af_alg - Fix regression on empty requests
  spi: spi-mt65xx: Fix NULL pointer access in interrupt handler
  net/bnx2x: Prevent access to a freed page in page_pool
  hsr: Handle failures in module init
  rds: introduce acquire/release ordering in acquire/release_in_xmit()
  hsr: Fix uninit-value access in hsr_get_node()
  net: hsr: fix placement of logical operator in a multi-line statement
  usb: gadget: net2272: Use irqflags in the call to net2272_probe_fin
  staging: greybus: fix get_channel_from_mode() failure path
  serial: 8250_exar: Don't remove GPIO device on suspend
  rtc: mt6397: select IRQ_DOMAIN instead of depending on it
  kconfig: fix infinite loop when expanding a macro at the end of file
  tty: serial: samsung: fix tx_empty() to return TIOCSER_TEMT
  serial: max310x: fix syntax error in IRQ error message
  clk: qcom: gdsc: Add support to update GDSC transition delay
  NFS: Fix an off by one in root_nfs_cat()
  net: sunrpc: Fix an off by one in rpc_sockaddr2uaddr()
  scsi: bfa: Fix function pointer type mismatch for hcb_qe->cbfn
  scsi: csiostor: Avoid function pointer casts
  ALSA: usb-audio: Stop parsing channels bits when all channels are found.
  sparc32: Fix section mismatch in leon_pci_grpci
  backlight: lp8788: Fully initialize backlight_properties during probe
  backlight: lm3639: Fully initialize backlight_properties during probe
  backlight: da9052: Fully initialize backlight_properties during probe
  backlight: lm3630a: Don't set bl->props.brightness in get_brightness
  backlight: lm3630a: Initialize backlight_properties on init
  powerpc/embedded6xx: Fix no previous prototype for avr_uart_send() etc.
  powerpc/hv-gpci: Fix the H_GET_PERF_COUNTER_INFO hcall return value checks
  drm/mediatek: Fix a null pointer crash in mtk_drm_crtc_finish_page_flip
  media: go7007: fix a memleak in go7007_load_encoder
  media: dvb-frontends: avoid stack overflow warnings with clang
  media: pvrusb2: fix uaf in pvr2_context_set_notify
  drm/amdgpu: Fix missing break in ATOM_ARG_IMM Case of atom_get_src_int()
  ASoC: meson: axg-tdm-interface: fix mclk setup without mclk-fs
  mtd: rawnand: lpc32xx_mlc: fix irq handler prototype
  crypto: arm/sha - fix function cast warnings
  crypto: arm - Rename functions to avoid conflict with crypto/sha256.h
  mfd: syscon: Call of_node_put() only when of_parse_phandle() takes a ref
  drm/tegra: put drm_gem_object ref on error in tegra_fb_create
  clk: hisilicon: hi3519: Release the correct number of gates in hi3519_clk_unregister()
  PCI: Mark 3ware-9650SE Root Port Extended Tags as broken
  drm/mediatek: dsi: Fix DSI RGB666 formats and definitions
  clk: qcom: dispcc-sdm845: Adjust internal GDSC wait times
  firmware: qcom: scm: Add WLAN VMID for Qualcomm SCM interface
  media: pvrusb2: fix pvr2_stream_callback casts
  media: go7007: add check of return value of go7007_read_addr()
  ALSA: seq: fix function cast warnings
  drm/radeon/ni: Fix wrong firmware size logging in ni_init_microcode()
  perf thread_map: Free strlist on normal path in thread_map__new_by_tid_str()
  quota: Fix rcu annotations of inode dquot pointers
  quota: Fix potential NULL pointer dereference
  quota: simplify drop_dquot_ref()
  quota: check time limit when back out space/inode change
  fs/quota: erase unused but set variable warning
  quota: code cleanup for __dquot_alloc_space()
  clk: qcom: reset: Ensure write completion on reset de/assertion
  clk: qcom: reset: Commonize the de/assert functions
  clk: qcom: reset: support resetting multiple bits
  clk: qcom: reset: Allow specifying custom reset delay
  media: edia: dvbdev: fix a use-after-free
  media: dvb-core: Fix use-after-free due to race at dvb_register_device()
  media: dvbdev: fix error logic at dvb_register_device()
  media: dvbdev: Fix memleak in dvb_register_device
  media: media/dvb: Use kmemdup rather than duplicating its implementation
  media: dvbdev: remove double-unlock
  media: v4l2-mem2mem: fix a memleak in v4l2_m2m_register_entity
  media: v4l2-tpg: fix some memleaks in tpg_alloc
  media: em28xx: annotate unchecked call to media_device_register()
  ABI: sysfs-bus-pci-devices-aer_stats uses an invalid tag
  perf evsel: Fix duplicate initialization of data->id in evsel__parse_sample()
  media: tc358743: register v4l2 async device only after successful setup
  drm/rockchip: lvds: do not print scary message when probing defer
  drm/rockchip: lvds: do not overwrite error code
  drm: Don't treat 0 as -1 in drm_fixp2int_ceil
  drm/rockchip: inno_hdmi: Fix video timing
  drm/tegra: dsi: Fix missing pm_runtime_disable() in the error handling path of tegra_dsi_probe()
  drm/tegra: dsi: Fix some error handling paths in tegra_dsi_probe()
  drm/tegra: dsi: Make use of the helper function dev_err_probe()
  gpu: host1x: mipi: Update tegra_mipi_request() to be node based
  drm/tegra: dsi: Add missing check for of_find_device_by_node
  dm: call the resume method on internal suspend
  dm raid: fix false positive for requeue needed during reshape
  nfp: flower: handle acti_netdevs allocation failure
  net/x25: fix incorrect parameter validation in the x25_getsockopt() function
  net: kcm: fix incorrect parameter validation in the kcm_getsockopt) function
  udp: fix incorrect parameter validation in the udp_lib_getsockopt() function
  l2tp: fix incorrect parameter validation in the pppol2tp_getsockopt() function
  tcp: fix incorrect parameter validation in the do_tcp_getsockopt() function
  ipv6: fib6_rules: flush route cache when rule is changed
  bpf: Fix stackmap overflow check on 32-bit arches
  bpf: Fix hashtab overflow check on 32-bit arches
  sr9800: Add check for usbnet_get_endpoints
  Bluetooth: hci_core: Fix possible buffer overflow
  Bluetooth: Remove superfluous call to hci_conn_check_pending()
  igb: Fix missing time sync events
  igb: move PEROUT and EXTTS isr logic to separate functions
  mmc: wmt-sdmmc: remove an incorrect release_mem_region() call in the .remove function
  SUNRPC: fix some memleaks in gssx_dec_option_array
  x86, relocs: Ignore relocations in .notes section
  ACPI: scan: Fix device check notification handling
  ARM: dts: arm: realview: Fix development chip ROM compatible value
  wifi: brcmsmac: avoid function pointer casts
  iommu/amd: Mark interrupt as managed
  bus: tegra-aconnect: Update dependency to ARCH_TEGRA
  ACPI: processor_idle: Fix memory leak in acpi_processor_power_exit()
  wifi: libertas: fix some memleaks in lbs_allocate_cmd_buffer()
  af_unix: Annotate data-race of gc_in_progress in wait_for_unix_gc().
  sock_diag: annotate data-races around sock_diag_handlers[family]
  wifi: mwifiex: debugfs: Drop unnecessary error check for debugfs_create_dir()
  wifi: b43: Disable QoS for bcm4331
  wifi: b43: Stop correct queue in DMA worker when QoS is disabled
  b43: main: Fix use true/false for bool type
  wifi: b43: Stop/wake correct queue in PIO Tx path when QoS is disabled
  wifi: b43: Stop/wake correct queue in DMA Tx path when QoS is disabled
  b43: dma: Fix use true/false for bool type variable
  wifi: ath10k: fix NULL pointer dereference in ath10k_wmi_tlv_op_pull_mgmt_tx_compl_ev()
  timekeeping: Fix cross-timestamp interpolation for non-x86
  timekeeping: Fix cross-timestamp interpolation corner case decision
  timekeeping: Fix cross-timestamp interpolation on counter wrap
  aoe: fix the potential use-after-free problem in aoecmd_cfg_pkts
  md: Don't clear MD_CLOSING when the raid is about to stop
  md: implement ->set_read_only to hook into BLKROSET processing
  block: add a new set_read_only method
  md: switch to ->check_events for media change notifications
  fs/select: rework stack allocation hack for clang
  do_sys_name_to_handle(): use kzalloc() to fix kernel-infoleak
  crypto: algif_aead - Only wake up when ctx->more is zero
  crypto: af_alg - make some functions static
  crypto: algif_aead - fix uninitialized ctx->init
  ASoC: wm8962: Fix up incorrect error message in wm8962_set_fll
  ASoC: wm8962: Enable both SPKOUTR_ENA and SPKOUTL_ENA in mono mode
  ASoC: wm8962: Enable oscillator if selecting WM8962_FLL_OSC
  Input: gpio_keys_polled - suppress deferred probe error for gpio
  ASoC: Intel: bytcr_rt5640: Add an extra entry for the Chuwi Vi8 tablet
  firewire: core: use long bus reset on gap count error
  Bluetooth: rfcomm: Fix null-ptr-deref in rfcomm_check_security
  scsi: mpt3sas: Prevent sending diag_reset when the controller is ready
  dm-verity, dm-crypt: align "struct bvec_iter" correctly
  block: sed-opal: handle empty atoms when parsing response
  net/iucv: fix the allocation size of iucv_path_table array
  MIPS: Clear Cause.BD in instruction_pointer_set
  x86/xen: Add some null pointer checking to smp.c
  ASoC: rt5645: Make LattePanda board DMI match more precise
  Linux 4.19.310
  selftests/vm: fix map_hugetlb length used for testing read and write
  selftests/vm: fix display of page size in map_hugetlb
  getrusage: use sig->stats_lock rather than lock_task_sighand()
  getrusage: use __for_each_thread()
  getrusage: move thread_group_cputime_adjusted() outside of lock_task_sighand()
  getrusage: add the "signal_struct *sig" local variable
  y2038: rusage: use __kernel_old_timeval
  hv_netvsc: Register VF in netvsc_probe if NET_DEVICE_REGISTER missed
  hv_netvsc: use netif_is_bond_master() instead of open code
  hv_netvsc: Make netvsc/VF binding check both MAC and serial number
  Input: i8042 - fix strange behavior of touchpad on Clevo NS70PU
  um: allow not setting extra rpaths in the linux binary
  selftests: mm: fix map_hugetlb failure on 64K page size systems
  tools/selftest/vm: allow choosing mem size and page size in map_hugetlb
  btrfs: ref-verify: free ref cache before clearing mount opt
  netrom: Fix data-races around sysctl_net_busy_read
  netrom: Fix a data-race around sysctl_netrom_link_fails_count
  netrom: Fix a data-race around sysctl_netrom_routing_control
  netrom: Fix a data-race around sysctl_netrom_transport_no_activity_timeout
  netrom: Fix a data-race around sysctl_netrom_transport_requested_window_size
  netrom: Fix a data-race around sysctl_netrom_transport_busy_delay
  netrom: Fix a data-race around sysctl_netrom_transport_acknowledge_delay
  netrom: Fix a data-race around sysctl_netrom_transport_maximum_tries
  netrom: Fix a data-race around sysctl_netrom_transport_timeout
  netrom: Fix data-races around sysctl_netrom_network_ttl_initialiser
  netrom: Fix a data-race around sysctl_netrom_obsolescence_count_initialiser
  netrom: Fix a data-race around sysctl_netrom_default_path_quality
  netfilter: nf_conntrack_h323: Add protection for bmp length out of range
  net/rds: fix WARNING in rds_conn_connect_if_down
  net/ipv6: avoid possible UAF in ip6_route_mpath_notify()
  geneve: make sure to pull inner header in geneve_rx()
  net: move definition of pcpu_lstats to header file
  net: lan78xx: fix runtime PM count underflow on link stop
  lan78xx: Fix race conditions in suspend/resume handling
  lan78xx: Fix partial packet errors on suspend/resume
  lan78xx: Add missing return code checks
  lan78xx: Fix white space and style issues
  net: usb: lan78xx: Remove lots of set but unused 'ret' variables
  Linux 4.19.309
  gpio: 74x164: Enable output pins after registers are reset
  cachefiles: fix memory leak in cachefiles_add_cache()
  mmc: core: Fix eMMC initialization with 1-bit bus connection
  btrfs: dev-replace: properly validate device names
  wifi: nl80211: reject iftype change with mesh ID change
  gtp: fix use-after-free and null-ptr-deref in gtp_newlink()
  ALSA: Drop leftover snd-rtctimer stuff from Makefile
  power: supply: bq27xxx-i2c: Do not free non existing IRQ
  efi/capsule-loader: fix incorrect allocation size
  Bluetooth: Enforce validation on max value of connection interval
  Bluetooth: hci_event: Fix handling of HCI_EV_IO_CAPA_REQUEST
  Bluetooth: Avoid potential use-after-free in hci_error_reset
  net: usb: dm9601: fix wrong return value in dm9601_mdio_read
  lan78xx: enable auto speed configuration for LAN7850 if no EEPROM is detected
  tun: Fix xdp_rxq_info's queue_index when detaching
  netlink: Fix kernel-infoleak-after-free in __skb_datagram_iter
  Linux 4.19.308
  scripts/bpf: Fix xdp_md forward declaration typo
  fs/aio: Restrict kiocb_set_cancel_fn() to I/O submitted via libaio
  KVM: arm64: vgic-its: Test for valid IRQ in MOVALL handler
  KVM: arm64: vgic-its: Test for valid IRQ in its_sync_lpi_pending_table()
  PCI/MSI: Prevent MSI hardware interrupt number truncation
  s390: use the correct count for __iowrite64_copy()
  packet: move from strlcpy with unused retval to strscpy
  ipv6: sr: fix possible use-after-free and null-ptr-deref
  nouveau: fix function cast warnings
  scsi: jazz_esp: Only build if SCSI core is builtin
  bpf, scripts: Correct GPL license name
  scripts/bpf: teach bpf_helpers_doc.py to dump BPF helper definitions
  RDMA/srpt: fix function pointer cast warnings
  RDMA/srpt: Make debug output more detailed
  RDMA/ulp: Use dev_name instead of ibdev->name
  RDMA/srpt: Support specifying the srpt_service_guid parameter
  RDMA/bnxt_re: Return error for SRQ resize
  IB/hfi1: Fix a memleak in init_credit_return
  usb: roles: don't get/set_role() when usb_role_switch is unregistered
  usb: gadget: ncm: Avoid dropping datagrams of properly parsed NTBs
  ARM: ep93xx: Add terminator to gpiod_lookup_table
  l2tp: pass correct message length to ip6_append_data
  gtp: fix use-after-free and null-ptr-deref in gtp_genl_dump_pdp()
  dm-crypt: don't modify the data when using authenticated encryption
  mm: memcontrol: switch to rcu protection in drain_all_stock()
  IB/hfi1: Fix sdma.h tx->num_descs off-by-one error
  pmdomain: renesas: r8a77980-sysc: CR7 must be always on
  s390/qeth: Fix potential loss of L3-IP@ in case of network issues
  virtio-blk: Ensure no requests in virtqueues before deleting vqs.
  firewire: core: send bus reset promptly on gap count error
  hwmon: (coretemp) Enlarge per package core count limit
  regulator: pwm-regulator: Add validity checks in continuous .get_voltage
  ext4: avoid allocating blocks from corrupted group in ext4_mb_find_by_goal()
  ext4: avoid allocating blocks from corrupted group in ext4_mb_try_best_found()
  ahci: asm1166: correct count of reported ports
  fbdev: sis: Error out if pixclock equals zero
  fbdev: savage: Error out if pixclock equals zero
  wifi: mac80211: fix race condition on enabling fast-xmit
  wifi: cfg80211: fix missing interfaces when dumping
  dmaengine: shdma: increase size of 'dev_id'
  scsi: target: core: Add TMF to tmr_list handling
  sched/rt: Disallow writing invalid values to sched_rt_period_us
  sched/rt: sysctl_sched_rr_timeslice show default timeslice after reset
  sched/rt: Fix sysctl_sched_rr_timeslice intial value
  userfaultfd: fix mmap_changing checking in mfill_atomic_hugetlb
  nilfs2: replace WARN_ONs for invalid DAT metadata block requests
  memcg: add refcnt for pcpu stock to avoid UAF problem in drain_all_stock()
  net: stmmac: fix notifier registration
  stmmac: no need to check return value of debugfs_create functions
  net/sched: Retire dsmark qdisc
  net/sched: Retire ATM qdisc
  net/sched: Retire CBQ qdisc
  Linux 4.19.307
  netfilter: nf_tables: fix pointer math issue in nft_byteorder_eval()
  lsm: new security_file_ioctl_compat() hook
  nilfs2: fix potential bug in end_buffer_async_write
  sched/membarrier: reduce the ability to hammer on sys_membarrier
  Revert "md/raid5: Wait for MD_SB_CHANGE_PENDING in raid5d"
  pmdomain: core: Move the unused cleanup to a _sync initcall
  irqchip/irq-brcmstb-l2: Add write memory barrier before exit
  nfp: use correct macro for LengthSelect in BAR config
  nilfs2: fix hang in nilfs_lookup_dirty_data_buffers()
  nilfs2: fix data corruption in dsync block recovery for small block sizes
  ALSA: hda/conexant: Add quirk for SWS JS201D
  x86/mm/ident_map: Use gbpages only where full GB page should be mapped.
  x86/Kconfig: Transmeta Crusoe is CPU family 5, not 6
  serial: max310x: improve crystal stable clock detection
  serial: max310x: set default value when reading clock ready bit
  ring-buffer: Clean ring_buffer_poll_wait() error return
  staging: iio: ad5933: fix type mismatch regression
  ext4: fix double-free of blocks due to wrong extents moved_len
  binder: signal epoll threads of self-work
  xen-netback: properly sync TX responses
  nfc: nci: free rx_data_reassembly skb on NCI device cleanup
  firewire: core: correct documentation of fw_csr_string() kernel API
  scsi: Revert "scsi: fcoe: Fix potential deadlock on &fip->ctlr_lock"
  usb: f_mass_storage: forbid async queue when shutdown happen
  USB: hub: check for alternate port before enabling A_ALT_HNP_SUPPORT
  HID: wacom: Do not register input devices until after hid_hw_start
  HID: wacom: generic: Avoid reporting a serial of '0' to userspace
  mm/writeback: fix possible divide-by-zero in wb_dirty_limits(), again
  tracing/trigger: Fix to return error if failed to alloc snapshot
  i40e: Fix waiting for queues of all VSIs to be disabled
  MIPS: Add 'memory' clobber to csum_ipv6_magic() inline assembler
  net: sysfs: Fix /sys/class/net/<iface> path for statistics
  Documentation: net-sysfs: describe missing statistics
  ASoC: rt5645: Fix deadlock in rt5645_jack_detect_work()
  spi: ppc4xx: Drop write-only variable
  btrfs: send: return EOPNOTSUPP on unknown flags
  btrfs: forbid creating subvol qgroups
  hrtimer: Report offline hrtimer enqueue
  vhost: use kzalloc() instead of kmalloc() followed by memset()
  Input: atkbd - skip ATKBD_CMD_SETLEDS when skipping ATKBD_CMD_GETID
  USB: serial: cp210x: add ID for IMST iM871A-USB
  USB: serial: option: add Fibocom FM101-GL variant
  USB: serial: qcserial: add new usb-id for Dell Wireless DW5826e
  net/af_iucv: clean up a try_then_request_module()
  netfilter: nft_compat: restrict match/target protocol to u16
  netfilter: nft_compat: reject unused compat flag
  ppp_async: limit MRU to 64K
  tipc: Check the bearer type before calling tipc_udp_nl_bearer_add()
  rxrpc: Fix response to PING RESPONSE ACKs to a dead call
  inet: read sk->sk_family once in inet_recv_error()
  hwmon: (coretemp) Fix bogus core_id to attr name mapping
  hwmon: (coretemp) Fix out-of-bounds memory access
  hwmon: (aspeed-pwm-tacho) mutex for tach reading
  atm: idt77252: fix a memleak in open_card_ubr0
  phy: ti: phy-omap-usb2: Fix NULL pointer dereference for SRP
  dmaengine: fix is_slave_direction() return false when DMA_DEV_TO_DEV
  bonding: remove print in bond_verify_device_path
  HID: apple: Add 2021 magic keyboard FN key mapping
  HID: apple: Swap the Fn and Left Control keys on Apple keyboards
  HID: apple: Add support for the 2021 Magic Keyboard
  net: sysfs: Fix /sys/class/net/<iface> path
  af_unix: fix lockdep positive in sk_diag_dump_icons()
  net: ipv4: fix a memleak in ip_setup_cork
  netfilter: nf_log: replace BUG_ON by WARN_ON_ONCE when putting logger
  llc: call sock_orphan() at release time
  ipv6: Ensure natural alignment of const ipv6 loopback and router addresses
  ixgbe: Fix an error handling path in ixgbe_read_iosf_sb_reg_x550()
  ixgbe: Refactor overtemp event handling
  ixgbe: Refactor returning internal error codes
  ixgbe: Remove non-inclusive language
  net: remove unneeded break
  scsi: isci: Fix an error code problem in isci_io_request_build()
  wifi: cfg80211: fix RCU dereference in __cfg80211_bss_update
  drm/amdgpu: Release 'adev->pm.fw' before return in 'amdgpu_device_need_post()'
  ceph: fix deadlock or deadcode of misusing dget()
  blk-mq: fix IO hang from sbitmap wakeup race
  virtio_net: Fix "‘%d’ directive writing between 1 and 11 bytes into a region of size 10" warnings
  libsubcmd: Fix memory leak in uniq()
  usb: hub: Replace hardcoded quirk value with BIT() macro
  PCI: Only override AMD USB controller if required
  mfd: ti_am335x_tscadc: Fix TI SoC dependencies
  um: net: Fix return type of uml_net_start_xmit()
  um: Don't use vfprintf() for os_info()
  um: Fix naming clash between UML and scheduler
  leds: trigger: panic: Don't register panic notifier if creating the trigger failed
  drm/amdgpu: Drop 'fence' check in 'to_amdgpu_amdkfd_fence()'
  drm/amdgpu: Let KFD sync with VM fences
  clk: mmp: pxa168: Fix memory leak in pxa168_clk_init()
  clk: hi3620: Fix memory leak in hi3620_mmc_clk_init()
  drm/msm/dpu: Ratelimit framedone timeout msgs
  media: ddbridge: fix an error code problem in ddb_probe
  IB/ipoib: Fix mcast list locking
  drm/exynos: Call drm_atomic_helper_shutdown() at shutdown/unbind time
  ALSA: hda: Intel: add HDA_ARL PCI ID support
  PCI: add INTEL_HDA_ARL to pci_ids.h
  media: rockchip: rga: fix swizzling for RGB formats
  media: stk1160: Fixed high volume of stk1160_dbg messages
  drm/mipi-dsi: Fix detach call without attach
  drm/framebuffer: Fix use of uninitialized variable
  drm/drm_file: fix use of uninitialized variable
  RDMA/IPoIB: Fix error code return in ipoib_mcast_join
  fast_dput(): handle underflows gracefully
  ASoC: doc: Fix undefined SND_SOC_DAPM_NOPM argument
  f2fs: fix to check return value of f2fs_reserve_new_block()
  wifi: cfg80211: free beacon_ies when overridden from hidden BSS
  wifi: rtlwifi: rtl8723{be,ae}: using calculate_bit_shift()
  wifi: rtl8xxxu: Add additional USB IDs for RTL8192EU devices
  md: Whenassemble the array, consult the superblock of the freshest device
  ARM: dts: imx23/28: Fix the DMA controller node name
  ARM: dts: imx23-sansa: Use preferred i2c-gpios properties
  ARM: dts: imx27-apf27dev: Fix LED name
  ARM: dts: imx1: Fix sram node
  ARM: dts: imx27: Fix sram node
  ARM: dts: imx: Use flash@0,0 pattern
  ARM: dts: imx25/27-eukrea: Fix RTC node name
  ARM: dts: rockchip: fix rk3036 hdmi ports node
  scsi: libfc: Fix up timeout error in fc_fcp_rec_error()
  scsi: libfc: Don't schedule abort twice
  bpf: Add map and need_defer parameters to .map_fd_put_ptr()
  wifi: ath9k: Fix potential array-index-out-of-bounds read in ath9k_htc_txstatus()
  ARM: dts: imx7s: Fix nand-controller #size-cells
  ARM: dts: imx7s: Fix lcdif compatible
  bonding: return -ENOMEM instead of BUG in alb_upper_dev_walk
  PCI: Add no PM reset quirk for NVIDIA Spectrum devices
  scsi: lpfc: Fix possible file string name overflow when updating firmware
  ext4: avoid online resizing failures due to oversized flex bg
  ext4: remove unnecessary check from alloc_flex_gd()
  ext4: unify the type of flexbg_size to unsigned int
  ext4: fix inconsistent between segment fstrim and full fstrim
  SUNRPC: Fix a suspicious RCU usage warning
  KVM: s390: fix setting of fpc register
  s390/ptrace: handle setting of fpc register correctly
  jfs: fix array-index-out-of-bounds in diNewExt
  rxrpc_find_service_conn_rcu: fix the usage of read_seqbegin_or_lock()
  afs: fix the usage of read_seqbegin_or_lock() in afs_find_server*()
  crypto: stm32/crc32 - fix parsing list of devices
  pstore/ram: Fix crash when setting number of cpus to an odd number
  jfs: fix uaf in jfs_evict_inode
  jfs: fix array-index-out-of-bounds in dbAdjTree
  jfs: fix slab-out-of-bounds Read in dtSearch
  UBSAN: array-index-out-of-bounds in dtSplitRoot
  FS:JFS:UBSAN:array-index-out-of-bounds in dbAdjTree
  ACPI: extlog: fix NULL pointer dereference check
  PNP: ACPI: fix fortify warning
  ACPI: video: Add quirk for the Colorful X15 AT 23 Laptop
  audit: Send netlink ACK before setting connection in auditd_set
  powerpc/lib: Validate size for vector operations
  powerpc/mm: Fix build failures due to arch_reserved_kernel_pages()
  powerpc: Fix build error due to is_valid_bugaddr()
  powerpc/mm: Fix null-pointer dereference in pgtable_cache_add
  net/sched: cbs: Fix not adding cbs instance to list
  x86/entry/ia32: Ensure s32 is sign extended to s64
  tick/sched: Preserve number of idle sleeps across CPU hotplug events
  mips: Call lose_fpu(0) before initializing fcr31 in mips_set_personality_nan
  gpio: eic-sprd: Clear interrupt after set the interrupt type
  drm/exynos: gsc: minor fix for loop iteration in gsc_runtime_resume
  drm/bridge: nxp-ptn3460: simplify some error checking
  drm/bridge: nxp-ptn3460: fix i2c_master_send() error checking
  drm: Don't unref the same fb many times by mistake due to deadlock handling
  gpiolib: acpi: Ignore touchpad wakeup on GPD G1619-04
  netfilter: nf_tables: reject QUEUE/DROP verdict parameters
  btrfs: defrag: reject unknown flags of btrfs_ioctl_defrag_range_args
  btrfs: don't warn if discard range is not aligned to sector
  net: fec: fix the unhandled context fault from smmu
  fjes: fix memleaks in fjes_hw_setup
  netfilter: nf_tables: restrict anonymous set and map names to 16 bytes
  net/mlx5e: fix a double-free in arfs_create_groups
  net/mlx5: Use kfree(ft->g) in arfs_create_groups()
  netlink: fix potential sleeping issue in mqueue_flush_file

 Conflicts:
	include/linux/fs.h
	include/linux/timer.h
	init/initramfs.c
	kernel/time/timer.c
	mm/memory-failure.c
	mm/page_alloc.c
	net/core/sock.c
	scripts/Makefile.extrawarn

Change-Id: I0ccfce4c1a43240cfb997b426ef9fc59e61e3c55
2024-05-07 22:02:57 +03:00
Greg Kroah-Hartman
ec0ad95612 Merge 4.19.312 into android-4.19-stable
Changes in 4.19.312
	Documentation/hw-vuln: Update spectre doc
	x86/cpu: Support AMD Automatic IBRS
	x86/bugs: Use sysfs_emit()
	timer/trace: Replace deprecated vsprintf pointer extension %pf by %ps
	timer/trace: Improve timer tracing
	timers: Prepare support for PREEMPT_RT
	timers: Update kernel-doc for various functions
	timers: Use del_timer_sync() even on UP
	timers: Rename del_timer_sync() to timer_delete_sync()
	wifi: brcmfmac: Fix use-after-free bug in brcmf_cfg80211_detach
	smack: Set SMACK64TRANSMUTE only for dirs in smack_inode_setxattr()
	smack: Handle SMACK64TRANSMUTE in smack_inode_setsecurity()
	ARM: dts: mmp2-brownstone: Don't redeclare phandle references
	arm: dts: marvell: Fix maxium->maxim typo in brownstone dts
	media: xc4000: Fix atomicity violation in xc4000_get_frequency
	KVM: Always flush async #PF workqueue when vCPU is being destroyed
	sparc64: NMI watchdog: fix return value of __setup handler
	sparc: vDSO: fix return value of __setup handler
	crypto: qat - fix double free during reset
	crypto: qat - resolve race condition during AER recovery
	fat: fix uninitialized field in nostale filehandles
	ubifs: Set page uptodate in the correct place
	ubi: Check for too small LEB size in VTBL code
	ubi: correct the calculation of fastmap size
	parisc: Do not hardcode registers in checksum functions
	parisc: Fix ip_fast_csum
	parisc: Fix csum_ipv6_magic on 32-bit systems
	parisc: Fix csum_ipv6_magic on 64-bit systems
	parisc: Strip upper 32 bit of sum in csum_ipv6_magic for 64-bit builds
	PM: suspend: Set mem_sleep_current during kernel command line setup
	clk: qcom: gcc-ipq8074: fix terminating of frequency table arrays
	clk: qcom: mmcc-apq8084: fix terminating of frequency table arrays
	clk: qcom: mmcc-msm8974: fix terminating of frequency table arrays
	powerpc/fsl: Fix mfpmr build errors with newer binutils
	USB: serial: ftdi_sio: add support for GMC Z216C Adapter IR-USB
	USB: serial: add device ID for VeriFone adapter
	USB: serial: cp210x: add ID for MGP Instruments PDS100
	USB: serial: option: add MeiG Smart SLM320 product
	USB: serial: cp210x: add pid/vid for TDK NC0110013M and MM0110113M
	PM: sleep: wakeirq: fix wake irq warning in system suspend
	mmc: tmio: avoid concurrent runs of mmc_request_done()
	fuse: don't unhash root
	PCI: Drop pci_device_remove() test of pci_dev->driver
	PCI/PM: Drain runtime-idle callbacks before driver removal
	Revert "Revert "md/raid5: Wait for MD_SB_CHANGE_PENDING in raid5d""
	dm-raid: fix lockdep waring in "pers->hot_add_disk"
	mmc: core: Fix switch on gp3 partition
	hwmon: (amc6821) add of_match table
	ext4: fix corruption during on-line resize
	slimbus: core: Remove usage of the deprecated ida_simple_xx() API
	speakup: Fix 8bit characters from direct synth
	kbuild: Move -Wenum-{compare-conditional,enum-conversion} into W=1
	vfio/platform: Disable virqfds on cleanup
	soc: fsl: qbman: Always disable interrupts when taking cgr_lock
	soc: fsl: qbman: Add helper for sanity checking cgr ops
	soc: fsl: qbman: Add CGR update function
	soc: fsl: qbman: Use raw spinlock for cgr_lock
	s390/zcrypt: fix reference counting on zcrypt card objects
	drm/imx/ipuv3: do not return negative values from .get_modes()
	drm/vc4: hdmi: do not return negative values from .get_modes()
	memtest: use {READ,WRITE}_ONCE in memory scanning
	nilfs2: fix failure to detect DAT corruption in btree and direct mappings
	nilfs2: use a more common logging style
	nilfs2: prevent kernel bug at submit_bh_wbc()
	x86/CPU/AMD: Update the Zenbleed microcode revisions
	ahci: asm1064: correct count of reported ports
	ahci: asm1064: asm1166: don't limit reported ports
	comedi: comedi_test: Prevent timers rescheduling during deletion
	netfilter: nf_tables: disallow anonymous set with timeout flag
	netfilter: nf_tables: reject constant set with timeout
	xfrm: Avoid clang fortify warning in copy_to_user_tmpl()
	ALSA: hda/realtek - Fix headset Mic no show at resume back for Lenovo ALC897 platform
	USB: usb-storage: Prevent divide-by-0 error in isd200_ata_command
	usb: gadget: ncm: Fix handling of zero block length packets
	usb: port: Don't try to peer unused USB ports based on location
	tty: serial: fsl_lpuart: avoid idle preamble pending if CTS is enabled
	vt: fix unicode buffer corruption when deleting characters
	vt: fix memory overlapping when deleting chars in the buffer
	mm/memory-failure: fix an incorrect use of tail pages
	mm/migrate: set swap entry values of THP tail pages properly.
	wifi: mac80211: check/clear fast rx for non-4addr sta VLAN changes
	exec: Fix NOMMU linux_binprm::exec in transfer_args_to_stack()
	usb: cdc-wdm: close race between read and workqueue
	ALSA: sh: aica: reorder cleanup operations to avoid UAF bugs
	fs/aio: Check IOCB_AIO_RW before the struct aio_kiocb conversion
	printk: Update @console_may_schedule in console_trylock_spinning()
	btrfs: allocate btrfs_ioctl_defrag_range_args on stack
	Revert "loop: Check for overflow while configuring loop"
	loop: Call loop_config_discard() only after new config is applied
	loop: Remove sector_t truncation checks
	loop: Factor out setting loop device size
	loop: Refactor loop_set_status() size calculation
	loop: properly observe rotational flag of underlying device
	perf/core: Fix reentry problem in perf_output_read_group()
	efivarfs: Request at most 512 bytes for variable names
	powerpc: xor_vmx: Add '-mhard-float' to CFLAGS
	loop: Factor out configuring loop from status
	loop: Check for overflow while configuring loop
	loop: loop_set_status_from_info() check before assignment
	usb: dwc2: host: Fix remote wakeup from hibernation
	usb: dwc2: host: Fix hibernation flow
	usb: dwc2: host: Fix ISOC flow in DDMA mode
	usb: dwc2: gadget: LPM flow fix
	usb: udc: remove warning when queue disabled ep
	scsi: qla2xxx: Fix command flush on cable pull
	x86/cpu: Enable STIBP on AMD if Automatic IBRS is enabled
	scsi: lpfc: Correct size for wqe for memset()
	USB: core: Fix deadlock in usb_deauthorize_interface()
	nfc: nci: Fix uninit-value in nci_dev_up and nci_ntf_packet
	mptcp: add sk_stop_timer_sync helper
	tcp: properly terminate timers for kernel sockets
	r8169: fix issue caused by buggy BIOS on certain boards with RTL8168d
	Bluetooth: hci_event: set the conn encrypted before conn establishes
	Bluetooth: Fix TOCTOU in HCI debugfs implementation
	netfilter: nf_tables: disallow timeout for anonymous sets
	net/rds: fix possible cp null dereference
	Revert "x86/mm/ident_map: Use gbpages only where full GB page should be mapped."
	mm, vmscan: prevent infinite loop for costly GFP_NOIO | __GFP_RETRY_MAYFAIL allocations
	netfilter: nf_tables: Fix potential data-race in __nft_flowtable_type_get()
	net/sched: act_skbmod: prevent kernel-infoleak
	net: stmmac: fix rx queue priority assignment
	selftests: reuseaddr_conflict: add missing new line at the end of the output
	ipv6: Fix infinite recursion in fib6_dump_done().
	i40e: fix vf may be used uninitialized in this function warning
	staging: mmal-vchiq: Avoid use of bool in structures
	staging: mmal-vchiq: Allocate and free components as required
	staging: mmal-vchiq: Fix client_component for 64 bit kernel
	staging: vc04_services: changen strncpy() to strscpy_pad()
	staging: vc04_services: fix information leak in create_component()
	initramfs: factor out a helper to populate the initrd image
	fs: add a vfs_fchown helper
	fs: add a vfs_fchmod helper
	initramfs: switch initramfs unpacking to struct file based APIs
	init: open /initrd.image with O_LARGEFILE
	erspan: Add type I version 0 support.
	erspan: make sure erspan_base_hdr is present in skb->head
	ASoC: ops: Fix wraparound for mask in snd_soc_get_volsw
	ata: sata_sx4: fix pdc20621_get_from_dimm() on 64-bit
	ata: sata_mv: Fix PCI device ID table declaration compilation warning
	ALSA: hda/realtek: Update Panasonic CF-SZ6 quirk to support headset with microphone
	wifi: ath9k: fix LNA selection in ath_ant_try_scan()
	VMCI: Fix memcpy() run-time warning in dg_dispatch_as_host()
	arm64: dts: rockchip: fix rk3399 hdmi ports node
	tools/power x86_energy_perf_policy: Fix file leak in get_pkg_num()
	btrfs: handle chunk tree lookup error in btrfs_relocate_sys_chunks()
	btrfs: export: handle invalid inode or root reference in btrfs_get_parent()
	btrfs: send: handle path ref underflow in header iterate_inode_ref()
	Bluetooth: btintel: Fix null ptr deref in btintel_read_version
	Input: synaptics-rmi4 - fail probing if memory allocation for "phys" fails
	sysv: don't call sb_bread() with pointers_lock held
	scsi: lpfc: Fix possible memory leak in lpfc_rcv_padisc()
	isofs: handle CDs with bad root inode but good Joliet root directory
	media: sta2x11: fix irq handler cast
	drm/amd/display: Fix nanosec stat overflow
	SUNRPC: increase size of rpc_wait_queue.qlen from unsigned short to unsigned int
	block: prevent division by zero in blk_rq_stat_sum()
	Input: allocate keycode for Display refresh rate toggle
	ktest: force $buildonly = 1 for 'make_warnings_file' test type
	tools: iio: replace seekdir() in iio_generic_buffer
	usb: sl811-hcd: only defined function checkdone if QUIRK2 is defined
	fbdev: viafb: fix typo in hw_bitblt_1 and hw_bitblt_2
	fbmon: prevent division by zero in fb_videomode_from_videomode()
	tty: n_gsm: require CAP_NET_ADMIN to attach N_GSM0710 ldisc
	drm/vkms: call drm_atomic_helper_shutdown before drm_dev_put()
	virtio: reenable config if freezing device failed
	x86/mm/pat: fix VM_PAT handling in COW mappings
	Bluetooth: btintel: Fixe build regression
	VMCI: Fix possible memcpy() run-time warning in vmci_datagram_invoke_guest_handler()
	erspan: Check IFLA_GRE_ERSPAN_VER is set.
	ip_gre: do not report erspan version on GRE interface
	initramfs: fix populate_initrd_image() section mismatch
	amdkfd: use calloc instead of kzalloc to avoid integer overflow
	Linux 4.19.312

Change-Id: Ic4c50de6afb4c88c8011be6cc93f960d2dc968e0
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2024-04-16 10:08:42 +00:00
Vlastimil Babka
c82a659cc8 mm, vmscan: prevent infinite loop for costly GFP_NOIO | __GFP_RETRY_MAYFAIL allocations
commit 803de9000f334b771afacb6ff3e78622916668b0 upstream.

Sven reports an infinite loop in __alloc_pages_slowpath() for costly order
__GFP_RETRY_MAYFAIL allocations that are also GFP_NOIO.  Such combination
can happen in a suspend/resume context where a GFP_KERNEL allocation can
have __GFP_IO masked out via gfp_allowed_mask.

Quoting Sven:

1. try to do a "costly" allocation (order > PAGE_ALLOC_COSTLY_ORDER)
   with __GFP_RETRY_MAYFAIL set.

2. page alloc's __alloc_pages_slowpath tries to get a page from the
   freelist. This fails because there is nothing free of that costly
   order.

3. page alloc tries to reclaim by calling __alloc_pages_direct_reclaim,
   which bails out because a zone is ready to be compacted; it pretends
   to have made a single page of progress.

4. page alloc tries to compact, but this always bails out early because
   __GFP_IO is not set (it's not passed by the snd allocator, and even
   if it were, we are suspending so the __GFP_IO flag would be cleared
   anyway).

5. page alloc believes reclaim progress was made (because of the
   pretense in item 3) and so it checks whether it should retry
   compaction. The compaction retry logic thinks it should try again,
   because:
    a) reclaim is needed because of the early bail-out in item 4
    b) a zonelist is suitable for compaction

6. goto 2. indefinite stall.

(end quote)

The immediate root cause is confusing the COMPACT_SKIPPED returned from
__alloc_pages_direct_compact() (step 4) due to lack of __GFP_IO to be
indicating a lack of order-0 pages, and in step 5 evaluating that in
should_compact_retry() as a reason to retry, before incrementing and
limiting the number of retries.  There are however other places that
wrongly assume that compaction can happen while we lack __GFP_IO.

To fix this, introduce gfp_compaction_allowed() to abstract the __GFP_IO
evaluation and switch the open-coded test in try_to_compact_pages() to use
it.

Also use the new helper in:
- compaction_ready(), which will make reclaim not bail out in step 3, so
  there's at least one attempt to actually reclaim, even if chances are
  small for a costly order
- in_reclaim_compaction() which will make should_continue_reclaim()
  return false and we don't over-reclaim unnecessarily
- in __alloc_pages_slowpath() to set a local variable can_compact,
  which is then used to avoid retrying reclaim/compaction for costly
  allocations (step 5) if we can't compact and also to skip the early
  compaction attempt that we do in some cases

Link: https://lkml.kernel.org/r/20240221114357.13655-2-vbabka@suse.cz
Fixes: 3250845d05 ("Revert "mm, oom: prevent premature OOM killer invocation for high order request"")
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Reported-by: Sven van Ashbrook <svenva@chromium.org>
Closes: https://lore.kernel.org/all/CAG-rBihs_xMKb3wrMO1%2B-%2Bp4fowP9oy1pa_OTkfxBzPUVOZF%2Bg@mail.gmail.com/
Tested-by: Karthikeyan Ramasubramanian <kramasub@chromium.org>
Cc: Brian Geffon <bgeffon@google.com>
Cc: Curtis Malainey <cujomalainey@chromium.org>
Cc: Jaroslav Kysela <perex@perex.cz>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Takashi Iwai <tiwai@suse.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-04-13 12:50:12 +02:00
Mark Salyzyn
fbc355c16d Revert "BACKPORT: mm: move zone watermark accesses behind an accessor"
This reverts commit acfb1c608b.

Reason for revert: revert customized code
Bug: 140544941
Test: boot
Signed-off-by: Minchan Kim <minchan@google.com>
Signed-off-by: Martin Liu <liumartin@google.com>
Signed-off-by: Mark Salyzyn <salyzyn@google.com>
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Change-Id: I4988ddf1fc24579d4fc478de1de6e45b870f7bcd
2020-04-14 19:35:44 +08:00
Mark Salyzyn
776eba0172 Revert "BACKPORT: mm, compaction: be selective about what pageblocks to clear skip hints"
This reverts commit fd9c71c06b.

Reason for revert: revert customized code
Bug: 140544941
Test: boot
Signed-off-by: Minchan Kim <minchan@google.com>
Signed-off-by: Martin Liu <liumartin@google.com>
Signed-off-by: Mark Salyzyn <salyzyn@google.com>
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Change-Id: I58d978c6f74f8a1a04b17fa6d3890ddc4f1988c0
2020-04-14 19:35:13 +08:00
Mark Salyzyn
fd9c71c06b BACKPORT: mm, compaction: be selective about what pageblocks to clear skip hints
Pageblock hints are cleared when compaction restarts or kswapd makes
enough progress that it can sleep but it's over-eager in that the bit is
cleared for migration sources with no LRU pages and migration targets
with no free pages.  As pageblock skip hint flushes are relatively rare
and out-of-band with respect to kswapd, this patch makes a few more
expensive checks to see if it's appropriate to even clear the bit.
Every pageblock that is not cleared will avoid 512 pages being scanned
unnecessarily on x86-64.

The impact is variable with different workloads showing small
differences in latency, success rates and scan rates.  This is expected
as clearing the hints is not that common but doing a small amount of
work out-of-band to avoid a large amount of work in-band later is
generally a good thing.

Link: http://lkml.kernel.org/r/20190118175136.31341-22-mgorman@techsingularity.net
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Qian Cai <cai@lca.pw>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Cc: David Rientjes <rientjes@google.com>
Cc: YueHaibing <yuehaibing@huawei.com>
[cai@lca.pw: no stuck in __reset_isolation_pfn()]
  Link: http://lkml.kernel.org/r/20190206034732.75687-1-cai@lca.pw
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Change-Id: I27b3d1bf8100d0281ec297ef5ce79d100d0cb37e
Git-commit: e332f741a8dd1ec9a6dc8aa997296ecbfe64323e
Git-repo: git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
Signed-off-by: Vinayak Menon <vinmenon@codeaurora.org>
(cherry picked from commit e332f741a8dd1ec9a6dc8aa997296ecbfe64323e)
Signed-off-by: Mark Salyzyn <salyzyn@google.com>
Bug: 150378964
2020-03-04 12:01:17 -08:00
Mark Salyzyn
acfb1c608b BACKPORT: mm: move zone watermark accesses behind an accessor
This is a preparation patch only, no functional change.

Link: http://lkml.kernel.org/r/20181123114528.28802-3-mgorman@techsingularity.net
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Zi Yan <zi.yan@cs.rutgers.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Change-Id: Iccef5f02348ab59d75cbbe2b48f40017391b88b1
Git-commit: a921444382b49cc7fdeca3fba3e278bc09484a27
Git-repo: git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
[vinmenon@codeaurora.org: trivial merge conflict fixes]
Signed-off-by: Vinayak Menon <vinmenon@codeaurora.org>
(cherry picked from commit a921444382b49cc7fdeca3fba3e278bc09484a27)
Signed-off-by: Mark Salyzyn <salyzyn@google.com>
Bug: 150378964
2020-03-04 12:01:16 -08:00
qctecmdr
3225a4fef0 Merge "Merge android-4.19-q.78 (d9e388f) into msm-4.19" 2019-10-24 19:01:01 -07:00
Vlastimil Babka
2a239da9de mm, compaction: fix wrong pfn handling in __reset_isolation_pfn()
Florian and Dave reported [1] a NULL pointer dereference in
__reset_isolation_pfn().  While the exact cause is unclear, staring at
the code revealed two bugs, which might be related.

One bug is that if zone starts in the middle of pageblock, block_page
might correspond to different pfn than block_pfn, and then the
pfn_valid_within() checks will check different pfn's than those accessed
via struct page.  This might result in acessing an unitialized page in
CONFIG_HOLES_IN_ZONE configs.

The other bug is that end_page refers to the first page of next
pageblock and not last page of current pageblock.  The online and valid
check is then wrong and with sections, the while (page < end_page) loop
might wander off actual struct page arrays.

[1] https://lore.kernel.org/linux-xfs/87o8z1fvqu.fsf@mid.deneb.enyo.de/

Change-Id: I4d2352cba861245ed844943846d5c569ff2f3e24
Link: http://lkml.kernel.org/r/20191008152915.24704-1-vbabka@suse.cz
Fixes: 6b0868c820ff ("mm/compaction.c: correct zone boundary handling when resetting pageblock skip hints")
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Reported-by: Florian Weimer <fw@deneb.enyo.de>
Reported-by: Dave Chinner <david@fromorbit.com>
Acked-by: Mel Gorman <mgorman@techsingularity.net>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Git-Commit: a2e9a5afce080226edbf1882d63d99bf32070e9e
Git-Repo: git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
Signed-off-by: Vinayak Menon <vinmenon@codeaurora.org>
2019-10-17 16:14:28 +05:30
Ivaylo Georgiev
2857d83eee Merge android-4.19-q.77 (cd1eb9f) into msm-4.19
* refs/heads/tmp-cd1eb9f:
  Revert "net: qrtr: Stop rx_worker before freeing node"
  Linux 4.19.77
  drm/amd/display: Restore backlight brightness after system resume
  mm/compaction.c: clear total_{migrate,free}_scanned before scanning a new zone
  fuse: fix deadlock with aio poll and fuse_iqueue::waitq.lock
  md/raid0: avoid RAID0 data corruption due to layout confusion.
  CIFS: Fix oplock handling for SMB 2.1+ protocols
  CIFS: fix max ea value size
  i2c: riic: Clear NACK in tend isr
  hwrng: core - don't wait on add_early_randomness()
  quota: fix wrong condition in is_quota_modification()
  ext4: fix punch hole for inline_data file systems
  ext4: fix warning inside ext4_convert_unwritten_extents_endio
  /dev/mem: Bail out upon SIGKILL.
  cfg80211: Purge frame registrations on iftype change
  md: only call set_in_sync() when it is expected to succeed.
  md: don't report active array_state until after revalidate_disk() completes.
  md/raid6: Set R5_ReadError when there is read failure on parity disk
  Btrfs: fix race setting up and completing qgroup rescan workers
  btrfs: qgroup: Fix reserved data space leak if we have multiple reserve calls
  btrfs: qgroup: Fix the wrong target io_tree when freeing reserved data space
  btrfs: Relinquish CPUs in btrfs_compare_trees
  Btrfs: fix use-after-free when using the tree modification log
  btrfs: fix allocation of free space cache v1 bitmap pages
  ovl: filter of trusted xattr results in audit
  ovl: Fix dereferencing possible ERR_PTR()
  smb3: allow disabling requesting leases
  block: fix null pointer dereference in blk_mq_rq_timed_out()
  i40e: check __I40E_VF_DISABLE bit in i40e_sync_filters_subtask
  memcg, kmem: do not fail __GFP_NOFAIL charges
  memcg, oom: don't require __GFP_FS when invoking memcg OOM killer
  gfs2: clear buf_in_tr when ending a transaction in sweep_bh_for_rgrps
  efifb: BGRT: Improve efifb_bgrt_sanity_check
  regulator: Defer init completion for a while after late_initcall
  alarmtimer: Use EOPNOTSUPP instead of ENOTSUPP
  arm64: dts: rockchip: limit clock rate of MMC controllers for RK3328
  arm64: tlb: Ensure we execute an ISB following walk cache invalidation
  Revert "arm64: Remove unnecessary ISBs from set_{pte,pmd,pud}"
  ARM: zynq: Use memcpy_toio instead of memcpy on smp bring-up
  ARM: samsung: Fix system restart on S3C6410
  ASoC: Intel: Fix use of potentially uninitialized variable
  ASoC: Intel: Skylake: Use correct function to access iomem space
  ASoC: Intel: NHLT: Fix debug print format
  binfmt_elf: Do not move brk for INTERP-less ET_EXEC
  media: don't drop front-end reference count for ->detach
  media: sn9c20x: Add MSI MS-1039 laptop to flip_dmi_table
  KVM: x86: Manually calculate reserved bits when loading PDPTRS
  KVM: x86: set ctxt->have_exception in x86_decode_insn()
  KVM: x86: always stop emulation on page fault
  parisc: Disable HP HSC-PCI Cards to prevent kernel crash
  fuse: fix missing unlock_page in fuse_writepage()
  powerpc/imc: Dont create debugfs files for cpu-less nodes
  scsi: implement .cleanup_rq callback
  blk-mq: add callback of .cleanup_rq
  ALSA: hda/realtek - PCI quirk for Medion E4254
  ceph: use ceph_evict_inode to cleanup inode's resource
  Revert "ceph: use ceph_evict_inode to cleanup inode's resource"
  randstruct: Check member structs in is_pure_ops_struct()
  IB/hfi1: Define variables as unsigned long to fix KASAN warning
  IB/mlx5: Free mpi in mp_slave mode
  printk: Do not lose last line in kmsg buffer dump
  scsi: qla2xxx: Fix Relogin to prevent modifying scan_state flag
  scsi: scsi_dh_rdac: zero cdb in send_mode_select()
  ALSA: firewire-tascam: check intermediate state of clock status and retry
  ALSA: firewire-tascam: handle error code when getting current source of clock
  iwlwifi: fw: don't send GEO_TX_POWER_LIMIT command to FW version 36
  PM / devfreq: passive: fix compiler warning
  media: omap3isp: Set device on omap3isp subdevs
  btrfs: extent-tree: Make sure we only allocate extents from block groups with the same type
  iommu/amd: Override wrong IVRS IOAPIC on Raven Ridge systems
  ALSA: hda/realtek - Blacklist PC beep for Lenovo ThinkCentre M73/93
  media: ttusb-dec: Fix info-leak in ttusb_dec_send_command()
  drm/amd/powerplay/smu7: enforce minimal VBITimeout (v2)
  ALSA: hda - Drop unsol event handler for Intel HDMI codecs
  e1000e: add workaround for possible stalled packet
  libertas: Add missing sentinel at end of if_usb.c fw_table
  raid5: don't increment read_errors on EILSEQ return
  mmc: dw_mmc: Re-store SDIO IRQs mask at system resume
  mmc: core: Add helper function to indicate if SDIO IRQs is enabled
  mmc: sdhci: Fix incorrect switch to HS mode
  mmc: core: Clarify sdio_irq_pending flag for MMC_CAP2_SDIO_IRQ_NOTHREAD
  raid5: don't set STRIPE_HANDLE to stripe which is in batch list
  ASoC: dmaengine: Make the pcm->name equal to pcm->id if the name is not set
  platform/x86: intel_pmc_core: Do not ioremap RAM
  x86/cpu: Add Tiger Lake to Intel family
  s390/crypto: xts-aes-s390 fix extra run-time crypto self tests finding
  kprobes: Prohibit probing on BUG() and WARN() address
  dmaengine: ti: edma: Do not reset reserved paRAM slots
  md/raid1: fail run raid1 array when active disk less than one
  hwmon: (acpi_power_meter) Change log level for 'unsafe software power cap'
  closures: fix a race on wakeup from closure_sync
  ACPI / PCI: fix acpi_pci_irq_enable() memory leak
  ACPI: custom_method: fix memory leaks
  ARM: dts: exynos: Mark LDO10 as always-on on Peach Pit/Pi Chromebooks
  libtraceevent: Change users plugin directory
  iommu/iova: Avoid false sharing on fq_timer_on
  libata/ahci: Drop PCS quirk for Denverton and beyond
  iommu/amd: Silence warnings under memory pressure
  ALSA: firewire-motu: add support for MOTU 4pre
  nvme-multipath: fix ana log nsid lookup when nsid is not found
  nvmet: fix data units read and written counters in SMART log
  x86/mm/pti: Handle unaligned address gracefully in pti_clone_pagetable()
  ASoC: fsl_ssi: Fix clock control issue in master mode
  x86/mm/pti: Do not invoke PTI functions when PTI is disabled
  arm64: kpti: ensure patched kernel text is fetched from PoU
  x86/apic/vector: Warn when vector space exhaustion breaks affinity
  sched/cpufreq: Align trace event behavior of fast switching
  ACPI / CPPC: do not require the _PSD method
  ASoC: es8316: fix headphone mixer volume table
  media: ov9650: add a sanity check
  perf trace beauty ioctl: Fix off-by-one error in cmd->string table
  media: saa7134: fix terminology around saa7134_i2c_eeprom_md7134_gate()
  media: cpia2_usb: fix memory leaks
  media: saa7146: add cleanup in hexium_attach()
  media: cec-notifier: clear cec_adap in cec_notifier_unregister
  PM / devfreq: exynos-bus: Correct clock enable sequence
  PM / devfreq: passive: Use non-devm notifiers
  EDAC/amd64: Decode syndrome before translating address
  EDAC/amd64: Recognize DRAM device type ECC capability
  libperf: Fix alignment trap with xyarray contents in 'perf stat'
  media: dvb-core: fix a memory leak bug
  posix-cpu-timers: Sanitize bogus WARNONS
  media: dvb-frontends: use ida for pll number
  media: mceusb: fix (eliminate) TX IR signal length limit
  nbd: add missing config put
  led: triggers: Fix a memory leak bug
  ASoC: sun4i-i2s: Don't use the oversample to calculate BCLK
  tools headers: Fixup bitsperlong per arch includes
  ASoC: uniphier: Fix double reset assersion when transitioning to suspend state
  media: hdpvr: add terminating 0 at end of string
  media: radio/si470x: kill urb on error
  ARM: dts: imx7-colibri: disable HS400
  ARM: dts: imx7d: cl-som-imx7: make ethernet work again
  m68k: Prevent some compiler warnings in Coldfire builds
  net: lpc-enet: fix printk format strings
  media: imx: mipi csi-2: Don't fail if initial state times-out
  media: omap3isp: Don't set streaming state on random subdevs
  media: i2c: ov5645: Fix power sequence
  media: vsp1: fix memory leak of dl on error return path
  perf record: Support aarch64 random socket_id assignment
  dmaengine: iop-adma: use correct printk format strings
  media: rc: imon: Allow iMON RC protocol for ffdc 7e device
  media: em28xx: modules workqueue not inited for 2nd device
  media: fdp1: Reduce FCP not found message level to debug
  media: mtk-mdp: fix reference count on old device tree
  perf test vfs_getname: Disable ~/.perfconfig to get default output
  perf config: Honour $PERF_CONFIG env var to specify alternate .perfconfig
  media: gspca: zero usb_buf on error
  idle: Prevent late-arriving interrupts from disrupting offline
  sched/fair: Use rq_lock/unlock in online_fair_sched_group
  firmware: arm_scmi: Check if platform has released shmem before using
  efi: cper: print AER info of PCIe fatal error
  EDAC, pnd2: Fix ioremap() size in dnv_rd_reg()
  loop: Add LOOP_SET_DIRECT_IO to compat ioctl
  ACPI / processor: don't print errors for processorIDs == 0xff
  media: media/platform: fsl-viu.c: fix build for MICROBLAZE
  md: don't set In_sync if array is frozen
  md: don't call spare_active in md_reap_sync_thread if all member devices can't work
  md/raid1: end bio when the device faulty
  arm64/prefetch: fix a -Wtype-limits warning
  ASoC: rsnd: don't call clk_get_rate() under atomic context
  EDAC/altera: Use the proper type for the IRQ status bits
  ia64:unwind: fix double free for mod->arch.init_unw_table
  ALSA: usb-audio: Skip bSynchAddress endpoint check if it is invalid
  base: soc: Export soc_device_register/unregister APIs
  media: iguanair: add sanity checks
  EDAC/mc: Fix grain_bits calculation
  ALSA: i2c: ak4xxx-adda: Fix a possible null pointer dereference in build_adc_controls()
  ALSA: hda - Show the fatal CORB/RIRB error more clearly
  x86/apic: Soft disable APIC before initializing it
  x86/reboot: Always use NMI fallback when shutdown via reboot vector IPI fails
  sched/deadline: Fix bandwidth accounting at all levels after offline migration
  x86/apic: Make apic_pending_intr_clear() more robust
  sched/core: Fix CPU controller for !RT_GROUP_SCHED
  sched/fair: Fix imbalance due to CPU affinity
  time/tick-broadcast: Fix tick_broadcast_offline() lockdep complaint
  media: i2c: ov5640: Check for devm_gpiod_get_optional() error
  media: hdpvr: Add device num check and handling
  media: exynos4-is: fix leaked of_node references
  media: mtk-cir: lower de-glitch counter for rc-mm protocol
  media: dib0700: fix link error for dibx000_i2c_set_speed
  leds: leds-lp5562 allow firmware files up to the maximum length
  dmaengine: bcm2835: Print error in case setting DMA mask fails
  firmware: qcom_scm: Use proper types for dma mappings
  ASoC: sgtl5000: Fix charge pump source assignment
  ASoC: sgtl5000: Fix of unmute outputs on probe
  ASoC: tlv320aic31xx: suppress error message for EPROBE_DEFER
  regulator: lm363x: Fix off-by-one n_voltages for lm3632 ldo_vpos/ldo_vneg
  ALSA: hda: Flush interrupts on disabling
  nfp: flower: prevent memory leak in nfp_flower_spawn_phy_reprs
  nfc: enforce CAP_NET_RAW for raw sockets
  ieee802154: enforce CAP_NET_RAW for raw sockets
  ax25: enforce CAP_NET_RAW for raw sockets
  appletalk: enforce CAP_NET_RAW for raw sockets
  mISDN: enforce CAP_NET_RAW for raw sockets
  net/mlx5: Add device ID of upcoming BlueField-2
  tcp: better handle TCP_USER_TIMEOUT in SYN_SENT state
  net: sched: fix possible crash in tcf_action_destroy()
  usbnet: sanity checking of packet sizes and device mtu
  usbnet: ignore endpoints with invalid wMaxPacketSize
  skge: fix checksum byte order
  sch_netem: fix a divide by zero in tabledist()
  ppp: Fix memory leak in ppp_write
  openvswitch: change type of UPCALL_PID attribute to NLA_UNSPEC
  nfp: flower: fix memory leak in nfp_flower_spawn_vnic_reprs
  net_sched: add max len check for TCA_KIND
  net/sched: act_sample: don't push mac header on ip6gre ingress
  net: qrtr: Stop rx_worker before freeing node
  net/phy: fix DP83865 10 Mbps HDX loopback disable function
  macsec: drop skb sk before calling gro_cells_receive
  cdc_ncm: fix divide-by-zero caused by invalid wMaxPacketSize
  arcnet: provide a buffer big enough to actually receive packets
  ANDROID: properly export new symbols with _GPL tag

Conflicts:
	kernel/sched/cpufreq_schedutil.c
	mm/compaction.c

Change-Id: I3f2cc4a1421480479b9040aa5eefe7e17437021d
Signed-off-by: Ivaylo Georgiev <irgeorgiev@codeaurora.org>
2019-10-14 00:52:41 -07:00
Greg Kroah-Hartman
cd1eb9f1b9 Merge 4.19.77 into android-4.19-q
Changes in 4.19.77
	arcnet: provide a buffer big enough to actually receive packets
	cdc_ncm: fix divide-by-zero caused by invalid wMaxPacketSize
	macsec: drop skb sk before calling gro_cells_receive
	net/phy: fix DP83865 10 Mbps HDX loopback disable function
	net: qrtr: Stop rx_worker before freeing node
	net/sched: act_sample: don't push mac header on ip6gre ingress
	net_sched: add max len check for TCA_KIND
	nfp: flower: fix memory leak in nfp_flower_spawn_vnic_reprs
	openvswitch: change type of UPCALL_PID attribute to NLA_UNSPEC
	ppp: Fix memory leak in ppp_write
	sch_netem: fix a divide by zero in tabledist()
	skge: fix checksum byte order
	usbnet: ignore endpoints with invalid wMaxPacketSize
	usbnet: sanity checking of packet sizes and device mtu
	net: sched: fix possible crash in tcf_action_destroy()
	tcp: better handle TCP_USER_TIMEOUT in SYN_SENT state
	net/mlx5: Add device ID of upcoming BlueField-2
	mISDN: enforce CAP_NET_RAW for raw sockets
	appletalk: enforce CAP_NET_RAW for raw sockets
	ax25: enforce CAP_NET_RAW for raw sockets
	ieee802154: enforce CAP_NET_RAW for raw sockets
	nfc: enforce CAP_NET_RAW for raw sockets
	nfp: flower: prevent memory leak in nfp_flower_spawn_phy_reprs
	ALSA: hda: Flush interrupts on disabling
	regulator: lm363x: Fix off-by-one n_voltages for lm3632 ldo_vpos/ldo_vneg
	ASoC: tlv320aic31xx: suppress error message for EPROBE_DEFER
	ASoC: sgtl5000: Fix of unmute outputs on probe
	ASoC: sgtl5000: Fix charge pump source assignment
	firmware: qcom_scm: Use proper types for dma mappings
	dmaengine: bcm2835: Print error in case setting DMA mask fails
	leds: leds-lp5562 allow firmware files up to the maximum length
	media: dib0700: fix link error for dibx000_i2c_set_speed
	media: mtk-cir: lower de-glitch counter for rc-mm protocol
	media: exynos4-is: fix leaked of_node references
	media: hdpvr: Add device num check and handling
	media: i2c: ov5640: Check for devm_gpiod_get_optional() error
	time/tick-broadcast: Fix tick_broadcast_offline() lockdep complaint
	sched/fair: Fix imbalance due to CPU affinity
	sched/core: Fix CPU controller for !RT_GROUP_SCHED
	x86/apic: Make apic_pending_intr_clear() more robust
	sched/deadline: Fix bandwidth accounting at all levels after offline migration
	x86/reboot: Always use NMI fallback when shutdown via reboot vector IPI fails
	x86/apic: Soft disable APIC before initializing it
	ALSA: hda - Show the fatal CORB/RIRB error more clearly
	ALSA: i2c: ak4xxx-adda: Fix a possible null pointer dereference in build_adc_controls()
	EDAC/mc: Fix grain_bits calculation
	media: iguanair: add sanity checks
	base: soc: Export soc_device_register/unregister APIs
	ALSA: usb-audio: Skip bSynchAddress endpoint check if it is invalid
	ia64:unwind: fix double free for mod->arch.init_unw_table
	EDAC/altera: Use the proper type for the IRQ status bits
	ASoC: rsnd: don't call clk_get_rate() under atomic context
	arm64/prefetch: fix a -Wtype-limits warning
	md/raid1: end bio when the device faulty
	md: don't call spare_active in md_reap_sync_thread if all member devices can't work
	md: don't set In_sync if array is frozen
	media: media/platform: fsl-viu.c: fix build for MICROBLAZE
	ACPI / processor: don't print errors for processorIDs == 0xff
	loop: Add LOOP_SET_DIRECT_IO to compat ioctl
	EDAC, pnd2: Fix ioremap() size in dnv_rd_reg()
	efi: cper: print AER info of PCIe fatal error
	firmware: arm_scmi: Check if platform has released shmem before using
	sched/fair: Use rq_lock/unlock in online_fair_sched_group
	idle: Prevent late-arriving interrupts from disrupting offline
	media: gspca: zero usb_buf on error
	perf config: Honour $PERF_CONFIG env var to specify alternate .perfconfig
	perf test vfs_getname: Disable ~/.perfconfig to get default output
	media: mtk-mdp: fix reference count on old device tree
	media: fdp1: Reduce FCP not found message level to debug
	media: em28xx: modules workqueue not inited for 2nd device
	media: rc: imon: Allow iMON RC protocol for ffdc 7e device
	dmaengine: iop-adma: use correct printk format strings
	perf record: Support aarch64 random socket_id assignment
	media: vsp1: fix memory leak of dl on error return path
	media: i2c: ov5645: Fix power sequence
	media: omap3isp: Don't set streaming state on random subdevs
	media: imx: mipi csi-2: Don't fail if initial state times-out
	net: lpc-enet: fix printk format strings
	m68k: Prevent some compiler warnings in Coldfire builds
	ARM: dts: imx7d: cl-som-imx7: make ethernet work again
	ARM: dts: imx7-colibri: disable HS400
	media: radio/si470x: kill urb on error
	media: hdpvr: add terminating 0 at end of string
	ASoC: uniphier: Fix double reset assersion when transitioning to suspend state
	tools headers: Fixup bitsperlong per arch includes
	ASoC: sun4i-i2s: Don't use the oversample to calculate BCLK
	led: triggers: Fix a memory leak bug
	nbd: add missing config put
	media: mceusb: fix (eliminate) TX IR signal length limit
	media: dvb-frontends: use ida for pll number
	posix-cpu-timers: Sanitize bogus WARNONS
	media: dvb-core: fix a memory leak bug
	libperf: Fix alignment trap with xyarray contents in 'perf stat'
	EDAC/amd64: Recognize DRAM device type ECC capability
	EDAC/amd64: Decode syndrome before translating address
	PM / devfreq: passive: Use non-devm notifiers
	PM / devfreq: exynos-bus: Correct clock enable sequence
	media: cec-notifier: clear cec_adap in cec_notifier_unregister
	media: saa7146: add cleanup in hexium_attach()
	media: cpia2_usb: fix memory leaks
	media: saa7134: fix terminology around saa7134_i2c_eeprom_md7134_gate()
	perf trace beauty ioctl: Fix off-by-one error in cmd->string table
	media: ov9650: add a sanity check
	ASoC: es8316: fix headphone mixer volume table
	ACPI / CPPC: do not require the _PSD method
	sched/cpufreq: Align trace event behavior of fast switching
	x86/apic/vector: Warn when vector space exhaustion breaks affinity
	arm64: kpti: ensure patched kernel text is fetched from PoU
	x86/mm/pti: Do not invoke PTI functions when PTI is disabled
	ASoC: fsl_ssi: Fix clock control issue in master mode
	x86/mm/pti: Handle unaligned address gracefully in pti_clone_pagetable()
	nvmet: fix data units read and written counters in SMART log
	nvme-multipath: fix ana log nsid lookup when nsid is not found
	ALSA: firewire-motu: add support for MOTU 4pre
	iommu/amd: Silence warnings under memory pressure
	libata/ahci: Drop PCS quirk for Denverton and beyond
	iommu/iova: Avoid false sharing on fq_timer_on
	libtraceevent: Change users plugin directory
	ARM: dts: exynos: Mark LDO10 as always-on on Peach Pit/Pi Chromebooks
	ACPI: custom_method: fix memory leaks
	ACPI / PCI: fix acpi_pci_irq_enable() memory leak
	closures: fix a race on wakeup from closure_sync
	hwmon: (acpi_power_meter) Change log level for 'unsafe software power cap'
	md/raid1: fail run raid1 array when active disk less than one
	dmaengine: ti: edma: Do not reset reserved paRAM slots
	kprobes: Prohibit probing on BUG() and WARN() address
	s390/crypto: xts-aes-s390 fix extra run-time crypto self tests finding
	x86/cpu: Add Tiger Lake to Intel family
	platform/x86: intel_pmc_core: Do not ioremap RAM
	ASoC: dmaengine: Make the pcm->name equal to pcm->id if the name is not set
	raid5: don't set STRIPE_HANDLE to stripe which is in batch list
	mmc: core: Clarify sdio_irq_pending flag for MMC_CAP2_SDIO_IRQ_NOTHREAD
	mmc: sdhci: Fix incorrect switch to HS mode
	mmc: core: Add helper function to indicate if SDIO IRQs is enabled
	mmc: dw_mmc: Re-store SDIO IRQs mask at system resume
	raid5: don't increment read_errors on EILSEQ return
	libertas: Add missing sentinel at end of if_usb.c fw_table
	e1000e: add workaround for possible stalled packet
	ALSA: hda - Drop unsol event handler for Intel HDMI codecs
	drm/amd/powerplay/smu7: enforce minimal VBITimeout (v2)
	media: ttusb-dec: Fix info-leak in ttusb_dec_send_command()
	ALSA: hda/realtek - Blacklist PC beep for Lenovo ThinkCentre M73/93
	iommu/amd: Override wrong IVRS IOAPIC on Raven Ridge systems
	btrfs: extent-tree: Make sure we only allocate extents from block groups with the same type
	media: omap3isp: Set device on omap3isp subdevs
	PM / devfreq: passive: fix compiler warning
	iwlwifi: fw: don't send GEO_TX_POWER_LIMIT command to FW version 36
	ALSA: firewire-tascam: handle error code when getting current source of clock
	ALSA: firewire-tascam: check intermediate state of clock status and retry
	scsi: scsi_dh_rdac: zero cdb in send_mode_select()
	scsi: qla2xxx: Fix Relogin to prevent modifying scan_state flag
	printk: Do not lose last line in kmsg buffer dump
	IB/mlx5: Free mpi in mp_slave mode
	IB/hfi1: Define variables as unsigned long to fix KASAN warning
	randstruct: Check member structs in is_pure_ops_struct()
	Revert "ceph: use ceph_evict_inode to cleanup inode's resource"
	ceph: use ceph_evict_inode to cleanup inode's resource
	ALSA: hda/realtek - PCI quirk for Medion E4254
	blk-mq: add callback of .cleanup_rq
	scsi: implement .cleanup_rq callback
	powerpc/imc: Dont create debugfs files for cpu-less nodes
	fuse: fix missing unlock_page in fuse_writepage()
	parisc: Disable HP HSC-PCI Cards to prevent kernel crash
	KVM: x86: always stop emulation on page fault
	KVM: x86: set ctxt->have_exception in x86_decode_insn()
	KVM: x86: Manually calculate reserved bits when loading PDPTRS
	media: sn9c20x: Add MSI MS-1039 laptop to flip_dmi_table
	media: don't drop front-end reference count for ->detach
	binfmt_elf: Do not move brk for INTERP-less ET_EXEC
	ASoC: Intel: NHLT: Fix debug print format
	ASoC: Intel: Skylake: Use correct function to access iomem space
	ASoC: Intel: Fix use of potentially uninitialized variable
	ARM: samsung: Fix system restart on S3C6410
	ARM: zynq: Use memcpy_toio instead of memcpy on smp bring-up
	Revert "arm64: Remove unnecessary ISBs from set_{pte,pmd,pud}"
	arm64: tlb: Ensure we execute an ISB following walk cache invalidation
	arm64: dts: rockchip: limit clock rate of MMC controllers for RK3328
	alarmtimer: Use EOPNOTSUPP instead of ENOTSUPP
	regulator: Defer init completion for a while after late_initcall
	efifb: BGRT: Improve efifb_bgrt_sanity_check
	gfs2: clear buf_in_tr when ending a transaction in sweep_bh_for_rgrps
	memcg, oom: don't require __GFP_FS when invoking memcg OOM killer
	memcg, kmem: do not fail __GFP_NOFAIL charges
	i40e: check __I40E_VF_DISABLE bit in i40e_sync_filters_subtask
	block: fix null pointer dereference in blk_mq_rq_timed_out()
	smb3: allow disabling requesting leases
	ovl: Fix dereferencing possible ERR_PTR()
	ovl: filter of trusted xattr results in audit
	btrfs: fix allocation of free space cache v1 bitmap pages
	Btrfs: fix use-after-free when using the tree modification log
	btrfs: Relinquish CPUs in btrfs_compare_trees
	btrfs: qgroup: Fix the wrong target io_tree when freeing reserved data space
	btrfs: qgroup: Fix reserved data space leak if we have multiple reserve calls
	Btrfs: fix race setting up and completing qgroup rescan workers
	md/raid6: Set R5_ReadError when there is read failure on parity disk
	md: don't report active array_state until after revalidate_disk() completes.
	md: only call set_in_sync() when it is expected to succeed.
	cfg80211: Purge frame registrations on iftype change
	/dev/mem: Bail out upon SIGKILL.
	ext4: fix warning inside ext4_convert_unwritten_extents_endio
	ext4: fix punch hole for inline_data file systems
	quota: fix wrong condition in is_quota_modification()
	hwrng: core - don't wait on add_early_randomness()
	i2c: riic: Clear NACK in tend isr
	CIFS: fix max ea value size
	CIFS: Fix oplock handling for SMB 2.1+ protocols
	md/raid0: avoid RAID0 data corruption due to layout confusion.
	fuse: fix deadlock with aio poll and fuse_iqueue::waitq.lock
	mm/compaction.c: clear total_{migrate,free}_scanned before scanning a new zone
	drm/amd/display: Restore backlight brightness after system resume
	Linux 4.19.77

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I024d8f764cd5910ba1210299375accb2e79f0abc
2019-10-06 11:28:58 +02:00
Greg Kroah-Hartman
7f1f24fed2 Merge 4.19.77 into android-4.19
Changes in 4.19.77
	arcnet: provide a buffer big enough to actually receive packets
	cdc_ncm: fix divide-by-zero caused by invalid wMaxPacketSize
	macsec: drop skb sk before calling gro_cells_receive
	net/phy: fix DP83865 10 Mbps HDX loopback disable function
	net: qrtr: Stop rx_worker before freeing node
	net/sched: act_sample: don't push mac header on ip6gre ingress
	net_sched: add max len check for TCA_KIND
	nfp: flower: fix memory leak in nfp_flower_spawn_vnic_reprs
	openvswitch: change type of UPCALL_PID attribute to NLA_UNSPEC
	ppp: Fix memory leak in ppp_write
	sch_netem: fix a divide by zero in tabledist()
	skge: fix checksum byte order
	usbnet: ignore endpoints with invalid wMaxPacketSize
	usbnet: sanity checking of packet sizes and device mtu
	net: sched: fix possible crash in tcf_action_destroy()
	tcp: better handle TCP_USER_TIMEOUT in SYN_SENT state
	net/mlx5: Add device ID of upcoming BlueField-2
	mISDN: enforce CAP_NET_RAW for raw sockets
	appletalk: enforce CAP_NET_RAW for raw sockets
	ax25: enforce CAP_NET_RAW for raw sockets
	ieee802154: enforce CAP_NET_RAW for raw sockets
	nfc: enforce CAP_NET_RAW for raw sockets
	nfp: flower: prevent memory leak in nfp_flower_spawn_phy_reprs
	ALSA: hda: Flush interrupts on disabling
	regulator: lm363x: Fix off-by-one n_voltages for lm3632 ldo_vpos/ldo_vneg
	ASoC: tlv320aic31xx: suppress error message for EPROBE_DEFER
	ASoC: sgtl5000: Fix of unmute outputs on probe
	ASoC: sgtl5000: Fix charge pump source assignment
	firmware: qcom_scm: Use proper types for dma mappings
	dmaengine: bcm2835: Print error in case setting DMA mask fails
	leds: leds-lp5562 allow firmware files up to the maximum length
	media: dib0700: fix link error for dibx000_i2c_set_speed
	media: mtk-cir: lower de-glitch counter for rc-mm protocol
	media: exynos4-is: fix leaked of_node references
	media: hdpvr: Add device num check and handling
	media: i2c: ov5640: Check for devm_gpiod_get_optional() error
	time/tick-broadcast: Fix tick_broadcast_offline() lockdep complaint
	sched/fair: Fix imbalance due to CPU affinity
	sched/core: Fix CPU controller for !RT_GROUP_SCHED
	x86/apic: Make apic_pending_intr_clear() more robust
	sched/deadline: Fix bandwidth accounting at all levels after offline migration
	x86/reboot: Always use NMI fallback when shutdown via reboot vector IPI fails
	x86/apic: Soft disable APIC before initializing it
	ALSA: hda - Show the fatal CORB/RIRB error more clearly
	ALSA: i2c: ak4xxx-adda: Fix a possible null pointer dereference in build_adc_controls()
	EDAC/mc: Fix grain_bits calculation
	media: iguanair: add sanity checks
	base: soc: Export soc_device_register/unregister APIs
	ALSA: usb-audio: Skip bSynchAddress endpoint check if it is invalid
	ia64:unwind: fix double free for mod->arch.init_unw_table
	EDAC/altera: Use the proper type for the IRQ status bits
	ASoC: rsnd: don't call clk_get_rate() under atomic context
	arm64/prefetch: fix a -Wtype-limits warning
	md/raid1: end bio when the device faulty
	md: don't call spare_active in md_reap_sync_thread if all member devices can't work
	md: don't set In_sync if array is frozen
	media: media/platform: fsl-viu.c: fix build for MICROBLAZE
	ACPI / processor: don't print errors for processorIDs == 0xff
	loop: Add LOOP_SET_DIRECT_IO to compat ioctl
	EDAC, pnd2: Fix ioremap() size in dnv_rd_reg()
	efi: cper: print AER info of PCIe fatal error
	firmware: arm_scmi: Check if platform has released shmem before using
	sched/fair: Use rq_lock/unlock in online_fair_sched_group
	idle: Prevent late-arriving interrupts from disrupting offline
	media: gspca: zero usb_buf on error
	perf config: Honour $PERF_CONFIG env var to specify alternate .perfconfig
	perf test vfs_getname: Disable ~/.perfconfig to get default output
	media: mtk-mdp: fix reference count on old device tree
	media: fdp1: Reduce FCP not found message level to debug
	media: em28xx: modules workqueue not inited for 2nd device
	media: rc: imon: Allow iMON RC protocol for ffdc 7e device
	dmaengine: iop-adma: use correct printk format strings
	perf record: Support aarch64 random socket_id assignment
	media: vsp1: fix memory leak of dl on error return path
	media: i2c: ov5645: Fix power sequence
	media: omap3isp: Don't set streaming state on random subdevs
	media: imx: mipi csi-2: Don't fail if initial state times-out
	net: lpc-enet: fix printk format strings
	m68k: Prevent some compiler warnings in Coldfire builds
	ARM: dts: imx7d: cl-som-imx7: make ethernet work again
	ARM: dts: imx7-colibri: disable HS400
	media: radio/si470x: kill urb on error
	media: hdpvr: add terminating 0 at end of string
	ASoC: uniphier: Fix double reset assersion when transitioning to suspend state
	tools headers: Fixup bitsperlong per arch includes
	ASoC: sun4i-i2s: Don't use the oversample to calculate BCLK
	led: triggers: Fix a memory leak bug
	nbd: add missing config put
	media: mceusb: fix (eliminate) TX IR signal length limit
	media: dvb-frontends: use ida for pll number
	posix-cpu-timers: Sanitize bogus WARNONS
	media: dvb-core: fix a memory leak bug
	libperf: Fix alignment trap with xyarray contents in 'perf stat'
	EDAC/amd64: Recognize DRAM device type ECC capability
	EDAC/amd64: Decode syndrome before translating address
	PM / devfreq: passive: Use non-devm notifiers
	PM / devfreq: exynos-bus: Correct clock enable sequence
	media: cec-notifier: clear cec_adap in cec_notifier_unregister
	media: saa7146: add cleanup in hexium_attach()
	media: cpia2_usb: fix memory leaks
	media: saa7134: fix terminology around saa7134_i2c_eeprom_md7134_gate()
	perf trace beauty ioctl: Fix off-by-one error in cmd->string table
	media: ov9650: add a sanity check
	ASoC: es8316: fix headphone mixer volume table
	ACPI / CPPC: do not require the _PSD method
	sched/cpufreq: Align trace event behavior of fast switching
	x86/apic/vector: Warn when vector space exhaustion breaks affinity
	arm64: kpti: ensure patched kernel text is fetched from PoU
	x86/mm/pti: Do not invoke PTI functions when PTI is disabled
	ASoC: fsl_ssi: Fix clock control issue in master mode
	x86/mm/pti: Handle unaligned address gracefully in pti_clone_pagetable()
	nvmet: fix data units read and written counters in SMART log
	nvme-multipath: fix ana log nsid lookup when nsid is not found
	ALSA: firewire-motu: add support for MOTU 4pre
	iommu/amd: Silence warnings under memory pressure
	libata/ahci: Drop PCS quirk for Denverton and beyond
	iommu/iova: Avoid false sharing on fq_timer_on
	libtraceevent: Change users plugin directory
	ARM: dts: exynos: Mark LDO10 as always-on on Peach Pit/Pi Chromebooks
	ACPI: custom_method: fix memory leaks
	ACPI / PCI: fix acpi_pci_irq_enable() memory leak
	closures: fix a race on wakeup from closure_sync
	hwmon: (acpi_power_meter) Change log level for 'unsafe software power cap'
	md/raid1: fail run raid1 array when active disk less than one
	dmaengine: ti: edma: Do not reset reserved paRAM slots
	kprobes: Prohibit probing on BUG() and WARN() address
	s390/crypto: xts-aes-s390 fix extra run-time crypto self tests finding
	x86/cpu: Add Tiger Lake to Intel family
	platform/x86: intel_pmc_core: Do not ioremap RAM
	ASoC: dmaengine: Make the pcm->name equal to pcm->id if the name is not set
	raid5: don't set STRIPE_HANDLE to stripe which is in batch list
	mmc: core: Clarify sdio_irq_pending flag for MMC_CAP2_SDIO_IRQ_NOTHREAD
	mmc: sdhci: Fix incorrect switch to HS mode
	mmc: core: Add helper function to indicate if SDIO IRQs is enabled
	mmc: dw_mmc: Re-store SDIO IRQs mask at system resume
	raid5: don't increment read_errors on EILSEQ return
	libertas: Add missing sentinel at end of if_usb.c fw_table
	e1000e: add workaround for possible stalled packet
	ALSA: hda - Drop unsol event handler for Intel HDMI codecs
	drm/amd/powerplay/smu7: enforce minimal VBITimeout (v2)
	media: ttusb-dec: Fix info-leak in ttusb_dec_send_command()
	ALSA: hda/realtek - Blacklist PC beep for Lenovo ThinkCentre M73/93
	iommu/amd: Override wrong IVRS IOAPIC on Raven Ridge systems
	btrfs: extent-tree: Make sure we only allocate extents from block groups with the same type
	media: omap3isp: Set device on omap3isp subdevs
	PM / devfreq: passive: fix compiler warning
	iwlwifi: fw: don't send GEO_TX_POWER_LIMIT command to FW version 36
	ALSA: firewire-tascam: handle error code when getting current source of clock
	ALSA: firewire-tascam: check intermediate state of clock status and retry
	scsi: scsi_dh_rdac: zero cdb in send_mode_select()
	scsi: qla2xxx: Fix Relogin to prevent modifying scan_state flag
	printk: Do not lose last line in kmsg buffer dump
	IB/mlx5: Free mpi in mp_slave mode
	IB/hfi1: Define variables as unsigned long to fix KASAN warning
	randstruct: Check member structs in is_pure_ops_struct()
	Revert "ceph: use ceph_evict_inode to cleanup inode's resource"
	ceph: use ceph_evict_inode to cleanup inode's resource
	ALSA: hda/realtek - PCI quirk for Medion E4254
	blk-mq: add callback of .cleanup_rq
	scsi: implement .cleanup_rq callback
	powerpc/imc: Dont create debugfs files for cpu-less nodes
	fuse: fix missing unlock_page in fuse_writepage()
	parisc: Disable HP HSC-PCI Cards to prevent kernel crash
	KVM: x86: always stop emulation on page fault
	KVM: x86: set ctxt->have_exception in x86_decode_insn()
	KVM: x86: Manually calculate reserved bits when loading PDPTRS
	media: sn9c20x: Add MSI MS-1039 laptop to flip_dmi_table
	media: don't drop front-end reference count for ->detach
	binfmt_elf: Do not move brk for INTERP-less ET_EXEC
	ASoC: Intel: NHLT: Fix debug print format
	ASoC: Intel: Skylake: Use correct function to access iomem space
	ASoC: Intel: Fix use of potentially uninitialized variable
	ARM: samsung: Fix system restart on S3C6410
	ARM: zynq: Use memcpy_toio instead of memcpy on smp bring-up
	Revert "arm64: Remove unnecessary ISBs from set_{pte,pmd,pud}"
	arm64: tlb: Ensure we execute an ISB following walk cache invalidation
	arm64: dts: rockchip: limit clock rate of MMC controllers for RK3328
	alarmtimer: Use EOPNOTSUPP instead of ENOTSUPP
	regulator: Defer init completion for a while after late_initcall
	efifb: BGRT: Improve efifb_bgrt_sanity_check
	gfs2: clear buf_in_tr when ending a transaction in sweep_bh_for_rgrps
	memcg, oom: don't require __GFP_FS when invoking memcg OOM killer
	memcg, kmem: do not fail __GFP_NOFAIL charges
	i40e: check __I40E_VF_DISABLE bit in i40e_sync_filters_subtask
	block: fix null pointer dereference in blk_mq_rq_timed_out()
	smb3: allow disabling requesting leases
	ovl: Fix dereferencing possible ERR_PTR()
	ovl: filter of trusted xattr results in audit
	btrfs: fix allocation of free space cache v1 bitmap pages
	Btrfs: fix use-after-free when using the tree modification log
	btrfs: Relinquish CPUs in btrfs_compare_trees
	btrfs: qgroup: Fix the wrong target io_tree when freeing reserved data space
	btrfs: qgroup: Fix reserved data space leak if we have multiple reserve calls
	Btrfs: fix race setting up and completing qgroup rescan workers
	md/raid6: Set R5_ReadError when there is read failure on parity disk
	md: don't report active array_state until after revalidate_disk() completes.
	md: only call set_in_sync() when it is expected to succeed.
	cfg80211: Purge frame registrations on iftype change
	/dev/mem: Bail out upon SIGKILL.
	ext4: fix warning inside ext4_convert_unwritten_extents_endio
	ext4: fix punch hole for inline_data file systems
	quota: fix wrong condition in is_quota_modification()
	hwrng: core - don't wait on add_early_randomness()
	i2c: riic: Clear NACK in tend isr
	CIFS: fix max ea value size
	CIFS: Fix oplock handling for SMB 2.1+ protocols
	md/raid0: avoid RAID0 data corruption due to layout confusion.
	fuse: fix deadlock with aio poll and fuse_iqueue::waitq.lock
	mm/compaction.c: clear total_{migrate,free}_scanned before scanning a new zone
	drm/amd/display: Restore backlight brightness after system resume
	Linux 4.19.77

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I2c74f09f497a4b45b244a7dd263c5116533dfccb
2019-10-06 11:27:45 +02:00
Yafang Shao
4d8bdf7f3a mm/compaction.c: clear total_{migrate,free}_scanned before scanning a new zone
[ Upstream commit a94b525241c0fff3598809131d7cfcfe1d572d8c ]

total_{migrate,free}_scanned will be added to COMPACTMIGRATE_SCANNED and
COMPACTFREE_SCANNED in compact_zone().  We should clear them before
scanning a new zone.  In the proc triggered compaction, we forgot clearing
them.

[laoar.shao@gmail.com: introduce a helper compact_zone_counters_init()]
  Link: http://lkml.kernel.org/r/1563869295-25748-1-git-send-email-laoar.shao@gmail.com
[akpm@linux-foundation.org: expand compact_zone_counters_init() into its single callsite, per mhocko]
[vbabka@suse.cz: squash compact_zone() list_head init as well]
  Link: http://lkml.kernel.org/r/1fb6f7da-f776-9e42-22f8-bbb79b030b98@suse.cz
[akpm@linux-foundation.org: kcompactd_do_work(): avoid unnecessary initialization of cc.zone]
Link: http://lkml.kernel.org/r/1563789275-9639-1-git-send-email-laoar.shao@gmail.com
Fixes: 7f354a548d ("mm, compaction: add vmstats for kcompactd work")
Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Cc: David Rientjes <rientjes@google.com>
Cc: Yafang Shao <shaoyafang@didiglobal.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Michal Hocko <mhocko@suse.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-10-05 13:10:13 +02:00
Mel Gorman
4375a566f0 mm: compaction: avoid 100% CPU usage during compaction when a task is killed
"howaboutsynergy" reported via kernel buzilla number 204165 that
compact_zone_order was consuming 100% CPU during a stress test for
prolonged periods of time.  Specifically the following command, which
should exit in 10 seconds, was taking an excessive time to finish while
the CPU was pegged at 100%.

  stress -m 220 --vm-bytes 1000000000 --timeout 10

Tracing indicated a pattern as follows

          stress-3923  [007]   519.106208: mm_compaction_isolate_migratepages: range=(0x70bb80 ~ 0x70bb80) nr_scanned=0 nr_taken=0
          stress-3923  [007]   519.106212: mm_compaction_isolate_migratepages: range=(0x70bb80 ~ 0x70bb80) nr_scanned=0 nr_taken=0
          stress-3923  [007]   519.106216: mm_compaction_isolate_migratepages: range=(0x70bb80 ~ 0x70bb80) nr_scanned=0 nr_taken=0
          stress-3923  [007]   519.106219: mm_compaction_isolate_migratepages: range=(0x70bb80 ~ 0x70bb80) nr_scanned=0 nr_taken=0
          stress-3923  [007]   519.106223: mm_compaction_isolate_migratepages: range=(0x70bb80 ~ 0x70bb80) nr_scanned=0 nr_taken=0
          stress-3923  [007]   519.106227: mm_compaction_isolate_migratepages: range=(0x70bb80 ~ 0x70bb80) nr_scanned=0 nr_taken=0
          stress-3923  [007]   519.106231: mm_compaction_isolate_migratepages: range=(0x70bb80 ~ 0x70bb80) nr_scanned=0 nr_taken=0
          stress-3923  [007]   519.106235: mm_compaction_isolate_migratepages: range=(0x70bb80 ~ 0x70bb80) nr_scanned=0 nr_taken=0
          stress-3923  [007]   519.106238: mm_compaction_isolate_migratepages: range=(0x70bb80 ~ 0x70bb80) nr_scanned=0 nr_taken=0
          stress-3923  [007]   519.106242: mm_compaction_isolate_migratepages: range=(0x70bb80 ~ 0x70bb80) nr_scanned=0 nr_taken=0

Note that compaction is entered in rapid succession while scanning and
isolating nothing.  The problem is that when a task that is compacting
receives a fatal signal, it retries indefinitely instead of exiting
while making no progress as a fatal signal is pending.

It's not easy to trigger this condition although enabling zswap helps on
the basis that the timing is altered.  A very small window has to be hit
for the problem to occur (signal delivered while compacting and
isolating a PFN for migration that is not aligned to SWAP_CLUSTER_MAX).

This was reproduced locally -- 16G single socket system, 8G swap, 30%
zswap configured, vm-bytes 22000000000 using Colin Kings stress-ng
implementation from github running in a loop until the problem hits).
Tracing recorded the problem occurring almost 200K times in a short
window.  With this patch, the problem hit 4 times but the task existed
normally instead of consuming CPU.

This problem has existed for some time but it was made worse by commit
cf66f0700c8f ("mm, compaction: do not consider a need to reschedule as
contention").  Before that commit, if the same condition was hit then
locks would be quickly contended and compaction would exit that way.

Change-Id: I67b546921390d17b393c1f3f2f195db9de499255
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=204165
Link: http://lkml.kernel.org/r/20190718085708.GE24383@techsingularity.net
Fixes: cf66f0700c8f ("mm, compaction: do not consider a need to reschedule as contention")
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Cc: <stable@vger.kernel.org>	[5.1+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Git-Commit: 670105a25608affe01cb0ccdc2a1f4bd2327172b
Git-Repo: git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
[vinmenon@codeaurora.org: trivial conflict fixes]
Signed-off-by: Vinayak Menon <vinmenon@codeaurora.org>
2019-10-03 10:15:01 +05:30
Mel Gorman
b3378ed4cf mm/compaction.c: correct zone boundary handling when isolating pages from a pageblock
syzbot reported the following error from a tree with a head commit of
baf76f0c58ae ("slip: make slhc_free() silently accept an error pointer")

  BUG: unable to handle kernel paging request at ffffea0003348000
  #PF error: [normal kernel read fault]
  PGD 12c3f9067 P4D 12c3f9067 PUD 12c3f8067 PMD 0
  Oops: 0000 [#1] PREEMPT SMP KASAN
  CPU: 1 PID: 28916 Comm: syz-executor.2 Not tainted 5.1.0-rc6+ #89
  Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
  RIP: 0010:constant_test_bit arch/x86/include/asm/bitops.h:314 [inline]
  RIP: 0010:PageCompound include/linux/page-flags.h:186 [inline]
  RIP: 0010:isolate_freepages_block+0x1c0/0xd40 mm/compaction.c:579
  Code: 01 d8 ff 4d 85 ed 0f 84 ef 07 00 00 e8 29 00 d8 ff 4c 89 e0 83 85 38 ff
  ff ff 01 48 c1 e8 03 42 80 3c 38 00 0f 85 31 0a 00 00 <4d> 8b 2c 24 31 ff 49
  c1 ed 10 41 83 e5 01 44 89 ee e8 3a 01 d8 ff
  RSP: 0018:ffff88802b31eab8 EFLAGS: 00010246
  RAX: 1ffffd4000669000 RBX: 00000000000cd200 RCX: ffffc9000a235000
  RDX: 000000000001ca5e RSI: ffffffff81988cc7 RDI: 0000000000000001
  RBP: ffff88802b31ebd8 R08: ffff88805af700c0 R09: 0000000000000000
  R10: 0000000000000000 R11: 0000000000000000 R12: ffffea0003348000
  R13: 0000000000000000 R14: ffff88802b31f030 R15: dffffc0000000000
  FS:  00007f61648dc700(0000) GS:ffff8880ae900000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: ffffea0003348000 CR3: 0000000037c64000 CR4: 00000000001426e0
  Call Trace:
   fast_isolate_around mm/compaction.c:1243 [inline]
   fast_isolate_freepages mm/compaction.c:1418 [inline]
   isolate_freepages mm/compaction.c:1438 [inline]
   compaction_alloc+0x1aee/0x22e0 mm/compaction.c:1550

There is no reproducer and it is difficult to hit -- 1 crash every few
days.  The issue is very similar to the fix in commit 6b0868c820ff
("mm/compaction.c: correct zone boundary handling when resetting pageblock
skip hints").  When isolating free pages around a target pageblock, the
boundary handling is off by one and can stray into the next pageblock.
Triggering the syzbot error requires that the end of pageblock is section
or zone aligned, and that the next section is unpopulated.

A more subtle consequence of the bug is that pageblocks were being
improperly used as migration targets which potentially hurts fragmentation
avoidance in the long-term one page at a time.

A debugging patch revealed that it's definitely possible to stray outside
of a pageblock which is not intended.  While syzbot cannot be used to
verify this patch, it was confirmed that the debugging warning no longer
triggers with this patch applied.  It has also been confirmed that the THP
allocation stress tests are not degraded by this patch.

Change-Id: I66ba899d30ee101dfbb3cd06bca790338a888682
Link: http://lkml.kernel.org/r/20190510182124.GI18914@techsingularity.net
Fixes: e332f741a8dd ("mm, compaction: be selective about what pageblocks to clear skip hints")
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Reported-by: syzbot+d84c80f9fe26a0f7a734@syzkaller.appspotmail.com
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Qian Cai <cai@lca.pw>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: <stable@vger.kernel.org> # v5.1+
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Git-Commit: 60fce36afa9c77c7ccbf980c4f670f3be3651fce
Git-Repo: git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
Signed-off-by: Vinayak Menon <vinmenon@codeaurora.org>
2019-10-03 10:10:21 +05:30
Suzuki K Poulose
211b97e45a mm, compaction: make sure we isolate a valid PFN
When we have holes in a normal memory zone, we could endup having
cached_migrate_pfns which may not necessarily be valid, under heavy memory
pressure with swapping enabled ( via __reset_isolation_suitable(),
triggered by kswapd).

Later if we fail to find a page via fast_isolate_freepages(), we may end
up using the migrate_pfn we started the search with, as valid page.  This
could lead to accessing NULL pointer derefernces like below, due to an
invalid mem_section pointer.

Unable to handle kernel NULL pointer dereference at virtual address 0000000000000008 [47/1825]
 Mem abort info:
   ESR = 0x96000004
   Exception class = DABT (current EL), IL = 32 bits
   SET = 0, FnV = 0
   EA = 0, S1PTW = 0
 Data abort info:
   ISV = 0, ISS = 0x00000004
   CM = 0, WnR = 0
 user pgtable: 4k pages, 48-bit VAs, pgdp = 0000000082f94ae9
 [0000000000000008] pgd=0000000000000000
 Internal error: Oops: 96000004 [#1] SMP
 ...
 CPU: 10 PID: 6080 Comm: qemu-system-aar Not tainted 510-rc1+ #6
 Hardware name: AmpereComputing(R) OSPREY EV-883832-X3-0001/OSPREY, BIOS 4819 09/25/2018
 pstate: 60000005 (nZCv daif -PAN -UAO)
 pc : set_pfnblock_flags_mask+0x58/0xe8
 lr : compaction_alloc+0x300/0x950
 [...]
 Process qemu-system-aar (pid: 6080, stack limit = 0x0000000095070da5)
 Call trace:
  set_pfnblock_flags_mask+0x58/0xe8
  compaction_alloc+0x300/0x950
  migrate_pages+0x1a4/0xbb0
  compact_zone+0x750/0xde8
  compact_zone_order+0xd8/0x118
  try_to_compact_pages+0xb4/0x290
  __alloc_pages_direct_compact+0x84/0x1e0
  __alloc_pages_nodemask+0x5e0/0xe18
  alloc_pages_vma+0x1cc/0x210
  do_huge_pmd_anonymous_page+0x108/0x7c8
  __handle_mm_fault+0xdd4/0x1190
  handle_mm_fault+0x114/0x1c0
  __get_user_pages+0x198/0x3c0
  get_user_pages_unlocked+0xb4/0x1d8
  __gfn_to_pfn_memslot+0x12c/0x3b8
  gfn_to_pfn_prot+0x4c/0x60
  kvm_handle_guest_abort+0x4b0/0xcd8
  handle_exit+0x140/0x1b8
  kvm_arch_vcpu_ioctl_run+0x260/0x768
  kvm_vcpu_ioctl+0x490/0x898
  do_vfs_ioctl+0xc4/0x898
  ksys_ioctl+0x8c/0xa0
  __arm64_sys_ioctl+0x28/0x38
  el0_svc_common+0x74/0x118
  el0_svc_handler+0x38/0x78
  el0_svc+0x8/0xc
 Code: f8607840 f100001f 8b011401 9a801020 (f9400400)
 ---[ end trace af6a35219325a9b6 ]---

The issue was reported on an arm64 server with 128GB with holes in the
zone (e.g, [32GB@4GB, 96GB@544GB]), with a swap device enabled, while
running 100 KVM guest instances.

This patch fixes the issue by ensuring that the page belongs to a valid
PFN when we fallback to using the lower limit of the scan range upon
failure in fast_isolate_freepages().

Change-Id: I813c890772ba42d056e2ea2bb58c7cd2e4c969c7
Link: http://lkml.kernel.org/r/1558711908-15688-1-git-send-email-suzuki.poulose@arm.com
Fixes: 5a811889de10f1eb ("mm, compaction: use free lists to quickly locate a migration target")
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Reported-by: Marc Zyngier <marc.zyngier@arm.com>
Reviewed-by: Mel Gorman <mgorman@techsingularity.net>
Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Qian Cai <cai@lca.pw>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Git-commit: 6b036c707d5b4476b6be9213fc5253c28324589d
Git-repo: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git
Signed-off-by: Sudarshan Rajagopalan <sudaraja@codeaurora.org>
2019-07-01 13:12:54 -07:00
Mel Gorman
6829f2dfa5 mm/compaction.c: correct zone boundary handling when resetting pageblock skip hints
Mikhail Gavrilo reported the following bug being triggered in a Fedora
kernel based on 5.1-rc1 but it is relevant to a vanilla kernel.

 kernel: page dumped because: VM_BUG_ON_PAGE(PagePoisoned(p))
 kernel: ------------[ cut here ]------------
 kernel: kernel BUG at include/linux/mm.h:1021!
 kernel: invalid opcode: 0000 [#1] SMP NOPTI
 kernel: CPU: 6 PID: 116 Comm: kswapd0 Tainted: G         C        5.1.0-0.rc1.git1.3.fc31.x86_64 #1
 kernel: Hardware name: System manufacturer System Product Name/ROG STRIX X470-I GAMING, BIOS 1201 12/07/2018
 kernel: RIP: 0010:__reset_isolation_pfn+0x244/0x2b0
 kernel: Code: fe 06 e8 0f 8e fc ff 44 0f b6 4c 24 04 48 85 c0 0f 85 dc fe ff ff e9 68 fe ff ff 48 c7 c6 58 b7 2e 8c 4c 89 ff e8 0c 75 00 00 <0f> 0b 48 c7 c6 58 b7 2e 8c e8 fe 74 00 00 0f 0b 48 89 fa 41 b8 01
 kernel: RSP: 0018:ffff9e2d03f0fde8 EFLAGS: 00010246
 kernel: RAX: 0000000000000034 RBX: 000000000081f380 RCX: ffff8cffbddd6c20
 kernel: RDX: 0000000000000000 RSI: 0000000000000006 RDI: ffff8cffbddd6c20
 kernel: RBP: 0000000000000001 R08: 0000009898b94613 R09: 0000000000000000
 kernel: R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000100000
 kernel: R13: 0000000000100000 R14: 0000000000000001 R15: ffffca7de07ce000
 kernel: FS:  0000000000000000(0000) GS:ffff8cffbdc00000(0000) knlGS:0000000000000000
 kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 kernel: CR2: 00007fc1670e9000 CR3: 00000007f5276000 CR4: 00000000003406e0
 kernel: Call Trace:
 kernel:  __reset_isolation_suitable+0x62/0x120
 kernel:  reset_isolation_suitable+0x3b/0x40
 kernel:  kswapd+0x147/0x540
 kernel:  ? finish_wait+0x90/0x90
 kernel:  kthread+0x108/0x140
 kernel:  ? balance_pgdat+0x560/0x560
 kernel:  ? kthread_park+0x90/0x90
 kernel:  ret_from_fork+0x27/0x50

He bisected it down to e332f741a8dd ("mm, compaction: be selective about
what pageblocks to clear skip hints").  The problem is that the patch in
question was sloppy with respect to the handling of zone boundaries.  In
some instances, it was possible for PFNs outside of a zone to be examined
and if those were not properly initialised or poisoned then it would
trigger the VM_BUG_ON.  This patch corrects the zone boundary issues when
resetting pageblock skip hints and Mikhail reported that the bug did not
trigger after 30 hours of testing.

Change-Id: Ie7a0dddfa966725e35805a97b59b6d337723227a
Link: http://lkml.kernel.org/r/20190327085424.GL3189@techsingularity.net
Fixes: e332f741a8dd ("mm, compaction: be selective about what pageblocks to clear skip hints")
Reported-by: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com>
Tested-by: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com>
Cc: Daniel Jordan <daniel.m.jordan@oracle.com>
Cc: Qian Cai <cai@lca.pw>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Git-commit: 6b0868c820ff7370d15d6ddfe71b1ce6bbe8a25d
Git-repo: git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
Signed-off-by: Vinayak Menon <vinmenon@codeaurora.org>
2019-05-24 03:55:59 -07:00
Qian Cai
4092087a75 mm/compaction.c: abort search if isolation fails
Running LTP oom01 in a tight loop or memory stress testing put the system
in a low-memory situation could triggers random memory corruption like
page flag corruption below due to in fast_isolate_freepages(), if
isolation fails, next_search_order() does not abort the search immediately
could lead to improper accesses.

UBSAN: Undefined behaviour in ./include/linux/mm.h:1195:50
index 7 is out of range for type 'zone [5]'
Call Trace:
 dump_stack+0x62/0x9a
 ubsan_epilogue+0xd/0x7f
 __ubsan_handle_out_of_bounds+0x14d/0x192
 __isolate_free_page+0x52c/0x600
 compaction_alloc+0x886/0x25f0
 unmap_and_move+0x37/0x1e70
 migrate_pages+0x2ca/0xb20
 compact_zone+0x19cb/0x3620
 kcompactd_do_work+0x2df/0x680
 kcompactd+0x1d8/0x6c0
 kthread+0x32c/0x3f0
 ret_from_fork+0x35/0x40
------------[ cut here ]------------
kernel BUG at mm/page_alloc.c:3124!
invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN PTI
RIP: 0010:__isolate_free_page+0x464/0x600
RSP: 0000:ffff888b9e1af848 EFLAGS: 00010007
RAX: 0000000030000000 RBX: ffff888c39fcf0f8 RCX: 0000000000000000
RDX: 1ffff111873f9e25 RSI: 0000000000000004 RDI: ffffed1173c35ef6
RBP: ffff888b9e1af898 R08: fffffbfff4fc2461 R09: fffffbfff4fc2460
R10: fffffbfff4fc2460 R11: ffffffffa7e12303 R12: 0000000000000008
R13: dffffc0000000000 R14: 0000000000000000 R15: 0000000000000007
FS:  0000000000000000(0000) GS:ffff888ba8e80000(0000)
knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fc7abc00000 CR3: 0000000752416004 CR4: 00000000001606a0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 compaction_alloc+0x886/0x25f0
 unmap_and_move+0x37/0x1e70
 migrate_pages+0x2ca/0xb20
 compact_zone+0x19cb/0x3620
 kcompactd_do_work+0x2df/0x680
 kcompactd+0x1d8/0x6c0
 kthread+0x32c/0x3f0
 ret_from_fork+0x35/0x40

Change-Id: I420de189e2f98d3bcb7b9407c3329eeac0379945
Link: http://lkml.kernel.org/r/20190320192648.52499-1-cai@lca.pw
Fixes: dbe2d4e4f12e ("mm, compaction: round-robin the order while searching the free lists for a target")
Signed-off-by: Qian Cai <cai@lca.pw>
Acked-by: Mel Gorman <mgorman@techsingularity.net>
Cc: Daniel Jordan <daniel.m.jordan@oracle.com>
Cc: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Git-commit: 5b56d996dd50a9d2ca87c25ebb50c07b255b7e04
Git-repo: git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
Signed-off-by: Vinayak Menon <vinmenon@codeaurora.org>
2019-05-24 03:55:54 -07:00
Qian Cai
781f40c173 mm/compaction.c: fix an undefined behaviour
In a low-memory situation, cc->fast_search_fail can keep increasing as it
is unable to find an available page to isolate in
fast_isolate_freepages().  As the result, it could trigger an error below,
so just compare with the maximum bits can be shifted first.

UBSAN: Undefined behaviour in mm/compaction.c:1160:30
shift exponent 64 is too large for 64-bit type 'unsigned long'
CPU: 131 PID: 1308 Comm: kcompactd1 Kdump: loaded Tainted: G
W    L    5.0.0+ #17
Call trace:
 dump_backtrace+0x0/0x450
 show_stack+0x20/0x2c
 dump_stack+0xc8/0x14c
 __ubsan_handle_shift_out_of_bounds+0x7e8/0x8c4
 compaction_alloc+0x2344/0x2484
 unmap_and_move+0xdc/0x1dbc
 migrate_pages+0x274/0x1310
 compact_zone+0x26ec/0x43bc
 kcompactd+0x15b8/0x1a24
 kthread+0x374/0x390
 ret_from_fork+0x10/0x18

Change-Id: I3e4d978adff2be903ee85ae2668a557eadc71ad2
Link: http://lkml.kernel.org/r/20190320203338.53367-1-cai@lca.pw
Fixes: 70b44595eafe ("mm, compaction: use free lists to quickly locate a migration source")
Signed-off-by: Qian Cai <cai@lca.pw>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Git-commit: b6643a56f0e5958ee33affdcfc06eb7d88557a79
Git-repo: git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
Signed-off-by: Vinayak Menon <vinmenon@codeaurora.org>
2019-05-24 03:55:38 -07:00
Mel Gorman
b3ef0156f7 mm, compaction: capture a page under direct compaction
Compaction is inherently race-prone as a suitable page freed during
compaction can be allocated by any parallel task.  This patch uses a
capture_control structure to isolate a page immediately when it is freed
by a direct compactor in the slow path of the page allocator.  The
intent is to avoid redundant scanning.

                                     5.0.0-rc1              5.0.0-rc1
                               selective-v3r17          capture-v3r19
Amean     fault-both-1         0.00 (   0.00%)        0.00 *   0.00%*
Amean     fault-both-3      2582.11 (   0.00%)     2563.68 (   0.71%)
Amean     fault-both-5      4500.26 (   0.00%)     4233.52 (   5.93%)
Amean     fault-both-7      5819.53 (   0.00%)     6333.65 (  -8.83%)
Amean     fault-both-12     9321.18 (   0.00%)     9759.38 (  -4.70%)
Amean     fault-both-18     9782.76 (   0.00%)    10338.76 (  -5.68%)
Amean     fault-both-24    15272.81 (   0.00%)    13379.55 *  12.40%*
Amean     fault-both-30    15121.34 (   0.00%)    16158.25 (  -6.86%)
Amean     fault-both-32    18466.67 (   0.00%)    18971.21 (  -2.73%)

Latency is only moderately affected but the devil is in the details.  A
closer examination indicates that base page fault latency is reduced but
latency of huge pages is increased as it takes creater care to succeed.
Part of the "problem" is that allocation success rates are close to 100%
even when under pressure and compaction gets harder

                                5.0.0-rc1              5.0.0-rc1
                          selective-v3r17          capture-v3r19
Percentage huge-3        96.70 (   0.00%)       98.23 (   1.58%)
Percentage huge-5        96.99 (   0.00%)       95.30 (  -1.75%)
Percentage huge-7        94.19 (   0.00%)       97.24 (   3.24%)
Percentage huge-12       94.95 (   0.00%)       97.35 (   2.53%)
Percentage huge-18       96.74 (   0.00%)       97.30 (   0.58%)
Percentage huge-24       97.07 (   0.00%)       97.55 (   0.50%)
Percentage huge-30       95.69 (   0.00%)       98.50 (   2.95%)
Percentage huge-32       96.70 (   0.00%)       99.27 (   2.65%)

And scan rates are reduced as expected by 6% for the migration scanner
and 29% for the free scanner indicating that there is less redundant
work.

Compaction migrate scanned    20815362    19573286
Compaction free scanned       16352612    11510663

[mgorman@techsingularity.net: remove redundant check]
  Link: http://lkml.kernel.org/r/20190201143853.GH9565@techsingularity.net
Link: http://lkml.kernel.org/r/20190118175136.31341-23-mgorman@techsingularity.net
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Cc: David Rientjes <rientjes@google.com>
Cc: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Change-Id: I4ee7abef8cfd6fa3f227e14eeb518f84027b224e
Git-commit: 5e1f0f098b4649fad53011246bcaeff011ffdf5d
Git-repo: git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
Signed-off-by: Vinayak Menon <vinmenon@codeaurora.org>
2019-05-24 03:54:56 -07:00
Mel Gorman
b065d58e1f mm, compaction: be selective about what pageblocks to clear skip hints
Pageblock hints are cleared when compaction restarts or kswapd makes
enough progress that it can sleep but it's over-eager in that the bit is
cleared for migration sources with no LRU pages and migration targets
with no free pages.  As pageblock skip hint flushes are relatively rare
and out-of-band with respect to kswapd, this patch makes a few more
expensive checks to see if it's appropriate to even clear the bit.
Every pageblock that is not cleared will avoid 512 pages being scanned
unnecessarily on x86-64.

The impact is variable with different workloads showing small
differences in latency, success rates and scan rates.  This is expected
as clearing the hints is not that common but doing a small amount of
work out-of-band to avoid a large amount of work in-band later is
generally a good thing.

Link: http://lkml.kernel.org/r/20190118175136.31341-22-mgorman@techsingularity.net
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Qian Cai <cai@lca.pw>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Cc: David Rientjes <rientjes@google.com>
Cc: YueHaibing <yuehaibing@huawei.com>
[cai@lca.pw: no stuck in __reset_isolation_pfn()]
  Link: http://lkml.kernel.org/r/20190206034732.75687-1-cai@lca.pw
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Change-Id: I27b3d1bf8100d0281ec297ef5ce79d100d0cb37e
Git-commit: e332f741a8dd1ec9a6dc8aa997296ecbfe64323e
Git-repo: git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
Signed-off-by: Vinayak Menon <vinmenon@codeaurora.org>
2019-05-24 03:54:48 -07:00
Mel Gorman
c04b3699a7 mm, compaction: sample pageblocks for free pages
Once fast searching finishes, there is a possibility that the linear
scanner is scanning full blocks found by the fast scanner earlier.  This
patch uses an adaptive stride to sample pageblocks for free pages.  The
more consecutive full pageblocks encountered, the larger the stride
until a pageblock with free pages is found.  The scanners might meet
slightly sooner but it is an acceptable risk given that the search of
the free lists may still encounter the pages and adjust the cached PFN
of the free scanner accordingly.

                                     5.0.0-rc1              5.0.0-rc1
                              roundrobin-v3r17       samplefree-v3r17
Amean     fault-both-1         0.00 (   0.00%)        0.00 *   0.00%*
Amean     fault-both-3      2752.37 (   0.00%)     2729.95 (   0.81%)
Amean     fault-both-5      4341.69 (   0.00%)     4397.80 (  -1.29%)
Amean     fault-both-7      6308.75 (   0.00%)     6097.61 (   3.35%)
Amean     fault-both-12    10241.81 (   0.00%)     9407.15 (   8.15%)
Amean     fault-both-18    13736.09 (   0.00%)    10857.63 *  20.96%*
Amean     fault-both-24    16853.95 (   0.00%)    13323.24 *  20.95%*
Amean     fault-both-30    15862.61 (   0.00%)    17345.44 (  -9.35%)
Amean     fault-both-32    18450.85 (   0.00%)    16892.00 (   8.45%)

The latency is mildly improved offseting some overhead from earlier
patches that are prerequisites for the rest of the series.  However, a
major impact is on the free scan rate with an 82% reduction.

                                5.0.0-rc1      5.0.0-rc1
                         roundrobin-v3r17 samplefree-v3r17
Compaction migrate scanned    21607271            20116887
Compaction free scanned       95336406            16668703

It's also the first time in the series where the number of pages scanned
by the migration scanner is greater than the free scanner due to the
increased search efficiency.

Change-Id: I402919a00925ad98bbc386f0711ac759c7547576
Link: http://lkml.kernel.org/r/20190118175136.31341-21-mgorman@techsingularity.net
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Cc: David Rientjes <rientjes@google.com>
Cc: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Git-commit: 4fca9730c51d51f643f2a3f8f10ebd718349c80f
Git-repo: git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
Signed-off-by: Vinayak Menon <vinmenon@codeaurora.org>
2019-05-24 03:54:43 -07:00
Mel Gorman
f50303ed0f mm, compaction: round-robin the order while searching the free lists for a target
As compaction proceeds and creates high-order blocks, the free list
search gets less efficient as the larger blocks are used as compaction
targets.  Eventually, the larger blocks will be behind the migration
scanner for partially migrated pageblocks and the search fails.  This
patch round-robins what orders are searched so that larger blocks can be
ignored and find smaller blocks that can be used as migration targets.

The overall impact was small on 1-socket but it avoids corner cases
where the migration/free scanners meet prematurely or situations where
many of the pageblocks encountered by the free scanner are almost full
instead of being properly packed.  Previous testing had indicated that
without this patch there were occasional large spikes in the free
scanner without this patch.

[dan.carpenter@oracle.com: fix static checker warning]
Link: http://lkml.kernel.org/r/20190118175136.31341-20-mgorman@techsingularity.net
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Cc: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Change-Id: I97df21cf1b0ed2966426235abbf3d35ef49c6195
Git-commit: dbe2d4e4f12e07c6a2215e3603a5f77056323081
Git-repo: git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
Signed-off-by: Vinayak Menon <vinmenon@codeaurora.org>
2019-05-24 03:54:35 -07:00
Mel Gorman
3049a660e2 mm, compaction: reduce premature advancement of the migration target scanner
The fast isolation of free pages allows the cached PFN of the free
scanner to advance faster than necessary depending on the contents of
the free list.  The key is that fast_isolate_freepages() can update
zone->compact_cached_free_pfn via isolate_freepages_block().  When the
fast search fails, the linear scan can start from a point that has
skipped valid migration targets, particularly pageblocks with just
low-order free pages.  This can cause the migration source/target
scanners to meet prematurely causing a reset.

This patch starts by avoiding an update of the pageblock skip
information and cached PFN from isolate_freepages_block() and puts the
responsibility of updating that information in the callers.  The fast
scanner will update the cached PFN if and only if it finds a block that
is higher than the existing cached PFN and sets the skip if the
pageblock is full or nearly full.  The linear scanner will update
skipped information and the cached PFN only when a block is completely
scanned.  The total impact is that the free scanner advances more slowly
as it is primarily driven by the linear scanner instead of the fast
search.

                                     5.0.0-rc1              5.0.0-rc1
                               noresched-v3r17         slowfree-v3r17
Amean     fault-both-3      2965.68 (   0.00%)     3036.75 (  -2.40%)
Amean     fault-both-5      3995.90 (   0.00%)     4522.24 * -13.17%*
Amean     fault-both-7      5842.12 (   0.00%)     6365.35 (  -8.96%)
Amean     fault-both-12     9550.87 (   0.00%)    10340.93 (  -8.27%)
Amean     fault-both-18    13304.72 (   0.00%)    14732.46 ( -10.73%)
Amean     fault-both-24    14618.59 (   0.00%)    16288.96 ( -11.43%)
Amean     fault-both-30    16650.96 (   0.00%)    16346.21 (   1.83%)
Amean     fault-both-32    17145.15 (   0.00%)    19317.49 ( -12.67%)

The impact to latency is higher than the last version but it appears to
be due to a slight increase in the free scan rates which is a potential
side-effect of the patch.  However, this is necessary for later patches
that are more careful about how pageblocks are treated as earlier
iterations of those patches hit corner cases where the restarts were
punishing and very visible.

Change-Id: I12f3f18f286fa2564d8150923fa09de42420773f
Link: http://lkml.kernel.org/r/20190118175136.31341-19-mgorman@techsingularity.net
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Cc: David Rientjes <rientjes@google.com>
Cc: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Git-commit: d097a6f63522547dfc7c75c7084a05b6a7f9e838
Git-repo: git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
Signed-off-by: Vinayak Menon <vinmenon@codeaurora.org>
2019-05-24 03:54:28 -07:00
Mel Gorman
7c50c7815a mm, compaction: do not consider a need to reschedule as contention
Scanning on large machines can take a considerable length of time and
eventually need to be rescheduled.  This is treated as an abort event
but that's not appropriate as the attempt is likely to be retried after
making numerous checks and taking another cycle through the page
allocator.  This patch will check the need to reschedule if necessary
but continue the scanning.

The main benefit is reduced scanning when compaction is taking a long
time or the machine is over-saturated.  It also avoids an unnecessary
exit of compaction that ends up being retried by the page allocator in
the outer loop.

                                     5.0.0-rc1              5.0.0-rc1
                              synccached-v3r16        noresched-v3r17
Amean     fault-both-1         0.00 (   0.00%)        0.00 *   0.00%*
Amean     fault-both-3      2958.27 (   0.00%)     2965.68 (  -0.25%)
Amean     fault-both-5      4091.90 (   0.00%)     3995.90 (   2.35%)
Amean     fault-both-7      5803.05 (   0.00%)     5842.12 (  -0.67%)
Amean     fault-both-12     9481.06 (   0.00%)     9550.87 (  -0.74%)
Amean     fault-both-18    14141.51 (   0.00%)    13304.72 (   5.92%)
Amean     fault-both-24    16438.00 (   0.00%)    14618.59 (  11.07%)
Amean     fault-both-30    17531.72 (   0.00%)    16650.96 (   5.02%)
Amean     fault-both-32    17101.96 (   0.00%)    17145.15 (  -0.25%)

Change-Id: I002a0fb5cfa1898e4a1ea5aa7c2111a4c9ced1a3
Link: http://lkml.kernel.org/r/20190118175136.31341-18-mgorman@techsingularity.net
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Cc: David Rientjes <rientjes@google.com>
Cc: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Git-commit: cf66f0700c8f1d7c7c1c1d7e5e846a1836814601
Git-repo: git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
Signed-off-by: Vinayak Menon <vinmenon@codeaurora.org>
2019-05-24 03:54:21 -07:00
Mel Gorman
ef120f1793 mm, compaction: rework compact_should_abort as compact_check_resched
With incremental changes, compact_should_abort no longer makes any
documented sense.  Rename to compact_check_resched and update the
associated comments.  There is no benefit other than reducing redundant
code and making the intent slightly clearer.  It could potentially be
merged with earlier patches but it just makes the review slightly
harder.

Change-Id: I1cc005bcde88921503966a2428c72564c897a9dd
Link: http://lkml.kernel.org/r/20190118175136.31341-17-mgorman@techsingularity.net
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Cc: David Rientjes <rientjes@google.com>
Cc: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Git-commit: cb810ad294d3c3a454e51b12fbb483bbb7096b98
Git-repo: git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
Signed-off-by: Vinayak Menon <vinmenon@codeaurora.org>
2019-05-24 03:54:16 -07:00
Mel Gorman
acda146d71 mm, compaction: keep cached migration PFNs synced for unusable pageblocks
Migrate has separate cached PFNs for ASYNC and SYNC* migration on the
basis that some migrations will fail in ASYNC mode.  However, if the
cached PFNs match at the start of scanning and pageblocks are skipped
due to having no isolation candidates, then the sync state does not
matter.  This patch keeps matching cached PFNs in sync until a pageblock
with isolation candidates is found.

The actual benefit is marginal given that the sync scanner following the
async scanner will often skip a number of pageblocks but it's useless
work.  Any benefit depends heavily on whether the scanners restarted
recently.

Change-Id: Ib2e156848785f6783276add839b3f2639f1c9de0
Link: http://lkml.kernel.org/r/20190118175136.31341-16-mgorman@techsingularity.net
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Cc: David Rientjes <rientjes@google.com>
Cc: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Git-commit: 8854c55f54bcc104e3adae42abe16948286ec75c
Git-repo: git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
Signed-off-by: Vinayak Menon <vinmenon@codeaurora.org>
2019-05-24 03:54:12 -07:00
Mel Gorman
27162b405a mm, compaction: check early for huge pages encountered by the migration scanner
When scanning for sources or targets, PageCompound is checked for huge
pages as they can be skipped quickly but it happens relatively late
after a lot of setup and checking.  This patch short-cuts the check to
make it earlier.  It might still change when the lock is acquired but
this has less overhead overall.  The free scanner advances but the
migration scanner does not.  Typically the free scanner encounters more
movable blocks that change state over the lifetime of the system and
also tends to scan more aggressively as it's actively filling its
portion of the physical address space with data.  This could change in
the future but for the moment, this worked better in practice and
incurred fewer scan restarts.

The impact on latency and allocation success rates is marginal but the
free scan rates are reduced by 15% and system CPU usage is reduced by
3.3%.  The 2-socket results are not materially different.

Change-Id: Id53eb162686c09caff271fac4c2b0accb5ab3404
Link: http://lkml.kernel.org/r/20190118175136.31341-15-mgorman@techsingularity.net
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Cc: David Rientjes <rientjes@google.com>
Cc: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Git-commit: 9bebefd59084af7c75b66eeee241bf0777f39b88
Git-repo: git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
Signed-off-by: Vinayak Menon <vinmenon@codeaurora.org>
2019-05-24 03:54:06 -07:00
Mel Gorman
b6daf86e23 mm, compaction: finish pageblock scanning on contention
Async migration aborts on spinlock contention but contention can be high
when there are multiple compaction attempts and kswapd is active.  The
consequence is that the migration scanners move forward uselessly while
still contending on locks for longer while leaving suitable migration
sources behind.

This patch will acquire the lock but track when contention occurs.  When
it does, the current pageblock will finish as compaction may succeed for
that block and then abort.  This will have a variable impact on latency
as in some cases useless scanning is avoided (reduces latency) but a
lock will be contended (increase latency) or a single contended
pageblock is scanned that would otherwise have been skipped (increase
latency).

                                     5.0.0-rc1              5.0.0-rc1
                                norescan-v3r16    finishcontend-v3r16
Amean     fault-both-1         0.00 (   0.00%)        0.00 *   0.00%*
Amean     fault-both-3      3002.07 (   0.00%)     3153.17 (  -5.03%)
Amean     fault-both-5      4684.47 (   0.00%)     4280.52 (   8.62%)
Amean     fault-both-7      6815.54 (   0.00%)     5811.50 *  14.73%*
Amean     fault-both-12    10864.02 (   0.00%)     9276.85 (  14.61%)
Amean     fault-both-18    12247.52 (   0.00%)    11032.67 (   9.92%)
Amean     fault-both-24    15683.99 (   0.00%)    14285.70 (   8.92%)
Amean     fault-both-30    18620.02 (   0.00%)    16293.76 *  12.49%*
Amean     fault-both-32    19250.28 (   0.00%)    16721.02 *  13.14%*

                                5.0.0-rc1              5.0.0-rc1
                           norescan-v3r16    finishcontend-v3r16
Percentage huge-1         0.00 (   0.00%)        0.00 (   0.00%)
Percentage huge-3        95.00 (   0.00%)       96.82 (   1.92%)
Percentage huge-5        94.22 (   0.00%)       95.40 (   1.26%)
Percentage huge-7        92.35 (   0.00%)       95.92 (   3.86%)
Percentage huge-12       91.90 (   0.00%)       96.73 (   5.25%)
Percentage huge-18       89.58 (   0.00%)       96.77 (   8.03%)
Percentage huge-24       90.03 (   0.00%)       96.05 (   6.69%)
Percentage huge-30       89.14 (   0.00%)       96.81 (   8.60%)
Percentage huge-32       90.58 (   0.00%)       97.41 (   7.54%)

There is a variable impact that is mostly good on latency while allocation
success rates are slightly higher.  System CPU usage is reduced by about
10% but scan rate impact is mixed

Compaction migrate scanned    27997659.00    20148867
Compaction free scanned      120782791.00   118324914

Migration scan rates are reduced 28% which is expected as a pageblock is
used by the async scanner instead of skipped.  The impact on the free
scanner is known to be variable.  Overall the primary justification for
this patch is that completing scanning of a pageblock is very important
for later patches.

[yuehaibing@huawei.com: fix unused variable warning]
Link: http://lkml.kernel.org/r/20190118175136.31341-14-mgorman@techsingularity.net
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: YueHaibing <yuehaibing@huawei.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Change-Id: Id1fe132cb48c49bfe47c3b17a3e07fe50462a5fa
Git-commit: cb2dcaf023c2cf12d45289c82d4030d33f7df73e
Git-repo: git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
Signed-off-by: Vinayak Menon <vinmenon@codeaurora.org>
2019-05-24 03:54:00 -07:00
Mel Gorman
6ab5cc9614 mm, compaction: avoid rescanning the same pageblock multiple times
Pageblocks are marked for skip when no pages are isolated after a scan.
However, it's possible to hit corner cases where the migration scanner
gets stuck near the boundary between the source and target scanner.  Due
to pages being migrated in blocks of COMPACT_CLUSTER_MAX, pages that are
migrated can be reallocated before the pageblock is complete.  The
pageblock is not necessarily skipped so it can be rescanned multiple
times.  Similarly, a pageblock with some dirty/writeback pages may fail
to migrate and be rescanned until writeback completes which is wasteful.

This patch tracks if a pageblock is being rescanned.  If so, then the
entire pageblock will be migrated as one operation.  This narrows the
race window during which pages can be reallocated during migration.
Secondly, if there are pages that cannot be isolated then the pageblock
will still be fully scanned and marked for skipping.  On the second
rescan, the pageblock skip is set and the migration scanner makes
progress.

                                     5.0.0-rc1              5.0.0-rc1
                                findfree-v3r16         norescan-v3r16
Amean     fault-both-1         0.00 (   0.00%)        0.00 *   0.00%*
Amean     fault-both-3      3200.68 (   0.00%)     3002.07 (   6.21%)
Amean     fault-both-5      4847.75 (   0.00%)     4684.47 (   3.37%)
Amean     fault-both-7      6658.92 (   0.00%)     6815.54 (  -2.35%)
Amean     fault-both-12    11077.62 (   0.00%)    10864.02 (   1.93%)
Amean     fault-both-18    12403.97 (   0.00%)    12247.52 (   1.26%)
Amean     fault-both-24    15607.10 (   0.00%)    15683.99 (  -0.49%)
Amean     fault-both-30    18752.27 (   0.00%)    18620.02 (   0.71%)
Amean     fault-both-32    21207.54 (   0.00%)    19250.28 *   9.23%*

                                5.0.0-rc1              5.0.0-rc1
                           findfree-v3r16         norescan-v3r16
Percentage huge-3        96.86 (   0.00%)       95.00 (  -1.91%)
Percentage huge-5        93.72 (   0.00%)       94.22 (   0.53%)
Percentage huge-7        94.31 (   0.00%)       92.35 (  -2.08%)
Percentage huge-12       92.66 (   0.00%)       91.90 (  -0.82%)
Percentage huge-18       91.51 (   0.00%)       89.58 (  -2.11%)
Percentage huge-24       90.50 (   0.00%)       90.03 (  -0.52%)
Percentage huge-30       91.57 (   0.00%)       89.14 (  -2.65%)
Percentage huge-32       91.00 (   0.00%)       90.58 (  -0.46%)

Negligible difference but this was likely a case when the specific
corner case was not hit.  A previous run of the same patch based on an
earlier iteration of the series showed large differences where migration
rates could be halved when the corner case was hit.

The specific corner case where migration scan rates go through the roof
was due to a dirty/writeback pageblock located at the boundary of the
migration/free scanner did not happen in this case.  When it does
happen, the scan rates multipled by massive margins.

Change-Id: I2c520b3b13d96a179b28b1138cbd1c8eb86b083e
Link: http://lkml.kernel.org/r/20190118175136.31341-13-mgorman@techsingularity.net
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Cc: David Rientjes <rientjes@google.com>
Cc: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Git-commit: 804d3121ba5f03af0ab225e2f688ee3ee669c0d2
Git-repo: git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
Signed-off-by: Vinayak Menon <vinmenon@codeaurora.org>
2019-05-24 03:32:58 -07:00
Mel Gorman
aa32e50664 mm, compaction: use free lists to quickly locate a migration target
Similar to the migration scanner, this patch uses the free lists to
quickly locate a migration target.  The search is different in that
lower orders will be searched for a suitable high PFN if necessary but
the search is still bound.  This is justified on the grounds that the
free scanner typically scans linearly much more than the migration
scanner.

If a free page is found, it is isolated and compaction continues if
enough pages were isolated.  For SYNC* scanning, the full pageblock is
scanned for any remaining free pages so that is can be marked for
skipping in the near future.

1-socket thpfioscale
                                     5.0.0-rc1              5.0.0-rc1
                                 isolmig-v3r15         findfree-v3r16
Amean     fault-both-3      3024.41 (   0.00%)     3200.68 (  -5.83%)
Amean     fault-both-5      4749.30 (   0.00%)     4847.75 (  -2.07%)
Amean     fault-both-7      6454.95 (   0.00%)     6658.92 (  -3.16%)
Amean     fault-both-12    10324.83 (   0.00%)    11077.62 (  -7.29%)
Amean     fault-both-18    12896.82 (   0.00%)    12403.97 (   3.82%)
Amean     fault-both-24    13470.60 (   0.00%)    15607.10 * -15.86%*
Amean     fault-both-30    17143.99 (   0.00%)    18752.27 (  -9.38%)
Amean     fault-both-32    17743.91 (   0.00%)    21207.54 * -19.52%*

The impact on latency is variable but the search is optimistic and
sensitive to the exact system state.  Success rates are similar but the
major impact is to the rate of scanning

                                5.0.0-rc1      5.0.0-rc1
                            isolmig-v3r15 findfree-v3r16
Compaction migrate scanned    25646769          29507205
Compaction free scanned      201558184         100359571

The free scan rates are reduced by 50%.  The 2-socket reductions for the
free scanner are more dramatic which is a likely reflection that the
machine has more memory.

[dan.carpenter@oracle.com: fix static checker warning]
[vbabka@suse.cz: correct number of pages scanned for lower orders]
Link: http://lkml.kernel.org/r/20190118175136.31341-12-mgorman@techsingularity.net
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Cc: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Change-Id: If5a453f546138631a88acbf89a70f238623df491
Git-commit: 5a811889de10f1ebb8e03a2744be006e909c405c
Git-repo: git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
Signed-off-by: Vinayak Menon <vinmenon@codeaurora.org>
2019-05-24 03:30:53 -07:00
Mel Gorman
15d9664b00 mm, compaction: keep migration source private to a single compaction instance
Due to either a fast search of the free list or a linear scan, it is
possible for multiple compaction instances to pick the same pageblock
for migration.  This is lucky for one scanner and increased scanning for
all the others.  It also allows a race between requests on which first
allocates the resulting free block.

This patch tests and updates the pageblock skip for the migration
scanner carefully.  When isolating a block, it will check and skip if
the block is already in use.  Once the zone lock is acquired, it will be
rechecked so that only one scanner can set the pageblock skip for
exclusive use.  Any scanner contending will continue with a linear scan.
The skip bit is still set if no pages can be isolated in a range.  While
this may result in redundant scanning, it avoids unnecessarily acquiring
the zone lock when there are no suitable migration sources.

1-socket thpscale
Amean     fault-both-1         0.00 (   0.00%)        0.00 *   0.00%*
Amean     fault-both-3      3390.40 (   0.00%)     3024.41 (  10.80%)
Amean     fault-both-5      5082.28 (   0.00%)     4749.30 (   6.55%)
Amean     fault-both-7      7012.51 (   0.00%)     6454.95 (   7.95%)
Amean     fault-both-12    11346.63 (   0.00%)    10324.83 (   9.01%)
Amean     fault-both-18    15324.19 (   0.00%)    12896.82 *  15.84%*
Amean     fault-both-24    16088.50 (   0.00%)    13470.60 *  16.27%*
Amean     fault-both-30    18723.42 (   0.00%)    17143.99 (   8.44%)
Amean     fault-both-32    18612.01 (   0.00%)    17743.91 (   4.66%)

                                5.0.0-rc1              5.0.0-rc1
                            findmig-v3r15          isolmig-v3r15
Percentage huge-3        89.83 (   0.00%)       92.96 (   3.48%)
Percentage huge-5        91.96 (   0.00%)       93.26 (   1.41%)
Percentage huge-7        92.85 (   0.00%)       93.63 (   0.84%)
Percentage huge-12       92.74 (   0.00%)       92.80 (   0.07%)
Percentage huge-18       91.71 (   0.00%)       91.62 (  -0.10%)
Percentage huge-24       92.13 (   0.00%)       91.50 (  -0.69%)
Percentage huge-30       93.79 (   0.00%)       92.73 (  -1.13%)
Percentage huge-32       91.27 (   0.00%)       91.94 (   0.74%)

This shows a reasonable reduction in latency as multiple compaction
scanners do not operate on the same blocks with a similar allocation
success rate.

Compaction migrate scanned    41093126    25646769

Migration scan rates are reduced by 38%.

Link: http://lkml.kernel.org/r/20190118175136.31341-11-mgorman@techsingularity.net
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Cc: David Rientjes <rientjes@google.com>
Cc: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Change-Id: I059fe2e0a3dfa5a5329f41aadee7a548ca749d37
Git-commit: e380bebe4771548df9bece8b7ad9dab07d9158a6
Git-repo: git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
[vinmenon@codeaurora.org: trivial merge conflict fixes]
Signed-off-by: Vinayak Menon <vinmenon@codeaurora.org>
2019-05-24 03:30:49 -07:00
Mel Gorman
78e48f6a7f mm, compaction: use free lists to quickly locate a migration source
The migration scanner is a linear scan of a zone with a potentiall large
search space.  Furthermore, many pageblocks are unusable such as those
filled with reserved pages or partially filled with pages that cannot
migrate.  These still get scanned in the common case of allocating a THP
and the cost accumulates.

The patch uses a partial search of the free lists to locate a migration
source candidate that is marked as MOVABLE when allocating a THP.  It
prefers picking a block with a larger number of free pages already on
the basis that there are fewer pages to migrate to free the entire
block.  The lowest PFN found during searches is tracked as the basis of
the start for the linear search after the first search of the free list
fails.  After the search, the free list is shuffled so that the next
search will not encounter the same page.  If the search fails then the
subsequent searches will be shorter and the linear scanner is used.

If this search fails, or if the request is for a small or
unmovable/reclaimable allocation then the linear scanner is still used.
It is somewhat pointless to use the list search in those cases.  Small
free pages must be used for the search and there is no guarantee that
movable pages are located within that block that are contiguous.

                                     5.0.0-rc1              5.0.0-rc1
                                 noboost-v3r10          findmig-v3r15
Amean     fault-both-3      3771.41 (   0.00%)     3390.40 (  10.10%)
Amean     fault-both-5      5409.05 (   0.00%)     5082.28 (   6.04%)
Amean     fault-both-7      7040.74 (   0.00%)     7012.51 (   0.40%)
Amean     fault-both-12    11887.35 (   0.00%)    11346.63 (   4.55%)
Amean     fault-both-18    16718.19 (   0.00%)    15324.19 (   8.34%)
Amean     fault-both-24    21157.19 (   0.00%)    16088.50 *  23.96%*
Amean     fault-both-30    21175.92 (   0.00%)    18723.42 *  11.58%*
Amean     fault-both-32    21339.03 (   0.00%)    18612.01 *  12.78%*

                                5.0.0-rc1              5.0.0-rc1
                            noboost-v3r10          findmig-v3r15
Percentage huge-3        86.50 (   0.00%)       89.83 (   3.85%)
Percentage huge-5        92.52 (   0.00%)       91.96 (  -0.61%)
Percentage huge-7        92.44 (   0.00%)       92.85 (   0.44%)
Percentage huge-12       92.98 (   0.00%)       92.74 (  -0.25%)
Percentage huge-18       91.70 (   0.00%)       91.71 (   0.02%)
Percentage huge-24       91.59 (   0.00%)       92.13 (   0.60%)
Percentage huge-30       90.14 (   0.00%)       93.79 (   4.04%)
Percentage huge-32       90.03 (   0.00%)       91.27 (   1.37%)

This shows an improvement in allocation latencies with similar
allocation success rates.  While not presented, there was a 31%
reduction in migration scanning and a 8% reduction on system CPU usage.
A 2-socket machine showed similar benefits.

[mgorman@techsingularity.net: several fixes]
  Link: http://lkml.kernel.org/r/20190204120111.GL9565@techsingularity.net
[vbabka@suse.cz: migrate block that was found-fast, some optimisations]
Link: http://lkml.kernel.org/r/20190118175136.31341-10-mgorman@techsingularity.net
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Vlastimil Babka <Vbabka@suse.cz>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Cc: David Rientjes <rientjes@google.com>
Cc: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Change-Id: Ic16515e4ca883de62d0acdfb24b15b72d695c861
Git-commit: 70b44595eafe9c7c235f076d653a268ca1ab9fdb
Git-repo: git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
[vinmenon@codeaurora.org: trivial merge conflict fixes]
Signed-off-by: Vinayak Menon <vinmenon@codeaurora.org>
2019-05-24 03:30:43 -07:00
Mel Gorman
ed33da8dd5 mm, compaction: always finish scanning of a full pageblock
When compaction is finishing, it uses a flag to ensure the pageblock is
complete but it makes sense to always complete migration of a pageblock.
Minimally, skip information is based on a pageblock and partially
scanned pageblocks may incur more scanning in the future.  The pageblock
skip handling also becomes more strict later in the series and the hint
is more useful if a complete pageblock was always scanned.

The potentially impacts latency as more scanning is done but it's not a
consistent win or loss as the scanning is not always a high percentage
of the pageblock and sometimes it is offset by future reductions in
scanning.  Hence, the results are not presented this time due to a
misleading mix of gains/losses without any clear pattern.  However, full
scanning of the pageblock is important for later patches.

Change-Id: I459a28f61d538903d0b672adbb30a87597693487
Link: http://lkml.kernel.org/r/20190118175136.31341-8-mgorman@techsingularity.net
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Cc: David Rientjes <rientjes@google.com>
Cc: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Git-commit: efe771c7603bc524425070d651e70e9c56c57f28
Git-repo: git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
Signed-off-by: Vinayak Menon <vinmenon@codeaurora.org>
2019-05-24 03:30:30 -07:00
Mel Gorman
427fc8feda mm, compaction: rename map_pages to split_map_pages
It's non-obvious that high-order free pages are split into order-0 pages
from the function name.  Fix it.

Change-Id: I53fc91c5fc093e3fbd7e1c396fc70c061f9c21e0
Link: http://lkml.kernel.org/r/20190118175136.31341-6-mgorman@techsingularity.net
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Cc: David Rientjes <rientjes@google.com>
Cc: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Git-commit: 4469ab98477b290f6728b79f8d225d9d88ce16e3
Git-repo: git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
Signed-off-by: Vinayak Menon <vinmenon@codeaurora.org>
2019-05-24 03:30:15 -07:00
Mel Gorman
09a394d1c1 mm, compaction: remove unnecessary zone parameter in some instances
A zone parameter is passed into a number of top-level compaction
functions despite the fact that it's already in compact_control.  This
is harmless but it did need an audit to check if zone actually ever
changes meaningfully.  This patches removes the parameter in a number of
top-level functions.  The change could be much deeper but this was
enough to briefly clarify the flow.

No functional change.

Change-Id: I6555d4c731319842ebfc489a890de7e287362ff8
Link: http://lkml.kernel.org/r/20190118175136.31341-5-mgorman@techsingularity.net
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Cc: David Rientjes <rientjes@google.com>
Cc: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Git-commit: 40cacbcb324036233a927418441323459d28d19b
Git-repo: git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
Signed-off-by: Vinayak Menon <vinmenon@codeaurora.org>
2019-05-24 03:30:09 -07:00
Mel Gorman
2ff32e84b0 mm, compaction: remove last_migrated_pfn from compact_control
The last_migrated_pfn field is a bit dubious as to whether it really
helps but either way, the information from it can be inferred without
increasing the size of compact_control so remove the field.

Change-Id: I10d6e13ae9989f9799ff9a824fb4fe35c4856027
Link: http://lkml.kernel.org/r/20190118175136.31341-4-mgorman@techsingularity.net
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Cc: David Rientjes <rientjes@google.com>
Cc: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Git-commit: 566e54e113eb2b669f9300db2c2df400cbb06646
Git-repo: git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
Signed-off-by: Vinayak Menon <vinmenon@codeaurora.org>
2019-05-24 03:30:02 -07:00
Mel Gorman
83eb0c67a3 mm: move zone watermark accesses behind an accessor
This is a preparation patch only, no functional change.

Link: http://lkml.kernel.org/r/20181123114528.28802-3-mgorman@techsingularity.net
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Zi Yan <zi.yan@cs.rutgers.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Change-Id: Iccef5f02348ab59d75cbbe2b48f40017391b88b1
Git-commit: a921444382b49cc7fdeca3fba3e278bc09484a27
Git-repo: git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
[vinmenon@codeaurora.org: trivial merge conflict fixes]
Signed-off-by: Vinayak Menon <vinmenon@codeaurora.org>
2019-05-17 14:58:16 +05:30
Johannes Weiner
e550f94252 UPSTREAM: psi: pressure stall information for CPU, memory, and IO
When systems are overcommitted and resources become contended, it's hard
to tell exactly the impact this has on workload productivity, or how close
the system is to lockups and OOM kills.  In particular, when machines work
multiple jobs concurrently, the impact of overcommit in terms of latency
and throughput on the individual job can be enormous.

In order to maximize hardware utilization without sacrificing individual
job health or risk complete machine lockups, this patch implements a way
to quantify resource pressure in the system.

A kernel built with CONFIG_PSI=y creates files in /proc/pressure/ that
expose the percentage of time the system is stalled on CPU, memory, or IO,
respectively.  Stall states are aggregate versions of the per-task delay
accounting delays:

       cpu: some tasks are runnable but not executing on a CPU
       memory: tasks are reclaiming, or waiting for swapin or thrashing cache
       io: tasks are waiting for io completions

These percentages of walltime can be thought of as pressure percentages,
and they give a general sense of system health and productivity loss
incurred by resource overcommit.  They can also indicate when the system
is approaching lockup scenarios and OOMs.

To do this, psi keeps track of the task states associated with each CPU
and samples the time they spend in stall states.  Every 2 seconds, the
samples are averaged across CPUs - weighted by the CPUs' non-idle time to
eliminate artifacts from unused CPUs - and translated into percentages of
walltime.  A running average of those percentages is maintained over 10s,
1m, and 5m periods (similar to the loadaverage).

[hannes@cmpxchg.org: doc fixlet, per Randy]
  Link: http://lkml.kernel.org/r/20180828205625.GA14030@cmpxchg.org
[hannes@cmpxchg.org: code optimization]
  Link: http://lkml.kernel.org/r/20180907175015.GA8479@cmpxchg.org
[hannes@cmpxchg.org: rename psi_clock() to psi_update_work(), per Peter]
  Link: http://lkml.kernel.org/r/20180907145404.GB11088@cmpxchg.org
[hannes@cmpxchg.org: fix build]
  Link: http://lkml.kernel.org/r/20180913014222.GA2370@cmpxchg.org
Link: http://lkml.kernel.org/r/20180828172258.3185-9-hannes@cmpxchg.org
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Daniel Drake <drake@endlessm.com>
Tested-by: Suren Baghdasaryan <surenb@google.com>
Cc: Christopher Lameter <cl@linux.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Johannes Weiner <jweiner@fb.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Enderborg <peter.enderborg@sony.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Vinayak Menon <vinmenon@codeaurora.org>
Cc: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

(cherry picked from commit eb414681d5a07d28d2ff90dc05f69ec6b232ebd2)

Bug: 127712811
Test: lmkd in PSI mode
Change-Id: Id00d23c977169b0c4636d92016fc1fee0274be05
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
2019-03-21 16:25:27 -07:00
Johannes Weiner
25f0c86db1 psi: pressure stall information for CPU, memory, and IO
When systems are overcommitted and resources become contended, it's hard
to tell exactly the impact this has on workload productivity, or how close
the system is to lockups and OOM kills.  In particular, when machines work
multiple jobs concurrently, the impact of overcommit in terms of latency
and throughput on the individual job can be enormous.

In order to maximize hardware utilization without sacrificing individual
job health or risk complete machine lockups, this patch implements a way
to quantify resource pressure in the system.

A kernel built with CONFIG_PSI=y creates files in /proc/pressure/ that
expose the percentage of time the system is stalled on CPU, memory, or IO,
respectively.  Stall states are aggregate versions of the per-task delay
accounting delays:

       cpu: some tasks are runnable but not executing on a CPU
       memory: tasks are reclaiming, or waiting for swapin or thrashing cache
       io: tasks are waiting for io completions

These percentages of walltime can be thought of as pressure percentages,
and they give a general sense of system health and productivity loss
incurred by resource overcommit.  They can also indicate when the system
is approaching lockup scenarios and OOMs.

To do this, psi keeps track of the task states associated with each CPU
and samples the time they spend in stall states.  Every 2 seconds, the
samples are averaged across CPUs - weighted by the CPUs' non-idle time to
eliminate artifacts from unused CPUs - and translated into percentages of
walltime.  A running average of those percentages is maintained over 10s,
1m, and 5m periods (similar to the loadaverage).

[hannes@cmpxchg.org: doc fixlet, per Randy]
  Link: http://lkml.kernel.org/r/20180828205625.GA14030@cmpxchg.org
[hannes@cmpxchg.org: code optimization]
  Link: http://lkml.kernel.org/r/20180907175015.GA8479@cmpxchg.org
[hannes@cmpxchg.org: rename psi_clock() to psi_update_work(), per Peter]
  Link: http://lkml.kernel.org/r/20180907145404.GB11088@cmpxchg.org
[hannes@cmpxchg.org: fix build]
  Link: http://lkml.kernel.org/r/20180913014222.GA2370@cmpxchg.org
Link: http://lkml.kernel.org/r/20180828172258.3185-9-hannes@cmpxchg.org
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Daniel Drake <drake@endlessm.com>
Tested-by: Suren Baghdasaryan <surenb@google.com>
Cc: Christopher Lameter <cl@linux.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Johannes Weiner <jweiner@fb.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Enderborg <peter.enderborg@sony.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Vinayak Menon <vinmenon@codeaurora.org>
Cc: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Change-Id: Ic4b6fddb7013719ffcba5d68abfda87ada90715a
Git-commit: eb414681d5a07d28d2ff90dc05f69ec6b232ebd2
Git-repo: https://source.codeaurora.org/quic/la/kernel/msm-4.19
[pdaly@codeaurora.org: Move PF_MEMSTALL flag]
Signed-off-by: Patrick Daly <pdaly@codeaurora.org>
2019-03-08 19:33:14 -08:00
Joe Perches
0825a6f986 mm: use octal not symbolic permissions
mm/*.c files use symbolic and octal styles for permissions.

Using octal and not symbolic permissions is preferred by many as more
readable.

https://lkml.org/lkml/2016/8/2/1945

Prefer the direct use of octal for permissions.

Done using
$ scripts/checkpatch.pl -f --types=SYMBOLIC_PERMS --fix-inplace mm/*.c
and some typing.

Before:	 $ git grep -P -w "0[0-7]{3,3}" mm | wc -l
44
After:	 $ git grep -P -w "0[0-7]{3,3}" mm | wc -l
86

Miscellanea:

o Whitespace neatening around these conversions.

Link: http://lkml.kernel.org/r/2e032ef111eebcd4c5952bae86763b541d373469.1522102887.git.joe@perches.com
Signed-off-by: Joe Perches <joe@perches.com>
Acked-by: David Rientjes <rientjes@google.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-06-15 07:55:25 +09:00
Joonsoo Kim
d883c6cf3b Revert "mm/cma: manage the memory of the CMA area by using the ZONE_MOVABLE"
This reverts the following commits that change CMA design in MM.

 3d2054ad8c ("ARM: CMA: avoid double mapping to the CMA area if CONFIG_HIGHMEM=y")

 1d47a3ec09 ("mm/cma: remove ALLOC_CMA")

 bad8c6c0b1 ("mm/cma: manage the memory of the CMA area by using the ZONE_MOVABLE")

Ville reported a following error on i386.

  Inode-cache hash table entries: 65536 (order: 6, 262144 bytes)
  microcode: microcode updated early to revision 0x4, date = 2013-06-28
  Initializing CPU#0
  Initializing HighMem for node 0 (000377fe:00118000)
  Initializing Movable for node 0 (00000001:00118000)
  BUG: Bad page state in process swapper  pfn:377fe
  page:f53effc0 count:0 mapcount:-127 mapping:00000000 index:0x0
  flags: 0x80000000()
  raw: 80000000 00000000 00000000 ffffff80 00000000 00000100 00000200 00000001
  page dumped because: nonzero mapcount
  Modules linked in:
  CPU: 0 PID: 0 Comm: swapper Not tainted 4.17.0-rc5-elk+ #145
  Hardware name: Dell Inc. Latitude E5410/03VXMC, BIOS A15 07/11/2013
  Call Trace:
   dump_stack+0x60/0x96
   bad_page+0x9a/0x100
   free_pages_check_bad+0x3f/0x60
   free_pcppages_bulk+0x29d/0x5b0
   free_unref_page_commit+0x84/0xb0
   free_unref_page+0x3e/0x70
   __free_pages+0x1d/0x20
   free_highmem_page+0x19/0x40
   add_highpages_with_active_regions+0xab/0xeb
   set_highmem_pages_init+0x66/0x73
   mem_init+0x1b/0x1d7
   start_kernel+0x17a/0x363
   i386_start_kernel+0x95/0x99
   startup_32_smp+0x164/0x168

The reason for this error is that the span of MOVABLE_ZONE is extended
to whole node span for future CMA initialization, and, normal memory is
wrongly freed here.  I submitted the fix and it seems to work, but,
another problem happened.

It's so late time to fix the later problem so I decide to reverting the
series.

Reported-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Acked-by: Laura Abbott <labbott@redhat.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-05-24 10:07:50 -07:00
Joonsoo Kim
1d47a3ec09 mm/cma: remove ALLOC_CMA
Now, all reserved pages for CMA region are belong to the ZONE_MOVABLE
and it only serves for a request with GFP_HIGHMEM && GFP_MOVABLE.

Therefore, we don't need to maintain ALLOC_CMA at all.

Link: http://lkml.kernel.org/r/1512114786-5085-3-git-send-email-iamjoonsoo.kim@lge.com
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Tested-by: Tony Lindgren <tony@atomide.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Laura Abbott <lauraa@codeaurora.org>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Michal Nazarewicz <mina86@mina86.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-11 10:28:32 -07:00
Michal Hocko
666feb21a0 mm, migrate: remove reason argument from new_page_t
No allocation callback is using this argument anymore.  new_page_node
used to use this parameter to convey node_id resp.  migration error up
to move_pages code (do_move_page_to_node_array).  The error status never
made it into the final status field and we have a better way to
communicate node id to the status field now.  All other allocation
callbacks simply ignored the argument so we can drop it finally.

[mhocko@suse.com: fix migration callback]
  Link: http://lkml.kernel.org/r/20180105085259.GH2801@dhcp22.suse.cz
[akpm@linux-foundation.org: fix alloc_misplaced_dst_page()]
[mhocko@kernel.org: fix build]
  Link: http://lkml.kernel.org/r/20180103091134.GB11319@dhcp22.suse.cz
Link: http://lkml.kernel.org/r/20180103082555.14592-3-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: Zi Yan <zi.yan@cs.rutgers.edu>
Cc: Andrea Reale <ar@linux.vnet.ibm.com>
Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-11 10:28:32 -07:00
Mike Rapoport
e8b098fc57 mm: kernel-doc: add missing parameter descriptions
Link: http://lkml.kernel.org/r/1519585191-10180-4-git-send-email-rppt@linux.vnet.ibm.com
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-05 21:36:27 -07:00
David Rientjes
bc3106b26c mm, compaction: drain pcps for zone when kcompactd fails
It's possible for free pages to become stranded on per-cpu pagesets
(pcps) that, if drained, could be merged with buddy pages on the zone's
free area to form large order pages, including up to MAX_ORDER.

Consider a verbose example using the tools/vm/page-types tool at the
beginning of a ZONE_NORMAL ('B' indicates a buddy page and 'S' indicates
a slab page).  Pages on pcps do not have any page flags set.

  109954  1       _______S________________________________________________________
  109955  2       __________B_____________________________________________________
  109957  1       ________________________________________________________________
  109958  1       __________B_____________________________________________________
  109959  7       ________________________________________________________________
  109960  1       __________B_____________________________________________________
  109961  9       ________________________________________________________________
  10996a  1       __________B_____________________________________________________
  10996b  3       ________________________________________________________________
  10996e  1       __________B_____________________________________________________
  10996f  1       ________________________________________________________________
  ...
  109f8c  1       __________B_____________________________________________________
  109f8d  2       ________________________________________________________________
  109f8f  2       __________B_____________________________________________________
  109f91  f       ________________________________________________________________
  109fa0  1       __________B_____________________________________________________
  109fa1  7       ________________________________________________________________
  109fa8  1       __________B_____________________________________________________
  109fa9  1       ________________________________________________________________
  109faa  1       __________B_____________________________________________________
  109fab  1       _______S________________________________________________________

The compaction migration scanner is attempting to defragment this memory
since it is at the beginning of the zone.  It has done so quite well,
all movable pages have been migrated.  From pfn [0x109955, 0x109fab),
there are only buddy pages and pages without flags set.

These pages may be stranded on pcps that could otherwise allow this
memory to be coalesced if freed back to the zone free area.  It is
possible that some of these pages may not be on pcps and that something
has called alloc_pages() and used the memory directly, but we rely on
the absence of __GFP_MOVABLE in these cases to allocate from
MIGATE_UNMOVABLE pageblocks to try to keep these MIGRATE_MOVABLE
pageblocks as free as possible.

These buddy and pcp pages, spanning 1,621 pages, could be coalesced and
allow for three transparent hugepages to be dynamically allocated.
Running the numbers for all such spans on the system, it was found that
there were over 400 such spans of only buddy pages and pages without
flags set at the time this /proc/kpageflags sample was collected.
Without this support, there were _no_ order-9 or order-10 pages free.

When kcompactd fails to defragment memory such that a cc.order page can
be allocated, drain all pcps for the zone back to the buddy allocator so
this stranding cannot occur.  Compaction for that order will
subsequently be deferred, which acts as a ratelimit on this drain.

Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1803010340100.88270@chino.kir.corp.google.com
Signed-off-by: David Rientjes <rientjes@google.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-05 21:36:26 -07:00
Yang Shi
112d2d29fc mm/compaction.c: fix comment for try_to_compact_pages()
"mode" argument is not used by try_to_compact_pages() and sub functions
anymore, it has been replaced by "prio".  Fix the comment to explain the
use of "prio" argument.

Link: http://lkml.kernel.org/r/1515801336-20611-1-git-send-email-yang.shi@linux.alibaba.com
Signed-off-by: Yang Shi <yang.shi@linux.alibaba.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Michal Hocko <mhocko@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-01-31 17:18:39 -08:00
Vlastimil Babka
d3c85bad89 mm, compaction: remove unneeded pageblock_skip_persistent() checks
Commit f3c931633a59 ("mm, compaction: persistently skip hugetlbfs
pageblocks") has introduced pageblock_skip_persistent() checks into
migration and free scanners, to make sure pageblocks that should be
persistently skipped are marked as such, regardless of the
ignore_skip_hint flag.

Since the previous patch introduced a new no_set_skip_hint flag, the
ignore flag no longer prevents marking pageblocks as skipped.  Therefore
we can remove the special cases.  The relevant pageblocks will be marked
as skipped by the common logic which marks each pageblock where no page
could be isolated.  This makes the code simpler.

Link: http://lkml.kernel.org/r/20171102121706.21504-3-vbabka@suse.cz
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-17 16:10:00 -08:00
Vlastimil Babka
2583d67132 mm, compaction: split off flag for not updating skip hints
Pageblock skip hints were added as a heuristic for compaction, which
shares core code with CMA.  Since CMA reliability would suffer from the
heuristics, compact_control flag ignore_skip_hint was added for the CMA
use case.  Since 6815bf3f23 ("mm/compaction: respect ignore_skip_hint
in update_pageblock_skip") the flag also means that CMA won't *update*
the skip hints in addition to ignoring them.

Today, direct compaction can also ignore the skip hints in the last
resort attempt, but there's no reason not to set them when isolation
fails in such case.  Thus, this patch splits off a new no_set_skip_hint
flag to avoid the updating, which only CMA sets.  This should improve
the heuristics a bit, and allow us to simplify the persistent skip bit
handling as the next step.

Link: http://lkml.kernel.org/r/20171102121706.21504-2-vbabka@suse.cz
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Mel Gorman <mgorman@techsingularity.net>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-17 16:10:00 -08:00
Vlastimil Babka
b527cfe5bc mm, compaction: extend pageblock_skip_persistent() to all compound pages
pageblock_skip_persistent() checks for HugeTLB pages of pageblock order.
When clearing pageblock skip bits for compaction, the bits are not
cleared for such pageblocks, because they cannot contain base pages
suitable for migration, nor free pages to use as migration targets.

This optimization can be simply extended to all compound pages of order
equal or larger than pageblock order, because migrating such pages (if
they support it) cannot help sub-pageblock fragmentation.  This includes
THP's and also gigantic HugeTLB pages, which the current implementation
doesn't persistently skip due to a strict pageblock_order equality check
and not recognizing tail pages.

While THP pages are generally less "persistent" than HugeTLB, we can
still expect that if a THP exists at the point of
__reset_isolation_suitable(), it will exist also during the subsequent
compaction run.  The time difference here could be actually smaller than
between a compaction run that sets a (non-persistent) skip bit on a THP,
and the next compaction run that observes it.

Link: http://lkml.kernel.org/r/20171102121706.21504-1-vbabka@suse.cz
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-17 16:10:00 -08:00