lineage-22.2
126 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
4c30d46517 |
Merge android-4.19.95 (5da1114) into msm-4.19
* refs/heads/tmp-5da1114:
Revert crypto changes from android-4.19.79-95
Revert "UPSTREAM: PM / wakeup updates"
Revert "ANDROID: of: property: Enable of_devlink by default"
Revert "UPSTREAM: dt-bindings: arm: coresight: Add support for coresight-loses-context-with-cpu"
UPSTREAM: net: usbnet: Fix -Wcast-function-type
UPSTREAM: USB: dummy-hcd: use usb_urb_dir_in instead of usb_pipein
UPSTREAM: USB: dummy-hcd: increase max number of devices to 32
ANDROID: tty: serdev: Fix broken serial console input
ANDROID: update kernel ABI (perf_event changes)
BACKPORT: perf_event: Add support for LSM and SELinux checks
UPSTREAM: iommu: Allow io-pgtable to be used outside of drivers/iommu/
ANDROID: update abi for 4.19.94 release
ANDROID: update abi due to revert
Revert "BACKPORT: perf_event: Add support for LSM and SELinux checks"
UPSTREAM: selinux: sidtab reverse lookup hash table
UPSTREAM: selinux: avoid atomic_t usage in sidtab
UPSTREAM: selinux: check sidtab limit before adding a new entry
UPSTREAM: selinux: fix context string corruption in convert_context()
UPSTREAM: selinux: overhaul sidtab to fix bug and improve performance
UPSTREAM: selinux: refactor mls_context_to_sid() and make it stricter
UPSTREAM: selinux: use separate table for initial SID lookup
UPSTREAM: selinux: make "selinux_policycap_names[]" const char *
UPSTREAM: selinux: refactor sidtab conversion
ANDROID: Update ABI representation
ANDROID: GKI: clk: Don't disable unused clocks with sync state support
ANDROID: GKI: clk: Add support for clock providers with sync state
ANDROID: GKI: driver core: Add dev_has_sync_state()
ANDROID: update kernel ABI representation
BACKPORT: perf_event: Add support for LSM and SELinux checks
ANDROID: update ABI representation
UPSTREAM: exit: panic before exit_mm() on global init exit
ANDROID: serdev: Fix platform device support
ANDROID: Kconfig.gki: Add Hidden SPRD DRM configs
ANDROID: gki_defconfig: Disable TRANSPARENT_HUGEPAGE
ANDROID: gki_defconfig: Enable CONFIG_GNSS_CMDLINE_SERIAL
ANDROID: gnss: Add command line test driver
ANDROID: serdev: add platform device support
ANDROID: gki_defconfig: enable ARM64_SW_TTBR0_PAN
ANDROID: gki_defconfig: Set BINFMT_MISC as =m
UPSTREAM: binder: fix incorrect calculation for num_valid
ABI: Update ABI after f2fs merge
ANDROID: add initial ABI whitelist for android-4.19
ANDROID: staging: android: ion: Fix build when CONFIG_ION_SYSTEM_HEAP=n
ANDROID: staging: android: ion: Expose total heap and pool sizes via sysfs
ANDROID: Update ABI representation due to vmstat counter changes
UPSTREAM: include/linux/slab.h: fix sparse warning in kmalloc_type()
UPSTREAM: mm, slab: shorten kmalloc cache names for large sizes
UPSTREAM: mm, proc: add KReclaimable to /proc/meminfo
UPSTREAM: mm: rename and change semantics of nr_indirectly_reclaimable_bytes
UPSTREAM: dcache: allocate external names from reclaimable kmalloc caches
UPSTREAM: mm, slab/slub: introduce kmalloc-reclaimable caches
UPSTREAM: mm, slab: combine kmalloc_caches and kmalloc_dma_caches
ANDROID: abi update for 4.19.89
ANDROID: update abi_gki_aarch64.xml for LTO, CFI, and SCS
ANDROID: gki_defconfig: enable LTO, CFI, and SCS
ANDROID: update abi_gki_aarch64.xml for CONFIG_GNSS
ANDROID: cuttlefish_defconfig: Enable CONFIG_GNSS
UPSTREAM: arm64: Validate tagged addresses in access_ok() called from kernel threads
ANDROID: mm: Throttle rss_stat tracepoint
UPSTREAM: mm: slub: really fix slab walking for init_on_free
ANDROID: update abi_gki_aarch64.xml for nf change
ANDROID: kbuild: limit LTO inlining
ANDROID: kbuild: merge module sections with LTO
ANDROID: netfilter: nf_nat: remove static from nf_nat_ipv4_fn
UPSTREAM: drm/client: remove the exporting of drm_client_close
ANDROID: f2fs: fix possible merge of unencrypted with encrypted I/O
UPSTREAM: binder: Add binder_proc logging to binderfs
UPSTREAM: binder: Make transaction_log available in binderfs
UPSTREAM: binder: Add stats, state and transactions files
UPSTREAM: binder: add a mount option to show global stats
UPSTREAM: binder: Validate the default binderfs device names.
UPSTREAM: binder: Add default binder devices through binderfs when configured
UPSTREAM: binder: fix CONFIG_ANDROID_BINDER_DEVICES
UPSTREAM: android: binder: use kstrdup instead of open-coding it
UPSTREAM: binderfs: remove separate device_initcall()
UPSTREAM: binderfs: respect limit on binder control creation
UPSTREAM: binderfs: switch from d_add() to d_instantiate()
UPSTREAM: binderfs: drop lock in binderfs_binder_ctl_create
UPSTREAM: binderfs: kill_litter_super() before cleanup
UPSTREAM: binderfs: rework binderfs_binder_device_create()
UPSTREAM: binderfs: rework binderfs_fill_super()
UPSTREAM: binderfs: prevent renaming the control dentry
UPSTREAM: binderfs: remove outdated comment
UPSTREAM: binderfs: fix error return code in binderfs_fill_super()
UPSTREAM: binderfs: handle !CONFIG_IPC_NS builds
UPSTREAM: binderfs: reserve devices for initial mount
UPSTREAM: binderfs: rename header to binderfs.h
UPSTREAM: binderfs: implement "max" mount option
UPSTREAM: binderfs: make each binderfs mount a new instance
UPSTREAM: binderfs: remove wrong kern_mount() call
UPSTREAM: binder: implement binderfs
UPSTREAM: binder: remove BINDER_DEBUG_ENTRY()
ANDROID: Don't base allmodconfig on gki_defconfig
ANDROID: Disable UNWINDER_ORC for allmodconfig
ANDROID: update abi_gki_aarch64.xml for 4.19.87
BACKPORT: ARM: 8905/1: Emit __gnu_mcount_nc when using Clang 10.0.0 or newer
ANDROID: update abi_gki_aarch64.xml
ANDROID: gki_defconfig: =m's applied for virtio configs in arm64
UPSTREAM: of: property: Add device link support for interrupt-parent, dmas and -gpio(s)
UPSTREAM: of: property: Add device link support for "iommu-map"
UPSTREAM: of: property: Fix the semantics of of_is_ancestor_of()
UPSTREAM: i2c: of: Populate fwnode in of_i2c_get_board_info()
UPSTREAM: driver core: Clarify documentation for fwnode_operations.add_links()
UPSTREAM: dt-bindings: arm: coresight: Add support for coresight-loses-context-with-cpu
BACKPORT: coresight: etm4x: Save/restore state across CPU low power states
ANDROID: Update ABI representation
ANDROID: gki_defconfig: IIO=y
f2fs: stop GC when the victim becomes fully valid
f2fs: expose main_blkaddr in sysfs
f2fs: choose hardlimit when softlimit is larger than hardlimit in f2fs_statfs_project()
f2fs: Fix deadlock in f2fs_gc() context during atomic files handling
f2fs: show f2fs instance in printk_ratelimited
f2fs: fix potential overflow
f2fs: fix to update dir's i_pino during cross_rename
f2fs: support aligned pinned file
f2fs: avoid kernel panic on corruption test
f2fs: fix wrong description in document
f2fs: cache global IPU bio
f2fs: fix to avoid memory leakage in f2fs_listxattr
f2fs: check total_segments from devices in raw_super
f2fs: update multi-dev metadata in resize_fs
f2fs: mark recovery flag correctly in read_raw_super_block()
f2fs: fix to update time in lazytime mode
vfs: don't allow writes to swap files
mm: set S_SWAPFILE on blockdev swap devices
BACKPORT: ARM: 8900/1: UNWINDER_FRAME_POINTER implementation for Clang
ANDROID: update abi_gki_aarch64.xml for 4.19.87
ANDROID: gki_defconfig: FW_CACHE to no
FROMGIT: firmware_class: make firmware caching configurable
FROMLIST: arm64: implement Shadow Call Stack
FROMLIST: arm64: disable SCS for hypervisor code
BACKPORT: FROMLIST: arm64: vdso: disable Shadow Call Stack
FROMLIST: arm64: efi: restore x18 if it was corrupted
FROMLIST: arm64: preserve x18 when CPU is suspended
FROMLIST: arm64: reserve x18 from general allocation with SCS
FROMLIST: arm64: disable function graph tracing with SCS
FROMLIST: scs: add support for stack usage debugging
FROMLIST: scs: add accounting
FROMLIST: add support for Clang's Shadow Call Stack (SCS)
FROMLIST: arm64: kernel: avoid x18 in __cpu_soft_restart
FROMLIST: arm64: kvm: stop treating register x18 as caller save
FROMLIST: arm64/lib: copy_page: avoid x18 register in assembler code
FROMLIST: arm64: mm: avoid x18 in idmap_kpti_install_ng_mappings
ANDROID: use non-canonical CFI jump tables
ANDROID: arm64: add __nocfi to __apply_alternatives
ANDROID: arm64: add __pa_function
ANDROID: arm64: allow ThinLTO to be selected
ANDROID: soc/tegra: disable ARCH_TEGRA_210_SOC with LTO
FROMLIST: arm64: fix alternatives with LLVM's integrated assembler
ANDROID: irqchip/gic-v3: rename gic_of_init to work around a ThinLTO+CFI bug
ANDROID: init: ensure initcall ordering with LTO
Revert "ANDROID: init: ensure initcall ordering with LTO"
ANDROID: add support for ThinLTO
ANDROID: clang: update to 10.0.1
ANDROID: gki_defconfig: enable CONFIG_REGULATOR_FIXED_VOLTAGE
ANDROID: gki_defconfig: removed CONFIG_PM_WAKELOCKS
ANDROID: gki_defconfig: enable CONFIG_IKHEADERS as m
FROMGIT: pinctrl: devicetree: Avoid taking direct reference to device name string
ANDROID: update abi_gki_aarch64.xml for 4.19.86 update
ANDROID: Update ABI representation
ANDROID: gki_defconfig: disable FUNCTION_TRACER
ANDROID: Update the ABI representation
ANDROID: update ABI representation
ANDROID: add unstripped modules to the distribution
FROMLIST: vsprintf: Inline call to ptr_to_hashval
UPSTREAM: rss_stat: Add support to detect RSS updates of external mm
UPSTREAM: mm: emit tracepoint when RSS changes
FROMGIT: driver core: Allow device link operations inside sync_state()
ANDROID: uid_sys_stats: avoid double accounting of dying threads
ANDROID: scsi: ufs-qcom: Enable BROKEN_CRYPTO quirk flag
ANDROID: scsi: ufs-hisi: Enable BROKEN_CRYPTO quirk flag
ANDROID: scsi: ufs: Add quirk bit for controllers that don't play well with inline crypto
ANDROID: scsi: ufs: UFS init should not require inline crypto
ANDROID: scsi: ufs: UFS crypto variant operations API
ANDROID: gki_defconfig: enable inline encryption
BACKPORT: FROMLIST: ext4: add inline encryption support
BACKPORT: FROMLIST: f2fs: add inline encryption support
BACKPORT: FROMLIST: fscrypt: add inline encryption support
BACKPORT: FROMLIST: scsi: ufs: Add inline encryption support to UFS
BACKPORT: FROMLIST: scsi: ufs: UFS crypto API
BACKPORT: FROMLIST: scsi: ufs: UFS driver v2.1 spec crypto additions
BACKPORT: FROMLIST: block: blk-crypto for Inline Encryption
ANDROID: block: Fix bio_crypt_should_process WARN_ON
BACKPORT: FROMLIST: block: Add encryption context to struct bio
BACKPORT: FROMLIST: block: Keyslot Manager for Inline Encryption
FROMLIST: f2fs: add support for IV_INO_LBLK_64 encryption policies
FROMLIST: ext4: add support for IV_INO_LBLK_64 encryption policies
BACKPORT: FROMLIST: fscrypt: add support for IV_INO_LBLK_64 policies
FROMLIST: fscrypt: zeroize fscrypt_info before freeing
FROMLIST: fscrypt: remove struct fscrypt_ctx
BACKPORT: FROMLIST: fscrypt: invoke crypto API for ESSIV handling
ANDROID: build kernels with llvm-nm and llvm-objcopy
ANDROID: Fix allmodconfig build with CC=clang
UPSTREAM: mm/page_poison: expose page_poisoning_enabled to kernel modules
FROMGIT: of: property: Add device link support for iommus, mboxes and io-channels
FROMGIT: of: property: Make it easy to add device links from DT properties
FROMGIT: of: property: Minor style clean up of of_link_to_phandle()
Revert "ANDROID: of/property: Add device link support for iommus"
ANDROID: Add allmodconfig build.configs for x86_64 and aarch64
ANDROID: fix allmodconfig build
ANDROID: nf: IDLETIMER: Fix possible use before initialization in idletimer_resume
BACKPORT: coresight: funnel: Support static funnel
BACKPORT:FROMGIT: coresight: replicator: Fix missing spin_lock_init()
BACKPORT:FROMGIT: coresight: funnel: Fix missing spin_lock_init()
BACKPORT:FROMGIT: coresight: Serialize enabling/disabling a link device.
UPSTREAM: coresight: tmc-etr: Add barrier packets when moving offset forward
UPSTREAM: coresight: tmc-etr: Decouple buffer sync and barrier packet insertion
UPSTREAM: coresight: tmc: Make memory width mask computation into a function
UPSTREAM: coresight: tmc-etr: Fix perf_data check
UPSTREAM: coresight: tmc-etr: Fix updating buffer in not-snapshot mode.
UPSTREAM: coresight: tmc-etr: Check if non-secure access is enabled
UPSTREAM: coresight: tmc-etr: Handle memory errors
BACKPORT: coresight: etr_buf: Consolidate refcount initialization
UPSTREAM: coresight: Fix DEBUG_LOCKS_WARN_ON for uninitialized attribute
UPSTREAM: coresight: Use coresight device names for sinks in PMU attribute
UPSTREAM: coresight: tmc-etr: alloc_perf_buf: Do not call smp_processor_id from preemptible
UPSTREAM: coresight: tmc-etr: Do not call smp_processor_id() from preemptible
UPSTREAM: coresight: perf: Don't set the truncated flag in snapshot mode
UPSTREAM: coresight: tmc-etf: Fix snapshot mode update function
UPSTREAM: coresight: tmc-etr: Properly set AUX buffer head in snapshot mode
UPSTREAM: coresight: tmc-etr: Add support for CPU-wide trace scenarios
UPSTREAM: coresight: tmc-etr: Allocate and free ETR memory buffers for CPU-wide scenarios
UPSTREAM: coresight: tmc-etr: Introduce the notion of IDR to ETR devices
UPSTREAM: coresight: tmc-etr: Introduce the notion of reference counting to ETR devices
UPSTREAM: coresight: tmc-etr: Introduce the notion of process ID to ETR devices
UPSTREAM: coresight: tmc-etr: Create per-thread buffer allocation function
UPSTREAM: coresight: tmc-etr: Refactor function tmc_etr_setup_perf_buf()
UPSTREAM: coresight: Communicate perf event to sink buffer allocation functions
UPSTREAM: coresight: perf: Refactor function free_event_data()
UPSTREAM: coresight: perf: Clean up function etm_setup_aux()
UPSTREAM: coresight: Properly address concurrency in sink::update() functions
UPSTREAM: coresight: Properly address errors in sink::disable() functions
UPSTREAM: coresight: Move reference counting inside sink drivers
UPSTREAM: coresight: Adding return code to sink::disable() operation
UPSTREAM: coresight: etm4x: Configure tracers to emit timestamps
UPSTREAM: coresight: etm4x: Skip selector pair 0
UPSTREAM: coresight: etm4x: Add kernel configuration for CONTEXTID
UPSTREAM: coresight: pmu: Adding ITRACE property to cs_etm PMU
UPSTREAM: coresight: tmc: Cleanup power management
UPSTREAM: coresight: Fix freeing up the coresight connections
UPSTREAM: coresight: tmc: Report DMA setup failures
UPSTREAM: coresight: catu: fix clang build warning
UPSTREAM: perf/core: Fix the address filtering fix
UPSTREAM: perf, pt, coresight: Fix address filters for vmas with non-zero offset
UPSTREAM: perf: Copy parent's address filter offsets on clone
UPSTREAM: coresight: Use event attributes for sink selection
UPSTREAM: coresight: perf: Add "sinks" group to PMU directory
UPSTREAM: coresight: etb10: Add support for CLAIM tag
UPSTREAM: coreisght: tmc: Claim device before use
UPSTREAM: coresight: dynamic-replicator: Claim device for use
UPSTREAM: coresight: funnel: Claim devices before use
UPSTREAM: coresight: etmx: Claim devices before use
UPSTREAM: coresight: Add support for CLAIM tag protocol
UPSTREAM: coresight: dynamic-replicator: Handle multiple connections
UPSTREAM: coresight: etb10: Handle errors enabling the device
UPSTREAM: coresight: etm3: Add support for handling errors
UPSTREAM: coresight: etm4x: Add support for handling errors
UPSTREAM: coresight: tmc-etb/etf: Prepare to handle errors enabling
UPSTREAM: coresight: tmc-etr: Handle errors enabling CATU
UPSTREAM: coresight: tmc-etr: Refactor for handling errors
UPSTREAM: coresight: Handle failures in enabling a trace path
UPSTREAM: coresight: tmc: Fix byte-address alignment for RRP
UPSTREAM: coresight: etm4x: Configure EL2 exception level when kernel is running in HYP
UPSTREAM: coresight: etb10: Splitting function etb_enable()
UPSTREAM: coresight: etb10: Refactor etb_drvdata::mode handling
UPSTREAM: coresight: etm-perf: Add support for ETR backend
UPSTREAM: coresight: perf: Remove set_buffer call back
UPSTREAM: coresight: perf: Add helper to retrieve sink configuration
UPSTREAM: coresight: perf: Remove reset_buffer call back for sinks
UPSTREAM: coresight: Convert driver messages to dev_dbg
UPSTREAM: coresight: tmc-etr: Relax collection of trace from sysfs mode
UPSTREAM: coresight: tmc-etr: Handle driver mode specific ETR buffers
UPSTREAM: coresight: perf: Disable trace path upon source error
UPSTREAM: coresight: perf: Allow tracing on hotplugged CPUs
UPSTREAM: coresight: perf: Avoid unncessary CPU hotplug read lock
UPSTREAM: coresight: perf: Fix per cpu path management
UPSTREAM: coresight: Fix handling of sinks
UPSTREAM: coresight: Use ERR_CAST instead of ERR_PTR
UPSTREAM: coresight: Fix remote endpoint parsing
UPSTREAM: coresight: platform: Fix leaking device reference
UPSTREAM: coresight: platform: Fix refcounting for graph nodes
UPSTREAM: coresight: platform: Refactor graph endpoint parsing
UPSTREAM: coresight: Document error handling in coresight_register
ANDROID: regression introduced override_creds=off
ANDROID: overlayfs: internal getxattr operations without sepolicy checking
ANDROID: overlayfs: add __get xattr method
ANDROID: Add optional __get xattr method paired to __vfs_getxattr
UPSTREAM: scsi: ufs: override auto suspend tunables for ufs
UPSTREAM: scsi: core: allow auto suspend override by low-level driver
FROMGIT: of: property: Skip adding device links to suppliers that aren't devices
ANDROID: gki_defconfig: enable CONFIG_KEYBOARD_GPIO
UPSTREAM: dm bufio: introduce a global cache replacement
UPSTREAM: dm bufio: remove old-style buffer cleanup
UPSTREAM: dm bufio: introduce a global queue
UPSTREAM: dm bufio: refactor adjust_total_allocated
UPSTREAM: dm bufio: call adjust_total_allocated from __link_buffer and __unlink_buffer
ANDROID: dummy_cpufreq: Implement get()
ANDROID: gki_defconfig: enable CONFIG_CPUSETS
ANDROID: virtio: virtio_input: Set the amount of multitouch slots in virtio input
rtlwifi: Fix potential overflow on P2P code
ANDROID: cpufreq: create dummy cpufreq driver
ANDROID: Allow DRM_IOCTL_MODE_*_DUMB for render clients.
Cuttlefish Wifi: Add data ops in virt_wifi driver for scan data simulation
ANDROID: of: property: Enable of_devlink by default
ANDROID: of: property: Make sure child dependencies don't block probing of parent
ANDROID: driver core: Allow fwnode_operations.add_links to differentiate errors
ANDROID: driver core: Allow a device to wait on optional suppliers
ANDROID: driver core: Add device link support for SYNC_STATE_ONLY flag
FROMGIT: docs: driver-model: Add documentation for sync_state
FROMGIT: driver: core: Improve documentation for fwnode_operations.add_links()
FROMGIT: of: property: Minor code formatting/style clean ups
ANDROID: of/property: Add device link support for iommus
ANDROID: move up spin_unlock_bh() ahead of remove_proc_entry()
BACKPORT: arm64: tags: Preserve tags for addresses translated via TTBR1
UPSTREAM: arm64: memory: Implement __tag_set() as common function
UPSTREAM: arm64/mm: fix variable 'tag' set but not used
UPSTREAM: arm64: avoid clang warning about self-assignment
ANDROID: sdcardfs: evict dentries on fscrypt key removal
ANDROID: fscrypt: add key removal notifier chain
ANDROID: refactor build.config files to remove duplication
ANDROID: Move from clang r353983c to r365631c
ANDROID: gki_defconfig: remove PWRSEQ_EMMC and PWRSEQ_SIMPLE
ANDROID: unconditionally compile sig_ok in struct module
ANDROID: gki_defconfig: enable fs-verity
UPSTREAM: mm: vmalloc: show number of vmalloc pages in /proc/meminfo
BACKPORT: PM/sleep: Expose suspend stats in sysfs
UPSTREAM: power: supply: Init device wakeup after device_add()
UPSTREAM: PM / wakeup: Unexport wakeup_source_sysfs_{add,remove}()
UPSTREAM: PM / wakeup: Register wakeup class kobj after device is added
UPSTREAM: PM / wakeup: Fix sysfs registration error path
UPSTREAM: PM / wakeup: Show wakeup sources stats in sysfs
UPSTREAM: PM / wakeup: Use wakeup_source_register() in wakelock.c
UPSTREAM: PM / wakeup: Drop wakeup_source_init(), wakeup_source_prepare()
UPSTREAM: PM / wakeup: Drop wakeup_source_drop()
UPSTREAM: PM / core: Add support to skip power management in device/driver model
gki_defconfig: Enable CONFIG_DM_SNAPSHOT
ANDROID: gki_defconfig: enable accelerated AES and SHA-256
ANDROID: fix overflow in /proc/uid_cputime/remove_uid_range
ANDROID: kasan: fix has_attribute check on older GCC versions
ANDROID: gki_defconfig: enable CONFIG_PARAVIRT and CONFIG_HYPERVISOR_GUEST
ANDROID: gki_defconfig: enable CONFIG_NLS_*
ANDROID: gki_defconfig: Enable BPF_JIT and BPF_JIT_ALWAYS_ON
FROMGIT: of: property: Create device links for all child-supplier depencencies
FROMGIT: of/platform: Pause/resume sync state during init and of_platform_populate()
BACKPORT: FROMGIT: driver core: Add sync_state driver/bus callback
BACKPORT: FROMGIT: of: property: Add functional dependency link from DT bindings
FROMGIT: driver core: Add support for linking devices during device addition
FROMGIT: driver core: Add fwnode_to_dev() to look up device from fwnode
UPSTREAM: mm: untag user pointers in mmap/munmap/mremap/brk
UPSTREAM: vfio/type1: untag user pointers in vaddr_get_pfn
UPSTREAM: tee/shm: untag user pointers in tee_shm_register
UPSTREAM: media/v4l2-core: untag user pointers in videobuf_dma_contig_user_get
UPSTREAM: drm/radeon: untag user pointers in radeon_gem_userptr_ioctl
BACKPORT: drm/amdgpu: untag user pointers
UPSTREAM: userfaultfd: untag user pointers
UPSTREAM: fs/namespace: untag user pointers in copy_mount_options
UPSTREAM: mm: untag user pointers in get_vaddr_frames
UPSTREAM: mm: untag user pointers in mm/gup.c
UPSTREAM: mm: untag user pointers passed to memory syscalls
BACKPORT: lib: untag user pointers in strn*_user
UPSTREAM: arm64: Fix reference to docs for ARM64_TAGGED_ADDR_ABI
UPSTREAM: selftests, arm64: add kernel headers path for tags_test
BACKPORT: arm64: Relax Documentation/arm64/tagged-pointers.rst
UPSTREAM: arm64: Define Documentation/arm64/tagged-address-abi.rst
UPSTREAM: arm64: Change the tagged_addr sysctl control semantics to only prevent the opt-in
UPSTREAM: arm64: Tighten the PR_{SET, GET}_TAGGED_ADDR_CTRL prctl() unused arguments
UPSTREAM: selftests, arm64: fix uninitialized symbol in tags_test.c
UPSTREAM: arm64: mm: Really fix sparse warning in untagged_addr()
UPSTREAM: selftests, arm64: add a selftest for passing tagged pointers to kernel
BACKPORT: arm64: Introduce prctl() options to control the tagged user addresses ABI
UPSTREAM: arm64: untag user pointers in access_ok and __uaccess_mask_ptr
UPSTREAM: uaccess: add noop untagged_addr definition
BACKPORT: block: annotate refault stalls from IO submission
f2fs: add a condition to detect overflow in f2fs_ioc_gc_range()
f2fs: fix to add missing F2FS_IO_ALIGNED() condition
f2fs: fix to fallback to buffered IO in IO aligned mode
f2fs: fix to handle error path correctly in f2fs_map_blocks
f2fs: fix extent corrupotion during directIO in LFS mode
f2fs: check all the data segments against all node ones
f2fs: Add a small clarification to CONFIG_FS_F2FS_FS_SECURITY
f2fs: fix inode rwsem regression
f2fs: fix to avoid accessing uninitialized field of inode page in is_alive()
f2fs: avoid infinite GC loop due to stale atomic files
f2fs: Fix indefinite loop in f2fs_gc()
f2fs: convert inline_data in prior to i_size_write
f2fs: fix error path of f2fs_convert_inline_page()
f2fs: add missing documents of reserve_root/resuid/resgid
f2fs: fix flushing node pages when checkpoint is disabled
f2fs: enhance f2fs_is_checkpoint_ready()'s readability
f2fs: clean up __bio_alloc()'s parameter
f2fs: fix wrong error injection path in inc_valid_block_count()
f2fs: fix to writeout dirty inode during node flush
f2fs: optimize case-insensitive lookups
f2fs: introduce f2fs_match_name() for cleanup
f2fs: Fix indefinite loop in f2fs_gc()
f2fs: allocate memory in batch in build_sit_info()
f2fs: support FS_IOC_{GET,SET}FSLABEL
f2fs: fix to avoid data corruption by forbidding SSR overwrite
f2fs: Fix build error while CONFIG_NLS=m
Revert "f2fs: avoid out-of-range memory access"
f2fs: cleanup the code in build_sit_entries.
f2fs: fix wrong available node count calculation
f2fs: remove duplicate code in f2fs_file_write_iter
f2fs: fix to migrate blocks correctly during defragment
f2fs: use wrapped f2fs_cp_error()
f2fs: fix to use more generic EOPNOTSUPP
f2fs: use wrapped IS_SWAPFILE()
f2fs: Support case-insensitive file name lookups
f2fs: include charset encoding information in the superblock
fs: Reserve flag for casefolding
f2fs: fix to avoid call kvfree under spinlock
fs: f2fs: Remove unnecessary checks of SM_I(sbi) in update_general_status()
f2fs: disallow direct IO in atomic write
f2fs: fix to handle quota_{on,off} correctly
f2fs: fix to detect cp error in f2fs_setxattr()
f2fs: fix to spread f2fs_is_checkpoint_ready()
f2fs: support fiemap() for directory inode
f2fs: fix to avoid discard command leak
f2fs: fix to avoid tagging SBI_QUOTA_NEED_REPAIR incorrectly
f2fs: fix to drop meta/node pages during umount
f2fs: disallow switching io_bits option during remount
f2fs: fix panic of IO alignment feature
f2fs: introduce {page,io}_is_mergeable() for readability
f2fs: fix livelock in swapfile writes
f2fs: add fs-verity support
ext4: update on-disk format documentation for fs-verity
ext4: add fs-verity read support
ext4: add basic fs-verity support
fs-verity: support builtin file signatures
fs-verity: add SHA-512 support
fs-verity: implement FS_IOC_MEASURE_VERITY ioctl
fs-verity: implement FS_IOC_ENABLE_VERITY ioctl
fs-verity: add data verification hooks for ->readpages()
fs-verity: add the hook for file ->setattr()
fs-verity: add the hook for file ->open()
fs-verity: add inode and superblock fields
fs-verity: add Kconfig and the helper functions for hashing
fs: uapi: define verity bit for FS_IOC_GETFLAGS
fs-verity: add UAPI header
fs-verity: add MAINTAINERS file entry
fs-verity: add a documentation file
ext4: fix kernel oops caused by spurious casefold flag
ext4: fix coverity warning on error path of filename setup
ext4: optimize case-insensitive lookups
ext4: fix dcache lookup of !casefolded directories
unicode: update to Unicode 12.1.0 final
unicode: add missing check for an error return from utf8lookup()
ext4: export /sys/fs/ext4/feature/casefold if Unicode support is present
unicode: refactor the rule for regenerating utf8data.h
ext4: Support case-insensitive file name lookups
ext4: include charset encoding information in the superblock
unicode: update unicode database unicode version 12.1.0
unicode: introduce test module for normalized utf8 implementation
unicode: implement higher level API for string handling
unicode: reduce the size of utf8data[]
unicode: introduce code for UTF-8 normalization
unicode: introduce UTF-8 character database
ext4 crypto: fix to check feature status before get policy
fscrypt: document the new ioctls and policy version
ubifs: wire up new fscrypt ioctls
f2fs: wire up new fscrypt ioctls
ext4: wire up new fscrypt ioctls
fscrypt: require that key be added when setting a v2 encryption policy
fscrypt: add FS_IOC_REMOVE_ENCRYPTION_KEY_ALL_USERS ioctl
fscrypt: allow unprivileged users to add/remove keys for v2 policies
fscrypt: v2 encryption policy support
fscrypt: add an HKDF-SHA512 implementation
fscrypt: add FS_IOC_GET_ENCRYPTION_KEY_STATUS ioctl
fscrypt: add FS_IOC_REMOVE_ENCRYPTION_KEY ioctl
fscrypt: add FS_IOC_ADD_ENCRYPTION_KEY ioctl
fscrypt: rename keyinfo.c to keysetup.c
fscrypt: move v1 policy key setup to keysetup_v1.c
fscrypt: refactor key setup code in preparation for v2 policies
fscrypt: rename fscrypt_master_key to fscrypt_direct_key
fscrypt: add ->ci_inode to fscrypt_info
fscrypt: use FSCRYPT_* definitions, not FS_*
fscrypt: use FSCRYPT_ prefix for uapi constants
fs, fscrypt: move uapi definitions to new header <linux/fscrypt.h>
fscrypt: use ENOPKG when crypto API support missing
fscrypt: improve warnings for missing crypto API support
fscrypt: improve warning messages for unsupported encryption contexts
fscrypt: make fscrypt_msg() take inode instead of super_block
fscrypt: clean up base64 encoding/decoding
fscrypt: remove loadable module related code
Updated following files to fix build errors:
drivers/gpu/msm/kgsl_pool.c
drivers/hwtracing/coresight/coresight-dummy.c
drivers/iommu/dma-mapping-fast.c
drivers/iommu/io-pgtable-fast.c
drivers/iommu/io-pgtable-msm-secure.c
kernel/taskstats.c
mm/vmalloc.c
security/selinux/ss/sidtab.h
Conflicts:
arch/arm/Makefile
arch/arm64/Kconfig
arch/x86/include/asm/syscall_wrapper.h
build.config.common
drivers/clk/clk.c
drivers/hwtracing/coresight/coresight-etm-perf.c
drivers/hwtracing/coresight/coresight-funnel.c
drivers/hwtracing/coresight/coresight-tmc-etf.c
drivers/hwtracing/coresight/coresight-tmc-etr.c
drivers/hwtracing/coresight/coresight-tmc.c
drivers/hwtracing/coresight/coresight-tmc.h
drivers/hwtracing/coresight/coresight.c
drivers/hwtracing/coresight/of_coresight.c
drivers/iommu/arm-smmu.c
drivers/iommu/io-pgtable-arm.c
drivers/iommu/io-pgtable.c
drivers/scsi/scsi_sysfs.c
drivers/scsi/sd.c
drivers/scsi/ufs/ufshcd.c
drivers/scsi/ufs/ufshcd.h
drivers/staging/android/ion/ion.c
drivers/staging/android/ion/ion.h
drivers/staging/android/ion/ion_page_pool.c
fs/ext4/readpage.c
fs/f2fs/data.c
fs/f2fs/f2fs.h
fs/f2fs/file.c
fs/f2fs/segment.c
fs/f2fs/super.c
include/linux/clk-provider.h
include/linux/compiler_types.h
include/linux/coresight.h
include/linux/mmzone.h
include/scsi/scsi_device.h
include/trace/events/kmem.h
kernel/events/core.c
kernel/sched/core.c
mm/vmstat.c
Change-Id: I2eca52b08b484f2b5c30437671cab8cb0195b8d6
Signed-off-by: Ivaylo Georgiev <irgeorgiev@codeaurora.org>
|
||
|
|
690c4ca8a5 |
UPSTREAM: mm: untag user pointers passed to memory syscalls
(Upstream commit 057d3389108eda8a20c7f496f011846932680d88). This patch is a part of a series that extends kernel ABI to allow to pass tagged user pointers (with the top byte set to something else other than 0x00) as syscall arguments. This patch allows tagged pointers to be passed to the following memory syscalls: get_mempolicy, madvise, mbind, mincore, mlock, mlock2, mprotect, mremap, msync, munlock, move_pages. The mmap and mremap syscalls do not currently accept tagged addresses. Architectures may interpret the tag as a background colour for the corresponding vma. Link: http://lkml.kernel.org/r/aaf0c0969d46b2feb9017f3e1b3ef3970b633d91.1563904656.git.andreyknvl@google.com Signed-off-by: Andrey Konovalov <andreyknvl@google.com> Reviewed-by: Khalid Aziz <khalid.aziz@oracle.com> Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Reviewed-by: Kees Cook <keescook@chromium.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Eric Auger <eric.auger@redhat.com> Cc: Felix Kuehling <Felix.Kuehling@amd.com> Cc: Jens Wiklander <jens.wiklander@linaro.org> Cc: Mauro Carvalho Chehab <mchehab+samsung@kernel.org> Cc: Mike Rapoport <rppt@linux.ibm.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Andrey Konovalov <andreyknvl@google.com> Bug: 135692346 Change-Id: I1a2d89eedb45e618e85ca515f4c9121460711efb |
||
|
|
053efa6d4c |
Merge android-4.19.58 (5ad6eeb) into msm-4.19
* refs/heads/tmp-5ad6eeb: Linux 4.19.58 dmaengine: imx-sdma: remove BD_INTR for channel0 dmaengine: qcom: bam_dma: Fix completed descriptors count MIPS: have "plain" make calls build dtbs for selected platforms MIPS: Add missing EHB in mtc0 -> mfc0 sequence. MIPS: Fix bounds check virt_addr_valid svcrdma: Ignore source port when computing DRC hash nfsd: Fix overflow causing non-working mounts on 1 TB machines KVM: LAPIC: Fix pending interrupt in IRR blocked by software disable LAPIC KVM: x86: degrade WARN to pr_warn_ratelimited netfilter: ipv6: nf_defrag: accept duplicate fragments again bpf: fix bpf_jit_limit knob for PAGE_SIZE >= 64K net: hns: fix unsigned comparison to less than zero sc16is7xx: move label 'err_spi' to correct section netfilter: ipv6: nf_defrag: fix leakage of unqueued fragments ip6: fix skb leak in ip6frag_expire_frag_queue() rds: Fix warning. ALSA: hda: Initialize power_state field properly net: hns: Fixes the missing put_device in positive leg for roce reset x86/boot/compressed/64: Do not corrupt EDX on EFER.LME=1 setting selftests: fib_rule_tests: Fix icmp proto with ipv6 scsi: tcmu: fix use after free mac80211: mesh: fix missing unlock on error in table_path_del() f2fs: don't access node/meta inode mapping after iput drm/fb-helper: generic: Don't take module ref for fbcon media: s5p-mfc: fix incorrect bus assignment in virtual child device net/smc: move unhash before release of clcsock mlxsw: spectrum: Handle VLAN device unlinking tty: rocket: fix incorrect forward declaration of 'rp_init()' btrfs: Ensure replaced device doesn't have pending chunk allocation mm/vmscan.c: prevent useless kswapd loops ftrace/x86: Remove possible deadlock between register_kprobe() and ftrace_run_update_code() drm/imx: only send event on crtc disable if kept disabled drm/imx: notify drm core before sending event during crtc disable drm/etnaviv: add missing failure path to destroy suballoc drm/amdgpu/gfx9: use reset default for PA_SC_FIFO_SIZE drm/amd/powerplay: use hardware fan control if no powerplay fan table arm64: kaslr: keep modules inside module region when KASAN is enabled ARM: dts: armada-xp-98dx3236: Switch to armada-38x-uart serial node tracing/snapshot: Resize spare buffer if size changed fs/userfaultfd.c: disable irqs for fault_pending and event locks lib/mpi: Fix karactx leak in mpi_powm ALSA: hda/realtek - Change front mic location for Lenovo M710q ALSA: hda/realtek: Add quirks for several Clevo notebook barebones ALSA: usb-audio: fix sign unintended sign extension on left shifts ALSA: line6: Fix write on zero-sized buffer ALSA: firewire-lib/fireworks: fix miss detection of received MIDI messages ALSA: seq: fix incorrect order of dest_client/dest_ports arguments crypto: cryptd - Fix skcipher instance memory leak crypto: user - prevent operating on larval algorithms ptrace: Fix ->ptracer_cred handling for PTRACE_TRACEME drm/i915/dmc: protect against reading random memory ftrace: Fix NULL pointer dereference in free_ftrace_func_mapper() module: Fix livepatch/ftrace module text permissions race tracing: avoid build warning with HAVE_NOP_MCOUNT mm/mlock.c: change count_mm_mlocked_page_nr return type scripts/decode_stacktrace.sh: prefix addr2line with $CROSS_COMPILE cpuset: restore sanity to cpuset_cpus_allowed_fallback() i2c: pca-platform: Fix GPIO lookup code platform/mellanox: mlxreg-hotplug: Add devm_free_irq call to remove flow platform/x86: mlx-platform: Fix parent device in i2c-mux-reg device registration platform/x86: intel-vbtn: Report switch events when event wakes device platform/x86: asus-wmi: Only Tell EC the OS will handle display hotkeys from asus_nb_wmi drm: panel-orientation-quirks: Add quirk for GPD MicroPC drm: panel-orientation-quirks: Add quirk for GPD pocket2 scsi: hpsa: correct ioaccel2 chaining SoC: rt274: Fix internal jack assignment in set_jack callback ALSA: hdac: fix memory release for SST and SOF drivers usb: gadget: udc: lpc32xx: allocate descriptor with GFP_ATOMIC usb: gadget: fusb300_udc: Fix memory leak of fusb300->ep[i] x86/CPU: Add more Icelake model numbers ASoC: sun4i-i2s: Add offset to RX channel select ASoC: sun4i-i2s: Fix sun8i tx channel offset mask ASoC: max98090: remove 24-bit format support if RJ is 0 drm/mediatek: call mtk_dsi_stop() after mtk_drm_crtc_atomic_disable() drm/mediatek: clear num_pipes when unbind driver drm/mediatek: call drm_atomic_helper_shutdown() when unbinding driver drm/mediatek: unbind components in mtk_drm_unbind() drm/mediatek: fix unbind functions spi: bitbang: Fix NULL pointer dereference in spi_unregister_master ASoC: ak4458: rstn_control - return a non-zero on error only ASoC: soc-pcm: BE dai needs prepare when pause release after resume ASoC: ak4458: add return value for ak4458_probe ASoC : cs4265 : readable register too low netfilter: nft_flow_offload: IPCB is only valid for ipv4 family netfilter: nft_flow_offload: don't offload when sequence numbers need adjustment netfilter: nft_flow_offload: set liberal tracking mode for tcp netfilter: nf_flow_table: ignore DF bit setting md/raid0: Do not bypass blocking queue entered for raid0 bios block: Fix a NULL pointer dereference in generic_make_request() Bluetooth: Fix faulty expression for minimum encryption key size check Change-Id: I42f3ee04258a2392be42269d05d99dc0b02a9feb Signed-off-by: Ivaylo Georgiev <irgeorgiev@codeaurora.org> |
||
|
|
5ad6eeba58 |
Merge 4.19.58 into android-4.19
Changes in 4.19.58 Bluetooth: Fix faulty expression for minimum encryption key size check block: Fix a NULL pointer dereference in generic_make_request() md/raid0: Do not bypass blocking queue entered for raid0 bios netfilter: nf_flow_table: ignore DF bit setting netfilter: nft_flow_offload: set liberal tracking mode for tcp netfilter: nft_flow_offload: don't offload when sequence numbers need adjustment netfilter: nft_flow_offload: IPCB is only valid for ipv4 family ASoC : cs4265 : readable register too low ASoC: ak4458: add return value for ak4458_probe ASoC: soc-pcm: BE dai needs prepare when pause release after resume ASoC: ak4458: rstn_control - return a non-zero on error only spi: bitbang: Fix NULL pointer dereference in spi_unregister_master drm/mediatek: fix unbind functions drm/mediatek: unbind components in mtk_drm_unbind() drm/mediatek: call drm_atomic_helper_shutdown() when unbinding driver drm/mediatek: clear num_pipes when unbind driver drm/mediatek: call mtk_dsi_stop() after mtk_drm_crtc_atomic_disable() ASoC: max98090: remove 24-bit format support if RJ is 0 ASoC: sun4i-i2s: Fix sun8i tx channel offset mask ASoC: sun4i-i2s: Add offset to RX channel select x86/CPU: Add more Icelake model numbers usb: gadget: fusb300_udc: Fix memory leak of fusb300->ep[i] usb: gadget: udc: lpc32xx: allocate descriptor with GFP_ATOMIC ALSA: hdac: fix memory release for SST and SOF drivers SoC: rt274: Fix internal jack assignment in set_jack callback scsi: hpsa: correct ioaccel2 chaining drm: panel-orientation-quirks: Add quirk for GPD pocket2 drm: panel-orientation-quirks: Add quirk for GPD MicroPC platform/x86: asus-wmi: Only Tell EC the OS will handle display hotkeys from asus_nb_wmi platform/x86: intel-vbtn: Report switch events when event wakes device platform/x86: mlx-platform: Fix parent device in i2c-mux-reg device registration platform/mellanox: mlxreg-hotplug: Add devm_free_irq call to remove flow i2c: pca-platform: Fix GPIO lookup code cpuset: restore sanity to cpuset_cpus_allowed_fallback() scripts/decode_stacktrace.sh: prefix addr2line with $CROSS_COMPILE mm/mlock.c: change count_mm_mlocked_page_nr return type tracing: avoid build warning with HAVE_NOP_MCOUNT module: Fix livepatch/ftrace module text permissions race ftrace: Fix NULL pointer dereference in free_ftrace_func_mapper() drm/i915/dmc: protect against reading random memory ptrace: Fix ->ptracer_cred handling for PTRACE_TRACEME crypto: user - prevent operating on larval algorithms crypto: cryptd - Fix skcipher instance memory leak ALSA: seq: fix incorrect order of dest_client/dest_ports arguments ALSA: firewire-lib/fireworks: fix miss detection of received MIDI messages ALSA: line6: Fix write on zero-sized buffer ALSA: usb-audio: fix sign unintended sign extension on left shifts ALSA: hda/realtek: Add quirks for several Clevo notebook barebones ALSA: hda/realtek - Change front mic location for Lenovo M710q lib/mpi: Fix karactx leak in mpi_powm fs/userfaultfd.c: disable irqs for fault_pending and event locks tracing/snapshot: Resize spare buffer if size changed ARM: dts: armada-xp-98dx3236: Switch to armada-38x-uart serial node arm64: kaslr: keep modules inside module region when KASAN is enabled drm/amd/powerplay: use hardware fan control if no powerplay fan table drm/amdgpu/gfx9: use reset default for PA_SC_FIFO_SIZE drm/etnaviv: add missing failure path to destroy suballoc drm/imx: notify drm core before sending event during crtc disable drm/imx: only send event on crtc disable if kept disabled ftrace/x86: Remove possible deadlock between register_kprobe() and ftrace_run_update_code() mm/vmscan.c: prevent useless kswapd loops btrfs: Ensure replaced device doesn't have pending chunk allocation tty: rocket: fix incorrect forward declaration of 'rp_init()' mlxsw: spectrum: Handle VLAN device unlinking net/smc: move unhash before release of clcsock media: s5p-mfc: fix incorrect bus assignment in virtual child device drm/fb-helper: generic: Don't take module ref for fbcon f2fs: don't access node/meta inode mapping after iput mac80211: mesh: fix missing unlock on error in table_path_del() scsi: tcmu: fix use after free selftests: fib_rule_tests: Fix icmp proto with ipv6 x86/boot/compressed/64: Do not corrupt EDX on EFER.LME=1 setting net: hns: Fixes the missing put_device in positive leg for roce reset ALSA: hda: Initialize power_state field properly rds: Fix warning. ip6: fix skb leak in ip6frag_expire_frag_queue() netfilter: ipv6: nf_defrag: fix leakage of unqueued fragments sc16is7xx: move label 'err_spi' to correct section net: hns: fix unsigned comparison to less than zero bpf: fix bpf_jit_limit knob for PAGE_SIZE >= 64K netfilter: ipv6: nf_defrag: accept duplicate fragments again KVM: x86: degrade WARN to pr_warn_ratelimited KVM: LAPIC: Fix pending interrupt in IRR blocked by software disable LAPIC nfsd: Fix overflow causing non-working mounts on 1 TB machines svcrdma: Ignore source port when computing DRC hash MIPS: Fix bounds check virt_addr_valid MIPS: Add missing EHB in mtc0 -> mfc0 sequence. MIPS: have "plain" make calls build dtbs for selected platforms dmaengine: qcom: bam_dma: Fix completed descriptors count dmaengine: imx-sdma: remove BD_INTR for channel0 Linux 4.19.58 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> |
||
|
|
79fccb9815 |
mm/mlock.c: change count_mm_mlocked_page_nr return type
[ Upstream commit 0874bb49bb21bf24deda853e8bf61b8325e24bcb ]
On a 64-bit machine the value of "vma->vm_end - vma->vm_start" may be
negative when using 32 bit ints and the "count >> PAGE_SHIFT"'s result
will be wrong. So change the local variable and return value to
unsigned long to fix the problem.
Link: http://lkml.kernel.org/r/20190513023701.83056-1-swkhack@gmail.com
Fixes:
|
||
|
|
3cfc37dc2b |
mm: protect VMA modifications using VMA sequence count
The VMA sequence count has been introduced to allow fast detection of VMA modification when running a page fault handler without holding the mmap_sem. This patch provides protection against the VMA modification done in : - madvise() - mpol_rebind_policy() - vma_replace_policy() - change_prot_numa() - mlock(), munlock() - mprotect() - mmap_region() - collapse_huge_page() - userfaultd registering services In addition, VMA fields which will be read during the speculative fault path needs to be written using WRITE_ONCE to prevent write to be split and intermediate values to be pushed to other CPUs. Change-Id: Ic36046b7254e538b6baf7144c50ae577ee7f2074 Signed-off-by: Laurent Dufour <ldufour@linux.vnet.ibm.com> Patch-mainline: linux-mm @ Tue, 17 Apr 2018 16:33:15 [vinmenon@codeaurora.org: trivial merge conflict fixes] Signed-off-by: Vinayak Menon <vinmenon@codeaurora.org> [charante@codeaurora.org: trivial merge conflict fixes] Signed-off-by: Charan Teja Reddy <charante@codeaurora.org> |
||
|
|
533e4ed309 |
ANDROID: mm: add a field to store names for private anonymous memory
Userspace processes often have multiple allocators that each do
anonymous mmaps to get memory. When examining memory usage of
individual processes or systems as a whole, it is useful to be
able to break down the various heaps that were allocated by
each layer and examine their size, RSS, and physical memory
usage.
This patch adds a user pointer to the shared union in
vm_area_struct that points to a null terminated string inside
the user process containing a name for the vma. vmas that
point to the same address will be merged, but vmas that
point to equivalent strings at different addresses will
not be merged.
Userspace can set the name for a region of memory by calling
prctl(PR_SET_VMA, PR_SET_VMA_ANON_NAME, start, len, (unsigned long)name);
Setting the name to NULL clears it.
The names of named anonymous vmas are shown in /proc/pid/maps
as [anon:<name>] and in /proc/pid/smaps in a new "Name" field
that is only present for named vmas. If the userspace pointer
is no longer valid all or part of the name will be replaced
with "<fault>".
The idea to store a userspace pointer to reduce the complexity
within mm (at the expense of the complexity of reading
/proc/pid/mem) came from Dave Hansen. This results in no
runtime overhead in the mm subsystem other than comparing
the anon_name pointers when considering vma merging. The pointer
is stored in a union with fieds that are only used on file-backed
mappings, so it does not increase memory usage.
Includes fix from Jed Davis <jld@mozilla.com> for typo in
prctl_set_vma_anon_name, which could attempt to set the name
across two vmas at the same time due to a typo, which might
corrupt the vma list. Fix it to use tmp instead of end to limit
the name setting to a single vma at a time.
Bug: 120441514
Change-Id: I9aa7b6b5ef536cd780599ba4e2fba8ceebe8b59f
Signed-off-by: Dmitry Shmidt <dimitrysh@google.com>
[AmitP: Fix get_user_pages_remote() call to align with upstream commit
|
||
|
|
62dd4637d6 |
ANDROID: mm: add a field to store names for private anonymous memory
Userspace processes often have multiple allocators that each do
anonymous mmaps to get memory. When examining memory usage of
individual processes or systems as a whole, it is useful to be
able to break down the various heaps that were allocated by
each layer and examine their size, RSS, and physical memory
usage.
This patch adds a user pointer to the shared union in
vm_area_struct that points to a null terminated string inside
the user process containing a name for the vma. vmas that
point to the same address will be merged, but vmas that
point to equivalent strings at different addresses will
not be merged.
Userspace can set the name for a region of memory by calling
prctl(PR_SET_VMA, PR_SET_VMA_ANON_NAME, start, len, (unsigned long)name);
Setting the name to NULL clears it.
The names of named anonymous vmas are shown in /proc/pid/maps
as [anon:<name>] and in /proc/pid/smaps in a new "Name" field
that is only present for named vmas. If the userspace pointer
is no longer valid all or part of the name will be replaced
with "<fault>".
The idea to store a userspace pointer to reduce the complexity
within mm (at the expense of the complexity of reading
/proc/pid/mem) came from Dave Hansen. This results in no
runtime overhead in the mm subsystem other than comparing
the anon_name pointers when considering vma merging. The pointer
is stored in a union with fieds that are only used on file-backed
mappings, so it does not increase memory usage.
Includes fix from Jed Davis <jld@mozilla.com> for typo in
prctl_set_vma_anon_name, which could attempt to set the name
across two vmas at the same time due to a typo, which might
corrupt the vma list. Fix it to use tmp instead of end to limit
the name setting to a single vma at a time.
Change-Id: I9aa7b6b5ef536cd780599ba4e2fba8ceebe8b59f
Signed-off-by: Dmitry Shmidt <dimitrysh@google.com>
[AmitP: Fix get_user_pages_remote() call to align with upstream commit
|
||
|
|
e1fb4a0864 |
dax: remove VM_MIXEDMAP for fsdax and device dax
This patch is reworked from an earlier patch that Dan has posted: https://patchwork.kernel.org/patch/10131727/ VM_MIXEDMAP is used by dax to direct mm paths like vm_normal_page() that the memory page it is dealing with is not typical memory from the linear map. The get_user_pages_fast() path, since it does not resolve the vma, is already using {pte,pmd}_devmap() as a stand-in for VM_MIXEDMAP, so we use that as a VM_MIXEDMAP replacement in some locations. In the cases where there is no pte to consult we fallback to using vma_is_dax() to detect the VM_MIXEDMAP special case. Now that we have explicit driver pfn_t-flag opt-in/opt-out for get_user_pages() support for DAX we can stop setting VM_MIXEDMAP. This also means we no longer need to worry about safely manipulating vm_flags in a future where we support dynamically changing the dax mode of a file. DAX should also now be supported with madvise_behavior(), vma_merge(), and copy_page_range(). This patch has been tested against ndctl unit test. It has also been tested against xfstests commit: 625515d using fake pmem created by memmap and no additional issues have been observed. Link: http://lkml.kernel.org/r/152847720311.55924.16999195879201817653.stgit@djiang5-desk3.ch.intel.com Signed-off-by: Dave Jiang <dave.jiang@intel.com> Acked-by: Dan Williams <dan.j.williams@intel.com> Cc: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
9c4e6b1a70 |
mm, mlock, vmscan: no more skipping pagevecs
When a thread mlocks an address space backed either by file pages which are currently not present in memory or swapped out anon pages (not in swapcache), a new page is allocated and added to the local pagevec (lru_add_pvec), I/O is triggered and the thread then sleeps on the page. On I/O completion, the thread can wake on a different CPU, the mlock syscall will then sets the PageMlocked() bit of the page but will not be able to put that page in unevictable LRU as the page is on the pagevec of a different CPU. Even on drain, that page will go to evictable LRU because the PageMlocked() bit is not checked on pagevec drain. The page will eventually go to right LRU on reclaim but the LRU stats will remain skewed for a long time. This patch puts all the pages, even unevictable, to the pagevecs and on the drain, the pages will be added on their LRUs correctly by checking their evictability. This resolves the mlocked pages on pagevec of other CPUs issue because when those pagevecs will be drained, the mlocked file pages will go to unevictable LRU. Also this makes the race with munlock easier to resolve because the pagevec drains happen in LRU lock. However there is still one place which makes a page evictable and does PageLRU check on that page without LRU lock and needs special attention. TestClearPageMlocked() and isolate_lru_page() in clear_page_mlock(). #0: __pagevec_lru_add_fn #1: clear_page_mlock SetPageLRU() if (!TestClearPageMlocked()) return smp_mb() // <--required // inside does PageLRU if (!PageMlocked()) if (isolate_lru_page()) move to evictable LRU putback_lru_page() else move to unevictable LRU In '#1', TestClearPageMlocked() provides full memory barrier semantics and thus the PageLRU check (inside isolate_lru_page) can not be reordered before it. In '#0', without explicit memory barrier, the PageMlocked() check can be reordered before SetPageLRU(). If that happens, '#0' can put a page in unevictable LRU and '#1' might have just cleared the Mlocked bit of that page but fails to isolate as PageLRU fails as '#0' still hasn't set PageLRU bit of that page. That page will be stranded on the unevictable LRU. There is one (good) side effect though. Without this patch, the pages allocated for System V shared memory segment are added to evictable LRUs even after shmctl(SHM_LOCK) on that segment. This patch will correctly put such pages to unevictable LRU. Link: http://lkml.kernel.org/r/20171121211241.18877-1-shakeelb@google.com Signed-off-by: Shakeel Butt <shakeelb@google.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Jérôme Glisse <jglisse@redhat.com> Cc: Huang Ying <ying.huang@intel.com> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Greg Thelen <gthelen@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Balbir Singh <bsingharora@gmail.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Shaohua Li <shli@fb.com> Cc: Jan Kara <jack@suse.cz> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Hugh Dickins <hughd@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
b7701a5f2e |
mm: docs: fixup punctuation
so that kernel-doc will properly recognize the parameter and function descriptions. Link: http://lkml.kernel.org/r/1516700871-22279-2-git-send-email-rppt@linux.vnet.ibm.com Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com> Cc: Jonathan Corbet <corbet@lwn.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
50d4fb7812 |
mm: Eliminate cond_resched_rcu_qs() in favor of cond_resched()
Now that cond_resched() also provides RCU quiescent states when needed, it can be used in place of cond_resched_rcu_qs(). This commit therefore makes this change. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Vlastimil Babka <vbabka@suse.cz> |
||
|
|
72b03fcd5d |
mm: mlock: remove lru_add_drain_all()
lru_add_drain_all() is not required by mlock() and it will drain everything that has been cached at the time mlock is called. And that is not really related to the memory which will be faulted in (and cached) and mlocked by the syscall itself. If anything lru_add_drain_all() should be called _after_ pages have been mlocked and faulted in but even that is not strictly needed because those pages would get to the appropriate LRUs lazily during the reclaim path. Moreover follow_page_pte (gup) will drain the local pcp LRU cache. On larger machines the overhead of lru_add_drain_all() in mlock() can be significant when mlocking data already in memory. We have observed high latency in mlock() due to lru_add_drain_all() when the users were mlocking in memory tmpfs files. [mhocko@suse.com: changelog fix] Link: http://lkml.kernel.org/r/20171019222507.2894-1-shakeelb@google.com Signed-off-by: Shakeel Butt <shakeelb@google.com> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Balbir Singh <bsingharora@gmail.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Yisheng Xie <xieyisheng1@huawei.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Greg Thelen <gthelen@google.com> Cc: Hugh Dickins <hughd@google.com> Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
8667982014 |
mm, pagevec: remove cold parameter for pagevecs
Every pagevec_init user claims the pages being released are hot even in cases where it is unlikely the pages are hot. As no one cares about the hotness of pages being released to the allocator, just ditch the parameter. No performance impact is expected as the overhead is marginal. The parameter is removed simply because it is a bit stupid to have a useless parameter copied everywhere. Link: http://lkml.kernel.org/r/20171018075952.10627-6-mgorman@techsingularity.net Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Andi Kleen <ak@linux.intel.com> Cc: Dave Chinner <david@fromorbit.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Jan Kara <jack@suse.cz> Cc: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
b24413180f |
License cleanup: add SPDX GPL-2.0 license identifier to files with no license
Many source files in the tree are missing licensing information, which makes it harder for compliance tools to determine the correct license. By default all files without license information are under the default license of the kernel, which is GPL version 2. Update the files which contain no license information with the 'GPL-2.0' SPDX license identifier. The SPDX identifier is a legally binding shorthand, which can be used instead of the full boiler plate text. This patch is based on work done by Thomas Gleixner and Kate Stewart and Philippe Ombredanne. How this work was done: Patches were generated and checked against linux-4.14-rc6 for a subset of the use cases: - file had no licensing information it it. - file was a */uapi/* one with no licensing information in it, - file was a */uapi/* one with existing licensing information, Further patches will be generated in subsequent months to fix up cases where non-standard license headers were used, and references to license had to be inferred by heuristics based on keywords. The analysis to determine which SPDX License Identifier to be applied to a file was done in a spreadsheet of side by side results from of the output of two independent scanners (ScanCode & Windriver) producing SPDX tag:value files created by Philippe Ombredanne. Philippe prepared the base worksheet, and did an initial spot review of a few 1000 files. The 4.13 kernel was the starting point of the analysis with 60,537 files assessed. Kate Stewart did a file by file comparison of the scanner results in the spreadsheet to determine which SPDX license identifier(s) to be applied to the file. She confirmed any determination that was not immediately clear with lawyers working with the Linux Foundation. Criteria used to select files for SPDX license identifier tagging was: - Files considered eligible had to be source code files. - Make and config files were included as candidates if they contained >5 lines of source - File already had some variant of a license header in it (even if <5 lines). All documentation files were explicitly excluded. The following heuristics were used to determine which SPDX license identifiers to apply. - when both scanners couldn't find any license traces, file was considered to have no license information in it, and the top level COPYING file license applied. For non */uapi/* files that summary was: SPDX license identifier # files ---------------------------------------------------|------- GPL-2.0 11139 and resulted in the first patch in this series. If that file was a */uapi/* path one, it was "GPL-2.0 WITH Linux-syscall-note" otherwise it was "GPL-2.0". Results of that was: SPDX license identifier # files ---------------------------------------------------|------- GPL-2.0 WITH Linux-syscall-note 930 and resulted in the second patch in this series. - if a file had some form of licensing information in it, and was one of the */uapi/* ones, it was denoted with the Linux-syscall-note if any GPL family license was found in the file or had no licensing in it (per prior point). Results summary: SPDX license identifier # files ---------------------------------------------------|------ GPL-2.0 WITH Linux-syscall-note 270 GPL-2.0+ WITH Linux-syscall-note 169 ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) 21 ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) 17 LGPL-2.1+ WITH Linux-syscall-note 15 GPL-1.0+ WITH Linux-syscall-note 14 ((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause) 5 LGPL-2.0+ WITH Linux-syscall-note 4 LGPL-2.1 WITH Linux-syscall-note 3 ((GPL-2.0 WITH Linux-syscall-note) OR MIT) 3 ((GPL-2.0 WITH Linux-syscall-note) AND MIT) 1 and that resulted in the third patch in this series. - when the two scanners agreed on the detected license(s), that became the concluded license(s). - when there was disagreement between the two scanners (one detected a license but the other didn't, or they both detected different licenses) a manual inspection of the file occurred. - In most cases a manual inspection of the information in the file resulted in a clear resolution of the license that should apply (and which scanner probably needed to revisit its heuristics). - When it was not immediately clear, the license identifier was confirmed with lawyers working with the Linux Foundation. - If there was any question as to the appropriate license identifier, the file was flagged for further research and to be revisited later in time. In total, over 70 hours of logged manual review was done on the spreadsheet to determine the SPDX license identifiers to apply to the source files by Kate, Philippe, Thomas and, in some cases, confirmation by lawyers working with the Linux Foundation. Kate also obtained a third independent scan of the 4.13 code base from FOSSology, and compared selected files where the other two scanners disagreed against that SPDX file, to see if there was new insights. The Windriver scanner is based on an older version of FOSSology in part, so they are related. Thomas did random spot checks in about 500 files from the spreadsheets for the uapi headers and agreed with SPDX license identifier in the files he inspected. For the non-uapi files Thomas did random spot checks in about 15000 files. In initial set of patches against 4.14-rc6, 3 files were found to have copy/paste license identifier errors, and have been fixed to reflect the correct identifier. Additionally Philippe spent 10 hours this week doing a detailed manual inspection and review of the 12,461 patched files from the initial patch version early this week with: - a full scancode scan run, collecting the matched texts, detected license ids and scores - reviewing anything where there was a license detected (about 500+ files) to ensure that the applied SPDX license was correct - reviewing anything where there was no detection but the patch license was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied SPDX license was correct This produced a worksheet with 20 files needing minor correction. This worksheet was then exported into 3 different .csv files for the different types of files to be modified. These .csv files were then reviewed by Greg. Thomas wrote a script to parse the csv files and add the proper SPDX tag to the file, in the format that the file expected. This script was further refined by Greg based on the output to detect more types of files automatically and to distinguish between header and source .c files (which need different comment types.) Finally Greg ran the script using the .csv files to generate the patches. Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org> Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
||
|
|
9472f23c9e |
mm/mlock.c: use page_zone() instead of page_zone_id()
page_zone_id() is a specialized function to compare the zone for the pages that are within the section range. If the section of the pages are different, page_zone_id() can be different even if their zone is the same. This wrong usage doesn't cause any actual problem since __munlock_pagevec_fill() would be called again with failed index. However, it's better to use more appropriate function here. Link: http://lkml.kernel.org/r/1503559211-10259-1-git-send-email-iamjoonsoo.kim@lge.com Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Minchan Kim <minchan@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
70feee0e1e |
mlock: fix mlock count can not decrease in race condition
Kefeng reported that when running the follow test, the mlock count in
meminfo will increase permanently:
[1] testcase
linux:~ # cat test_mlockal
grep Mlocked /proc/meminfo
for j in `seq 0 10`
do
for i in `seq 4 15`
do
./p_mlockall >> log &
done
sleep 0.2
done
# wait some time to let mlock counter decrease and 5s may not enough
sleep 5
grep Mlocked /proc/meminfo
linux:~ # cat p_mlockall.c
#include <sys/mman.h>
#include <stdlib.h>
#include <stdio.h>
#define SPACE_LEN 4096
int main(int argc, char ** argv)
{
int ret;
void *adr = malloc(SPACE_LEN);
if (!adr)
return -1;
ret = mlockall(MCL_CURRENT | MCL_FUTURE);
printf("mlcokall ret = %d\n", ret);
ret = munlockall();
printf("munlcokall ret = %d\n", ret);
free(adr);
return 0;
}
In __munlock_pagevec() we should decrement NR_MLOCK for each page where
we clear the PageMlocked flag. Commit
|
||
|
|
192d723256 |
mm: make try_to_munlock() return void
try_to_munlock returns SWAP_MLOCK if the one of VMAs mapped the page has VM_LOCKED flag. In that time, VM set PG_mlocked to the page if the page is not pte-mapped THP which cannot be mlocked, either. With that, __munlock_isolated_page can use PageMlocked to check whether try_to_munlock is successful or not without relying on try_to_munlock's retval. It helps to make try_to_unmap/try_to_unmap_one simple with upcoming patches. [minchan@kernel.org: remove PG_Mlocked VM_BUG_ON check] Link: http://lkml.kernel.org/r/20170411025615.GA6545@bbox Link: http://lkml.kernel.org/r/1489555493-14659-5-git-send-email-minchan@kernel.org Signed-off-by: Minchan Kim <minchan@kernel.org> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com> Cc: Hillf Danton <hillf.zj@alibaba-inc.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Sasha Levin <alexander.levin@verizon.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
baeedc7158 |
Merge branch 'prep-for-5level'
Merge 5-level page table prep from Kirill Shutemov: "Here's relatively low-risk part of 5-level paging patchset. Merging it now will make x86 5-level paging enabling in v4.12 easier. The first patch is actually x86-specific: detect 5-level paging support. It boils down to single define. The rest of patchset converts Linux MMU abstraction from 4- to 5-level paging. Enabling of new abstraction in most cases requires adding single line of code in arch-specific code. The rest is taken care by asm-generic/. Changes to mm/ code are mostly mechanical: add support for new page table level -- p4d_t -- where we deal with pud_t now. v2: - fix build on microblaze (Michal); - comment for __ARCH_HAS_5LEVEL_HACK in kasan_populate_zero_shadow(); - acks from Michal" * emailed patches from Kirill A Shutemov <kirill.shutemov@linux.intel.com>: mm: introduce __p4d_alloc() mm: convert generic code to 5-level paging asm-generic: introduce <asm-generic/pgtable-nop4d.h> arch, mm: convert all architectures to use 5level-fixup.h asm-generic: introduce __ARCH_USE_5LEVEL_HACK asm-generic: introduce 5level-fixup.h x86/cpufeature: Add 5-level paging detection |
||
|
|
6ebb4a1b84 |
thp: fix another corner case of munlock() vs. THPs
The following test case triggers BUG() in munlock_vma_pages_range():
int main(int argc, char *argv[])
{
int fd;
system("mount -t tmpfs -o huge=always none /mnt");
fd = open("/mnt/test", O_CREAT | O_RDWR);
ftruncate(fd, 4UL << 20);
mmap(NULL, 4UL << 20, PROT_READ | PROT_WRITE,
MAP_SHARED | MAP_FIXED | MAP_LOCKED, fd, 0);
mmap(NULL, 4096, PROT_READ | PROT_WRITE,
MAP_SHARED | MAP_LOCKED, fd, 0);
munlockall();
return 0;
}
The second mmap() create PTE-mapping of the first huge page in file. It
makes kernel munlock the page as we never keep PTE-mapped page mlocked.
On munlockall() when we handle vma created by the first mmap(),
munlock_vma_page() returns page_mask == 0, as the page is not mlocked
anymore. On next iteration follow_page_mask() return tail page, but
page_mask is HPAGE_NR_PAGES - 1. It makes us skip to the first tail
page of the next huge page and step on
VM_BUG_ON_PAGE(PageMlocked(page)).
The fix is not use the page_mask from follow_page_mask() at all. It has
no use for us.
Link: http://lkml.kernel.org/r/20170302150252.34120-1-kirill.shutemov@linux.intel.com
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: <stable@vger.kernel.org> [4.5+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
||
|
|
c2febafc67 |
mm: convert generic code to 5-level paging
Convert all non-architecture-specific code to 5-level paging. It's mostly mechanical adding handling one more page table level in places where we deal with pud_t. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Acked-by: Michal Hocko <mhocko@suse.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
8703e8a465 |
sched/headers: Prepare for new header dependencies before moving code to <linux/sched/user.h>
We are going to split <linux/sched/user.h> out of <linux/sched.h>, which will have to be picked up from other headers and a couple of .c files. Create a trivial placeholder <linux/sched/user.h> file that just maps to <linux/sched.h> to make this patch obviously correct and bisectable. Include the new header in the files that are going to need it. Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org> |
||
|
|
655548bf62 |
thp: fix corner case of munlock() of PTE-mapped THPs
The following program triggers BUG() in munlock_vma_pages_range():
// autogenerated by syzkaller (http://github.com/google/syzkaller)
#include <sys/mman.h>
int main()
{
mmap((void*)0x20105000ul, 0xc00000ul, 0x2ul, 0x2172ul, -1, 0);
mremap((void*)0x201fd000ul, 0x4000ul, 0xc00000ul, 0x3ul, 0x203f0000ul);
return 0;
}
The test-case constructs the situation when munlock_vma_pages_range()
finds PTE-mapped THP-head in the middle of page table and, by mistake,
skips HPAGE_PMD_NR pages after that.
As result, on the next iteration it hits the middle of PMD-mapped THP
and gets upset seeing mlocked tail page.
The solution is only skip HPAGE_PMD_NR pages if the THP was mlocked
during munlock_vma_page(). It would guarantee that the page is
PMD-mapped as we never mlock PTE-mapeed THPs.
Fixes:
|
||
|
|
b155b4fde5 |
mm: mlock: avoid increase mm->locked_vm on mlock() when already mlock2(,MLOCK_ONFAULT)
When one vma was with flag VM_LOCKED|VM_LOCKONFAULT (by invoking mlock2(,MLOCK_ONFAULT)), it can again be populated with mlock() with VM_LOCKED flag only. There is a hole in mlock_fixup() which increase mm->locked_vm twice even the two operations are on the same vma and both with VM_LOCKED flags. The issue can be reproduced by following code: mlock2(p, 1024 * 64, MLOCK_ONFAULT); //VM_LOCKED|VM_LOCKONFAULT mlock(p, 1024 * 64); //VM_LOCKED Then check the increase VmLck field in /proc/pid/status(to 128k). When vma is set with different vm_flags, and the new vm_flags is with VM_LOCKED, it is not necessarily be a "new locked" vma. This patch corrects this bug by prevent mm->locked_vm from increment when old vm_flags is already VM_LOCKED. Link: http://lkml.kernel.org/r/1472554781-9835-3-git-send-email-wei.guo.simon@gmail.com Signed-off-by: Simon Guo <wei.guo.simon@gmail.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Alexey Klimov <klimov.linux@gmail.com> Cc: Eric B Munson <emunson@akamai.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Michal Hocko <mhocko@suse.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Simon Guo <wei.guo.simon@gmail.com> Cc: Thierry Reding <treding@nvidia.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
0cf2f6f6dc |
mm: mlock: check against vma for actual mlock() size
In do_mlock(), the check against locked memory limitation has a hole
which will fail following cases at step 3):
1) User has a memory chunk from addressA with 50k, and user mem lock
rlimit is 64k.
2) mlock(addressA, 30k)
3) mlock(addressA, 40k)
The 3rd step should have been allowed since the 40k request is
intersected with the previous 30k at step 2), and the 3rd step is
actually for mlock on the extra 10k memory.
This patch checks vma to caculate the actual "new" mlock size, if
necessary, and ajust the logic to fix this issue.
[akpm@linux-foundation.org: clean up comment layout]
[wei.guo.simon@gmail.com: correct a typo in count_mm_mlocked_page_nr()]
Link: http://lkml.kernel.org/r/1473325970-11393-2-git-send-email-wei.guo.simon@gmail.com
Link: http://lkml.kernel.org/r/1472554781-9835-2-git-send-email-wei.guo.simon@gmail.com
Signed-off-by: Simon Guo <wei.guo.simon@gmail.com>
Cc: Alexey Klimov <klimov.linux@gmail.com>
Cc: Eric B Munson <emunson@akamai.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Simon Guo <wei.guo.simon@gmail.com>
Cc: Thierry Reding <treding@nvidia.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
||
|
|
599d0c954f |
mm, vmscan: move LRU lists to node
This moves the LRU lists from the zone to the node and related data such as counters, tracing, congestion tracking and writeback tracking. Unfortunately, due to reclaim and compaction retry logic, it is necessary to account for the number of LRU pages on both zone and node logic. Most reclaim logic is based on the node counters but the retry logic uses the zone counters which do not distinguish inactive and active sizes. It would be possible to leave the LRU counters on a per-zone basis but it's a heavier calculation across multiple cache lines that is much more frequent than the retry checks. Other than the LRU counters, this is mostly a mechanical patch but note that it introduces a number of anomalies. For example, the scans are per-zone but using per-node counters. We also mark a node as congested when a zone is congested. This causes weird problems that are fixed later but is easier to review. In the event that there is excessive overhead on 32-bit systems due to the nodes being on LRU then there are two potential solutions 1. Long-term isolation of highmem pages when reclaim is lowmem When pages are skipped, they are immediately added back onto the LRU list. If lowmem reclaim persisted for long periods of time, the same highmem pages get continually scanned. The idea would be that lowmem keeps those pages on a separate list until a reclaim for highmem pages arrives that splices the highmem pages back onto the LRU. It potentially could be implemented similar to the UNEVICTABLE list. That would reduce the skip rate with the potential corner case is that highmem pages have to be scanned and reclaimed to free lowmem slab pages. 2. Linear scan lowmem pages if the initial LRU shrink fails This will break LRU ordering but may be preferable and faster during memory pressure than skipping LRU pages. Link: http://lkml.kernel.org/r/1467970510-21195-4-git-send-email-mgorman@techsingularity.net Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Hillf Danton <hillf.zj@alibaba-inc.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Rik van Riel <riel@surriel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
a52633d8e9 |
mm, vmscan: move lru_lock to the node
Node-based reclaim requires node-based LRUs and locking. This is a preparation patch that just moves the lru_lock to the node so later patches are easier to review. It is a mechanical change but note this patch makes contention worse because the LRU lock is hotter and direct reclaim and kswapd can contend on the same lock even when reclaiming from different zones. Link: http://lkml.kernel.org/r/1467970510-21195-3-git-send-email-mgorman@techsingularity.net Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Reviewed-by: Minchan Kim <minchan@kernel.org> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Hillf Danton <hillf.zj@alibaba-inc.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Rik van Riel <riel@surriel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
dc0ef0df7b |
mm: make mmap_sem for write waits killable for mm syscalls
This is a follow up work for oom_reaper [1]. As the async OOM killing depends on oom_sem for read we would really appreciate if a holder for write didn't stood in the way. This patchset is changing many of down_write calls to be killable to help those cases when the writer is blocked and waiting for readers to release the lock and so help __oom_reap_task to process the oom victim. Most of the patches are really trivial because the lock is help from a shallow syscall paths where we can return EINTR trivially and allow the current task to die (note that EINTR will never get to the userspace as the task has fatal signal pending). Others seem to be easy as well as the callers are already handling fatal errors and bail and return to userspace which should be sufficient to handle the failure gracefully. I am not familiar with all those code paths so a deeper review is really appreciated. As this work is touching more areas which are not directly connected I have tried to keep the CC list as small as possible and people who I believed would be familiar are CCed only to the specific patches (all should have received the cover though). This patchset is based on linux-next and it depends on down_write_killable for rw_semaphores which got merged into tip locking/rwsem branch and it is merged into this next tree. I guess it would be easiest to route these patches via mmotm because of the dependency on the tip tree but if respective maintainers prefer other way I have no objections. I haven't covered all the mmap_write(mm->mmap_sem) instances here $ git grep "down_write(.*\<mmap_sem\>)" next/master | wc -l 98 $ git grep "down_write(.*\<mmap_sem\>)" | wc -l 62 I have tried to cover those which should be relatively easy to review in this series because this alone should be a nice improvement. Other places can be changed on top. [0] http://lkml.kernel.org/r/1456752417-9626-1-git-send-email-mhocko@kernel.org [1] http://lkml.kernel.org/r/1452094975-551-1-git-send-email-mhocko@kernel.org [2] http://lkml.kernel.org/r/1456750705-7141-1-git-send-email-mhocko@kernel.org This patch (of 18): This is the first step in making mmap_sem write waiters killable. It focuses on the trivial ones which are taking the lock early after entering the syscall and they are not changing state before. Therefore it is very easy to change them to use down_write_killable and immediately return with -EINTR. This will allow the waiter to pass away without blocking the mmap_sem which might be required to make a forward progress. E.g. the oom reaper will need the lock for reading to dismantle the OOM victim address space. The only tricky function in this patch is vm_mmap_pgoff which has many call sites via vm_mmap. To reduce the risk keep vm_mmap with the original non-killable semantic for now. vm_munmap callers do not bother checking the return value so open code it into the munmap syscall path for now for simplicity. Signed-off-by: Michal Hocko <mhocko@suse.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Mel Gorman <mgorman@suse.de> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Konstantin Khlebnikov <koct9i@gmail.com> Cc: Hugh Dickins <hughd@google.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: David Rientjes <rientjes@google.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
7162a1e87b |
mm: fix mlock accouting
Tetsuo Handa reported underflow of NR_MLOCK on munlock.
Testcase:
#include <stdio.h>
#include <stdlib.h>
#include <sys/mman.h>
#define BASE ((void *)0x400000000000)
#define SIZE (1UL << 21)
int main(int argc, char *argv[])
{
void *addr;
system("grep Mlocked /proc/meminfo");
addr = mmap(BASE, SIZE, PROT_READ | PROT_WRITE,
MAP_ANONYMOUS | MAP_PRIVATE | MAP_LOCKED | MAP_FIXED,
-1, 0);
if (addr == MAP_FAILED)
printf("mmap() failed\n"), exit(1);
munmap(addr, SIZE);
system("grep Mlocked /proc/meminfo");
return 0;
}
It happens on munlock_vma_page() due to unfortunate choice of nr_pages
data type:
__mod_zone_page_state(zone, NR_MLOCK, -nr_pages);
For unsigned int nr_pages, implicitly casted to long in
__mod_zone_page_state(), it becomes something around UINT_MAX.
munlock_vma_page() usually called for THP as small pages go though
pagevec.
Let's make nr_pages signed int.
Similar fixes in
|
||
|
|
7f43add451 |
mm/mlock.c: change can_do_mlock return value type to boolean
Since can_do_mlock only return 1 or 0, so make it boolean. No functional change. [akpm@linux-foundation.org: update declaration in mm.h] Signed-off-by: Wang Xiaoqiang <wangxq10@lzu.edu.cn> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
e90309c9f7 |
thp: allow mlocked THP again
Before THP refcounting rework, THP was not allowed to cross VMA
boundary. So, if we have THP and we split it, PG_mlocked can be safely
transferred to small pages.
With new THP refcounting and naive approach to mlocking we can end up
with this scenario:
1. we have a mlocked THP, which belong to one VM_LOCKED VMA.
2. the process does munlock() on the *part* of the THP:
- the VMA is split into two, one of them VM_LOCKED;
- huge PMD split into PTE table;
- THP is still mlocked;
3. split_huge_page():
- it transfers PG_mlocked to *all* small pages regrardless if it
blong to any VM_LOCKED VMA.
We probably could munlock() all small pages on split_huge_page(), but I
think we have accounting issue already on step two.
Instead of forbidding mlocked pages altogether, we just avoid mlocking
PTE-mapped THPs and munlock THPs on split_huge_pmd().
This means PTE-mapped THPs will be on normal lru lists and will be split
under memory pressure by vmscan. After the split vmscan will detect
unevictable small pages and mlock them.
With this approach we shouldn't hit situation like described above.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Sasha Levin <sasha.levin@oracle.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: Jerome Marchand <jmarchan@redhat.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Rik van Riel <riel@redhat.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Steve Capper <steve.capper@linaro.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Christoph Lameter <cl@linux.com>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
||
|
|
7479df6da9 |
thp, mlock: do not allow huge pages in mlocked area
With new refcounting THP can belong to several VMAs. This makes tricky to track THP pages, when they partially mlocked. It can lead to leaking mlocked pages to non-VM_LOCKED vmas and other problems. With this patch we will split all pages on mlock and avoid fault-in/collapse new THP in VM_LOCKED vmas. I've tried alternative approach: do not mark THP pages mlocked and keep them on normal LRUs. This way vmscan could try to split huge pages on memory pressure and free up subpages which doesn't belong to VM_LOCKED vmas. But this is user-visible change: we screw up Mlocked accouting reported in meminfo, so I had to leave this approach aside. We can bring something better later, but this should be good enough for now. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Tested-by: Sasha Levin <sasha.levin@oracle.com> Tested-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Acked-by: Jerome Marchand <jmarchan@redhat.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Rik van Riel <riel@redhat.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Steve Capper <steve.capper@linaro.org> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@suse.cz> Cc: Christoph Lameter <cl@linux.com> Cc: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
ab7a5af7fd |
mm/mlock.c: drop unneeded initialization in munlock_vma_pages_range()
Before usage page pointer initialized by NULL is reinitialized by follow_page_mask(). Drop useless init of page pointer in the beginning of loop. Signed-off-by: Alexey Klimov <klimov.linux@gmail.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
b0f205c2a3 |
mm: mlock: add mlock flags to enable VM_LOCKONFAULT usage
The previous patch introduced a flag that specified pages in a VMA should be placed on the unevictable LRU, but they should not be made present when the area is created. This patch adds the ability to set this state via the new mlock system calls. We add MLOCK_ONFAULT for mlock2 and MCL_ONFAULT for mlockall. MLOCK_ONFAULT will set the VM_LOCKONFAULT modifier for VM_LOCKED. MCL_ONFAULT should be used as a modifier to the two other mlockall flags. When used with MCL_CURRENT, all current mappings will be marked with VM_LOCKED | VM_LOCKONFAULT. When used with MCL_FUTURE, the mm->def_flags will be marked with VM_LOCKED | VM_LOCKONFAULT. When used with both MCL_CURRENT and MCL_FUTURE, all current mappings and mm->def_flags will be marked with VM_LOCKED | VM_LOCKONFAULT. Prior to this patch, mlockall() will unconditionally clear the mm->def_flags any time it is called without MCL_FUTURE. This behavior is maintained after adding MCL_ONFAULT. If a call to mlockall(MCL_FUTURE) is followed by mlockall(MCL_CURRENT), the mm->def_flags will be cleared and new VMAs will be unlocked. This remains true with or without MCL_ONFAULT in either mlockall() invocation. munlock() will unconditionally clear both vma flags. munlockall() unconditionally clears for VMA flags on all VMAs and in the mm->def_flags field. Signed-off-by: Eric B Munson <emunson@akamai.com> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Guenter Roeck <linux@roeck-us.net> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Shuah Khan <shuahkh@osg.samsung.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
de60f5f10c |
mm: introduce VM_LOCKONFAULT
The cost of faulting in all memory to be locked can be very high when working with large mappings. If only portions of the mapping will be used this can incur a high penalty for locking. For the example of a large file, this is the usage pattern for a large statical language model (probably applies to other statical or graphical models as well). For the security example, any application transacting in data that cannot be swapped out (credit card data, medical records, etc). This patch introduces the ability to request that pages are not pre-faulted, but are placed on the unevictable LRU when they are finally faulted in. The VM_LOCKONFAULT flag will be used together with VM_LOCKED and has no effect when set without VM_LOCKED. Setting the VM_LOCKONFAULT flag for a VMA will cause pages faulted into that VMA to be added to the unevictable LRU when they are faulted or if they are already present, but will not cause any missing pages to be faulted in. Exposing this new lock state means that we cannot overload the meaning of the FOLL_POPULATE flag any longer. Prior to this patch it was used to mean that the VMA for a fault was locked. This means we need the new FOLL_MLOCK flag to communicate the locked state of a VMA. FOLL_POPULATE will now only control if the VMA should be populated and in the case of VM_LOCKONFAULT, it will not be set. Signed-off-by: Eric B Munson <emunson@akamai.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Michal Hocko <mhocko@suse.cz> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Guenter Roeck <linux@roeck-us.net> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Shuah Khan <shuahkh@osg.samsung.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
a8ca5d0ecb |
mm: mlock: add new mlock system call
With the refactored mlock code, introduce a new system call for mlock. The new call will allow the user to specify what lock states are being added. mlock2 is trivial at the moment, but a follow on patch will add a new mlock state making it useful. Signed-off-by: Eric B Munson <emunson@akamai.com> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Guenter Roeck <linux@roeck-us.net> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Shuah Khan <shuahkh@osg.samsung.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
1aab92ec3d |
mm: mlock: refactor mlock, munlock, and munlockall code
mlock() allows a user to control page out of program memory, but this
comes at the cost of faulting in the entire mapping when it is allocated.
For large mappings where the entire area is not necessary this is not
ideal. Instead of forcing all locked pages to be present when they are
allocated, this set creates a middle ground. Pages are marked to be
placed on the unevictable LRU (locked) when they are first used, but they
are not faulted in by the mlock call.
This series introduces a new mlock() system call that takes a flags
argument along with the start address and size. This flags argument gives
the caller the ability to request memory be locked in the traditional way,
or to be locked after the page is faulted in. A new MCL flag is added to
mirror the lock on fault behavior from mlock() in mlockall().
There are two main use cases that this set covers. The first is the
security focussed mlock case. A buffer is needed that cannot be written
to swap. The maximum size is known, but on average the memory used is
significantly less than this maximum. With lock on fault, the buffer is
guaranteed to never be paged out without consuming the maximum size every
time such a buffer is created.
The second use case is focussed on performance. Portions of a large file
are needed and we want to keep the used portions in memory once accessed.
This is the case for large graphical models where the path through the
graph is not known until run time. The entire graph is unlikely to be
used in a given invocation, but once a node has been used it needs to stay
resident for further processing. Given these constraints we have a number
of options. We can potentially waste a large amount of memory by mlocking
the entire region (this can also cause a significant stall at startup as
the entire file is read in). We can mlock every page as we access them
without tracking if the page is already resident but this introduces large
overhead for each access. The third option is mapping the entire region
with PROT_NONE and using a signal handler for SIGSEGV to
mprotect(PROT_READ) and mlock() the needed page. Doing this page at a
time adds a significant performance penalty. Batching can be used to
mitigate this overhead, but in order to safely avoid trying to mprotect
pages outside of the mapping, the boundaries of each mapping to be used in
this way must be tracked and available to the signal handler. This is
precisely what the mm system in the kernel should already be doing.
For mlock(MLOCK_ONFAULT) the user is charged against RLIMIT_MEMLOCK as if
mlock(MLOCK_LOCKED) or mmap(MAP_LOCKED) was used, so when the VMA is
created not when the pages are faulted in. For mlockall(MCL_ONFAULT) the
user is charged as if MCL_FUTURE was used. This decision was made to keep
the accounting checks out of the page fault path.
To illustrate the benefit of this set I wrote a test program that mmaps a
5 GB file filled with random data and then makes 15,000,000 accesses to
random addresses in that mapping. The test program was run 20 times for
each setup. Results are reported for two program portions, setup and
execution. The setup phase is calling mmap and optionally mlock on the
entire region. For most experiments this is trivial, but it highlights
the cost of faulting in the entire region. Results are averages across
the 20 runs in milliseconds.
mmap with mlock(MLOCK_LOCKED) on entire range:
Setup avg: 8228.666
Processing avg: 8274.257
mmap with mlock(MLOCK_LOCKED) before each access:
Setup avg: 0.113
Processing avg: 90993.552
mmap with PROT_NONE and signal handler and batch size of 1 page:
With the default value in max_map_count, this gets ENOMEM as I attempt
to change the permissions, after upping the sysctl significantly I get:
Setup avg: 0.058
Processing avg: 69488.073
mmap with PROT_NONE and signal handler and batch size of 8 pages:
Setup avg: 0.068
Processing avg: 38204.116
mmap with PROT_NONE and signal handler and batch size of 16 pages:
Setup avg: 0.044
Processing avg: 29671.180
mmap with mlock(MLOCK_ONFAULT) on entire range:
Setup avg: 0.189
Processing avg: 17904.899
The signal handler in the batch cases faulted in memory in two steps to
avoid having to know the start and end of the faulting mapping. The first
step covers the page that caused the fault as we know that it will be
possible to lock. The second step speculatively tries to mlock and
mprotect the batch size - 1 pages that follow. There may be a clever way
to avoid this without having the program track each mapping to be covered
by this handeler in a globally accessible structure, but I could not find
it. It should be noted that with a large enough batch size this two step
fault handler can still cause the program to crash if it reaches far
beyond the end of the mapping.
These results show that if the developer knows that a majority of the
mapping will be used, it is better to try and fault it in at once,
otherwise mlock(MLOCK_ONFAULT) is significantly faster.
The performance cost of these patches are minimal on the two benchmarks I
have tested (stream and kernbench). The following are the average values
across 20 runs of stream and 10 runs of kernbench after a warmup run whose
results were discarded.
Avg throughput in MB/s from stream using 1000000 element arrays
Test 4.2-rc1 4.2-rc1+lock-on-fault
Copy: 10,566.5 10,421
Scale: 10,685 10,503.5
Add: 12,044.1 11,814.2
Triad: 12,064.8 11,846.3
Kernbench optimal load
4.2-rc1 4.2-rc1+lock-on-fault
Elapsed Time 78.453 78.991
User Time 64.2395 65.2355
System Time 9.7335 9.7085
Context Switches 22211.5 22412.1
Sleeps 14965.3 14956.1
This patch (of 6):
Extending the mlock system call is very difficult because it currently
does not take a flags argument. A later patch in this set will extend
mlock to support a middle ground between pages that are locked and faulted
in immediately and unlocked pages. To pave the way for the new system
call, the code needs some reorganization so that all the actual entry
point handles is checking input and translating to VMA flags.
Signed-off-by: Eric B Munson <emunson@akamai.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Guenter Roeck <linux@roeck-us.net>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Shuah Khan <shuahkh@osg.samsung.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
||
|
|
8fd9e4883a |
mm/mlock: use offset_in_page macro
linux/mm.h provides offset_in_page() macro. Let's use already predefined macro instead of (addr & ~PAGE_MASK). Signed-off-by: Alexander Kuleshov <kuleshovmail@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
86d2adccfb |
mm/mlock.c: reorganize mlockall() return values and remove goto-out label
In mlockall syscall wrapper after out-label for goto code just doing return. Remove goto out statements and return error values directly. Also instead of rewriting ret variable before every if-check move returns to 'error'-like path under if-check. Objdump asm listing showed me reducing by few asm lines. Object file size descreased from 220592 bytes to 220528 bytes for me (for aarch64). Signed-off-by: Alexey Klimov <klimov.linux@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
19a809afe2 |
userfaultfd: teach vma_merge to merge across vma->vm_userfaultfd_ctx
vma->vm_userfaultfd_ctx is yet another vma parameter that vma_merge must be aware about so that we can merge vmas back like they were originally before arming the userfaultfd on some memory range. Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Acked-by: Pavel Emelyanov <xemul@parallels.com> Cc: Sanidhya Kashyap <sanidhya.gatech@gmail.com> Cc: zhang.zhanghailiang@huawei.com Cc: "Kirill A. Shutemov" <kirill@shutemov.name> Cc: Andres Lagar-Cavilla <andreslc@google.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Rik van Riel <riel@redhat.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Hugh Dickins <hughd@google.com> Cc: Peter Feiner <pfeiner@google.com> Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: "Huangpeng (Peter)" <peter.huangpeng@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
acc3c8d15e |
mm: move mm_populate()-related code to mm/gup.c
It's odd that we have populate_vma_page_range() and __mm_populate() in mm/mlock.c. It's implementation of generic memory population and mlocking is one of possible side effect, if VM_LOCKED is set. __get_user_pages() is core of the implementation. Let's move the code into mm/gup.c. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Acked-by: David Rientjes <rientjes@google.com> Cc: Michel Lespinasse <walken@google.com> Cc: Rik van Riel <riel@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
c561259ca7 |
mm: move gup() -> posix mlock() error conversion out of __mm_populate
This is praparation to moving mm_populate()-related code out of mm/mlock.c. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Acked-by: David Rientjes <rientjes@google.com> Cc: Michel Lespinasse <walken@google.com> Cc: Rik van Riel <riel@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
fc05f56621 |
mm: rename __mlock_vma_pages_range() to populate_vma_page_range()
__mlock_vma_pages_range() doesn't necessarily mlock pages. It depends on
vma flags. The same codepath is used for MAP_POPULATE.
Let's rename __mlock_vma_pages_range() to populate_vma_page_range().
This patch also drops mlock_vma_pages_range() references from
documentation. It has gone in
|
||
|
|
84d33df279 |
mm: rename FOLL_MLOCK to FOLL_POPULATE
After commit
|
||
|
|
a5a6579db3 |
mm: reorder can_do_mlock to fix audit denial
A userspace call to mmap(MAP_LOCKED) may result in the successful locking of memory while also producing a confusing audit log denial. can_do_mlock checks capable and rlimit. If either of these return positive can_do_mlock returns true. The capable check leads to an LSM hook used by apparmour and selinux which produce the audit denial. Reordering so rlimit is checked first eliminates the denial on success, only recording a denial when the lock is unsuccessful as a result of the denial. Signed-off-by: Jeff Vander Stoep <jeffv@google.com> Acked-by: Nick Kralevich <nnk@google.com> Cc: Jeff Vander Stoep <jeffv@google.com> Cc: Sasha Levin <sasha.levin@oracle.com> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: Rik van Riel <riel@redhat.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Paul Cassella <cassella@cray.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
d6dd50e07c |
Merge branch 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull RCU updates from Ingo Molnar: "The main changes in this cycle were: - changes related to No-CBs CPUs and NO_HZ_FULL - RCU-tasks implementation - torture-test updates - miscellaneous fixes - locktorture updates - RCU documentation updates" * 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (81 commits) workqueue: Use cond_resched_rcu_qs macro workqueue: Add quiescent state between work items locktorture: Cleanup header usage locktorture: Cannot hold read and write lock locktorture: Fix __acquire annotation for spinlock irq locktorture: Support rwlocks rcu: Eliminate deadlock between CPU hotplug and expedited grace periods locktorture: Document boot/module parameters rcutorture: Rename rcutorture_runnable parameter locktorture: Add test scenario for rwsem_lock locktorture: Add test scenario for mutex_lock locktorture: Make torture scripting account for new _runnable name locktorture: Introduce torture context locktorture: Support rwsems locktorture: Add infrastructure for torturing read locks torture: Address race in module cleanup locktorture: Make statistics generic locktorture: Teach about lock debugging locktorture: Support mutexes locktorture: Add documentation ... |
||
|
|
96dad67ff2 |
mm: use VM_BUG_ON_MM where possible
Dump the contents of the relevant struct_mm when we hit the bug condition. Signed-off-by: Sasha Levin <sasha.levin@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
81d1b09c6b |
mm: convert a few VM_BUG_ON callers to VM_BUG_ON_VMA
Trivially convert a few VM_BUG_ON calls to VM_BUG_ON_VMA to extract more information when they trigger. [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Sasha Levin <sasha.levin@oracle.com> Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Konstantin Khlebnikov <khlebnikov@openvz.org> Cc: Rik van Riel <riel@redhat.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Michal Hocko <mhocko@suse.cz> Cc: Hugh Dickins <hughd@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Michel Lespinasse <walken@google.com> Cc: Minchan Kim <minchan@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
bde6c3aa99 |
rcu: Provide cond_resched_rcu_qs() to force quiescent states in long loops
RCU-tasks requires the occasional voluntary context switch from CPU-bound in-kernel tasks. In some cases, this requires instrumenting cond_resched(). However, there is some reluctance to countenance unconditionally instrumenting cond_resched() (see http://lwn.net/Articles/603252/), so this commit creates a separate cond_resched_rcu_qs() that may be used in place of cond_resched() in locations prone to long-duration in-kernel looping. This commit currently instruments only RCU-tasks. Future possibilities include also instrumenting RCU, RCU-bh, and RCU-sched in order to reduce IPI usage. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> |
||
|
|
9a95f3cf7b |
mm: describe mmap_sem rules for __lock_page_or_retry() and callers
Add a comment describing the circumstances in which __lock_page_or_retry() will or will not release the mmap_sem when returning 0. Add comments to lock_page_or_retry()'s callers (filemap_fault(), do_swap_page()) noting the impact on VM_FAULT_RETRY returns. Add comments on up the call tree, particularly replacing the false "We return with mmap_sem still held" comments. Signed-off-by: Paul Cassella <cassella@cray.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |