Commit Graph

199 Commits

Author SHA1 Message Date
Greg Kroah-Hartman
291d853dff Merge 4.19.88 into android-4.19
Changes in 4.19.88
	clk: meson: gxbb: let sar_adc_clk_div set the parent clock rate
	clocksource/drivers/mediatek: Fix error handling
	ASoC: msm8916-wcd-analog: Fix RX1 selection in RDAC2 MUX
	ASoC: compress: fix unsigned integer overflow check
	reset: Fix memory leak in reset_control_array_put()
	clk: samsung: exynos5433: Fix error paths
	ASoC: kirkwood: fix external clock probe defer
	ASoC: kirkwood: fix device remove ordering
	clk: samsung: exynos5420: Preserve PLL configuration during suspend/resume
	pinctrl: cherryview: Allocate IRQ chip dynamic
	ARM: dts: imx6qdl-sabreauto: Fix storm of accelerometer interrupts
	reset: fix reset_control_ops kerneldoc comment
	clk: at91: avoid sleeping early
	clk: sunxi: Fix operator precedence in sunxi_divs_clk_setup
	clk: sunxi-ng: a80: fix the zero'ing of bits 16 and 18
	ARM: dts: sun8i-a83t-tbs-a711: Fix WiFi resume from suspend
	samples/bpf: fix build by setting HAVE_ATTR_TEST to zero
	powerpc/bpf: Fix tail call implementation
	idr: Fix integer overflow in idr_for_each_entry
	idr: Fix idr_alloc_u32 on 32-bit systems
	x86/resctrl: Prevent NULL pointer dereference when reading mondata
	clk: ti: dra7-atl-clock: Remove ti_clk_add_alias call
	clk: ti: clkctrl: Fix failed to enable error with double udelay timeout
	net: fec: add missed clk_disable_unprepare in remove
	bridge: ebtables: don't crash when using dnat target in output chains
	can: peak_usb: report bus recovery as well
	can: c_can: D_CAN: c_can_chip_config(): perform a sofware reset on open
	can: rx-offload: can_rx_offload_queue_tail(): fix error handling, avoid skb mem leak
	can: rx-offload: can_rx_offload_offload_one(): do not increase the skb_queue beyond skb_queue_len_max
	can: rx-offload: can_rx_offload_offload_one(): increment rx_fifo_errors on queue overflow or OOM
	can: rx-offload: can_rx_offload_offload_one(): use ERR_PTR() to propagate error value in case of errors
	can: rx-offload: can_rx_offload_irq_offload_timestamp(): continue on error
	can: rx-offload: can_rx_offload_irq_offload_fifo(): continue on error
	can: flexcan: increase error counters if skb enqueueing via can_rx_offload_queue_sorted() fails
	can: mcp251x: mcp251x_restart_work_handler(): Fix potential force_quit race condition
	watchdog: meson: Fix the wrong value of left time
	ASoC: stm32: sai: add restriction on mmap support
	scripts/gdb: fix debugging modules compiled with hot/cold partitioning
	net: bcmgenet: use RGMII loopback for MAC reset
	net: bcmgenet: reapply manual settings to the PHY
	net: mscc: ocelot: fix __ocelot_rmw_ix prototype
	ceph: return -EINVAL if given fsc mount option on kernel w/o support
	net/fq_impl: Switch to kvmalloc() for memory allocation
	mac80211: fix station inactive_time shortly after boot
	block: drbd: remove a stray unlock in __drbd_send_protocol()
	pwm: bcm-iproc: Prevent unloading the driver module while in use
	scsi: target/tcmu: Fix queue_cmd_ring() declaration
	scsi: lpfc: Fix kernel Oops due to null pring pointers
	scsi: lpfc: Fix dif and first burst use in write commands
	ARM: dts: Fix up SQ201 flash access
	tracing: Lock event_mutex before synth_event_mutex
	ARM: debug-imx: only define DEBUG_IMX_UART_PORT if needed
	ARM: dts: imx51: Fix memory node duplication
	ARM: dts: imx53: Fix memory node duplication
	ARM: dts: imx31: Fix memory node duplication
	ARM: dts: imx35: Fix memory node duplication
	ARM: dts: imx7: Fix memory node duplication
	ARM: dts: imx6ul: Fix memory node duplication
	ARM: dts: imx6sx: Fix memory node duplication
	ARM: dts: imx6sl: Fix memory node duplication
	ARM: dts: imx50: Fix memory node duplication
	ARM: dts: imx23: Fix memory node duplication
	ARM: dts: imx1: Fix memory node duplication
	ARM: dts: imx27: Fix memory node duplication
	ARM: dts: imx25: Fix memory node duplication
	ARM: dts: imx53-voipac-dmm-668: Fix memory node duplication
	parisc: Fix serio address output
	parisc: Fix HP SDC hpa address output
	ARM: dts: Fix hsi gdd range for omap4
	arm64: mm: Prevent mismatched 52-bit VA support
	arm64: smp: Handle errors reported by the firmware
	bus: ti-sysc: Check for no-reset and no-idle flags at the child level
	platform/x86: mlx-platform: Fix LED configuration
	ARM: OMAP1: fix USB configuration for device-only setups
	RDMA/hns: Fix the bug while use multi-hop of pbl
	arm64: preempt: Fix big-endian when checking preempt count in assembly
	RDMA/vmw_pvrdma: Use atomic memory allocation in create AH
	PM / AVS: SmartReflex: NULL check before some freeing functions is not needed
	xfs: zero length symlinks are not valid
	ARM: ks8695: fix section mismatch warning
	ACPI / LPSS: Ignore acpi_device_fix_up_power() return value
	scsi: lpfc: Enable Management features for IF_TYPE=6
	scsi: qla2xxx: Fix NPIV handling for FC-NVMe
	scsi: qla2xxx: Fix for FC-NVMe discovery for NPIV port
	nvme: provide fallback for discard alloc failure
	s390/zcrypt: make sysfs reset attribute trigger queue reset
	crypto: user - support incremental algorithm dumps
	arm64: dts: renesas: draak: Fix CVBS input
	mwifiex: fix potential NULL dereference and use after free
	mwifiex: debugfs: correct histogram spacing, formatting
	brcmfmac: set F2 watermark to 256 for 4373
	brcmfmac: set SDIO F1 MesBusyCtrl for CYW4373
	rtl818x: fix potential use after free
	bcache: do not check if debug dentry is ERR or NULL explicitly on remove
	bcache: do not mark writeback_running too early
	xfs: require both realtime inodes to mount
	nvme: fix kernel paging oops
	ubifs: Fix default compression selection in ubifs
	ubi: Put MTD device after it is not used
	ubi: Do not drop UBI device reference before using
	microblaze: adjust the help to the real behavior
	microblaze: move "... is ready" messages to arch/microblaze/Makefile
	microblaze: fix multiple bugs in arch/microblaze/boot/Makefile
	iwlwifi: move iwl_nvm_check_version() into dvm
	iwlwifi: mvm: force TCM re-evaluation on TCM resume
	iwlwifi: pcie: fix erroneous print
	iwlwifi: pcie: set cmd_len in the correct place
	gpio: pca953x: Fix AI overflow on PCAL6524
	gpiolib: Fix return value of gpio_to_desc() stub if !GPIOLIB
	kvm: vmx: Set IA32_TSC_AUX for legacy mode guests
	Revert "KVM: nVMX: reset cache/shadows when switching loaded VMCS"
	Revert "KVM: nVMX: move check_vmentry_postreqs() call to nested_vmx_enter_non_root_mode()"
	crypto/chelsio/chtls: listen fails with multiadapt
	VSOCK: bind to random port for VMADDR_PORT_ANY
	mmc: meson-gx: make sure the descriptor is stopped on errors
	mtd: rawnand: sunxi: Write pageprog related opcodes to WCMD_SET
	usb: ehci-omap: Fix deferred probe for phy handling
	btrfs: Check for missing device before bio submission in btrfs_map_bio
	btrfs: fix ncopies raid_attr for RAID56
	btrfs: dev-replace: set result code of cancel by status of scrub
	Btrfs: allow clear_extent_dirty() to receive a cached extent state record
	btrfs: only track ref_heads in delayed_ref_updates
	serial: sh-sci: Fix crash in rx_timer_fn() on PIO fallback
	HID: intel-ish-hid: fixes incorrect error handling
	gpio: raspberrypi-exp: decrease refcount on firmware dt node
	serial: 8250: Rate limit serial port rx interrupts during input overruns
	kprobes/x86/xen: blacklist non-attachable xen interrupt functions
	xen/pciback: Check dev_data before using it
	kprobes: Blacklist symbols in arch-defined prohibited area
	kprobes/x86: Show x86-64 specific blacklisted symbols correctly
	vfio-mdev/samples: Use u8 instead of char for handle functions
	memory: omap-gpmc: Get the header of the enum
	pinctrl: xway: fix gpio-hog related boot issues
	net/mlx5: Continue driver initialization despite debugfs failure
	netfilter: nf_nat_sip: fix RTP/RTCP source port translations
	exofs_mount(): fix leaks on failure exits
	bnxt_en: Return linux standard errors in bnxt_ethtool.c
	bnxt_en: Save ring statistics before reset.
	bnxt_en: query force speeds before disabling autoneg mode.
	KVM: s390: unregister debug feature on failing arch init
	pinctrl: sh-pfc: r8a77990: Fix MOD_SEL0 SEL_I2C1 field width
	pinctrl: sh-pfc: sh7264: Fix PFCR3 and PFCR0 register configuration
	pinctrl: sh-pfc: sh7734: Fix shifted values in IPSR10
	HID: doc: fix wrong data structure reference for UHID_OUTPUT
	dm flakey: Properly corrupt multi-page bios.
	gfs2: take jdata unstuff into account in do_grow
	dm raid: fix false -EBUSY when handling check/repair message
	xfs: Align compat attrlist_by_handle with native implementation.
	xfs: Fix bulkstat compat ioctls on x32 userspace.
	IB/qib: Fix an error code in qib_sdma_verbs_send()
	clocksource/drivers/fttmr010: Fix invalid interrupt register access
	vxlan: Fix error path in __vxlan_dev_create()
	powerpc/book3s/32: fix number of bats in p/v_block_mapped()
	powerpc/xmon: fix dump_segments()
	drivers/regulator: fix a missing check of return value
	Bluetooth: hci_bcm: Handle specific unknown packets after firmware loading
	serial: max310x: Fix tx_empty() callback
	openrisc: Fix broken paths to arch/or32
	RDMA/srp: Propagate ib_post_send() failures to the SCSI mid-layer
	scsi: qla2xxx: deadlock by configfs_depend_item
	scsi: csiostor: fix incorrect dma device in case of vport
	brcmfmac: Fix access point mode
	ath6kl: Only use match sets when firmware supports it
	ath6kl: Fix off by one error in scan completion
	powerpc/perf: Fix unit_sel/cache_sel checks
	powerpc/32: Avoid unsupported flags with clang
	powerpc/prom: fix early DEBUG messages
	powerpc/mm: Make NULL pointer deferences explicit on bad page faults.
	powerpc/44x/bamboo: Fix PCI range
	vfio/spapr_tce: Get rid of possible infinite loop
	powerpc/powernv/eeh/npu: Fix uninitialized variables in opal_pci_eeh_freeze_status
	drbd: ignore "all zero" peer volume sizes in handshake
	drbd: reject attach of unsuitable uuids even if connected
	drbd: do not block when adjusting "disk-options" while IO is frozen
	drbd: fix print_st_err()'s prototype to match the definition
	IB/rxe: Make counters thread safe
	bpf/cpumap: make sure frame_size for build_skb is aligned if headroom isn't
	regulator: tps65910: fix a missing check of return value
	powerpc/83xx: handle machine check caused by watchdog timer
	powerpc/pseries: Fix node leak in update_lmb_associativity_index()
	powerpc: Fix HMIs on big-endian with CONFIG_RELOCATABLE=y
	crypto: mxc-scc - fix build warnings on ARM64
	pwm: clps711x: Fix period calculation
	net/netlink_compat: Fix a missing check of nla_parse_nested
	net/net_namespace: Check the return value of register_pernet_subsys()
	f2fs: fix block address for __check_sit_bitmap
	f2fs: fix to dirty inode synchronously
	um: Include sys/uio.h to have writev()
	um: Make GCOV depend on !KCOV
	net: (cpts) fix a missing check of clk_prepare
	net: stmicro: fix a missing check of clk_prepare
	net: dsa: bcm_sf2: Propagate error value from mdio_write
	atl1e: checking the status of atl1e_write_phy_reg
	tipc: fix a missing check of genlmsg_put
	net: marvell: fix a missing check of acpi_match_device
	net/wan/fsl_ucc_hdlc: Avoid double free in ucc_hdlc_probe()
	ocfs2: clear journal dirty flag after shutdown journal
	vmscan: return NODE_RECLAIM_NOSCAN in node_reclaim() when CONFIG_NUMA is n
	mm/page_alloc.c: free order-0 pages through PCP in page_frag_free()
	mm/page_alloc.c: use a single function to free page
	mm/page_alloc.c: deduplicate __memblock_free_early() and memblock_free()
	tools/vm/page-types.c: fix "kpagecount returned fewer pages than expected" failures
	netfilter: nf_tables: fix a missing check of nla_put_failure
	xprtrdma: Prevent leak of rpcrdma_rep objects
	infiniband: bnxt_re: qplib: Check the return value of send_message
	infiniband/qedr: Potential null ptr dereference of qp
	firmware: arm_sdei: fix wrong of_node_put() in init function
	firmware: arm_sdei: Fix DT platform device creation
	lib/genalloc.c: fix allocation of aligned buffer from non-aligned chunk
	lib/genalloc.c: use vzalloc_node() to allocate the bitmap
	fork: fix some -Wmissing-prototypes warnings
	drivers/base/platform.c: kmemleak ignore a known leak
	lib/genalloc.c: include vmalloc.h
	mtd: Check add_mtd_device() ret code
	tipc: fix memory leak in tipc_nl_compat_publ_dump
	net/core/neighbour: tell kmemleak about hash tables
	ata: ahci: mvebu: do Armada 38x configuration only on relevant SoCs
	PCI/MSI: Return -ENOSPC from pci_alloc_irq_vectors_affinity()
	net/core/neighbour: fix kmemleak minimal reference count for hash tables
	serial: 8250: Fix serial8250 initialization crash
	gpu: ipu-v3: pre: don't trigger update if buffer address doesn't change
	sfc: suppress duplicate nvmem partition types in efx_ef10_mtd_probe
	ip_tunnel: Make none-tunnel-dst tunnel port work with lwtunnel
	decnet: fix DN_IFREQ_SIZE
	net/smc: prevent races between smc_lgr_terminate() and smc_conn_free()
	net/smc: don't wait for send buffer space when data was already sent
	mm/hotplug: invalid PFNs from pfn_to_online_page()
	xfs: end sync buffer I/O properly on shutdown error
	net/smc: fix sender_free computation
	blktrace: Show requests without sector
	net/smc: fix byte_order for rx_curs_confirmed
	tipc: fix skb may be leaky in tipc_link_input
	ASoC: samsung: i2s: Fix prescaler setting for the secondary DAI
	sfc: initialise found bitmap in efx_ef10_mtd_probe
	geneve: change NET_UDP_TUNNEL dependency to select
	net: fix possible overflow in __sk_mem_raise_allocated()
	net: ip_gre: do not report erspan_ver for gre or gretap
	net: ip6_gre: do not report erspan_ver for ip6gre or ip6gretap
	sctp: don't compare hb_timer expire date before starting it
	bpf: decrease usercnt if bpf_map_new_fd() fails in bpf_map_get_fd_by_id()
	mmc: core: align max segment size with logical block size
	net: dev: Use unsigned integer as an argument to left-shift
	kvm: properly check debugfs dentry before using it
	bpf: drop refcount if bpf_map_new_fd() fails in map_create()
	net: hns3: Change fw error code NOT_EXEC to NOT_SUPPORTED
	net: hns3: fix PFC not setting problem for DCB module
	net: hns3: fix an issue for hclgevf_ae_get_hdev
	net: hns3: fix an issue for hns3_update_new_int_gl
	iommu/amd: Fix NULL dereference bug in match_hid_uid
	apparmor: delete the dentry in aafs_remove() to avoid a leak
	scsi: libsas: Support SATA PHY connection rate unmatch fixing during discovery
	ACPI / APEI: Don't wait to serialise with oops messages when panic()ing
	ACPI / APEI: Switch estatus pool to use vmalloc memory
	scsi: hisi_sas: shutdown axi bus to avoid exception CQ returned
	scsi: libsas: Check SMP PHY control function result
	RDMA/hns: Fix the bug with updating rq head pointer when flush cqe
	RDMA/hns: Bugfix for the scene without receiver queue
	RDMA/hns: Fix the state of rereg mr
	RDMA/hns: Use GFP_ATOMIC in hns_roce_v2_modify_qp
	ASoC: rt5645: Headphone Jack sense inverts on the LattePanda board
	powerpc/pseries/dlpar: Fix a missing check in dlpar_parse_cc_property()
	xdp: fix cpumap redirect SKB creation bug
	mtd: Remove a debug trace in mtdpart.c
	mm, gup: add missing refcount overflow checks on s390
	clk: at91: fix update bit maps on CFG_MOR write
	clk: at91: generated: set audio_pll_allowed in at91_clk_register_generated()
	usb: dwc2: use a longer core rest timeout in dwc2_core_reset()
	staging: rtl8192e: fix potential use after free
	staging: rtl8723bs: Drop ACPI device ids
	staging: rtl8723bs: Add 024c:0525 to the list of SDIO device-ids
	USB: serial: ftdi_sio: add device IDs for U-Blox C099-F9P
	mei: bus: prefix device names on bus with the bus name
	mei: me: add comet point V device id
	thunderbolt: Power cycle the router if NVM authentication fails
	xfrm: Fix memleak on xfrm state destroy
	media: v4l2-ctrl: fix flags for DO_WHITE_BALANCE
	net: macb: fix error format in dev_err()
	pwm: Clear chip_data in pwm_put()
	media: atmel: atmel-isc: fix asd memory allocation
	media: atmel: atmel-isc: fix INIT_WORK misplacement
	macvlan: schedule bc_work even if error
	net: psample: fix skb_over_panic
	openvswitch: fix flow command message size
	sctp: Fix memory leak in sctp_sf_do_5_2_4_dupcook
	slip: Fix use-after-free Read in slip_open
	openvswitch: drop unneeded BUG_ON() in ovs_flow_cmd_build_info()
	openvswitch: remove another BUG_ON()
	selftests: bpf: test_sockmap: handle file creation failures gracefully
	tipc: fix link name length check
	sctp: cache netns in sctp_ep_common
	net: sched: fix `tc -s class show` no bstats on class with nolock subqueues
	net: macb: add missed tasklet_kill
	ext4: add more paranoia checking in ext4_expand_extra_isize handling
	watchdog: sama5d4: fix WDD value to be always set to max
	net: macb: Fix SUBNS increment and increase resolution
	net: macb driver, check for SKBTX_HW_TSTAMP
	mtd: rawnand: atmel: Fix spelling mistake in error message
	mtd: rawnand: atmel: fix possible object reference leak
	mtd: spi-nor: cast to u64 to avoid uint overflows
	drm/atmel-hlcdc: revert shift by 8
	mailbox: stm32_ipcc: add spinlock to fix channels concurrent access
	tcp: exit if nothing to retransmit on RTO timeout
	HID: core: check whether Usage Page item is after Usage ID items
	crypto: stm32/hash - Fix hmac issue more than 256 bytes
	media: stm32-dcmi: fix DMA corruption when stopping streaming
	media: stm32-dcmi: fix check of pm_runtime_get_sync return value
	hwrng: stm32 - fix unbalanced pm_runtime_enable
	clk: stm32mp1: fix HSI divider flag
	clk: stm32mp1: fix mcu divider table
	clk: stm32mp1: add CLK_SET_RATE_NO_REPARENT to Kernel clocks
	clk: stm32mp1: parent clocks update
	mailbox: mailbox-test: fix null pointer if no mmio
	pinctrl: stm32: fix memory leak issue
	ASoC: stm32: i2s: fix dma configuration
	ASoC: stm32: i2s: fix 16 bit format support
	ASoC: stm32: i2s: fix IRQ clearing
	ASoC: stm32: sai: add missing put_device()
	dmaengine: stm32-dma: check whether length is aligned on FIFO threshold
	platform/x86: hp-wmi: Fix ACPI errors caused by too small buffer
	platform/x86: hp-wmi: Fix ACPI errors caused by passing 0 as input size
	net: fec: fix clock count mis-match
	Linux 4.19.88

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Ifd3801a77cb551be72788031e7fcfc8a1d4fd197
2019-12-05 12:02:49 +01:00
Wentao Wang
5643569bec mm/page_alloc.c: deduplicate __memblock_free_early() and memblock_free()
[ Upstream commit d31cfe7bff9109476da92c245b56083e9b48d60a ]

Link: http://lkml.kernel.org/r/C8ECE1B7A767434691FEEFA3A01765D72AFB8E78@MX203CL03.corp.emc.com
Signed-off-by: Wentao Wang <witallwang@gmail.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Mike Rapoport <rppt@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-12-05 09:20:58 +01:00
Qian Cai
88299f649b BACKPORT: mm/memblock.c: skip kmemleak for kasan_init()
(Upstream commit fed84c78527009d4f799a3ed9a566502fa026d82).

Kmemleak does not play well with KASAN (tested on both HPE Apollo 70 and
Huawei TaiShan 2280 aarch64 servers).

After calling start_kernel()->setup_arch()->kasan_init(), kmemleak early
log buffer went from something like 280 to 260000 which caused kmemleak
disabled and crash dump memory reservation failed.  The multitude of
kmemleak_alloc() calls is from nested loops while KASAN is setting up full
memory mappings, so let early kmemleak allocations skip those
memblock_alloc_internal() calls came from kasan_init() given that those
early KASAN memory mappings should not reference to other memory.  Hence,
no kmemleak false positives.

kasan_init
  kasan_map_populate [1]
    kasan_pgd_populate [2]
      kasan_pud_populate [3]
        kasan_pmd_populate [4]
          kasan_pte_populate [5]
            kasan_alloc_zeroed_page
              memblock_alloc_try_nid
                memblock_alloc_internal
                  kmemleak_alloc

[1] for_each_memblock(memory, reg)
[2] while (pgdp++, addr = next, addr != end)
[3] while (pudp++, addr = next, addr != end && pud_none(READ_ONCE(*pudp)))
[4] while (pmdp++, addr = next, addr != end && pmd_none(READ_ONCE(*pmdp)))
[5] while (ptep++, addr = next, addr != end && pte_none(READ_ONCE(*ptep)))

Link: http://lkml.kernel.org/r/1543442925-17794-1-git-send-email-cai@gmx.us
Signed-off-by: Qian Cai <cai@gmx.us>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Change-Id: I2423d511c3938c882c738341673b13b3beff5475
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Bug: 128674696
2019-09-24 17:44:13 -07:00
Mike Rapoport
a36aab890c mm/memblock.c: replace u64 with phys_addr_t where appropriate
Most functions in memblock already use phys_addr_t to represent a
physical address with __memblock_free_late() being an exception.

This patch replaces u64 with phys_addr_t in __memblock_free_late() and
switches several format strings from %llx to %pa to avoid casting from
phys_addr_t to u64.

Link: http://lkml.kernel.org/r/1530637506-1256-1-git-send-email-rppt@linux.vnet.ibm.com
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Reviewed-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Pasha Tatashin <Pavel.Tatashin@microsoft.com>
Cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-08-17 16:20:30 -07:00
Linus Torvalds
e6ecec342f Merge tag 'docs-4.19' of git://git.lwn.net/linux
Pull documentation update from Jonathan Corbet:
 "This was a moderately busy cycle for docs, with the usual collection
  of small fixes and updates.

  We also have new ktime_get_*() docs from Arnd, some kernel-doc fixes,
  a new set of Italian translations (non so se vale la pena, ma non fa
  male - speriamo bene), and some extensive early memory-management
  documentation improvements from Mike Rapoport"

* tag 'docs-4.19' of git://git.lwn.net/linux: (52 commits)
  Documentation: corrections to console/console.txt
  Documentation: add ioctl number entry for v4l2-subdev.h
  Remove gendered language from management style documentation
  scripts/kernel-doc: Escape all literal braces in regexes
  docs/mm: add description of boot time memory management
  docs/mm: memblock: add overview documentation
  docs/mm: memblock: add kernel-doc description for memblock types
  docs/mm: memblock: add kernel-doc comments for memblock_add[_node]
  docs/mm: memblock: update kernel-doc comments
  mm/memblock: add a name for memblock flags enumeration
  docs/mm: bootmem: add overview documentation
  docs/mm: bootmem: add kernel-doc description of 'struct bootmem_data'
  docs/mm: bootmem: fix kernel-doc warnings
  docs/mm: nobootmem: fixup kernel-doc comments
  mm/bootmem: drop duplicated kernel-doc comments
  Documentation: vm.txt: Adding 'nr_hugepages_mempolicy' parameter description.
  doc:it_IT: translation for kernel-hacking
  docs: Fix the reference labels in Locking.rst
  doc: tracing: Fix a typo of trace_stat
  mm: Introduce new type vm_fault_t
  ...
2018-08-14 14:29:31 -07:00
Mike Rapoport
3e039c5c0a docs/mm: memblock: add overview documentation
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2018-08-02 12:17:29 -06:00
Mike Rapoport
48a833cc74 docs/mm: memblock: add kernel-doc comments for memblock_add[_node]
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2018-08-02 12:17:28 -06:00
Mike Rapoport
47cec4432a docs/mm: memblock: update kernel-doc comments
* make memblock_discard description kernel-doc compatible
* add brief description for memblock_setclr_flag and describe its
  parameters
* fixup return value descriptions

Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2018-08-02 12:17:28 -06:00
Mike Rapoport
e1720fee27 mm/memblock: add a name for memblock flags enumeration
Since kernel-doc does not like anonymous enums the name is required for
adding documentation. While on it, I've also updated all the function
declarations to use 'enum memblock_flags' instead of unsigned long.

Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2018-08-02 12:17:27 -06:00
Mathieu Malaterre
1937367205 mm/memblock: add missing include <linux/bootmem.h>
Commit 26f09e9b3a ("mm/memblock: add memblock memory allocation apis")
introduced two new function definitions:

  memblock_virt_alloc_try_nid_nopanic()
  memblock_virt_alloc_try_nid()

and commit ea1f5f3712 ("mm: define memblock_virt_alloc_try_nid_raw")
introduced the following function definition:

  memblock_virt_alloc_try_nid_raw()

This commit adds an include of header file <linux/bootmem.h> to provide
the missing function prototypes.  This silences the following gcc warning
(W=1):

  mm/memblock.c:1334:15: warning: no previous prototype for `memblock_virt_alloc_try_nid_raw' [-Wmissing-prototypes]
  mm/memblock.c:1371:15: warning: no previous prototype for `memblock_virt_alloc_try_nid_nopanic' [-Wmissing-prototypes]
  mm/memblock.c:1407:15: warning: no previous prototype for `memblock_virt_alloc_try_nid' [-Wmissing-prototypes]

Also adds #ifdef blockers to prevent compilation failure on mips/ia64
where CONFIG_NO_BOOTMEM=n as could be seen in commit commit 6cc22dc08a
("revert "mm/memblock: add missing include <linux/bootmem.h>"").

Because Makefile already does:

  obj-$(CONFIG_HAVE_MEMBLOCK) += memblock.o

The #ifdef has been simplified from:

  #if defined(CONFIG_HAVE_MEMBLOCK) && defined(CONFIG_NO_BOOTMEM)

to simply:

  #if defined(CONFIG_NO_BOOTMEM)

Link: http://lkml.kernel.org/r/20180626184422.24974-1-malat@debian.org
Signed-off-by: Mathieu Malaterre <malat@debian.org>
Suggested-by: Tony Luck <tony.luck@intel.com>
Suggested-by: Michal Hocko <mhocko@kernel.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-07-21 12:50:46 -07:00
Michal Hocko
e3d301cae0 mm/memblock.c: do not complain about top-down allocations for !MEMORY_HOTREMOVE
Mike Rapoport is converting architectures from bootmem to nobootmem
allocator.  While doing so for m68k Geert has noticed that he gets a
scary looking warning:

  WARNING: CPU: 0 PID: 0 at mm/memblock.c:230
  memblock_find_in_range_node+0x11c/0x1be
  memblock: bottom-up allocation failed, memory hotunplug may be affected
  Modules linked in:
  CPU: 0 PID: 0 Comm: swapper Not tainted
  4.18.0-rc3-atari-01343-gf2fb5f2e09a97a3c-dirty #7
  Call Trace: __warn+0xa8/0xc2
    kernel_pg_dir+0x0/0x1000
    netdev_lower_get_next+0x2/0x22
    warn_slowpath_fmt+0x2e/0x36
    memblock_find_in_range_node+0x11c/0x1be
    memblock_find_in_range_node+0x11c/0x1be
    memblock_find_in_range_node+0x0/0x1be
    vprintk_func+0x66/0x6e
    memblock_virt_alloc_internal+0xd0/0x156
    netdev_lower_get_next+0x2/0x22
    netdev_lower_get_next+0x2/0x22
    kernel_pg_dir+0x0/0x1000
    memblock_virt_alloc_try_nid_nopanic+0x58/0x7a
    netdev_lower_get_next+0x2/0x22
    kernel_pg_dir+0x0/0x1000
    kernel_pg_dir+0x0/0x1000
    EXPTBL+0x234/0x400
    EXPTBL+0x234/0x400
    alloc_node_mem_map+0x4a/0x66
    netdev_lower_get_next+0x2/0x22
    free_area_init_node+0xe2/0x29e
    EXPTBL+0x234/0x400
    paging_init+0x430/0x462
    kernel_pg_dir+0x0/0x1000
    printk+0x0/0x1a
    EXPTBL+0x234/0x400
    setup_arch+0x1b8/0x22c
    start_kernel+0x4a/0x40a
    _sinittext+0x344/0x9e8

The warning is basically saying that a top-down allocation can break
memory hotremove because memblock allocation is not movable.  But m68k
doesn't even support MEMORY_HOTREMOVE so there is no point to warn about
it.

Make the warning conditional only to configurations that care.

Link: http://lkml.kernel.org/r/20180706061750.GH32658@dhcp22.suse.cz
Signed-off-by: Michal Hocko <mhocko@suse.com>
Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Tested-by: Geert Uytterhoeven <geert@linux-m68k.org>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Greg Ungerer <gerg@linux-m68k.org>
Cc: Sam Creasey <sammy@sammy.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-07-14 11:11:10 -07:00
Andrew Morton
6cc22dc08a revert "mm/memblock: add missing include <linux/bootmem.h>"
The patch fixed a W=1 warning but broke the ia64 build:

    CC      mm/memblock.o
  mm/memblock.c:1340: error: redefinition of `memblock_virt_alloc_try_nid_raw'
  ./include/linux/bootmem.h:335: error: previous definition of `memblock_virt_alloc_try_nid_raw' was here

Because inlcude/linux/bootmem.h says

	#if defined(CONFIG_HAVE_MEMBLOCK) && defined(CONFIG_NO_BOOTMEM)

whereas mm/Makefile says

	obj-$(CONFIG_HAVE_MEMBLOCK) += memblock.o

So revert 26f09e9b3a ("mm/memblock: add missing include
<linux/bootmem.h>") while a full fix can be worked on.

Fixes: 26f09e9b3a ("mm/memblock: add missing include <linux/bootmem.h>")
Reported-by: Tony Luck <tony.luck@gmail.com>
Cc: Mathieu Malaterre <malat@debian.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-06-19 07:43:44 +09:00
Joe Perches
0825a6f986 mm: use octal not symbolic permissions
mm/*.c files use symbolic and octal styles for permissions.

Using octal and not symbolic permissions is preferred by many as more
readable.

https://lkml.org/lkml/2016/8/2/1945

Prefer the direct use of octal for permissions.

Done using
$ scripts/checkpatch.pl -f --types=SYMBOLIC_PERMS --fix-inplace mm/*.c
and some typing.

Before:	 $ git grep -P -w "0[0-7]{3,3}" mm | wc -l
44
After:	 $ git grep -P -w "0[0-7]{3,3}" mm | wc -l
86

Miscellanea:

o Whitespace neatening around these conversions.

Link: http://lkml.kernel.org/r/2e032ef111eebcd4c5952bae86763b541d373469.1522102887.git.joe@perches.com
Signed-off-by: Joe Perches <joe@perches.com>
Acked-by: David Rientjes <rientjes@google.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-06-15 07:55:25 +09:00
Mathieu Malaterre
69b5086b12 mm/memblock: add missing include <linux/bootmem.h>
Commit 26f09e9b3a ("mm/memblock: add memblock memory allocation apis")
introduced two new function definitions:

  memblock_virt_alloc_try_nid_nopanic()
  memblock_virt_alloc_try_nid()

Commit ea1f5f3712 ("mm: define memblock_virt_alloc_try_nid_raw")
introduced the following function definition:

  memblock_virt_alloc_try_nid_raw()

This commit adds an includeof header file <linux/bootmem.h> to provide
the missing function prototypes.  Silence the following gcc warning
(W=1):

  mm/memblock.c:1334:15: warning: no previous prototype for `memblock_virt_alloc_try_nid_raw' [-Wmissing-prototypes]
  mm/memblock.c:1371:15: warning: no previous prototype for `memblock_virt_alloc_try_nid_nopanic' [-Wmissing-prototypes]
  mm/memblock.c:1407:15: warning: no previous prototype for `memblock_virt_alloc_try_nid' [-Wmissing-prototypes]

Link: http://lkml.kernel.org/r/20180606194144.16990-1-malat@debian.org
Signed-off-by: Mathieu Malaterre <malat@debian.org>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-06-15 07:55:24 +09:00
Minchan Kim
25cf23d7a9 mm/memblock: print memblock_remove
memblock_remove report is useful to see why MemTotal of /proc/meminfo
between two kernels makes difference.

Link: http://lkml.kernel.org/r/20180508104223.8028-1-minchan@kernel.org
Signed-off-by: Minchan Kim <minchan@kernel.org>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-06-07 17:34:36 -07:00
Stefan Agner
1c4bc43ddf mm/memblock: introduce PHYS_ADDR_MAX
So far code was using ULLONG_MAX and type casting to obtain a
phys_addr_t with all bits set.  The typecast is necessary to silence
compiler warnings on 32-bit platforms.

Use the simpler but still type safe approach "~(phys_addr_t)0" to create a
preprocessor define for all bits set.

Link: http://lkml.kernel.org/r/20180406213809.566-1-stefan@agner.ch
Signed-off-by: Stefan Agner <stefan@agner.ch>
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Pavel Tatashin <pasha.tatashin@oracle.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-06-07 17:34:35 -07:00
Linus Torvalds
49a695ba72 Merge tag 'powerpc-4.17-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc updates from Michael Ellerman:
 "Notable changes:

   - Support for 4PB user address space on 64-bit, opt-in via mmap().

   - Removal of POWER4 support, which was accidentally broken in 2016
     and no one noticed, and blocked use of some modern instructions.

   - Workarounds so that the hypervisor can enable Transactional Memory
     on Power9.

   - A series to disable the DAWR (Data Address Watchpoint Register) on
     Power9.

   - More information displayed in the meltdown/spectre_v1/v2 sysfs
     files.

   - A vpermxor (Power8 Altivec) implementation for the raid6 Q
     Syndrome.

   - A big series to make the allocation of our pacas (per cpu area),
     kernel page tables, and per-cpu stacks NUMA aware when using the
     Radix MMU on Power9.

  And as usual many fixes, reworks and cleanups.

  Thanks to: Aaro Koskinen, Alexandre Belloni, Alexey Kardashevskiy,
  Alistair Popple, Andy Shevchenko, Aneesh Kumar K.V, Anshuman Khandual,
  Balbir Singh, Benjamin Herrenschmidt, Christophe Leroy, Christophe
  Lombard, Cyril Bur, Daniel Axtens, Dave Young, Finn Thain, Frederic
  Barrat, Gustavo Romero, Horia Geantă, Jonathan Neuschäfer, Kees Cook,
  Larry Finger, Laurent Dufour, Laurent Vivier, Logan Gunthorpe,
  Madhavan Srinivasan, Mark Greer, Mark Hairgrove, Markus Elfring,
  Mathieu Malaterre, Matt Brown, Matt Evans, Mauricio Faria de Oliveira,
  Michael Neuling, Naveen N. Rao, Nicholas Piggin, Paul Mackerras,
  Philippe Bergheaud, Ram Pai, Rob Herring, Sam Bobroff, Segher
  Boessenkool, Simon Guo, Simon Horman, Stewart Smith, Sukadev
  Bhattiprolu, Suraj Jitindar Singh, Thiago Jung Bauermann, Vaibhav
  Jain, Vaidyanathan Srinivasan, Vasant Hegde, Wei Yongjun"

* tag 'powerpc-4.17-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (207 commits)
  powerpc/64s/idle: Fix restore of AMOR on POWER9 after deep sleep
  powerpc/64s: Fix POWER9 DD2.2 and above in cputable features
  powerpc/64s: Fix pkey support in dt_cpu_ftrs, add CPU_FTR_PKEY bit
  powerpc/64s: Fix dt_cpu_ftrs to have restore_cpu clear unwanted LPCR bits
  Revert "powerpc/64s/idle: POWER9 ESL=0 stop avoid save/restore overhead"
  powerpc: iomap.c: introduce io{read|write}64_{lo_hi|hi_lo}
  powerpc: io.h: move iomap.h include so that it can use readq/writeq defs
  cxl: Fix possible deadlock when processing page faults from cxllib
  powerpc/hw_breakpoint: Only disable hw breakpoint if cpu supports it
  powerpc/mm/radix: Update command line parsing for disable_radix
  powerpc/mm/radix: Parse disable_radix commandline correctly.
  powerpc/mm/hugetlb: initialize the pagetable cache correctly for hugetlb
  powerpc/mm/radix: Update pte fragment count from 16 to 256 on radix
  powerpc/mm/keys: Update documentation and remove unnecessary check
  powerpc/64s/idle: POWER9 ESL=0 stop avoid save/restore overhead
  powerpc/64s/idle: Consolidate power9_offline_stop()/power9_idle_stop()
  powerpc/powernv: Always stop secondaries before reboot/shutdown
  powerpc: hard disable irqs in smp_send_stop loop
  powerpc: use NMI IPI for smp_send_stop
  powerpc/powernv: Fix SMT4 forcing idle code
  ...
2018-04-07 12:08:19 -07:00
Stefan Agner
644d87dccd mm/memblock.c: cast constant ULLONG_MAX to phys_addr_t
This fixes a warning shown when phys_addr_t is 32-bit int when compiling
with clang:

  mm/memblock.c:927:15: warning: implicit conversion from 'unsigned long long'
        to 'phys_addr_t' (aka 'unsigned int') changes value from
        18446744073709551615 to 4294967295 [-Wconstant-conversion]
                                  r->base : ULLONG_MAX;
                                            ^~~~~~~~~~
  ./include/linux/kernel.h:30:21: note: expanded from macro 'ULLONG_MAX'
  #define ULLONG_MAX      (~0ULL)
                           ^~~~~

Link: http://lkml.kernel.org/r/20180319005645.29051-1-stefan@agner.ch
Signed-off-by: Stefan Agner <stefan@agner.ch>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Pavel Tatashin <pasha.tatashin@oracle.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-05 21:36:27 -07:00
Randy Dunlap
514c603249 headers: untangle kmemleak.h from mm.h
Currently <linux/slab.h> #includes <linux/kmemleak.h> for no obvious
reason.  It looks like it's only a convenience, so remove kmemleak.h
from slab.h and add <linux/kmemleak.h> to any users of kmemleak_* that
don't already #include it.  Also remove <linux/kmemleak.h> from source
files that do not use it.

This is tested on i386 allmodconfig and x86_64 allmodconfig.  It would
be good to run it through the 0day bot for other $ARCHes.  I have
neither the horsepower nor the storage space for the other $ARCHes.

Update: This patch has been extensively build-tested by both the 0day
bot & kisskb/ozlabs build farms.  Both of them reported 2 build failures
for which patches are included here (in v2).

[ slab.h is the second most used header file after module.h; kernel.h is
  right there with slab.h. There could be some minor error in the
  counting due to some #includes having comments after them and I didn't
  combine all of those. ]

[akpm@linux-foundation.org: security/keys/big_key.c needs vmalloc.h, per sfr]
Link: http://lkml.kernel.org/r/e4309f98-3749-93e1-4bb7-d9501a39d015@infradead.org
Link: http://kisskb.ellerman.id.au/kisskb/head/13396/
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Reported-by: Michael Ellerman <mpe@ellerman.id.au>	[2 build failures]
Reported-by: Fengguang Wu <fengguang.wu@intel.com>	[2 build failures]
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Wei Yongjun <weiyongjun1@huawei.com>
Cc: Luis R. Rodriguez <mcgrof@kernel.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Mimi Zohar <zohar@linux.vnet.ibm.com>
Cc: John Johansen <john.johansen@canonical.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-05 21:36:27 -07:00
Andy Shevchenko
5ad3509364 mm: reuse DEFINE_SHOW_ATTRIBUTE() macro
...instead of open coding file operations followed by custom ->open()
callbacks per each attribute.

[andriy.shevchenko@linux.intel.com: add tags, fix compilation issue]
  Link: http://lkml.kernel.org/r/20180217144253.58604-1-andriy.shevchenko@linux.intel.com
Link: http://lkml.kernel.org/r/20180214154644.54505-1-andriy.shevchenko@linux.intel.com
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Reviewed-by: Matthew Wilcox <mawilcox@microsoft.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: Christoph Lameter <cl@linux.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Dennis Zhou <dennisszhou@gmail.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Nitin Gupta <ngupta@vflare.org>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-05 21:36:25 -07:00
Pavel Tatashin
f165b378bb mm: uninitialized struct page poisoning sanity checking
During boot we poison struct page memory in order to ensure that no one
is accessing this memory until the struct pages are initialized in
__init_single_page().

This patch adds more scrutiny to this checking by making sure that flags
do not equal the poison pattern when they are accessed.  The pattern is
all ones.

Since node id is also stored in struct page, and may be accessed quite
early, we add this enforcement into page_to_nid() function as well.
Note, this is applicable only when NODE_NOT_IN_PAGE_FLAGS=n

[pasha.tatashin@oracle.com: v4]
  Link: http://lkml.kernel.org/r/20180215165920.8570-4-pasha.tatashin@oracle.com
Link: http://lkml.kernel.org/r/20180213193159.14606-4-pasha.tatashin@oracle.com
Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Bharata B Rao <bharata@linux.vnet.ibm.com>
Cc: Daniel Jordan <daniel.m.jordan@oracle.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Steven Sistare <steven.sistare@oracle.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-05 21:36:25 -07:00
Pavel Tatashin
c9e97a1997 mm: initialize pages on demand during boot
Deferred page initialization allows the boot cpu to initialize a small
subset of the system's pages early in boot, with other cpus doing the
rest later on.

It is, however, problematic to know how many pages the kernel needs
during boot.  Different modules and kernel parameters may change the
requirement, so the boot cpu either initializes too many pages or runs
out of memory.

To fix that, initialize early pages on demand.  This ensures the kernel
does the minimum amount of work to initialize pages during boot and
leaves the rest to be divided in the multithreaded initialization path
(deferred_init_memmap).

The on-demand code is permanently disabled using static branching once
deferred pages are initialized.  After the static branch is changed to
false, the overhead is up-to two branch-always instructions if the zone
watermark check fails or if rmqueue fails.

Sergey Senozhatsky noticed that while deferred pages currently make
sense only on NUMA machines (we start one thread per latency node),
CONFIG_NUMA is not a requirement for CONFIG_DEFERRED_STRUCT_PAGE_INIT,
so that is also must be addressed in the patch.

[akpm@linux-foundation.org: fix typo in comment, make deferred_pages static]
[pasha.tatashin@oracle.com: fix min() type mismatch warning]
  Link: http://lkml.kernel.org/r/20180212164543.26592-1-pasha.tatashin@oracle.com
[pasha.tatashin@oracle.com: use zone_to_nid() in deferred_grow_zone()]
  Link: http://lkml.kernel.org/r/20180214163343.21234-2-pasha.tatashin@oracle.com
[pasha.tatashin@oracle.com: might_sleep warning]
  Link: http://lkml.kernel.org/r/20180306192022.28289-1-pasha.tatashin@oracle.com
[akpm@linux-foundation.org: s/spin_lock/spin_lock_irq/ in page_alloc_init_late()]
[pasha.tatashin@oracle.com: v5]
  Link: http://lkml.kernel.org/r/20180309220807.24961-3-pasha.tatashin@oracle.com
[akpm@linux-foundation.org: tweak comments]
[pasha.tatashin@oracle.com: v6]
  Link: http://lkml.kernel.org/r/20180313182355.17669-3-pasha.tatashin@oracle.com
[akpm@linux-foundation.org: coding-style fixes]
Link: http://lkml.kernel.org/r/20180209192216.20509-2-pasha.tatashin@oracle.com
Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Reviewed-by: Daniel Jordan <daniel.m.jordan@oracle.com>
Reviewed-by: Steven Sistare <steven.sistare@oracle.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Tested-by: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com>
Acked-by: Mel Gorman <mgorman@suse.de>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: AKASHI Takahiro <takahiro.akashi@linaro.org>
Cc: Gioh Kim <gi-oh.kim@profitbricks.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Yaowei Bai <baiyaowei@cmss.chinamobile.com>
Cc: Wei Yang <richard.weiyang@gmail.com>
Cc: Paul Burton <paul.burton@mips.com>
Cc: Miles Chen <miles.chen@mediatek.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-05 21:36:25 -07:00
Nicholas Piggin
b575454fa3 mm: make memblock_alloc_base_nid() non-static
This will be used by powerpc to allocate per-cpu stacks and other
data structures node-local where possible.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
[mpe: Drop stray change to memblock_alloc_range() as noticed by akpm]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-03-30 23:34:25 +11:00
Daniel Vacek
f59f1caf72 Revert "mm: page_alloc: skip over regions of invalid pfns where possible"
This reverts commit b92df1de5d ("mm: page_alloc: skip over regions of
invalid pfns where possible").  The commit is meant to be a boot init
speed up skipping the loop in memmap_init_zone() for invalid pfns.

But given some specific memory mapping on x86_64 (or more generally
theoretically anywhere but on arm with CONFIG_HAVE_ARCH_PFN_VALID) the
implementation also skips valid pfns which is plain wrong and causes
'kernel BUG at mm/page_alloc.c:1389!'

  crash> log | grep -e BUG -e RIP -e Call.Trace -e move_freepages_block -e rmqueue -e freelist -A1
  kernel BUG at mm/page_alloc.c:1389!
  invalid opcode: 0000 [#1] SMP
  --
  RIP: 0010: move_freepages+0x15e/0x160
  --
  Call Trace:
    move_freepages_block+0x73/0x80
    __rmqueue+0x263/0x460
    get_page_from_freelist+0x7e1/0x9e0
    __alloc_pages_nodemask+0x176/0x420
  --

  crash> page_init_bug -v | grep RAM
  <struct resource 0xffff88067fffd2f8>          1000 -        9bfff       System RAM (620.00 KiB)
  <struct resource 0xffff88067fffd3a0>        100000 -     430bffff       System RAM (  1.05 GiB = 1071.75 MiB = 1097472.00 KiB)
  <struct resource 0xffff88067fffd410>      4b0c8000 -     4bf9cfff       System RAM ( 14.83 MiB = 15188.00 KiB)
  <struct resource 0xffff88067fffd480>      4bfac000 -     646b1fff       System RAM (391.02 MiB = 400408.00 KiB)
  <struct resource 0xffff88067fffd560>      7b788000 -     7b7fffff       System RAM (480.00 KiB)
  <struct resource 0xffff88067fffd640>     100000000 -    67fffffff       System RAM ( 22.00 GiB)

  crash> page_init_bug | head -6
  <struct resource 0xffff88067fffd560>      7b788000 -     7b7fffff       System RAM (480.00 KiB)
  <struct page 0xffffea0001ede200>   1fffff00000000  0 <struct pglist_data 0xffff88047ffd9000> 1 <struct zone 0xffff88047ffd9800> DMA32          4096    1048575
  <struct page 0xffffea0001ede200>       505736 505344 <struct page 0xffffea0001ed8000> 505855 <struct page 0xffffea0001edffc0>
  <struct page 0xffffea0001ed8000>                0  0 <struct pglist_data 0xffff88047ffd9000> 0 <struct zone 0xffff88047ffd9000> DMA               1       4095
  <struct page 0xffffea0001edffc0>   1fffff00000400  0 <struct pglist_data 0xffff88047ffd9000> 1 <struct zone 0xffff88047ffd9800> DMA32          4096    1048575
  BUG, zones differ!

  crash> kmem -p 77fff000 78000000 7b5ff000 7b600000 7b787000 7b788000
        PAGE        PHYSICAL      MAPPING       INDEX CNT FLAGS
  ffffea0001e00000  78000000                0        0  0 0
  ffffea0001ed7fc0  7b5ff000                0        0  0 0
  ffffea0001ed8000  7b600000                0        0  0 0       <<<<
  ffffea0001ede1c0  7b787000                0        0  0 0
  ffffea0001ede200  7b788000                0        0  1 1fffff00000000

Link: http://lkml.kernel.org/r/20180316143855.29838-1-neelx@redhat.com
Fixes: b92df1de5d ("mm: page_alloc: skip over regions of invalid pfns where possible")
Signed-off-by: Daniel Vacek <neelx@redhat.com>
Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Pavel Tatashin <pasha.tatashin@oracle.com>
Cc: Paul Burton <paul.burton@imgtec.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-03-22 17:07:01 -07:00
Daniel Vacek
379b03b7fa mm/memblock.c: hardcode the end_pfn being -1
This is just a cleanup.  It aids handling the special end case in the
next commit.

[akpm@linux-foundation.org: make it work against current -linus, not against -mm]
[akpm@linux-foundation.org: make it work against current -linus, not against -mm some more]
Link: http://lkml.kernel.org/r/1ca478d4269125a99bcfb1ca04d7b88ac1aee924.1520011944.git.neelx@redhat.com
Signed-off-by: Daniel Vacek <neelx@redhat.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Pavel Tatashin <pasha.tatashin@oracle.com>
Cc: Paul Burton <paul.burton@imgtec.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-03-09 16:40:01 -08:00
Yaowei Bai
937f0c2675 mm/memblock: memblock_is_map/region_memory can be boolean
Make memblock_is_map/region_memory return bool due to these two
functions only using either true or false as its return value.

No functional change.

Link: http://lkml.kernel.org/r/1513266622-15860-2-git-send-email-baiyaowei@cmss.chinamobile.com
Signed-off-by: Yaowei Bai <baiyaowei@cmss.chinamobile.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-02-06 18:32:47 -08:00
Pavel Tatashin
ea1f5f3712 mm: define memblock_virt_alloc_try_nid_raw
* A new variant of memblock_virt_alloc_* allocations:
memblock_virt_alloc_try_nid_raw()
    - Does not zero the allocated memory
    - Does not panic if request cannot be satisfied

* optimize early system hash allocations

Clients can call alloc_large_system_hash() with flag: HASH_ZERO to
specify that memory that was allocated for system hash needs to be
zeroed, otherwise the memory does not need to be zeroed, and client will
initialize it.

If memory does not need to be zero'd, call the new
memblock_virt_alloc_raw() interface, and thus improve the boot
performance.

* debug for raw alloctor

When CONFIG_DEBUG_VM is enabled, this patch sets all the memory that is
returned by memblock_virt_alloc_try_nid_raw() to ones to ensure that no
places excpect zeroed memory.

Link: http://lkml.kernel.org/r/20171013173214.27300-6-pasha.tatashin@oracle.com
Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Reviewed-by: Steven Sistare <steven.sistare@oracle.com>
Reviewed-by: Daniel Jordan <daniel.m.jordan@oracle.com>
Reviewed-by: Bob Picco <bob.picco@oracle.com>
Tested-by: Bob Picco <bob.picco@oracle.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Sam Ravnborg <sam@ravnborg.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-15 18:21:05 -08:00
Gioh Kim
66e8b438bd mm/memblock.c: make the index explicit argument of for_each_memblock_type
for_each_memblock_type macro function relies on idx variable defined in
the caller context.  Silent macro arguments are almost always wrong
thing to do.  They make code harder to read and easier to get wrong.
Let's use an explicit iterator parameter for for_each_memblock_type and
make the code more obious.  This patch is a mere cleanup and it
shouldn't introduce any functional change.

Link: http://lkml.kernel.org/r/20170913133029.28911-1-gi-oh.kim@profitbricks.com
Signed-off-by: Gioh Kim <gi-oh.kim@profitbricks.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-15 18:21:02 -08:00
Pavel Tatashin
91b540f988 mm/memblock.c: reversed logic in memblock_discard()
In recently introduced memblock_discard() there is a reversed logic bug.
Memory is freed of static array instead of dynamically allocated one.

Link: http://lkml.kernel.org/r/1503511441-95478-2-git-send-email-pasha.tatashin@oracle.com
Fixes: 3010f87650 ("mm: discard memblock data later")
Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Reported-by: Woody Suwalski <terraluna977@gmail.com>
Tested-by: Woody Suwalski <terraluna977@gmail.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-08-25 16:12:46 -07:00
Pavel Tatashin
3010f87650 mm: discard memblock data later
There is existing use after free bug when deferred struct pages are
enabled:

The memblock_add() allocates memory for the memory array if more than
128 entries are needed.  See comment in e820__memblock_setup():

  * The bootstrap memblock region count maximum is 128 entries
  * (INIT_MEMBLOCK_REGIONS), but EFI might pass us more E820 entries
  * than that - so allow memblock resizing.

This memblock memory is freed here:
        free_low_memory_core_early()

We access the freed memblock.memory later in boot when deferred pages
are initialized in this path:

        deferred_init_memmap()
                for_each_mem_pfn_range()
                  __next_mem_pfn_range()
                    type = &memblock.memory;

One possible explanation for why this use-after-free hasn't been hit
before is that the limit of INIT_MEMBLOCK_REGIONS has never been
exceeded at least on systems where deferred struct pages were enabled.

Tested by reducing INIT_MEMBLOCK_REGIONS down to 4 from the current 128,
and verifying in qemu that this code is getting excuted and that the
freed pages are sane.

Link: http://lkml.kernel.org/r/1502485554-318703-2-git-send-email-pasha.tatashin@oracle.com
Fixes: 7e18adb4f8 ("mm: meminit: initialise remaining struct pages in parallel with kswapd")
Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Reviewed-by: Steven Sistare <steven.sistare@oracle.com>
Reviewed-by: Daniel Jordan <daniel.m.jordan@oracle.com>
Reviewed-by: Bob Picco <bob.picco@oracle.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-08-18 15:32:01 -07:00
Michal Hocko
4932381ee2 mm, memory_hotplug: move movable_node to the hotplug proper
movable_node_is_enabled is defined in memblock proper while it is
initialized from the memory hotplug proper.  This is quite messy and it
makes a dependency between the two so move movable_node along with the
helper functions to memory_hotplug.

To make it more entertaining the kernel parameter is ignored unless
CONFIG_HAVE_MEMBLOCK_NODE_MAP=y because we do not have the node
information for each memblock otherwise.  So let's warn when the option
is disabled.

Link: http://lkml.kernel.org/r/20170529114141.536-4-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Acked-by: Reza Arbab <arbab@linux.vnet.ibm.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Yasuaki Ishimatsu <yasu.isimatu@gmail.com>
Cc: Xishi Qiu <qiuxishi@huawei.com>
Cc: Kani Toshimitsu <toshi.kani@hpe.com>
Cc: Chen Yucong <slaoub@gmail.com>
Cc: Joonsoo Kim <js1304@gmail.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Daniel Kiper <daniel.kiper@oracle.com>
Cc: Igor Mammedov <imammedo@redhat.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-07-06 16:24:35 -07:00
Michal Hocko
f70029bbaa mm, memory_hotplug: drop CONFIG_MOVABLE_NODE
Commit 20b2f52b73 ("numa: add CONFIG_MOVABLE_NODE for
movable-dedicated node") has introduced CONFIG_MOVABLE_NODE without a
good explanation on why it is actually useful.

It makes a lot of sense to make movable node semantic opt in but we
already have that because the feature has to be explicitly enabled on
the kernel command line.  A config option on top only makes the
configuration space larger without a good reason.  It also adds an
additional ifdefery that pollutes the code.

Just drop the config option and make it de-facto always enabled.  This
shouldn't introduce any change to the semantic.

Link: http://lkml.kernel.org/r/20170529114141.536-3-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Acked-by: Reza Arbab <arbab@linux.vnet.ibm.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Yasuaki Ishimatsu <yasu.isimatu@gmail.com>
Cc: Xishi Qiu <qiuxishi@huawei.com>
Cc: Kani Toshimitsu <toshi.kani@hpe.com>
Cc: Chen Yucong <slaoub@gmail.com>
Cc: Joonsoo Kim <js1304@gmail.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Daniel Kiper <daniel.kiper@oracle.com>
Cc: Igor Mammedov <imammedo@redhat.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-07-06 16:24:35 -07:00
Michal Hocko
864b9a393d mm: consider memblock reservations for deferred memory initialization sizing
We have seen an early OOM killer invocation on ppc64 systems with
crashkernel=4096M:

	kthreadd invoked oom-killer: gfp_mask=0x16040c0(GFP_KERNEL|__GFP_COMP|__GFP_NOTRACK), nodemask=7, order=0, oom_score_adj=0
	kthreadd cpuset=/ mems_allowed=7
	CPU: 0 PID: 2 Comm: kthreadd Not tainted 4.4.68-1.gd7fe927-default #1
	Call Trace:
	  dump_stack+0xb0/0xf0 (unreliable)
	  dump_header+0xb0/0x258
	  out_of_memory+0x5f0/0x640
	  __alloc_pages_nodemask+0xa8c/0xc80
	  kmem_getpages+0x84/0x1a0
	  fallback_alloc+0x2a4/0x320
	  kmem_cache_alloc_node+0xc0/0x2e0
	  copy_process.isra.25+0x260/0x1b30
	  _do_fork+0x94/0x470
	  kernel_thread+0x48/0x60
	  kthreadd+0x264/0x330
	  ret_from_kernel_thread+0x5c/0xa4

	Mem-Info:
	active_anon:0 inactive_anon:0 isolated_anon:0
	 active_file:0 inactive_file:0 isolated_file:0
	 unevictable:0 dirty:0 writeback:0 unstable:0
	 slab_reclaimable:5 slab_unreclaimable:73
	 mapped:0 shmem:0 pagetables:0 bounce:0
	 free:0 free_pcp:0 free_cma:0
	Node 7 DMA free:0kB min:0kB low:0kB high:0kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:52428800kB managed:110016kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:320kB slab_unreclaimable:4672kB kernel_stack:1152kB pagetables:0kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
	lowmem_reserve[]: 0 0 0 0
	Node 7 DMA: 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB 0*8192kB 0*16384kB = 0kB
	0 total pagecache pages
	0 pages in swap cache
	Swap cache stats: add 0, delete 0, find 0/0
	Free swap  = 0kB
	Total swap = 0kB
	819200 pages RAM
	0 pages HighMem/MovableOnly
	817481 pages reserved
	0 pages cma reserved
	0 pages hwpoisoned

the reason is that the managed memory is too low (only 110MB) while the
rest of the the 50GB is still waiting for the deferred intialization to
be done.  update_defer_init estimates the initial memoty to initialize
to 2GB at least but it doesn't consider any memory allocated in that
range.  In this particular case we've had

	Reserving 4096MB of memory at 128MB for crashkernel (System RAM: 51200MB)

so the low 2GB is mostly depleted.

Fix this by considering memblock allocations in the initial static
initialization estimation.  Move the max_initialise to
reset_deferred_meminit and implement a simple memblock_reserved_memory
helper which iterates all reserved blocks and sums the size of all that
start below the given address.  The cumulative size is than added on top
of the initial estimation.  This is still not ideal because
reset_deferred_meminit doesn't consider holes and so reservation might
be above the initial estimation whihch we ignore but let's make the
logic simpler until we really need to handle more complicated cases.

Fixes: 3a80a7fa79 ("mm: meminit: initialise a subset of struct pages if CONFIG_DEFERRED_STRUCT_PAGE_INIT is set")
Link: http://lkml.kernel.org/r/20170531104010.GI27783@dhcp22.suse.cz
Signed-off-by: Michal Hocko <mhocko@suse.com>
Acked-by: Mel Gorman <mgorman@suse.de>
Tested-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: <stable@vger.kernel.org>	[4.2+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-06-02 15:07:38 -07:00
AKASHI Takahiro
c9ca9b4e21 memblock: add memblock_cap_memory_range()
Add memblock_cap_memory_range() which will remove all the memblock regions
except the memory range specified in the arguments. In addition, rework is
done on memblock_mem_limit_remove_map() to re-implement it using
memblock_cap_memory_range().

This function, like memblock_mem_limit_remove_map(), will not remove
memblocks with MEMMAP_NOMAP attribute as they may be mapped and accessed
later as "device memory."
See the commit a571d4eb55 ("mm/memblock.c: add new infrastructure to
address the mem limit issue").

This function is used, in a succeeding patch in the series of arm64 kdump
suuport, to limit the range of usable memory, or System RAM, on crash dump
kernel.
(Please note that "mem=" parameter is of little use for this purpose.)

Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
Reviewed-by: Will Deacon <will.deacon@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Dennis Chen <dennis.chen@arm.com>
Cc: linux-mm@kvack.org
Cc: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2017-04-05 18:26:50 +01:00
AKASHI Takahiro
4c546b8a34 memblock: add memblock_clear_nomap()
This function, with a combination of memblock_mark_nomap(), will be used
in a later kdump patch for arm64 when it temporarily isolates some range
of memory from the other memory blocks in order to create a specific
kernel mapping at boot time.

Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2017-04-05 18:26:46 +01:00
AKASHI Takahiro
c9a1b80dae mm/memblock.c: fix memblock_next_valid_pfn()
Obviously, we should not access memblock.memory.regions[right] if
'right' is outside of [0..memblock.memory.cnt>.

Fixes: b92df1de5d ("mm: page_alloc: skip over regions of invalid pfns where possible")
Link: http://lkml.kernel.org/r/20170303023745.9104-1-takahiro.akashi@linaro.org
Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
Cc: Paul Burton <paul.burton@imgtec.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-03-09 17:01:10 -08:00
Heiko Carstens
0262d9c845 memblock: embed memblock type name within struct memblock_type
Provide the name of each memblock type with struct memblock_type.  This
allows to get rid of the function memblock_type_name() and duplicating
the type names in __memblock_dump_all().

The only memblock_type usage out of mm/memblock.c seems to be
arch/s390/kernel/crash_dump.c.  While at it, give it a name.

Link: http://lkml.kernel.org/r/20170120123456.46508-4-heiko.carstens@de.ibm.com
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Philipp Hachtmann <phacht@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-02-24 17:46:54 -08:00
Heiko Carstens
409efd4c9b memblock: also dump physmem list within __memblock_dump_all
Since commit 70210ed950 ("mm/memblock: add physical memory list") the
memblock structure knows about a physical memory list.

The physical memory list should also be dumped if memblock_dump_all() is
called in case memblock_debug is switched on.  This makes debugging a
bit easier.

Link: http://lkml.kernel.org/r/20170120123456.46508-3-heiko.carstens@de.ibm.com
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Philipp Hachtmann <phacht@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-02-24 17:46:54 -08:00
Heiko Carstens
7409c5f738 memblock: let memblock_type_name know about physmem type
Since commit 70210ed950 ("mm/memblock: add physical memory list") the
memblock structure knows about a physical memory list.

memblock_type_name() should return "physmem" instead of "unknown" if the
name of the physmem memblock_type is being asked for.

Link: http://lkml.kernel.org/r/20170120123456.46508-2-heiko.carstens@de.ibm.com
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Philipp Hachtmann <phacht@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-02-24 17:46:53 -08:00
Miles Chen
5d63f81c9e mm/memblock.c: remove unnecessary log and clean up
There is no variable named flags in memblock_add() and
memblock_reserve() so remove it from the log messages.

This patch also cleans up the type casting for phys_addr_t by using %pa
to print them.

Link: http://lkml.kernel.org/r/1484720165-25403-1-git-send-email-miles.chen@mediatek.com
Signed-off-by: Miles Chen <miles.chen@mediatek.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-02-22 16:41:30 -08:00
Wei Yang
7d41c03e2d mm/memblock.c: check return value of memblock_reserve() in memblock_virt_alloc_internal()
memblock_reserve() would add a new range to memblock.reserved in case
the new range is not totally covered by any of the current
memblock.reserved range.  If the memblock.reserved is full and can't
resize, memblock_reserve() would fail.

This doesn't happen in real world now, I observed this during code
review.  While theoretically, it has the chance to happen.  And if it
happens, others would think this range of memory is still available and
may corrupt the memory.

This patch checks the return value and goto "done" after it succeeds.

Link: http://lkml.kernel.org/r/1482363033-24754-3-git-send-email-richard.weiyang@gmail.com
Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-02-22 16:41:29 -08:00
Wei Yang
ef415ef411 mm/memblock.c: trivial code refine in memblock_is_region_memory()
memblock_is_region_memory() invoke memblock_search() to see whether the
base address is in the memory region.  If it fails, idx would be -1.
Then, it returns 0.

If the memblock_search() returns a valid index, it means the base
address is guaranteed to be in the range memblock.memory.regions[idx].
Because of this, it is not necessary to check the base again.

This patch removes the check on "base".

Link: http://lkml.kernel.org/r/1482363033-24754-2-git-send-email-richard.weiyang@gmail.com
Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-02-22 16:41:29 -08:00
Paul Burton
b92df1de5d mm: page_alloc: skip over regions of invalid pfns where possible
When using a sparse memory model memmap_init_zone() when invoked with
the MEMMAP_EARLY context will skip over pages which aren't valid - ie.
which aren't in a populated region of the sparse memory map.  However if
the memory map is extremely sparse then it can spend a long time
linearly checking each PFN in a large non-populated region of the memory
map & skipping it in turn.

When CONFIG_HAVE_MEMBLOCK_NODE_MAP is enabled, we have sufficient
information to quickly discover the next valid PFN given an invalid one
by searching through the list of memory regions & skipping forwards to
the first PFN covered by the memory region to the right of the
non-populated region.  Implement this in order to speed up
memmap_init_zone() for systems with extremely sparse memory maps.

James said "I have tested this patch on a virtual model of a Samurai CPU
with a sparse memory map.  The kernel boot time drops from 109 to
62 seconds. "

Link: http://lkml.kernel.org/r/20161125185518.29885-1-paul.burton@imgtec.com
Signed-off-by: Paul Burton <paul.burton@imgtec.com>
Tested-by: James Hartley <james.hartley@imgtec.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-02-22 16:41:29 -08:00
Catalin Marinas
9099daed9c mm: kmemleak: avoid using __va() on addresses that don't have a lowmem mapping
Some of the kmemleak_*() callbacks in memblock, bootmem, CMA convert a
physical address to a virtual one using __va().  However, such physical
addresses may sometimes be located in highmem and using __va() is
incorrect, leading to inconsistent object tracking in kmemleak.

The following functions have been added to the kmemleak API and they take
a physical address as the object pointer.  They only perform the
corresponding action if the address has a lowmem mapping:

kmemleak_alloc_phys
kmemleak_free_part_phys
kmemleak_not_leak_phys
kmemleak_ignore_phys

The affected calling places have been updated to use the new kmemleak
API.

Link: http://lkml.kernel.org/r/1471531432-16503-1-git-send-email-catalin.marinas@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Reported-by: Vignesh R <vigneshr@ti.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-11 15:06:33 -07:00
Srikar Dronamraju
8907de5dc6 mm/memblock.c: expose total reserved memory
The total reserved memory in a system is accounted but not available for
use use outside mm/memblock.c.  By exposing the total reserved memory,
systems can better calculate the size of large hashes.

Link: http://lkml.kernel.org/r/1472476010-4709-3-git-send-email-srikar@linux.vnet.ibm.com
Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Suggested-by: Mel Gorman <mgorman@techsingularity.net>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Cc: Hari Bathini <hbathini@linux.vnet.ibm.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Balbir Singh <bsingharora@gmail.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-07 18:46:28 -07:00
zijun_hu
e47608ab6d mm/memblock.c: fix NULL dereference error
It causes NULL dereference error and failure to get type_a->regions[0]
info if parameter type_b of __next_mem_range_rev() == NULL

Fix this by checking before dereferring and initializing idx_b to 0

The approach is tested by dumping all types of region via
__memblock_dump_all() and __next_mem_range_rev() fixed to UART
separately the result is okay after checking the logs.

Link: http://lkml.kernel.org/r/57A0320D.6070102@zoho.com
Signed-off-by: zijun_hu <zijun_hu@htc.com>
Tested-by: zijun_hu <zijun_hu@htc.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-08-04 20:02:09 -04:00
Alexander Kuleshov
412d0008d6 mm/memblock: fix a typo in a comment
s/accomodate/accommodate/

Link: http://lkml.kernel.org/r/20160804121824.18100-1-kuleshovmail@gmail.com
Signed-off-by: Alexander Kuleshov <kuleshovmail@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-08-04 20:02:09 -04:00
zijun_hu
fb399b4854 mm/memblock.c: fix index adjustment error in __next_mem_range_rev()
Fix region index adjustment error when parameter type_b of
__next_mem_range_rev() == NULL.

Signed-off-by: zijun_hu <zijun_hu@htc.com>
Cc: Alexander Kuleshov <kuleshovmail@gmail.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Tang Chen <tangchen@cn.fujitsu.com>
Cc: Wei Yang <weiyang@linux.vnet.ibm.com>
Cc: Tang Chen <tangchen@cn.fujitsu.com>
Cc: Richard Leitner <dev@g0hl1n.net>
Cc: David Gibson <david@gibson.dropbear.id.au>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-07-28 16:07:41 -07:00
Dennis Chen
a571d4eb55 mm/memblock.c: add new infrastructure to address the mem limit issue
In some cases, memblock is queried by kernel to determine whether a
specified address is RAM or not.  For example, the ACPI core needs this
information to determine which attributes to use when mapping ACPI
regions(acpi_os_ioremap).  Use of incorrect memory types can result in
faults, data corruption, or other issues.

Removing memory with memblock_enforce_memory_limit() throws away this
information, and so a kernel booted with 'mem=' may suffer from the
issues described above.  To avoid this, we need to keep those NOMAP
regions instead of removing all above the limit, which preserves the
information we need while preventing other use of those regions.

This patch adds new infrastructure to retain all NOMAP memblock regions
while removing others, to cater for this.

Link: http://lkml.kernel.org/r/1468475036-5852-2-git-send-email-dennis.chen@arm.com
Signed-off-by: Dennis Chen <dennis.chen@arm.com>
Acked-by: Steve Capper <steve.capper@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Tang Chen <tangchen@cn.fujitsu.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Rafael J. Wysocki <rafael@kernel.org>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Kaly Xin <kaly.xin@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-07-28 16:07:41 -07:00
Christoph Hellwig
c4c5ad6b35 memblock: include <asm/sections.h> instead of <asm-generic/sections.h>
asm-generic headers are generic implementations for architecture
specific code and should not be included by common code.  Thus use the
asm/ version of sections.h to get at the linker sections.

Link: http://lkml.kernel.org/r/1468285103-7470-1-git-send-email-hch@lst.de
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-07-28 16:07:41 -07:00