Commit Graph

39 Commits

Author SHA1 Message Date
Greg Kroah-Hartman
a1757f43e8 Merge 4.19.247 into android-4.19-stable
Changes in 4.19.247
	binfmt_flat: do not stop relocating GOT entries prematurely on riscv
	ALSA: hda/realtek - Fix microphone noise on ASUS TUF B550M-PLUS
	USB: serial: option: add Quectel BG95 modem
	USB: new quirk for Dell Gen 2 devices
	ptrace/xtensa: Replace PT_SINGLESTEP with TIF_SINGLESTEP
	ptrace: Reimplement PTRACE_KILL by always sending SIGKILL
	btrfs: add "0x" prefix for unsupported optional features
	btrfs: repair super block num_devices automatically
	drm/virtio: fix NULL pointer dereference in virtio_gpu_conn_get_modes
	mwifiex: add mutex lock for call in mwifiex_dfs_chan_sw_work_queue
	b43legacy: Fix assigning negative value to unsigned variable
	b43: Fix assigning negative value to unsigned variable
	ipw2x00: Fix potential NULL dereference in libipw_xmit()
	ipv6: fix locking issues with loops over idev->addr_list
	fbcon: Consistently protect deferred_takeover with console_lock()
	ACPICA: Avoid cache flush inside virtual machines
	ALSA: jack: Access input_dev under mutex
	drm/amd/pm: fix double free in si_parse_power_table()
	ath9k: fix QCA9561 PA bias level
	media: venus: hfi: avoid null dereference in deinit
	media: pci: cx23885: Fix the error handling in cx23885_initdev()
	media: cx25821: Fix the warning when removing the module
	md/bitmap: don't set sb values if can't pass sanity check
	scsi: megaraid: Fix error check return value of register_chrdev()
	drm/plane: Move range check for format_count earlier
	drm/amd/pm: fix the compile warning
	ipv6: Don't send rs packets to the interface of ARPHRD_TUNNEL
	ASoC: dapm: Don't fold register value changes into notifications
	mlxsw: spectrum_dcb: Do not warn about priority changes
	ASoC: tscs454: Add endianness flag in snd_soc_component_driver
	s390/preempt: disable __preempt_count_add() optimization for PROFILE_ALL_BRANCHES
	dma-debug: change allocation mode from GFP_NOWAIT to GFP_ATIOMIC
	ipmi:ssif: Check for NULL msg when handling events and messages
	rtlwifi: Use pr_warn instead of WARN_ONCE
	media: cec-adap.c: fix is_configuring state
	openrisc: start CPU timer early in boot
	nvme-pci: fix a NULL pointer dereference in nvme_alloc_admin_tags
	ASoC: rt5645: Fix errorenous cleanup order
	net: phy: micrel: Allow probing without .driver_data
	media: exynos4-is: Fix compile warning
	hwmon: Make chip parameter for with_info API mandatory
	rxrpc: Return an error to sendmsg if call failed
	eth: tg3: silence the GCC 12 array-bounds warning
	ARM: dts: ox820: align interrupt controller node name with dtschema
	PM / devfreq: rk3399_dmc: Disable edev on remove()
	fs: jfs: fix possible NULL pointer dereference in dbFree()
	ARM: OMAP1: clock: Fix UART rate reporting algorithm
	fat: add ratelimit to fat*_ent_bread()
	ARM: versatile: Add missing of_node_put in dcscb_init
	ARM: dts: exynos: add atmel,24c128 fallback to Samsung EEPROM
	ARM: hisi: Add missing of_node_put after of_find_compatible_node
	PCI: Avoid pci_dev_lock() AB/BA deadlock with sriov_numvfs_store()
	tracing: incorrect isolate_mote_t cast in mm_vmscan_lru_isolate
	powerpc/xics: fix refcount leak in icp_opal_init()
	macintosh/via-pmu: Fix build failure when CONFIG_INPUT is disabled
	RDMA/hfi1: Prevent panic when SDMA is disabled
	drm: fix EDID struct for old ARM OABI format
	ath9k: fix ar9003_get_eepmisc
	drm/edid: fix invalid EDID extension block filtering
	drm/bridge: adv7511: clean up CEC adapter when probe fails
	ASoC: mediatek: Fix error handling in mt8173_max98090_dev_probe
	ASoC: mediatek: Fix missing of_node_put in mt2701_wm8960_machine_probe
	x86/delay: Fix the wrong asm constraint in delay_loop()
	drm/mediatek: Fix mtk_cec_mask()
	drm/vc4: txp: Don't set TXP_VSTART_AT_EOF
	drm/vc4: txp: Force alpha to be 0xff if it's disabled
	nl80211: show SSID for P2P_GO interfaces
	spi: spi-ti-qspi: Fix return value handling of wait_for_completion_timeout
	NFC: NULL out the dev->rfkill to prevent UAF
	efi: Add missing prototype for efi_capsule_setup_info
	HID: hid-led: fix maximum brightness for Dream Cheeky
	HID: elan: Fix potential double free in elan_input_configured
	spi: img-spfi: Fix pm_runtime_get_sync() error checking
	ath9k_htc: fix potential out of bounds access with invalid rxstatus->rs_keyix
	inotify: show inotify mask flags in proc fdinfo
	fsnotify: fix wrong lockdep annotations
	of: overlay: do not break notify on NOTIFY_{OK|STOP}
	scsi: ufs: core: Exclude UECxx from SFR dump list
	x86/pm: Fix false positive kmemleak report in msr_build_context()
	x86/speculation: Add missing prototype for unpriv_ebpf_notify()
	drm/msm/disp/dpu1: set vbif hw config to NULL to avoid use after memory free during pm runtime resume
	drm/msm/dsi: fix error checks and return values for DSI xmit functions
	drm/msm/hdmi: check return value after calling platform_get_resource_byname()
	drm/rockchip: vop: fix possible null-ptr-deref in vop_bind()
	x86: Fix return value of __setup handlers
	irqchip/aspeed-i2c-ic: Fix irq_of_parse_and_map() return value
	x86/mm: Cleanup the control_va_addr_alignment() __setup handler
	drm/msm/mdp5: Return error code in mdp5_pipe_release when deadlock is detected
	drm/msm/mdp5: Return error code in mdp5_mixer_release when deadlock is detected
	drm/msm: return an error pointer in msm_gem_prime_get_sg_table()
	media: uvcvideo: Fix missing check to determine if element is found in list
	perf/amd/ibs: Use interrupt regs ip for stack unwinding
	ASoC: mxs-saif: Fix refcount leak in mxs_saif_probe
	regulator: pfuze100: Fix refcount leak in pfuze_parse_regulators_dt
	scripts/faddr2line: Fix overlapping text section failures
	media: st-delta: Fix PM disable depth imbalance in delta_probe
	media: exynos4-is: Change clk_disable to clk_disable_unprepare
	media: pvrusb2: fix array-index-out-of-bounds in pvr2_i2c_core_init
	media: vsp1: Fix offset calculation for plane cropping
	Bluetooth: fix dangling sco_conn and use-after-free in sco_sock_timeout
	m68k: math-emu: Fix dependencies of math emulation support
	sctp: read sk->sk_bound_dev_if once in sctp_rcv()
	ext4: reject the 'commit' option on ext2 filesystems
	drm: msm: fix possible memory leak in mdp5_crtc_cursor_set()
	ASoC: wm2000: fix missing clk_disable_unprepare() on error in wm2000_anc_transition()
	NFC: hci: fix sleep in atomic context bugs in nfc_hci_hcp_message_tx
	rxrpc: Fix listen() setting the bar too high for the prealloc rings
	rxrpc: Don't try to resend the request if we're receiving the reply
	soc: qcom: smp2p: Fix missing of_node_put() in smp2p_parse_ipc
	soc: qcom: smsm: Fix missing of_node_put() in smsm_parse_ipc
	PCI: cadence: Fix find_first_zero_bit() limit
	PCI: rockchip: Fix find_first_zero_bit() limit
	ARM: dts: bcm2835-rpi-zero-w: Fix GPIO line name for Wifi/BT
	ARM: dts: bcm2835-rpi-b: Fix GPIO line names
	crypto: marvell/cesa - ECB does not IV
	mfd: ipaq-micro: Fix error check return value of platform_get_irq()
	scsi: fcoe: Fix Wstringop-overflow warnings in fcoe_wwn_from_mac()
	firmware: arm_scmi: Fix list protocols enumeration in the base protocol
	pinctrl: mvebu: Fix irq_of_parse_and_map() return value
	drivers/base/node.c: fix compaction sysfs file leak
	dax: fix cache flush on PMD-mapped pages
	powerpc/8xx: export 'cpm_setbrg' for modules
	powerpc/idle: Fix return value of __setup() handler
	powerpc/4xx/cpm: Fix return value of __setup() handler
	proc: fix dentry/inode overinstantiating under /proc/${pid}/net
	tty: fix deadlock caused by calling printk() under tty_port->lock
	Input: sparcspkr - fix refcount leak in bbc_beep_probe
	powerpc/perf: Fix the threshold compare group constraint for power9
	powerpc/fsl_rio: Fix refcount leak in fsl_rio_setup
	mailbox: forward the hrtimer if not queued and under a lock
	RDMA/hfi1: Prevent use of lock before it is initialized
	f2fs: fix dereference of stale list iterator after loop body
	iommu/mediatek: Add list_del in mtk_iommu_remove
	i2c: at91: use dma safe buffers
	i2c: at91: Initialize dma_buf in at91_twi_xfer()
	NFSv4/pNFS: Do not fail I/O when we fail to allocate the pNFS layout
	video: fbdev: clcdfb: Fix refcount leak in clcdfb_of_vram_setup
	dmaengine: stm32-mdma: remove GISR1 register
	iommu/amd: Increase timeout waiting for GA log enablement
	perf c2c: Use stdio interface if slang is not supported
	perf jevents: Fix event syntax error caused by ExtSel
	f2fs: fix deadloop in foreground GC
	wifi: mac80211: fix use-after-free in chanctx code
	iwlwifi: mvm: fix assert 1F04 upon reconfig
	fs-writeback: writeback_sb_inodes:Recalculate 'wrote' according skipped pages
	netfilter: nf_tables: disallow non-stateful expression in sets earlier
	ext4: fix use-after-free in ext4_rename_dir_prepare
	ext4: fix bug_on in ext4_writepages
	ext4: verify dir block before splitting it
	ext4: avoid cycles in directory h-tree
	tracing: Fix potential double free in create_var_ref()
	PCI/PM: Fix bridge_d3_blacklist[] Elo i2 overwrite of Gigabyte X299
	PCI: qcom: Fix runtime PM imbalance on probe errors
	PCI: qcom: Fix unbalanced PHY init on probe errors
	dlm: fix plock invalid read
	dlm: fix missing lkb refcount handling
	ocfs2: dlmfs: fix error handling of user_dlm_destroy_lock
	scsi: dc395x: Fix a missing check on list iterator
	scsi: ufs: qcom: Add a readl() to make sure ref_clk gets enabled
	drm/amdgpu/cs: make commands with 0 chunks illegal behaviour.
	drm/nouveau/clk: Fix an incorrect NULL check on list iterator
	drm/bridge: analogix_dp: Grab runtime PM reference for DP-AUX
	md: fix an incorrect NULL check in does_sb_need_changing
	md: fix an incorrect NULL check in md_reload_sb
	media: coda: Fix reported H264 profile
	media: coda: Add more H264 levels for CODA960
	RDMA/hfi1: Fix potential integer multiplication overflow errors
	irqchip/armada-370-xp: Do not touch Performance Counter Overflow on A375, A38x, A39x
	irqchip: irq-xtensa-mx: fix initial IRQ affinity
	mac80211: upgrade passive scan to active scan on DFS channels after beacon rx
	um: chan_user: Fix winch_tramp() return value
	um: Fix out-of-bounds read in LDT setup
	iommu/msm: Fix an incorrect NULL check on list iterator
	nodemask.h: fix compilation error with GCC12
	hugetlb: fix huge_pmd_unshare address update
	rtl818x: Prevent using not initialized queues
	ASoC: rt5514: Fix event generation for "DSP Voice Wake Up" control
	carl9170: tx: fix an incorrect use of list iterator
	gma500: fix an incorrect NULL check on list iterator
	arm64: dts: qcom: ipq8074: fix the sleep clock frequency
	phy: qcom-qmp: fix struct clk leak on probe errors
	docs/conf.py: Cope with removal of language=None in Sphinx 5.0.0
	dt-bindings: gpio: altera: correct interrupt-cells
	blk-iolatency: Fix inflight count imbalances and IO hangs on offline
	phy: qcom-qmp: fix reset-controller leak on probe errors
	RDMA/rxe: Generate a completion for unsupported/invalid opcode
	MIPS: IP27: Remove incorrect `cpu_has_fpu' override
	md: bcache: check the return value of kzalloc() in detached_dev_do_request()
	pcmcia: db1xxx_ss: restrict to MIPS_DB1XXX boards
	staging: greybus: codecs: fix type confusion of list iterator variable
	tty: goldfish: Use tty_port_destroy() to destroy port
	usb: usbip: fix a refcount leak in stub_probe()
	usb: usbip: add missing device lock on tweak configuration cmd
	USB: storage: karma: fix rio_karma_init return
	usb: musb: Fix missing of_node_put() in omap2430_probe
	pwm: lp3943: Fix duty calculation in case period was clamped
	rpmsg: qcom_smd: Fix irq_of_parse_and_map() return value
	usb: dwc3: pci: Fix pm_runtime_get_sync() error checking
	iio: adc: sc27xx: fix read big scale voltage not right
	rpmsg: qcom_smd: Fix returning 0 if irq_of_parse_and_map() fails
	coresight: cpu-debug: Replace mutex with mutex_trylock on panic notifier
	soc: rockchip: Fix refcount leak in rockchip_grf_init
	clocksource/drivers/riscv: Events are stopped during CPU suspend
	rtc: mt6397: check return value after calling platform_get_resource()
	serial: meson: acquire port->lock in startup()
	serial: 8250_fintek: Check SER_RS485_RTS_* only with RS485
	serial: digicolor-usart: Don't allow CS5-6
	serial: txx9: Don't allow CS5-6
	serial: sh-sci: Don't allow CS5-6
	serial: st-asc: Sanitize CSIZE and correct PARENB for CS7
	serial: stm32-usart: Correct CSIZE, bits, and parity
	firmware: dmi-sysfs: Fix memory leak in dmi_sysfs_register_handle
	bus: ti-sysc: Fix warnings for unbind for serial
	clocksource/drivers/oxnas-rps: Fix irq_of_parse_and_map() return value
	s390/crypto: fix scatterwalk_unmap() callers in AES-GCM
	net: ethernet: mtk_eth_soc: out of bounds read in mtk_hwlro_get_fdir_entry()
	net: dsa: mv88e6xxx: Fix refcount leak in mv88e6xxx_mdios_register
	modpost: fix removing numeric suffixes
	jffs2: fix memory leak in jffs2_do_fill_super
	ubi: ubi_create_volume: Fix use-after-free when volume creation failed
	nfp: only report pause frame configuration for physical device
	net/mlx5e: Update netdev features after changing XDP state
	tcp: tcp_rtx_synack() can be called from process context
	afs: Fix infinite loop found by xfstest generic/676
	tipc: check attribute length for bearer name
	perf c2c: Fix sorting in percent_rmt_hitm_cmp()
	mips: cpc: Fix refcount leak in mips_cpc_default_phys_base
	tracing: Fix sleeping function called from invalid context on RT kernel
	tracing: Avoid adding tracer option before update_tracer_options
	i2c: cadence: Increase timeout per message if necessary
	m68knommu: set ZERO_PAGE() to the allocated zeroed page
	m68knommu: fix undefined reference to `_init_sp'
	NFSv4: Don't hold the layoutget locks across multiple RPC calls
	video: fbdev: pxa3xx-gcu: release the resources correctly in pxa3xx_gcu_probe/remove()
	xprtrdma: treat all calls not a bcall when bc_serv is NULL
	ata: pata_octeon_cf: Fix refcount leak in octeon_cf_probe
	af_unix: Fix a data-race in unix_dgram_peer_wake_me().
	bpf, arm64: Clear prog->jited_len along prog->jited
	net/mlx4_en: Fix wrong return value on ioctl EEPROM query failure
	SUNRPC: Fix the calculation of xdr->end in xdr_get_next_encode_buffer()
	net: mdio: unexport __init-annotated mdio_bus_init()
	net: xfrm: unexport __init-annotated xfrm4_protocol_init()
	net: ipv6: unexport __init-annotated seg6_hmac_init()
	net/mlx5: Rearm the FW tracer after each tracer event
	ip_gre: test csum_start instead of transport header
	net: altera: Fix refcount leak in altera_tse_mdio_create
	drm: imx: fix compiler warning with gcc-12
	iio: dummy: iio_simple_dummy: check the return value of kstrdup()
	lkdtm/usercopy: Expand size of "out of frame" object
	tty: synclink_gt: Fix null-pointer-dereference in slgt_clean()
	tty: Fix a possible resource leak in icom_probe
	drivers: staging: rtl8192u: Fix deadlock in ieee80211_beacons_stop()
	drivers: staging: rtl8192e: Fix deadlock in rtllib_beacons_stop()
	USB: host: isp116x: check return value after calling platform_get_resource()
	drivers: tty: serial: Fix deadlock in sa1100_set_termios()
	drivers: usb: host: Fix deadlock in oxu_bus_suspend()
	USB: hcd-pci: Fully suspend across freeze/thaw cycle
	usb: dwc2: gadget: don't reset gadget's driver->bus
	misc: rtsx: set NULL intfdata when probe fails
	extcon: Modify extcon device to be created after driver data is set
	clocksource/drivers/sp804: Avoid error on multiple instances
	staging: rtl8712: fix uninit-value in r871xu_drv_init()
	serial: msm_serial: disable interrupts in __msm_console_write()
	kernfs: Separate kernfs_pr_cont_buf and rename_lock.
	md: protect md_unregister_thread from reentrancy
	Revert "net: af_key: add check for pfkey_broadcast in function pfkey_process"
	ceph: allow ceph.dir.rctime xattr to be updatable
	drm/radeon: fix a possible null pointer dereference
	modpost: fix undefined behavior of is_arm_mapping_symbol()
	nbd: call genl_unregister_family() first in nbd_cleanup()
	nbd: fix race between nbd_alloc_config() and module removal
	nbd: fix io hung while disconnecting device
	nodemask: Fix return values to be unsigned
	vringh: Fix loop descriptors check in the indirect cases
	ALSA: hda/conexant - Fix loopback issue with CX20632
	cifs: return errors during session setup during reconnects
	ata: libata-transport: fix {dma|pio|xfer}_mode sysfs files
	mmc: block: Fix CQE recovery reset success
	nfc: st21nfca: fix incorrect validating logic in EVT_TRANSACTION
	nfc: st21nfca: fix memory leaks in EVT_TRANSACTION handling
	ixgbe: fix bcast packets Rx on VF after promisc removal
	ixgbe: fix unexpected VLAN Rx in promisc mode on VF
	Input: bcm5974 - set missing URB_NO_TRANSFER_DMA_MAP urb flag
	powerpc/32: Fix overread/overwrite of thread_struct via ptrace
	md/raid0: Ignore RAID0 layout if the second zone has only one device
	mtd: cfi_cmdset_0002: Move and rename chip_check/chip_ready/chip_good_for_write
	mtd: cfi_cmdset_0002: Use chip_ready() for write on S29GL064N
	tcp: fix tcp_mtup_probe_success vs wrong snd_cwnd
	Linux 4.19.247

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I58c002ddc38e389a13e2bdb9f291f05805718c9d
2022-06-14 17:16:36 +02:00
Randy Dunlap
61967ac7ba x86: Fix return value of __setup handlers
[ Upstream commit 12441ccdf5e2f5a01a46e344976cbbd3d46845c9 ]

__setup() handlers should return 1 to obsolete_checksetup() in
init/main.c to indicate that the boot option has been handled. A return
of 0 causes the boot option/value to be listed as an Unknown kernel
parameter and added to init's (limited) argument (no '=') or environment
(with '=') strings. So return 1 from these x86 __setup handlers.

Examples:

  Unknown kernel command line parameters "apicpmtimer
    BOOT_IMAGE=/boot/bzImage-517rc8 vdso=1 ring3mwait=disable", will be
    passed to user space.

  Run /sbin/init as init process
   with arguments:
     /sbin/init
     apicpmtimer
   with environment:
     HOME=/
     TERM=linux
     BOOT_IMAGE=/boot/bzImage-517rc8
     vdso=1
     ring3mwait=disable

Fixes: 2aae950b21 ("x86_64: Add vDSO for x86-64 with gettimeofday/clock_gettime/getcpu")
Fixes: 77b52b4c5c ("x86: add "debugpat" boot option")
Fixes: e16fd002af ("x86/cpufeature: Enable RING3MWAIT for Knights Landing")
Fixes: b8ce335906 ("x86_64: convert to clock events")
Reported-by: Igor Zhbanov <i.zhbanov@omprussia.ru>
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lore.kernel.org/r/64644a2f-4a20-bab3-1e15-3b2cdd0defe3@omprussia.ru
Link: https://lore.kernel.org/r/20220314012725.26661-1-rdunlap@infradead.org
Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-06-14 16:59:21 +02:00
Sean Christopherson
bd1b3b2bd4 x86/cpu: Initialize MSR_TSC_AUX if RDTSCP *or* RDPID is supported
commit b6b4fbd90b155a0025223df2c137af8a701d53b3 upstream.

Initialize MSR_TSC_AUX with CPU node information if RDTSCP or RDPID is
supported.  This fixes a bug where vdso_read_cpunode() will read garbage
via RDPID if RDPID is supported but RDTSCP is not.  While no known CPU
supports RDPID but not RDTSCP, both Intel's SDM and AMD's APM allow for
RDPID to exist without RDTSCP, e.g. it's technically a legal CPU model
for a virtual machine.

Note, technically MSR_TSC_AUX could be initialized if and only if RDPID
is supported since RDTSCP is currently not used to retrieve the CPU node.
But, the cost of the superfluous WRMSR is negigible, whereas leaving
MSR_TSC_AUX uninitialized is just asking for future breakage if someone
decides to utilize RDTSCP.

Fixes: a582c540ac ("x86/vdso: Use RDPID in preference to LSL when available")
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20210504225632.1532621-2-seanjc@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-05-22 10:59:26 +02:00
Chang S. Bae
32bc6b4d93 UPSTREAM: x86/vdso: Initialize the CPU/node NR segment descriptor earlier
Currently the CPU/node NR segment descriptor (GDT_ENTRY_CPU_NUMBER) is
initialized relatively late during CPU init, from the vCPU code, which
has a number of disadvantages, such as hotplug CPU notifiers and SMP
cross-calls.

Instead just initialize it much earlier, directly in cpu_init().

This reduces complexity and increases robustness.

[ mingo: Wrote new changelog. ]

Suggested-by: H. Peter Anvin <hpa@zytor.com>
Suggested-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Markus T Metzger <markus.t.metzger@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Shankar <ravi.v.shankar@intel.com>
Cc: Rik van Riel <riel@surriel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1537312139-5580-9-git-send-email-chang.seok.bae@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
(cherry picked from commit b2e2ba578e016a091eb31565849990fe68c7c599)
Signed-off-by: Mark Salyzyn <salyzyn@google.com>
Bug: 154668398
Change-Id: Ibbf40841e9ec5933a3723f15b20d152bbf9155b4
2020-04-27 22:51:56 -07:00
Chang S. Bae
bd3bc4995a UPSTREAM: x86/vdso: Introduce helper functions for CPU and node number
Clean up the CPU/node number related code a bit, to make it more apparent
how we are encoding/extracting the CPU and node fields from the
segment limit.

No change in functionality intended.

[ mingo: Wrote new changelog. ]

Suggested-by: Andy Lutomirski <luto@kernel.org>
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Markus T Metzger <markus.t.metzger@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Shankar <ravi.v.shankar@intel.com>
Cc: Rik van Riel <riel@surriel.com>
Link: http://lkml.kernel.org/r/1537312139-5580-8-git-send-email-chang.seok.bae@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
(cherry picked from commit ffebbaedc8616cffe648202e364dce6a045d65a2)
Signed-off-by: Mark Salyzyn <salyzyn@google.com>
Bug: 154668398
Change-Id: I9e5a0fdf5be21deaf746142467b0a24b56378304
2020-04-27 22:51:56 -07:00
Chang S. Bae
fbc9e69b7a UPSTREAM: x86/segments/64: Rename the GDT PER_CPU entry to CPU_NUMBER
The old 'per CPU' naming was misleading: 64-bit kernels don't use this
GDT entry for per CPU data, but to store the CPU (and node) ID.

[ mingo: Wrote new changelog. ]

Suggested-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Andy Lutomirski <luto@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Markus T Metzger <markus.t.metzger@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Shankar <ravi.v.shankar@intel.com>
Cc: Rik van Riel <riel@surriel.com>
Link: http://lkml.kernel.org/r/1537312139-5580-7-git-send-email-chang.seok.bae@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
(cherry picked from commit c4755613a1339ea77dbb15de75c9f74217209265)
Signed-off-by: Mark Salyzyn <salyzyn@google.com>
Bug: 154668398
Change-Id: I6fa94f1e2370e991dfed14c318b0954a26926927
2020-04-27 22:51:56 -07:00
Linus Torvalds
051089a2ee Merge tag 'for-linus-4.15-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip
Pull xen updates from Juergen Gross:
 "Xen features and fixes for v4.15-rc1

  Apart from several small fixes it contains the following features:

   - a series by Joao Martins to add vdso support of the pv clock
     interface

   - a series by Juergen Gross to add support for Xen pv guests to be
     able to run on 5 level paging hosts

   - a series by Stefano Stabellini adding the Xen pvcalls frontend
     driver using a paravirtualized socket interface"

* tag 'for-linus-4.15-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: (34 commits)
  xen/pvcalls: fix potential endless loop in pvcalls-front.c
  xen/pvcalls: Add MODULE_LICENSE()
  MAINTAINERS: xen, kvm: track pvclock-abi.h changes
  x86/xen/time: setup vcpu 0 time info page
  x86/xen/time: set pvclock flags on xen_time_init()
  x86/pvclock: add setter for pvclock_pvti_cpu0_va
  ptp_kvm: probe for kvm guest availability
  xen/privcmd: remove unused variable pageidx
  xen: select grant interface version
  xen: update arch/x86/include/asm/xen/cpuid.h
  xen: add grant interface version dependent constants to gnttab_ops
  xen: limit grant v2 interface to the v1 functionality
  xen: re-introduce support for grant v2 interface
  xen: support priv-mapping in an HVM tools domain
  xen/pvcalls: remove redundant check for irq >= 0
  xen/pvcalls: fix unsigned less than zero error check
  xen/time: Return -ENODEV from xen_get_wallclock()
  xen/pvcalls-front: mark expected switch fall-through
  xen: xenbus_probe_frontend: mark expected switch fall-throughs
  xen/time: do not decrease steal time after live migration on xen
  ...
2017-11-16 13:06:27 -08:00
Joao Martins
9f08890ab9 x86/pvclock: add setter for pvclock_pvti_cpu0_va
Right now there is only a pvclock_pvti_cpu0_va() which is defined
on kvmclock since:

commit dac16fba6f
("x86/vdso: Get pvclock data from the vvar VMA instead of the fixmap")

The only user of this interface so far is kvm. This commit adds a
setter function for the pvti page and moves pvclock_pvti_cpu0_va
to pvclock, which is a more generic place to have it; and would
allow other PV clocksources to use it, such as Xen.

While moving pvclock_pvti_cpu0_va into pvclock, rename also this
function to pvclock_get_pvti_cpu0_va (including its call sites)
to be symmetric with the setter (pvclock_set_pvti_cpu0_va).

Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Acked-by: Andy Lutomirski <luto@kernel.org>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
2017-11-08 16:33:14 -05:00
Brijesh Singh
819aeee065 X86/KVM: Clear encryption attribute when SEV is active
The guest physical memory area holding the struct pvclock_wall_clock and
struct pvclock_vcpu_time_info are shared with the hypervisor. It
periodically updates the contents of the memory.

When SEV is active, the encryption attributes from the shared memory pages
must be cleared so that both hypervisor and guest can access the data.

Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Borislav Petkov <bp@suse.de>
Tested-by: Borislav Petkov <bp@suse.de>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: kvm@vger.kernel.org
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Link: https://lkml.kernel.org/r/20171020143059.3291-18-brijesh.singh@amd.com
2017-11-07 15:36:00 +01:00
Thomas Gleixner
38e9e81f4c x86/gdt: Use bitfields for initialization
The GDT entry related code uses two ways to access entries via
union fields:

 - bitfields

 - macros which initialize the two 16-bit parts of the entry
   by magic shift and mask operations.

Clean it up and only use the bitfields to initialize and access entries.

( The old access patterns were partly done due to GCC optimizing bitfield
  accesses in a horrible way - that's mostly fixed these days and clarity
  of code in such low level accessors is very important. )

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/20170828064958.197673367@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-29 12:07:25 +02:00
Dmitry Safonov
280e87e98c ARM: 8683/1: ARM32: Support mremap() for sigpage/vDSO
CRIU restores application mappings on the same place where they
were before Checkpoint. That means, that we need to move vDSO
and sigpage during restore on exactly the same place where
they were before C/R.

Make mremap() code update mm->context.{sigpage,vdso} pointers
during VMA move. Sigpage is used for landing after handling
a signal - if the pointer is not updated during moving, the
application might crash on any signal after mremap().

vDSO pointer on ARM32 is used only for setting auxv at this moment,
update it during mremap() in case of future usage.

Without those updates, current work of CRIU on ARM32 is not reliable.
Historically, we error Checkpointing if we find vDSO page on ARM32
and suggest user to disable CONFIG_VDSO.
But that's not correct - it goes from x86 where signal processing
is ended in vDSO blob. For arm32 it's sigpage, which is not disabled
with `CONFIG_VDSO=n'.

Looks like C/R was working by luck - because userspace on ARM32 at
this moment always sets SA_RESTORER.

Signed-off-by: Dmitry Safonov <dsafonov@virtuozzo.com>
Acked-by: Andy Lutomirski <luto@amacapital.net>
Cc: linux-arm-kernel@lists.infradead.org
Cc: Will Deacon <will.deacon@arm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Pavel Emelyanov <xemul@virtuozzo.com>
Cc: Christopher Covington <cov@codeaurora.org>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
2017-06-21 13:02:58 +01:00
Linus Torvalds
d3b5d35290 Merge branch 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 mm updates from Ingo Molnar:
 "The main x86 MM changes in this cycle were:

   - continued native kernel PCID support preparation patches to the TLB
     flushing code (Andy Lutomirski)

   - various fixes related to 32-bit compat syscall returning address
     over 4Gb in applications, launched from 64-bit binaries - motivated
     by C/R frameworks such as Virtuozzo. (Dmitry Safonov)

   - continued Intel 5-level paging enablement: in particular the
     conversion of x86 GUP to the generic GUP code. (Kirill A. Shutemov)

   - x86/mpx ABI corner case fixes/enhancements (Joerg Roedel)

   - ... plus misc updates, fixes and cleanups"

* 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (62 commits)
  mm, zone_device: Replace {get, put}_zone_device_page() with a single reference to fix pmem crash
  x86/mm: Fix flush_tlb_page() on Xen
  x86/mm: Make flush_tlb_mm_range() more predictable
  x86/mm: Remove flush_tlb() and flush_tlb_current_task()
  x86/vm86/32: Switch to flush_tlb_mm_range() in mark_screen_rdonly()
  x86/mm/64: Fix crash in remove_pagetable()
  Revert "x86/mm/gup: Switch GUP to the generic get_user_page_fast() implementation"
  x86/boot/e820: Remove a redundant self assignment
  x86/mm: Fix dump pagetables for 4 levels of page tables
  x86/mpx, selftests: Only check bounds-vs-shadow when we keep shadow
  x86/mpx: Correctly report do_mpx_bt_fault() failures to user-space
  Revert "x86/mm/numa: Remove numa_nodemask_from_meminfo()"
  x86/espfix: Add support for 5-level paging
  x86/kasan: Extend KASAN to support 5-level paging
  x86/mm: Add basic defines/helpers for CONFIG_X86_5LEVEL=y
  x86/paravirt: Add 5-level support to the paravirt code
  x86/mm: Define virtual memory map for 5-level paging
  x86/asm: Remove __VIRTUAL_MASK_SHIFT==47 assert
  x86/boot: Detect 5-level paging support
  x86/mm/numa: Remove numa_nodemask_from_meminfo()
  ...
2017-05-01 23:54:56 -07:00
Thomas Garnier
69218e4799 x86: Remap GDT tables in the fixmap section
Each processor holds a GDT in its per-cpu structure. The sgdt
instruction gives the base address of the current GDT. This address can
be used to bypass KASLR memory randomization. With another bug, an
attacker could target other per-cpu structures or deduce the base of
the main memory section (PAGE_OFFSET).

This patch relocates the GDT table for each processor inside the
fixmap section. The space is reserved based on number of supported
processors.

For consistency, the remapping is done by default on 32 and 64-bit.

Each processor switches to its remapped GDT at the end of
initialization. For hibernation, the main processor returns with the
original GDT and switches back to the remapping at completion.

This patch was tested on both architectures. Hibernation and KVM were
both tested specially for their usage of the GDT.

Thanks to Boris Ostrovsky <boris.ostrovsky@oracle.com> for testing and
recommending changes for Xen support.

Signed-off-by: Thomas Garnier <thgarnie@google.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Kosina <jikos@kernel.org>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Len Brown <len.brown@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Lorenzo Stoakes <lstoakes@gmail.com>
Cc: Luis R . Rodriguez <mcgrof@kernel.org>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Rafael J . Wysocki <rjw@rjwysocki.net>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Stanislaw Gruszka <sgruszka@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: kasan-dev@googlegroups.com
Cc: kernel-hardening@lists.openwall.com
Cc: kvm@vger.kernel.org
Cc: lguest@lists.ozlabs.org
Cc: linux-doc@vger.kernel.org
Cc: linux-efi@vger.kernel.org
Cc: linux-mm@kvack.org
Cc: linux-pm@vger.kernel.org
Cc: xen-devel@lists.xenproject.org
Cc: zijun_hu <zijun_hu@htc.com>
Link: http://lkml.kernel.org/r/20170314170508.100882-2-thgarnie@google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-16 09:06:35 +01:00
Vitaly Kuznetsov
90b20432ae x86/vdso: Add VCLOCK_HVCLOCK vDSO clock read method
Hyper-V TSC page clocksource is suitable for vDSO, however, the protocol
defined by the hypervisor is different from VCLOCK_PVCLOCK. Implement the
required support by adding hvclock_page VVAR.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Stephen Hemminger <sthemmin@microsoft.com>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: Dexuan Cui <decui@microsoft.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: devel@linuxdriverproject.org
Cc: "K. Y. Srinivasan" <kys@microsoft.com>
Cc: virtualization@lists.linux-foundation.org
Link: http://lkml.kernel.org/r/20170303132142.25595-4-vkuznets@redhat.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-03-11 14:47:28 +01:00
Ingo Molnar
68db0cf106 sched/headers: Prepare for new header dependencies before moving code to <linux/sched/task_stack.h>
We are going to split <linux/sched/task_stack.h> out of <linux/sched.h>, which
will have to be picked up from other headers and a couple of .c files.

Create a trivial placeholder <linux/sched/task_stack.h> file that just
maps to <linux/sched.h> to make this patch obviously correct and
bisectable.

Include the new header in the files that are going to need it.

Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-02 08:42:36 +01:00
Mike Rapoport
897ab3e0c4 userfaultfd: non-cooperative: add event for memory unmaps
When a non-cooperative userfaultfd monitor copies pages in the
background, it may encounter regions that were already unmapped.
Addition of UFFD_EVENT_UNMAP allows the uffd monitor to track precisely
changes in the virtual memory layout.

Since there might be different uffd contexts for the affected VMAs, we
first should create a temporary representation for the unmap event for
each uffd context and then notify them one by one to the appropriate
userfault file descriptors.

The event notification occurs after the mmap_sem has been released.

[arnd@arndb.de: fix nommu build]
  Link: http://lkml.kernel.org/r/20170203165141.3665284-1-arnd@arndb.de
[mhocko@suse.com: fix nommu build]
  Link: http://lkml.kernel.org/r/20170202091503.GA22823@dhcp22.suse.cz
Link: http://lkml.kernel.org/r/1485542673-24387-3-git-send-email-rppt@linux.vnet.ibm.com
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Hillf Danton <hillf.zj@alibaba-inc.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Pavel Emelyanov <xemul@virtuozzo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-02-24 17:46:55 -08:00
Thomas Gleixner
73c1b41e63 cpu/hotplug: Cleanup state names
When the state names got added a script was used to add the extra argument
to the calls. The script basically converted the state constant to a
string, but the cleanup to convert these strings into meaningful ones did
not happen.

Replace all the useless strings with 'subsys/xxx/yyy:state' strings which
are used in all the other places already.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sebastian Siewior <bigeasy@linutronix.de>
Link: http://lkml.kernel.org/r/20161221192112.085444152@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-12-25 10:47:44 +01:00
Jan Kara
1a29d85eb0 mm: use vmf->address instead of of vmf->virtual_address
Every single user of vmf->virtual_address typed that entry to unsigned
long before doing anything with it so the type of virtual_address does
not really provide us any additional safety.  Just use masked
vmf->address which already has the appropriate type.

Link: http://lkml.kernel.org/r/1479460644-25076-3-git-send-email-jack@suse.cz
Signed-off-by: Jan Kara <jack@suse.cz>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-12-14 16:04:09 -08:00
Dmitry Safonov
67dece7d4c x86/vdso: Set vDSO pointer only after success
Those pointers were initialized before call to _install_special_mapping()
after the commit:

  f7b6eb3fa0 ("x86: Set context.vdso before installing the mapping")

This is not required anymore as special mappings have their vma name and
don't use arch_vma_name() after commit:

  a62c34bd2a ("x86, mm: Improve _install_special_mapping and fix x86 vdso naming")

So, this way to init looks less entangled.

I even belive that we can remove NULL initializers:

 - on failure load_elf_binary() will not start a new thread;
 - arch_prctl will have the same pointers as before syscall.

Signed-off-by: Dmitry Safonov <dsafonov@virtuozzo.com>
Acked-by: Andy Lutomirski <luto@kernel.org>
Cc: 0x7f454c46@gmail.com
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-mm@kvack.org
Cc: oleg@redhat.com
Link: http://lkml.kernel.org/r/20161027141516.28447-3-dsafonov@virtuozzo.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-10-28 08:15:55 +02:00
Ingo Molnar
3947f49302 x86/vdso: Only define map_vdso_randomized() if CONFIG_X86_64
... otherwise the compiler complains:

   arch/x86/entry/vdso/vma.c:252:12: warning: ‘map_vdso_randomized’ defined but not used [-Wunused-function]

But the #ifdeffery here is getting pretty ugly, so move around
vdso_addr() as well to cluster the dependencies a bit more.

It's still not particulary pretty though ...

Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Dmitry Safonov <dsafonov@virtuozzo.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: gorcunov@openvz.org
Cc: linux-kernel@vger.kernel.org
Cc: linux-mm@kvack.org
Cc: oleg@redhat.com
Cc: xemul@virtuozzo.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-09-15 09:44:32 +02:00
Dmitry Safonov
2eefd87896 x86/arch_prctl/vdso: Add ARCH_MAP_VDSO_*
Add API to change vdso blob type with arch_prctl.
As this is usefull only by needs of CRIU, expose
this interface under CONFIG_CHECKPOINT_RESTORE.

Signed-off-by: Dmitry Safonov <dsafonov@virtuozzo.com>
Acked-by: Andy Lutomirski <luto@kernel.org>
Cc: 0x7f454c46@gmail.com
Cc: oleg@redhat.com
Cc: linux-mm@kvack.org
Cc: gorcunov@openvz.org
Cc: xemul@virtuozzo.com
Link: http://lkml.kernel.org/r/20160905133308.28234-4-dsafonov@virtuozzo.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-09-14 21:28:09 +02:00
Dmitry Safonov
576ebfefd3 x86/vdso: Replace calculate_addr in map_vdso() with addr
That will allow to specify address where to map vDSO blob.
For the randomized vDSO mappings introduce map_vdso_randomized()
which will simplify calls to map_vdso.

Signed-off-by: Dmitry Safonov <dsafonov@virtuozzo.com>
Acked-by: Andy Lutomirski <luto@kernel.org>
Cc: 0x7f454c46@gmail.com
Cc: oleg@redhat.com
Cc: linux-mm@kvack.org
Cc: gorcunov@openvz.org
Cc: xemul@virtuozzo.com
Link: http://lkml.kernel.org/r/20160905133308.28234-3-dsafonov@virtuozzo.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-09-14 21:28:09 +02:00
Dmitry Safonov
e38447ee1f x86/vdso: Unmap vdso blob on vvar mapping failure
If remapping of vDSO blob failed on vvar mapping,
we need to unmap previously mapped vDSO blob.

Signed-off-by: Dmitry Safonov <dsafonov@virtuozzo.com>
Acked-by: Andy Lutomirski <luto@kernel.org>
Cc: 0x7f454c46@gmail.com
Cc: oleg@redhat.com
Cc: linux-mm@kvack.org
Cc: gorcunov@openvz.org
Cc: xemul@virtuozzo.com
Link: http://lkml.kernel.org/r/20160905133308.28234-2-dsafonov@virtuozzo.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-09-14 21:28:08 +02:00
Linus Torvalds
a6408f6cb6 Merge branch 'smp-hotplug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull smp hotplug updates from Thomas Gleixner:
 "This is the next part of the hotplug rework.

   - Convert all notifiers with a priority assigned

   - Convert all CPU_STARTING/DYING notifiers

     The final removal of the STARTING/DYING infrastructure will happen
     when the merge window closes.

  Another 700 hundred line of unpenetrable maze gone :)"

* 'smp-hotplug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (70 commits)
  timers/core: Correct callback order during CPU hot plug
  leds/trigger/cpu: Move from CPU_STARTING to ONLINE level
  powerpc/numa: Convert to hotplug state machine
  arm/perf: Fix hotplug state machine conversion
  irqchip/armada: Avoid unused function warnings
  ARC/time: Convert to hotplug state machine
  clocksource/atlas7: Convert to hotplug state machine
  clocksource/armada-370-xp: Convert to hotplug state machine
  clocksource/exynos_mct: Convert to hotplug state machine
  clocksource/arm_global_timer: Convert to hotplug state machine
  rcu: Convert rcutree to hotplug state machine
  KVM/arm/arm64/vgic-new: Convert to hotplug state machine
  smp/cfd: Convert core to hotplug state machine
  x86/x2apic: Convert to CPU hotplug state machine
  profile: Convert to hotplug state machine
  timers/core: Convert to hotplug state machine
  hrtimer: Convert to hotplug state machine
  x86/tboot: Convert to hotplug state machine
  arm64/armv8 deprecated: Convert to hotplug state machine
  hwtracing/coresight-etm4x: Convert to hotplug state machine
  ...
2016-07-29 13:55:30 -07:00
Sebastian Andrzej Siewior
07d36c9e84 x86/vdso: Convert to hotplug state machine
Install the callbacks via the state machine and let the core invoke
the callbacks on the already online CPUs.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de>
Acked-by: Andy Lutomirski <luto@amacapital.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: rt@linutronix.de
Link: http://lkml.kernel.org/r/20160713153332.987560239@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-14 09:34:27 +02:00
Dmitry Safonov
b059a453b1 x86/vdso: Add mremap hook to vm_special_mapping
Add possibility for 32-bit user-space applications to move
the vDSO mapping.

Previously, when a user-space app called mremap() for the vDSO
address, in the syscall return path it would land on the previous
address of the vDSOpage, resulting in segmentation violation.

Now it lands fine and returns to userspace with a remapped vDSO.

This will also fix the context.vdso pointer for 64-bit, which does
not affect the user of vDSO after mremap() currently, but this
may change in the future.

As suggested by Andy, return -EINVAL for mremap() that would
split the vDSO image: that operation cannot possibly result in
a working system so reject it.

Renamed and moved the text_mapping structure declaration inside
map_vdso(), as it used only there and now it complements the
vvar_mapping variable.

There is still a problem for remapping the vDSO in glibc
applications: the linker relocates addresses for syscalls
on the vDSO page, so you need to relink with the new
addresses.

Without that the next syscall through glibc may fail:

  Program received signal SIGSEGV, Segmentation fault.
  #0  0xf7fd9b80 in __kernel_vsyscall ()
  #1  0xf7ec8238 in _exit () from /usr/lib32/libc.so.6

Signed-off-by: Dmitry Safonov <dsafonov@virtuozzo.com>
Acked-by: Andy Lutomirski <luto@kernel.org>
Cc: 0x7f454c46@gmail.com
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20160628113539.13606-2-dsafonov@virtuozzo.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-08 14:17:51 +02:00
Michal Hocko
6904817607 vdso: make arch_setup_additional_pages wait for mmap_sem for write killable
most architectures are relying on mmap_sem for write in their
arch_setup_additional_pages.  If the waiting task gets killed by the oom
killer it would block oom_reaper from asynchronous address space reclaim
and reduce the chances of timely OOM resolving.  Wait for the lock in
the killable mode and return with EINTR if the task got killed while
waiting.

Signed-off-by: Michal Hocko <mhocko@suse.com>
Acked-by: Andy Lutomirski <luto@amacapital.net>	[x86 vdso]
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-23 17:04:14 -07:00
Andy Lutomirski
1ed95e52d9 x86/vdso: Remove direct HPET access through the vDSO
Allowing user code to map the HPET is problematic.  HPET
implementations are notoriously buggy, and there are probably many
machines on which even MMIO reads from bogus HPET addresses are
problematic.

We have a report that the Dell Precision M2800 with:

  ACPI: HPET 0x00000000C8FE6238 000038 (v01 DELL   CBX3  01072009 AMI. 00000005)

is either so slow when accessing the HPET or actually hangs in some
regard, causing soft lockups to be reported if users do unexpected
things to the HPET.

The vclock HPET code has also always been a questionable speedup.
Accessing an HPET is exceedingly slow (on the order of several
microseconds), so the added overhead in requiring a syscall to read
the HPET is a small fraction of the total code of accessing it.

To avoid future problems, let's just delete the code entirely.

In the long run, this could actually be a speedup.  Waiman Long as a
patch to optimize the case where multiple CPUs contend for the HPET,
but that won't help unless all the accesses are mediated by the
kernel.

Reported-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Borislav Petkov <bp@alien8.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Waiman Long <Waiman.Long@hpe.com>
Cc: Waiman Long <waiman.long@hpe.com>
Link: http://lkml.kernel.org/r/d2f90bba98db9905041cff294646d290d378f67a.1460074438.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-04-13 10:28:34 +02:00
Borislav Petkov
8c72530699 x86/vdso: Use static_cpu_has()
... and simplify and speed up a tad.

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1453842730-28463-10-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-01-30 11:22:23 +01:00
Borislav Petkov
cd4d09ec6f x86/cpufeature: Carve out X86_FEATURE_*
Move them to a separate header and have the following
dependency:

  x86/cpufeatures.h <- x86/processor.h <- x86/cpufeature.h

This makes it easier to use the header in asm code and not
include the whole cpufeature.h and add guards for asm.

Suggested-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1453842730-28463-5-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-01-30 11:22:17 +01:00
Andy Lutomirski
bd902c5362 x86/vdso: Disallow vvar access to vclock IO for never-used vclocks
It makes me uncomfortable that even modern systems grant every
process direct read access to the HPET.

While fixing this for real without regressing anything is a mess
(unmapping the HPET is tricky because we don't adequately track
all the mappings), we can do almost as well by tracking which
vclocks have ever been used and only allowing pages associated
with used vclocks to be faulted in.

This will cause rogue programs that try to peek at the HPET to
get SIGBUS instead on most systems.

We can't restrict faults to vclock pages that are associated
with the currently selected vclock due to a race: a process
could start to access the HPET for the first time and race
against a switch away from the HPET as the current clocksource.
We can't segfault the process trying to peek at the HPET in this
case, even though the process isn't going to do anything useful
with the data.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/e79d06295625c02512277737ab55085a498ac5d8.1451446564.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-01-12 11:59:35 +01:00
Andy Lutomirski
a48a704261 x86/vdso: Use ->fault() instead of remap_pfn_range() for the vvar mapping
This is IMO much less ugly, and it also opens the door to
disallowing unprivileged userspace HPET access on systems with
usable TSCs.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/c19c2909e5ee3c3d8742f916586676bb7c40345f.1451446564.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-01-12 11:59:35 +01:00
Andy Lutomirski
05ef76b20f x86/vdso: Use .fault for the vDSO text mapping
The old scheme for mapping the vDSO text is rather complicated.
vdso2c generates a struct vm_special_mapping and a blank .pages
array of the correct size for each vdso image.  Init code in
vdso/vma.c populates the .pages array for each vDSO image, and
the mapping code selects the appropriate struct
vm_special_mapping.

With .fault, we can use a less roundabout approach: vdso_fault()
just returns the appropriate page for the selected vDSO image.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/f886954c186bafd74e1b967c8931d852ae199aa2.1451446564.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-01-12 11:59:34 +01:00
Andy Lutomirski
352b78c62f x86/vdso: Track each mm's loaded vDSO image as well as its base
As we start to do more intelligent things with the vDSO at
runtime (as opposed to just at mm initialization time), we'll
need to know which vDSO is in use.

In principle, we could guess based on the mm type, but that's
over-complicated and error-prone.  Instead, just track it in the
mmu context.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/c99ac48681bad709ca7ad5ee899d9042a3af6b00.1451446564.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-01-12 11:59:34 +01:00
Andy Lutomirski
cc1e24fdb0 x86/vdso: Remove pvclock fixmap machinery
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/4933029991103ae44672c82b97a20035f5c1fe4f.1449702533.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-12-11 08:56:03 +01:00
Andy Lutomirski
dac16fba6f x86/vdso: Get pvclock data from the vvar VMA instead of the fixmap
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/9d37826fdc7e2d2809efe31d5345f97186859284.1449702533.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-12-11 08:56:03 +01:00
Andy Lutomirski
0a6d1fa0d2 x86/vdso: Remove runtime 32-bit vDSO selection
32-bit userspace will now always see the same vDSO, which is
exactly what used to be the int80 vDSO.  Subsequent patches will
clean it up and make it support SYSENTER and SYSCALL using
alternatives.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Link: http://lkml.kernel.org/r/e7e6b3526fa442502e6125fe69486aab50813c32.1444091584.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-10-07 11:34:08 +02:00
Brian Gerst
ab8b82ee6d x86/compat: Don't build the 32-bit VDSO if not needed
Build the 32-bit vdso only for native 32-bit or 32-bit compat is
enabled.  x32 should not force it to build.

Signed-off-by: Brian Gerst <brgerst@gmail.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1434974121-32575-7-git-send-email-brgerst@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-07-06 15:28:56 +02:00
Ingo Molnar
d603c8e184 x86/asm/entry, x86/vdso: Move the vDSO code to arch/x86/entry/vdso/
Cc: Borislav Petkov <bp@alien8.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-06-03 18:51:37 +02:00