Commit Graph

720729 Commits

Author SHA1 Message Date
Linus Torvalds
b630a23a73 Merge tag 'pinctrl-v4.15-1' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl
Pull pin control updates from Linus Walleij:
 "This is the bulk of pin control changes for the v4.15 kernel cycle:

  Core:

   - The pin control Kconfig entry PINCTRL is now turned into a
     menuconfig option. This obviously has the implication of making the
     subsystem menu visible in menuconfig. This is happening because of
     two things:

      (a) Intel have started to deploy and depend on pin controllers in
          a way that is affecting users directly. This happens on the
          highly integrated laptop chipsets named after geographical
          places: baytrail, broxton, cannonlake, cedarfork, cherryview,
          denverton, geminilake, lewisburg, merrifield, sunrisepoint...
          It started a while back and now it is ever more evident that
          this is crucial infrastructure for x86 laptops and not an
          embedded obscurity anymore. Users need to be aware.

      (b) Pin control expanders on I2C and SPI that are arch-agnostic.
          Currently Semtech SX150X and Microchip MCP28x08 but more are
          expected. Users will have to be able to configure these in
          directly for their set-up.

   - Just go and select GPIOLIB now that we made sure that GPIOLIB is a
     very vanilla subsystem. Do not depend on it, if we need it, select
     it.

   - Exposing the pin control subsystem in menuconfig uncovered a bunch
     of obscure bugs that are now hopefully fixed, all more or less
     pertaining to Blackfin.

   - Unified namespace for cross-calls between pin control and GPIO.

   - New support for clock skew/delay generic DT bindings and generic
     pin config options for this.

   - Minor documentation improvements.

  Various:

   - The Renesas SH-PFC pin controller has evolved a lot. It seems
     Renesas are churning out new SoCs by the minute.

   - A bunch of non-critical fixes for the Rockchip driver.

   - Improve the use of library functions instead of open coding.

   - Support the MCP28018 variant in the MCP28x08 driver.

   - Static constifying"

* tag 'pinctrl-v4.15-1' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl: (91 commits)
  pinctrl: gemini: Fix missing pad descriptions
  pinctrl: Add some depends on HAS_IOMEM
  pinctrl: samsung/s3c24xx: add CONFIG_OF dependency
  pinctrl: gemini: Fix GMAC groups
  pinctrl: qcom: spmi-gpio: Add pmi8994 gpio support
  pinctrl: ti-iodelay: remove redundant unused variable dev
  pinctrl: max77620: Use common error handling code in max77620_pinconf_set()
  pinctrl: gemini: Implement clock skew/delay config
  pinctrl: gemini: Use generic DT parser
  pinctrl: Add skew-delay pin config and bindings
  pinctrl: armada-37xx: Add edge both type gpio irq support
  pinctrl: uniphier: remove eMMC hardware reset pin-mux
  pinctrl: rockchip: Add iomux-route switching support for rk3288
  pinctrl: intel: Add Intel Cedar Fork PCH pin controller support
  pinctrl: intel: Make offset to interrupt status register configurable
  pinctrl: sunxi: Enforce the strict mode by default
  pinctrl: sunxi: Disable strict mode for old pinctrl drivers
  pinctrl: sunxi: Introduce the strict flag
  pinctrl: sh-pfc: Save/restore registers for PSCI system suspend
  pinctrl: sh-pfc: r8a7796: Use generic IOCTRL register description
  ...
2017-11-16 10:57:11 -08:00
Linus Torvalds
9c7a867ebd Merge tag 'backlight-next-4.15' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/backlight
Pull backlight updates from Lee Jones:

   - handle 32bit overflow in pwm_bl

   - remove redundant code/checks in tps65217_bl and ili922x

* tag 'backlight-next-4.15' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/backlight:
  backlight: ili922x: Remove redundant variable len
  backlight: tps65217_bl: Remove unnecessary default brightness check
  backlight: pwm_bl: Fix overflow condition
2017-11-16 10:36:46 -08:00
James Smart
cce75291ff nvmet_fc: fix better length checking
Reorganize nvmet_fc_handle_fcp_rqst() so that the nvmet req.transfer_len
field is set after the call nvmet_req_init(). An update to nvmet now
has nvmet_req_init() clearing the field, thus the fc transport was losing
the value.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-11-16 11:27:04 -07:00
Linus Torvalds
d3092e4e99 Merge tag 'mfd-next-4.15' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd
Pull MFD updates from Lee Jones:
 "New drivers:
   - Add support for Cherry Trail Dollar Cove TI PMIC
   - Add support for Add Spreadtrum SC27xx series PMICs

  New device support:
   - Add support Regulator to axp20x

  New functionality:
   - Add DT support; aspeed-scu sc27xx-pmic
   - Add power saving support; rts5249

  Fix-ups:
   - DT clean-up/rework; tps65217, max77693, iproc-cdru, iproc-mhb, tps65218
   - Staticise/constify; stw481x
   - Use new succinct IRQ API; fsl-imx25-tsadc
   - Kconfig fix-ups; MFD_TPS65218
   - Identify SPI method; lpc_ich
   - Use managed resources (devm_*) calls; ssbi
   - Remove unused/obsolete code/documentation; mc13xxx

  Bug fixes:
   - Fix typo in MAINTAINERS
   - Fix error handling; mxs-lradc
   - Clean-up IRQs on .remove; fsl-imx25-tsadc"

* tag 'mfd-next-4.15' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd: (21 commits)
  dt-bindings: mfd: mc13xxx: Remove obsolete property
  mfd: axp20x: Add axp20x-regulator cell for AXP813
  mfd: Add Spreadtrum SC27xx series PMICs driver
  dt-bindings: mfd: Add Spreadtrum SC27xx PMIC documentation
  mfd: ssbi: Use devm_of_platform_populate()
  mfd: fsl-imx25: Clean up irq settings during removal
  mfd: mxs-lradc: Fix error handling in mxs_lradc_probe()
  mfd: lpc_ich: Avoton/Rangeley uses SPI_BYT method
  mfd: tps65218: Introduce dependency on CONFIG_OF
  mfd: tps65218: Correct the config description
  MAINTAINERS: Fix Dialog search term for watchdog binding file
  mfd: fsl-imx25: Set irq handler and data in one go
  mfd: rts5249: Add support for RTS5250S power saving
  ACPI / PMIC: Add opregion driver for Intel Dollar Cove TI PMIC
  mfd: Add support for Cherry Trail Dollar Cove TI PMIC
  syscon: dt-bindings: Add binding document for iProc MHB block
  syscon: dt-bindings: Add binding doc for Broadcom iProc CDRU
  mfd: max77693: Add muic of_compatible in mfd_cell
  mfd: stw481x: Make three arrays static const, reduces object code size
  mfd: tps65217: Introduce dependency on CONFIG_OF
  ...
2017-11-16 09:15:57 -08:00
Linus Torvalds
2bf16b7a73 Merge tag 'char-misc-4.15-rc1' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc
Pull char/misc updates from Greg KH:
 "Here is the big set of char/misc and other driver subsystem patches
  for 4.15-rc1.

  There are small changes all over here, hyperv driver updates, pcmcia
  driver updates, w1 driver updats, vme driver updates, nvmem driver
  updates, and lots of other little one-off driver updates as well. The
  shortlog has the full details.

  All of these have been in linux-next for quite a while with no
  reported issues"

* tag 'char-misc-4.15-rc1' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (90 commits)
  VME: Return -EBUSY when DMA list in use
  w1: keep balance of mutex locks and refcnts
  MAINTAINERS: Update VME subsystem tree.
  nvmem: sunxi-sid: add support for A64/H5's SID controller
  nvmem: imx-ocotp: Update module description
  nvmem: imx-ocotp: Enable i.MX7D OTP write support
  nvmem: imx-ocotp: Add i.MX7D timing write clock setup support
  nvmem: imx-ocotp: Move i.MX6 write clock setup to dedicated function
  nvmem: imx-ocotp: Add support for banked OTP addressing
  nvmem: imx-ocotp: Pass parameters via a struct
  nvmem: imx-ocotp: Restrict OTP write to IMX6 processors
  nvmem: uniphier: add UniPhier eFuse driver
  dt-bindings: nvmem: add description for UniPhier eFuse
  nvmem: set nvmem->owner to nvmem->dev->driver->owner if unset
  nvmem: qfprom: fix different address space warnings of sparse
  nvmem: mtk-efuse: fix different address space warnings of sparse
  nvmem: mtk-efuse: use stack for nvmem_config instead of malloc'ing it
  nvmem: imx-iim: use stack for nvmem_config instead of malloc'ing it
  thunderbolt: tb: fix use after free in tb_activate_pcie_devices
  MAINTAINERS: Add git tree for Thunderbolt development
  ...
2017-11-16 09:10:59 -08:00
Linus Torvalds
b9743042b3 Merge tag 'driver-core-4.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
Pull driver core updates from Greg KH:
 "Here is the set of driver core / debugfs patches for 4.15-rc1.

  Not many here, mostly all are debugfs fixes to resolve some
  long-reported problems with files going away with references to them
  in userspace. There's also some SPDX cleanups for the debugfs code, as
  well as a few other minor driver core changes for issues reported by
  people.

  All of these have been in linux-next for a week or more with no
  reported issues"

* tag 'driver-core-4.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
  driver core: Fix device link deferred probe
  debugfs: Remove redundant license text
  debugfs: add SPDX identifiers to all debugfs files
  debugfs: defer debugfs_fsdata allocation to first usage
  debugfs: call debugfs_real_fops() only after debugfs_file_get()
  debugfs: purge obsolete SRCU based removal protection
  IB/hfi1: convert to debugfs_file_get() and -put()
  debugfs: convert to debugfs_file_get() and -put()
  debugfs: debugfs_real_fops(): drop __must_hold sparse annotation
  debugfs: implement per-file removal protection
  debugfs: add support for more elaborate ->d_fsdata
  driver core: Move device_links_purge() after bus_remove_device()
  arch_topology: Fix section miss match warning due to free_raw_capacity()
  driver-core: pr_err() strings should end with newlines
2017-11-16 08:55:30 -08:00
Masahiro Yamada
ba5b5034bd arm64: dts: uniphier: route on-board device IRQ to GPIO controller for PXs3
Commit 429f203eb7 ("arm64: dts: uniphier: route on-board device IRQ
to GPIO controller") missed to update this DTS.  It becames a real
problem when arm and arm64 trees are merged together.

Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2017-11-16 17:16:55 +01:00
Masahiro Yamada
7f855fc805 kbuild: move coccicheck help from scripts/Makefile.help to top Makefile
In my view, it is not helpful to have a separate file just for
the coccicheck help message.  Merge scripts/Makefile.help into
the top-level Makefile.

Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Acked-by: Julia Lawall <julia.lawall@lip6.fr>
2017-11-17 00:33:09 +09:00
Masahiro Yamada
52c291a36c sh: decompressor: add shipped files to .gitignore
These files are copied from arch/sh/lib, so should be ignored by git.

Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
2017-11-17 00:33:08 +09:00
Masahiro Yamada
380a1edbcb frv: .gitignore: ignore vmlinux.lds
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
2017-11-17 00:33:08 +09:00
Heiko Carstens
ab35727eb8 s390: remove unused parameter from Makefile
Remove unused parameter from the call function,
which I accidentally added.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2017-11-16 15:06:21 +01:00
Steffen Maier
5c13db9b5d zfcp: purely mechanical update using timer API, plus blank lines
erp_memwait only occurs in seldom memory pressure situations.
The typical case never uses the associated timer and thus also
does not need to initialize the timer.
Also, we don't want to re-initialize the timer each time we re-use an
erp_action in zfcp_erp_setup_act() [see also v4.14-rc7 commit ab31fd0ce6
("scsi: zfcp: fix erp_action use-before-initialize in REC action trace")
for erp_action life cycle].
Hence, retain the lazy inintialization of zfcp_erp_action.timer
in zfcp_erp_strategy_memwait().

Add an empty line after declarations in zfcp_erp_timeout_handler()
and zfcp_fsf_request_timeout_handler() even though it was also missing
before the timer conversion.

Fix checkpatch warning:
WARNING: function definition argument 'struct timer_list *' should also have an identifier name
+extern void zfcp_erp_timeout_handler(struct timer_list *);

Depends-on: v4.14-rc3 commit 686fef928b ("timer: Prepare to change timer callback argument type")
Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com>
Reviewed-by: Jens Remus <jremus@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2017-11-16 15:06:19 +01:00
Kees Cook
75492a5156 s390/scsi: Convert timers to use timer_setup()
In preparation for unconditionally passing the struct timer_list pointer to
all timer callbacks, switch to using the new timer_setup() and from_timer()
to pass the timer pointer explicitly.

Cc: Steffen Maier <maier@linux.vnet.ibm.com>
Cc: Benjamin Block <bblock@linux.vnet.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: linux-s390@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2017-11-16 15:06:17 +01:00
Hendrik Brueckner
544e8dd7a8 s390/cpum_sf: correctly set the PID and TID in perf samples
The hardware sampler creates samples that are processed at a later
point in time.  The PID and TID values of the perf samples that are
created for hardware samples are initialized with values from the
current task.  Hence, the PID and TID values are not correct and
perf samples are associated with wrong processes.

The PID and TID values are obtained from the Host Program Parameter
(HPP) field in the basic-sampling data entries.  These PIDs are
valid in the init PID namespace.  Ensure that the PIDs in the perf
samples are resolved considering the PID namespace in which the
perf event was created.

To correct the PID and TID values in the created perf samples,
a special overflow handler is installed.  It replaces the default
overflow handler and does not become effective if any other
overflow handler is used.  With the special overflow handler most
of the perf samples are associated with the right processes.
For processes, that are no longer exist, the association might
still be wrong.

Signed-off-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2017-11-16 15:06:17 +01:00
Hendrik Brueckner
d4c7e649d7 s390/cpum_sf: load program parameter at sampler enablement
The lpp instruction is used to place the PID of the current
task in the program-parameter (PP) register.  The register
contents is then included in the sampling data entries.

The lpp instruction loads the PP register only when at least
one sampling function is enabled.  Otherwise it is executed
as a no-op.

Linux calls lpp at context switch.  If the context switch
happens before the sampler is enabled, the PP register is
empty.  That means, the PID of the task that is sampled is
not stored in sampling data until the next context switch.

Hence, always call lpp when enabling the sampler.

Signed-off-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2017-11-16 15:06:16 +01:00
Hendrik Brueckner
de9954b75e s390/perf: add perf register support for floating-point registers
For correct unwinding of user space processes, the floating-point
register contents are required.  For example, leaf functions might
use fp registers to temporarily store the return address.

Signed-off-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Reviewed-and-tested-by: Thomas Richter <tmricht@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2017-11-16 15:06:15 +01:00
Hendrik Brueckner
0da0017f72 s390/perf: extend perf_regs support to include floating-point registers
Extend the perf register support to also export floating-point register
contents for user space tasks.  Floating-point registers might be used
in leaf functions to contain the return address.  Hence, they are required
for proper DWARF unwinding.

Signed-off-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Reviewed-and-tested-by: Thomas Richter <tmricht@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2017-11-16 15:06:14 +01:00
Hendrik Brueckner
a9fc2db0a8 s390/perf: define common DWARF register string table
Instead of defining DWARF register to string table in dwarf-regs-table.h
and dwarf-regs.c, use a common table in dwarf-regs-table.h.

Ensure that the DWARF register table is up-to-date with
http://refspecs.linuxfoundation.org/ELF/zSeries/lzsabi0_s390/x1542.html.

For unwinding with libdw, also ensure to correctly setup the DWARF
register frame according to the register mappings.  Currently, libdw
supports up to 32 registers only.

Suggested-by: Thomas Richter <tmricht@linux.vnet.ibm.com>
Signed-off-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Reviewed-and-tested-by: Thomas Richter <tmricht@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2017-11-16 15:06:13 +01:00
Heiko Carstens
f704ef4460 s390/perf: add support for perf_regs and libdw
With support for perf_regs and libdw, you can record and report
call graphs for user space programs. Simply invoke perf with
the --call-graph=dwarf command line option.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
[brueckner: added dwfl_thread_state_register_pc() call]
Signed-off-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Reviewed-and-tested-by: Thomas Richter <tmricht@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2017-11-16 15:06:12 +01:00
Heiko Carstens
c33eff6005 s390/perf: add perf_regs support and user stack dump
Add s390 support to dump user stack to user space for DWARF
stack unwinding.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Reviewed-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Reviewed-and-tested-by: Thomas Richter <tmricht@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2017-11-16 15:06:11 +01:00
Hendrik Brueckner
9232c3c741 s390/cpum_sf: do not register PMU if no sampling mode is authorized
Previously, the cpum_sf PMU was registered even if there is no
sampling mode authorized.  Add a check and register cpum_sf only
at least one sampling mode is authorized.

Signed-off-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2017-11-16 15:06:10 +01:00
Pu Hou
3d43b981eb s390/cpumf: remove raw event support in basic-only sampling mode
Raw sample was implemented to export the diagnostic samples.
With having this achieved with AUX buffers, there is no requirement
for basic samples to export raw data.  In particular, most basic
sampling information are consumed for creating the perf event sample.

Signed-off-by: Pu Hou <bjhoupu@linux.vnet.ibm.com>
Reviewed-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2017-11-16 15:06:09 +01:00
Pu Hou
a3f22d505f s390/perf: add callback to perf to enable using AUX buffer
Perf tool need implement a callback to enable using AUX buffer. Perf
will do another mmap() to trigger the setup of AUX buffer in kernel
if there is such callback. The default size of the AUX buffer is set
properly according to the sampling frequency to avoid overflow. It
could also be manually set by -m option of perf.

The interface of perf is not changed. Diagnostic mode sampling
could be started by `perf record -e rBD000` like before.

Signed-off-by: Pu Hou <bjhoupu@linux.vnet.ibm.com>
Reviewed-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2017-11-16 15:06:08 +01:00
Pu Hou
cbf6948f36 s390/cpumf: enable using AUX buffer
Modify PMU callback to use AUX buffer for diagnostic mode sampling.
Basic-mode sampling still use orignal way.

Signed-off-by: Pu Hou <bjhoupu@linux.vnet.ibm.com>
Reviewed-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2017-11-16 15:06:07 +01:00
Pu Hou
ca5955cdea s390/cpumf: introduce AUX buffer for dump diagnostic sample data
Current implementation uses a private buffer for cpumf to dump samples.
Samples first go to this buffer. Then copy to ring buffer allocated
by perf core. With AUX buffer, this copy is not needed. AUX buffer is
shared and zero-copy mapped to user space. The trailer information at
the end of each SDB(sample data block) is also exported to user space.
AUX buffer is used when diagnostic sampling mode is enabled.

This patch contains functions to setup/free AUX buffer or to begin/end
sampling per-cpu. Also include function called in interrupt to
collect samples.

Signed-off-by: Pu Hou <bjhoupu@linux.vnet.ibm.com>
Reviewed-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2017-11-16 15:06:06 +01:00
Radim Krčmář
a6014f1ab7 Merge tag 'kvm-s390-next-4.15-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux
KVM: s390: fixes and improvements for 4.15

- Some initial preparation patches for exitless interrupts and crypto
- New capability for AIS migration
- Fixes
- merge of the sthyi tree from the base s390 team, which moves the sthyi
out of KVM into a shared function also for non-KVM
2017-11-16 14:39:46 +01:00
Vasily Gorbik
b192571d1a s390/disassembler: increase show_code buffer size
Current buffer size of 64 is too small. objdump shows that there are
instructions which would require up to 75 bytes buffer (with current
formating). 128 bytes "ought to be enough for anybody".

Also replaces 8 spaces with a single tab to reduce the memory footprint.

Fixes the following KASAN finding:

BUG: KASAN: stack-out-of-bounds in number+0x3fe/0x538
Write of size 1 at addr 000000005a4a75a0 by task bash/1282

CPU: 1 PID: 1282 Comm: bash Not tainted 4.14.0+ #215
Hardware name: IBM 2964 N96 702 (z/VM 6.4.0)
Call Trace:
([<000000000011eeb6>] show_stack+0x56/0x88)
 [<0000000000e1ce1a>] dump_stack+0x15a/0x1b0
 [<00000000004e2994>] print_address_description+0xf4/0x288
 [<00000000004e2cf2>] kasan_report+0x13a/0x230
 [<0000000000e38ae6>] number+0x3fe/0x538
 [<0000000000e3dfe4>] vsnprintf+0x194/0x948
 [<0000000000e3ea42>] sprintf+0xa2/0xb8
 [<00000000001198dc>] print_insn+0x374/0x500
 [<0000000000119346>] show_code+0x4ee/0x538
 [<000000000011f234>] show_registers+0x34c/0x388
 [<000000000011f2ae>] show_regs+0x3e/0xa8
 [<000000000011f502>] die+0x1ea/0x2e8
 [<0000000000138f0e>] do_no_context+0x106/0x168
 [<0000000000139a1a>] do_protection_exception+0x4da/0x7d0
 [<0000000000e55914>] pgm_check_handler+0x16c/0x1c0
 [<000000000090639e>] sysrq_handle_crash+0x46/0x58
([<0000000000000007>] 0x7)
 [<00000000009073fa>] __handle_sysrq+0x102/0x218
 [<0000000000907c06>] write_sysrq_trigger+0xd6/0x100
 [<000000000061d67a>] proc_reg_write+0xb2/0x128
 [<0000000000520be6>] __vfs_write+0xee/0x368
 [<0000000000521222>] vfs_write+0x21a/0x278
 [<000000000052156a>] SyS_write+0xda/0x178
 [<0000000000e555cc>] system_call+0xc4/0x270

The buggy address belongs to the page:
page:000003d1016929c0 count:0 mapcount:0 mapping:          (null) index:0x0
flags: 0x0()
raw: 0000000000000000 0000000000000000 0000000000000000 ffffffff00000000
raw: 0000000000000100 0000000000000200 0000000000000000 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
 000000005a4a7480: 00 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
 000000005a4a7500: 00 00 00 00 00 00 00 00 f2 f2 f2 f2 00 00 00 00
>000000005a4a7580: 00 00 00 00 f3 f3 f3 f3 00 00 00 00 00 00 00 00
                               ^
 000000005a4a7600: 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1 f8 f8
 000000005a4a7680: f2 f2 f2 f2 f2 f2 f8 f8 f2 f2 f3 f3 f3 f3 00 00
==================================================================

Cc: <stable@vger.kernel.org>
Signed-off-by: Vasily Gorbik <gor@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2017-11-16 13:12:44 +01:00
Michael Holzheu
6470c0cc48 s390: Remove CONFIG_HARDENED_USERCOPY
When running the crash tool on a s390 live system we get a kernel panic
for reading memory within the kernel image:

 # uname -a
   Linux r3545011 4.14.0-rc8-00066-g1c9dbd4615fd #45 SMP PREEMPT Fri Nov 10 16:16:22 CET 2017 s390x s390x s390x GNU/Linux
 # crash /boot/vmlinux-devel /dev/mem
 # crash> rd 0x100000

 usercopy: kernel memory exposure attempt detected from 0000000000100000 (<kernel text>) (8 bytes)
 ------------[ cut here ]------------
 kernel BUG at mm/usercopy.c:72!
 illegal operation: 0001 ilc:1 [#1] PREEMPT SMP.
 Modules linked in:
 CPU: 0 PID: 1461 Comm: crash Not tainted 4.14.0-rc8-00066-g1c9dbd4615fd-dirty #46
 Hardware name: IBM 2827 H66 706 (z/VM 6.3.0)
 task: 000000001ad10100 task.stack: 000000001df78000
 Krnl PSW : 0704d00180000000 000000000038165c (__check_object_size+0x164/0x1d0)
            R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:1 PM:0 RI:0 EA:3
 Krnl GPRS: 0000000012440e1d 0000000080000000 0000000000000061 00000000001cabc0
            00000000001cc6d6 0000000000000000 0000000000cc4ed2 0000000000001000
            000003ffc22fdd20 0000000000000008 0000000000100008 0000000000000001
            0000000000000008 0000000000100000 0000000000381658 000000001df7bc90
 Krnl Code: 000000000038164c: c020004a1c4a        larl    %r2,cc4ee0
            0000000000381652: c0e5fff2581b        brasl   %r14,1cc688
           #0000000000381658: a7f40001            brc     15,38165a
           >000000000038165c: eb42000c000c        srlg    %r4,%r2,12
            0000000000381662: eb32001c000c        srlg    %r3,%r2,28
            0000000000381668: c0110003ffff        lgfi    %r1,262143
            000000000038166e: ec31ff752065        clgrj   %r3,%r1,2,381558
            0000000000381674: a7f4ff67            brc     15,381542
 Call Trace:
 ([<0000000000381658>] __check_object_size+0x160/0x1d0)
  [<000000000082263a>] read_mem+0xaa/0x130.
  [<0000000000386182>] __vfs_read+0x42/0x168.
  [<000000000038632e>] vfs_read+0x86/0x140.
  [<0000000000386a26>] SyS_read+0x66/0xc0.
  [<0000000000ace6a4>] system_call+0xc4/0x2b0.
 INFO: lockdep is turned off.
 Last Breaking-Event-Address:
  [<0000000000381658>] __check_object_size+0x160/0x1d0

 Kernel panic - not syncing: Fatal exception: panic_on_oops

With CONFIG_HARDENED_USERCOPY copy_to_user() checks in __check_object_size()
if the source address is within the kernel image. When the crash tool reads
from 0x100000, this check leads to the kernel BUG().

So disable the kernel config option until this bug is fixed.

Corresponding bug report on LKML: https://lkml.org/lkml/2017/11/10/341

Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2017-11-16 13:12:21 +01:00
Ming Lei
34d9715ac1 block: wake up all tasks blocked in get_request()
Once blk_set_queue_dying() is done in blk_cleanup_queue(), we call
blk_freeze_queue() and wait for q->q_usage_counter becoming zero. But
if there are tasks blocked in get_request(), q->q_usage_counter can
never become zero. So we have to wake up all these tasks in
blk_set_queue_dying() first.

Fixes: 3ef28e83ab ("block: generic request_queue reference counting")
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-11-15 21:51:03 -07:00
Linus Torvalds
e60e1ee606 Merge tag 'drm-for-v4.15' of git://people.freedesktop.org/~airlied/linux
Pull drm updates from Dave Airlie:
 "This is the main drm pull request for v4.15.

  Core:
   - Atomic object lifetime fixes
   - Atomic iterator improvements
   - Sparse/smatch fixes
   - Legacy kms ioctls to be interruptible
   - EDID override improvements
   - fb/gem helper cleanups
   - Simple outreachy patches
   - Documentation improvements
   - Fix dma-buf rcu races
   - DRM mode object leasing for improving VR use cases.
   - vgaarb improvements for non-x86 platforms.

  New driver:
   - tve200: Faraday Technology TVE200 block.

     This "TV Encoder" encodes a ITU-T BT.656 stream and can be found in
     the StorLink SL3516 (later Cortina Systems CS3516) as well as the
     Grain Media GM8180.

  New bridges:
   - SiI9234 support

  New panels:
   - S6E63J0X03, OTM8009A, Seiko 43WVF1G, 7" rpi touch panel, Toshiba
     LT089AC19000, Innolux AT043TN24

  i915:
   - Remove Coffeelake from alpha support
   - Cannonlake workarounds
   - Infoframe refactoring for DisplayPort
   - VBT updates
   - DisplayPort vswing/emph/buffer translation refactoring
   - CCS fixes
   - Restore GPU clock boost on missed vblanks
   - Scatter list updates for userptr allocations
   - Gen9+ transition watermarks
   - Display IPC (Isochronous Priority Control)
   - Private PAT management
   - GVT: improved error handling and pci config sanitizing
   - Execlist refactoring
   - Transparent Huge Page support
   - User defined priorities support
   - HuC/GuC firmware refactoring
   - DP MST fixes
   - eDP power sequencing fixes
   - Use RCU instead of stop_machine
   - PSR state tracking support
   - Eviction fixes
   - BDW DP aux channel timeout fixes
   - LSPCON fixes
   - Cannonlake PLL fixes

  amdgpu:
   - Per VM BO support
   - Powerplay cleanups
   - CI powerplay support
   - PASID mgr for kfd
   - SR-IOV fixes
   - initial GPU reset for vega10
   - Prime mmap support
   - TTM updates
   - Clock query interface for Raven
   - Fence to handle ioctl
   - UVD encode ring support on Polaris
   - Transparent huge page DMA support
   - Compute LRU pipe tweaks
   - BO flag to allow buffers to opt out of implicit sync
   - CTX priority setting API
   - VRAM lost infrastructure plumbing

  qxl:
   - fix flicker since atomic rework

  amdkfd:
   - Further improvements from internal AMD tree
   - Usermode events
   - Drop radeon support

  nouveau:
   - Pascal temperature sensor support
   - Improved BAR2 handling
   - MMU rework to support Pascal MMU

  exynos:
   - Improved HDMI/mixer support
   - HDMI audio interface support

  tegra:
   - Prep work for tegra186
   - Cleanup/fixes

  msm:
   - Preemption support for a5xx
   - Display fixes for 8x96 (snapdragon 820)
   - Async cursor plane fixes
   - FW loading rework
   - GPU debugging improvements

  vc4:
   - Prep for DSI panels
   - fix T-format tiling scanout
   - New madvise ioctl

  Rockchip:
   - LVDS support

  omapdrm:
   - omap4 HDMI CEC support

  etnaviv:
   - GPU performance counters groundwork

  sun4i:
   - refactor driver load + TCON backend
   - HDMI improvements
   - A31 support
   - Misc fixes

  udl:
   - Probe/EDID read fixes.

  tilcdc:
   - Misc fixes.

  pl111:
   - Support more variants

  adv7511:
   - Improve EDID handling.
   - HDMI CEC support

  sii8620:
   - Add remote control support"

* tag 'drm-for-v4.15' of git://people.freedesktop.org/~airlied/linux: (1480 commits)
  drm/rockchip: analogix_dp: Use mutex rather than spinlock
  drm/mode_object: fix documentation for object lookups.
  drm/i915: Reorder context-close to avoid calling i915_vma_close() under RCU
  drm/i915: Move init_clock_gating() back to where it was
  drm/i915: Prune the reservation shared fence array
  drm/i915: Idle the GPU before shinking everything
  drm/i915: Lock llist_del_first() vs llist_del_all()
  drm/i915: Calculate ironlake intermediate watermarks correctly, v2.
  drm/i915: Disable lazy PPGTT page table optimization for vGPU
  drm/i915/execlists: Remove the priority "optimisation"
  drm/i915: Filter out spurious execlists context-switch interrupts
  drm/amdgpu: use irq-safe lock for kiq->ring_lock
  drm/amdgpu: bypass lru touch for KIQ ring submission
  drm/amdgpu: Potential uninitialized variable in amdgpu_vm_update_directories()
  drm/amdgpu: potential uninitialized variable in amdgpu_vce_ring_parse_cs()
  drm/amd/powerplay: initialize a variable before using it
  drm/amd/powerplay: suppress KASAN out of bounds warning in vega10_populate_all_memory_levels
  drm/amd/amdgpu: fix evicted VRAM bo adjudgement condition
  drm/vblank: Tune drm_crtc_accurate_vblank_count() WARN down to a debug
  drm/rockchip: add CONFIG_OF dependency for lvds
  ...
2017-11-15 20:42:10 -08:00
Linus Torvalds
5d352e69c6 Merge tag 'media/v4.15-1' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media
Pull media updates from Mauro Carvalho Chehab:

 - Documentation for digital TV (both kAPI and uAPI) are now in sync
   with the implementation (except for legacy/deprecated ioctls). This
   is a major step, as there were always a gap there

 - New sensor driver: imx274

 - New cec driver: cec-gpio

 - New platform driver for rockship rga and tegra CEC

 - New RC driver: tango-ir

 - Several cleanups at atomisp driver

 - Core improvements for RC, CEC, V4L2 async probing support and DVB

 - Lots of drivers cleanup, fixes and improvements.

* tag 'media/v4.15-1' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: (332 commits)
  dvb_frontend: don't use-after-free the frontend struct
  media: dib0700: fix invalid dvb_detach argument
  media: v4l2-ctrls: Don't validate BITMASK twice
  media: s5p-mfc: fix lockdep warning
  media: dvb-core: always call invoke_release() in fe_free()
  media: usb: dvb-usb-v2: dvb_usb_core: remove redundant code in dvb_usb_fe_sleep
  media: au0828: make const array addr_list static
  media: cx88: make const arrays default_addr_list and pvr2000_addr_list static
  media: drxd: make const array fastIncrDecLUT static
  media: usb: fix spelling mistake: "synchronuously" -> "synchronously"
  media: ddbridge: fix build warnings
  media: av7110: avoid 2038 overflow in debug print
  media: Don't do DMA on stack for firmware upload in the AS102 driver
  media: v4l: async: fix unregister for implicitly registered sub-device notifiers
  media: v4l: async: fix return of unitialized variable ret
  media: imx274: fix missing return assignment from call to imx274_mode_regs
  media: camss-vfe: always initialize reg at vfe_set_xbar_cfg()
  media: atomisp: make function calls cleaner
  media: atomisp: get rid of storage_class.h
  media: atomisp: get rid of wrong stddef.h include
  ...
2017-11-15 20:30:12 -08:00
Linus Torvalds
93ea0eb7d7 Merge tag 'leaks-4.15-rc1' of git://github.com/tcharding/linux
Pull leaking_addresses script updates from Tobin Harding:
 "Here are development patches for the leaking_addresses.pl script.

  Changes include:

   - add summary reporting to the script

   - add 'SigIgn' to false positives

   - add a file read timeout so the script doesn't block indefinitely

   - add infrastructure to enable multi-arch support and add support for ppc

   - add some exclude files/paths suggested by various people

   - code clean up and refactoring

   - overhaul command line options"

* tag 'leaks-4.15-rc1' of git://github.com/tcharding/linux:
  leaking_addresses: add SigIgn to false positives
  leaking_addresses: add timeout on file read
  leaking_addresses: add support for ppc64
  leaking_addresses: add summary reporting options
  leaking_addresses: add to exclude files/paths list
  leaking_addresses: fix comment string typo
  leaking_addresses: remove command line options
  leaking_addresses: remove dead/unused code
  leaking_addresses: use tabs instead of spaces
2017-11-15 20:11:56 -08:00
Linus Torvalds
7c225c69f8 Merge branch 'akpm' (patches from Andrew)
Merge updates from Andrew Morton:

 - a few misc bits

 - ocfs2 updates

 - almost all of MM

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (131 commits)
  memory hotplug: fix comments when adding section
  mm: make alloc_node_mem_map a void call if we don't have CONFIG_FLAT_NODE_MEM_MAP
  mm: simplify nodemask printing
  mm,oom_reaper: remove pointless kthread_run() error check
  mm/page_ext.c: check if page_ext is not prepared
  writeback: remove unused function parameter
  mm: do not rely on preempt_count in print_vma_addr
  mm, sparse: do not swamp log with huge vmemmap allocation failures
  mm/hmm: remove redundant variable align_end
  mm/list_lru.c: mark expected switch fall-through
  mm/shmem.c: mark expected switch fall-through
  mm/page_alloc.c: broken deferred calculation
  mm: don't warn about allocations which stall for too long
  fs: fuse: account fuse_inode slab memory as reclaimable
  mm, page_alloc: fix potential false positive in __zone_watermark_ok
  mm: mlock: remove lru_add_drain_all()
  mm, sysctl: make NUMA stats configurable
  shmem: convert shmem_init_inodecache() to void
  Unify migrate_pages and move_pages access checks
  mm, pagevec: rename pagevec drained field
  ...
2017-11-15 19:42:40 -08:00
Dave Airlie
49e37ba07a Merge branch 'drm-next-4.15-dc' of git://people.freedesktop.org/~agd5f/linux into drm-next
Various fixes for DC for 4.15.

* 'drm-next-4.15-dc' of git://people.freedesktop.org/~agd5f/linux:
  drm/amd/display: fix MST link training fail division by 0
  drm/amd/display: Fix formatting for null pointer dereference fix
  drm/amd/display: Remove dangling planes on dc commit state
  drm/amd/display: add flip_immediate to commit update for stream
  drm/amd/display: Miss register MST encoder cbs
  drm/amd/display: Fix warnings on S3 resume
  drm/amd/display: use num_timing_generator instead of pipe_count
  drm/amd/display: use configurable FBC option in dm
  drm/amd/display: fix AZ clock not enabled before program AZ endpoint
  amdgpu/dm: Don't use DRM_ERROR in amdgpu_dm_atomic_check
2017-11-16 12:39:40 +10:00
Fan Du
1b7176aea0 memory hotplug: fix comments when adding section
Here, pfn_to_node should be page_to_nid.

Link: http://lkml.kernel.org/r/1510735205-22540-1-git-send-email-fan.du@intel.com
Signed-off-by: Fan Du <fan.du@intel.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-15 18:21:07 -08:00
Oscar Salvador
0cd842f970 mm: make alloc_node_mem_map a void call if we don't have CONFIG_FLAT_NODE_MEM_MAP
free_area_init_node() calls alloc_node_mem_map(), but this function does
nothing unless we have CONFIG_FLAT_NODE_MEM_MAP.

As a cleanup, we can move the "#ifdef CONFIG_FLAT_NODE_MEM_MAP" within
alloc_node_mem_map() out of the function, and define a
alloc_node_mem_map() { } when CONFIG_FLAT_NODE_MEM_MAP is not present.

This also moves the printk that lays within the "#ifdef
CONFIG_FLAT_NODE_MEM_MAP" block from free_area_init_node() to
alloc_node_mem_map(), getting rid of the "#ifdef
CONFIG_FLAT_NODE_MEM_MAP" in free_area_init_node().

[akpm@linux-foundation.org: clean up the printk while we're there]
Link: http://lkml.kernel.org/r/20171114111935.GA11758@techadventures.net
Signed-off-by: Oscar Salvador <osalvador@techadventures.net>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-15 18:21:07 -08:00
Michal Hocko
0205f75571 mm: simplify nodemask printing
alloc_warn() and dump_header() have to explicitly handle NULL nodemask
which forces both paths to use pr_cont.  We can do better.  printk
already handles NULL pointers properly so all we need is to teach
nodemask_pr_args to handle NULL nodemask carefully.  This allows
simplification of both alloc_warn() and dump_header() and gets rid of
pr_cont altogether.

This patch has been motivated by patch from Joe Perches

  http://lkml.kernel.org/r/b31236dfe3fc924054fd7842bde678e71d193638.1509991345.git.joe@perches.com

[akpm@linux-foundation.org: fix tile warning, per Arnd]
Link: http://lkml.kernel.org/r/20171109100531.3cn2hcqnuj7mjaju@dhcp22.suse.cz
Signed-off-by: Michal Hocko <mhocko@suse.com>
Acked-by: Joe Perches <joe@perches.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-15 18:21:07 -08:00
Tetsuo Handa
c50842c8e1 mm,oom_reaper: remove pointless kthread_run() error check
Since oom_init() is called before userspace processes start, memory
allocation failure for creating the OOM reaper kernel thread will let
the OOM killer call panic() rather than wake up the OOM reaper.

Link: http://lkml.kernel.org/r/1510137800-4602-1-git-send-email-penguin-kernel@I-love.SAKURA.ne.jp
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-15 18:21:07 -08:00
Jaewon Kim
e492080e64 mm/page_ext.c: check if page_ext is not prepared
online_page_ext() and page_ext_init() allocate page_ext for each
section, but they do not allocate if the first PFN is !pfn_present(pfn)
or !pfn_valid(pfn).  Then section->page_ext remains as NULL.
lookup_page_ext checks NULL only if CONFIG_DEBUG_VM is enabled.  For a
valid PFN, __set_page_owner will try to get page_ext through
lookup_page_ext.  Without CONFIG_DEBUG_VM lookup_page_ext will misuse
NULL pointer as value 0.  This incurrs invalid address access.

This is the panic example when PFN 0x100000 is not valid but PFN
0x13FC00 is being used for page_ext.  section->page_ext is NULL,
get_entry returned invalid page_ext address as 0x1DFA000 for a PFN
0x13FC00.

To avoid this panic, CONFIG_DEBUG_VM should be removed so that page_ext
will be checked at all times.

  Unable to handle kernel paging request at virtual address 01dfa014
  ------------[ cut here ]------------
  Kernel BUG at ffffff80082371e0 [verbose debug info unavailable]
  Internal error: Oops: 96000045 [#1] PREEMPT SMP
  Modules linked in:
  PC is at __set_page_owner+0x48/0x78
  LR is at __set_page_owner+0x44/0x78
    __set_page_owner+0x48/0x78
    get_page_from_freelist+0x880/0x8e8
    __alloc_pages_nodemask+0x14c/0xc48
    __do_page_cache_readahead+0xdc/0x264
    filemap_fault+0x2ac/0x550
    ext4_filemap_fault+0x3c/0x58
    __do_fault+0x80/0x120
    handle_mm_fault+0x704/0xbb0
    do_page_fault+0x2e8/0x394
    do_mem_abort+0x88/0x124

Pre-4.7 kernels also need commit f86e427197 ("mm: check the return
value of lookup_page_ext for all call sites").

Link: http://lkml.kernel.org/r/20171107094131.14621-1-jaewon31.kim@samsung.com
Fixes: eefa864b70 ("mm/page_ext: resurrect struct page extending code for debugging")
Signed-off-by: Jaewon Kim <jaewon31.kim@samsung.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Joonsoo Kim <js1304@gmail.com>
Cc: <stable@vger.kernel.org>	[depends on f86e427197, see above]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-15 18:21:07 -08:00
Wang Long
2bce774e82 writeback: remove unused function parameter
The parameter `struct bdi_writeback *wb` is not been used in the
function body.  Remove it.

Link: http://lkml.kernel.org/r/1509685485-15278-1-git-send-email-wanglong19@meituan.com
Signed-off-by: Wang Long <wanglong19@meituan.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Acked-by: Tejun Heo <tj@kernel.org>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-15 18:21:07 -08:00
Michal Hocko
0a7f682d04 mm: do not rely on preempt_count in print_vma_addr
The preempt count check on print_vma_addr has been added by commit
e8bff74afb ("x86: fix "BUG: sleeping function called from invalid
context" in print_vma_addr()") and it relied on the elevated preempt
count from preempt_conditional_sti because preempt_count check doesn't
work on non preemptive kernels by default.

The code has evolved though and commit d99e1bd175 ("x86/entry/traps:
Refactor preemption and interrupt flag handling") has replaced
preempt_conditional_sti by an explicit preempt_disable which is noop on
!PREEMPT so the check in print_vma_addr is broken.

Fix the issue by using trylock on mmap_sem rather than chacking the
preempt count.  The allocation we are relying on has to be GFP_NOWAIT as
well.  There is a chance that we won't dump the vma state if the lock is
contended or the memory short but this is acceptable outcome and much
less fragile than the not working preemption check or tricks around it.

Link: http://lkml.kernel.org/r/20171106134031.g6dbelg55mrbyc6i@dhcp22.suse.cz
Fixes: d99e1bd175 ("x86/entry/traps: Refactor preemption and interrupt flag handling")
Signed-off-by: Michal Hocko <mhocko@suse.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Yang Shi <yang.s@alibaba-inc.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-15 18:21:07 -08:00
Michal Hocko
fcdaf842bd mm, sparse: do not swamp log with huge vmemmap allocation failures
While doing memory hotplug tests under heavy memory pressure we have
noticed too many page allocation failures when allocating vmemmap memmap
backed by huge page

  kworker/u3072:1: page allocation failure: order:9, mode:0x24084c0(GFP_KERNEL|__GFP_REPEAT|__GFP_ZERO)
  [...]
  Call Trace:
    dump_trace+0x59/0x310
    show_stack_log_lvl+0xea/0x170
    show_stack+0x21/0x40
    dump_stack+0x5c/0x7c
    warn_alloc_failed+0xe2/0x150
    __alloc_pages_nodemask+0x3ed/0xb20
    alloc_pages_current+0x7f/0x100
    vmemmap_alloc_block+0x79/0xb6
    __vmemmap_alloc_block_buf+0x136/0x145
    vmemmap_populate+0xd2/0x2b9
    sparse_mem_map_populate+0x23/0x30
    sparse_add_one_section+0x68/0x18e
    __add_pages+0x10a/0x1d0
    arch_add_memory+0x4a/0xc0
    add_memory_resource+0x89/0x160
    add_memory+0x6d/0xd0
    acpi_memory_device_add+0x181/0x251
    acpi_bus_attach+0xfd/0x19b
    acpi_bus_scan+0x59/0x69
    acpi_device_hotplug+0xd2/0x41f
    acpi_hotplug_work_fn+0x1a/0x23
    process_one_work+0x14e/0x410
    worker_thread+0x116/0x490
    kthread+0xbd/0xe0
    ret_from_fork+0x3f/0x70

and we do see many of those because essentially every allocation fails
for each memory section.  This is an excessive way to tell the user that
there is nothing to really worry about because we do have a fallback
mechanism to use base pages.  The only downside might be a performance
degradation due to TLB pressure.

This patch changes vmemmap_alloc_block() to use __GFP_NOWARN and warn
explicitly once on the first allocation failure.  This will reduce the
noise in the kernel log considerably, while we still have an indication
that a performance might be impacted.

[mhocko@kernel.org: forgot to git add the follow up fix]
  Link: http://lkml.kernel.org/r/20171107090635.c27thtse2lchjgvb@dhcp22.suse.cz
Link: http://lkml.kernel.org/r/20171106092228.31098-1-mhocko@kernel.org
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Michal Hocko <mhocko@suse.com>
Cc: Joe Perches <joe@perches.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Khalid Aziz <khalid.aziz@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-15 18:21:07 -08:00
Colin Ian King
fec11bc039 mm/hmm: remove redundant variable align_end
Variable align_end is assigned a value but it is never read, so the
variable is redundant and can be removed.  Cleans up the clang warning:
Value stored to 'align_end' is never read

Link: http://lkml.kernel.org/r/20171017143837.23207-1-colin.king@canonical.com
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Reviewed-by: Jérôme Glisse <jglisse@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-15 18:21:07 -08:00
Gustavo A. R. Silva
5b568acc3c mm/list_lru.c: mark expected switch fall-through
In preparation for enabling -Wimplicit-fallthrough, mark switch cases
where we are expecting to fall through.

Link: http://lkml.kernel.org/r/20171020190754.GA24332@embeddedor.com
Signed-off-by: Gustavo A. R. Silva <garsilva@embeddedor.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-15 18:21:07 -08:00
Gustavo A. R. Silva
c8402871d5 mm/shmem.c: mark expected switch fall-through
In preparation to enabling -Wimplicit-fallthrough, mark switch cases
where we are expecting to fall through.

Link: http://lkml.kernel.org/r/20171020191058.GA24427@embeddedor.com
Signed-off-by: Gustavo A. R. Silva <garsilva@embeddedor.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-15 18:21:07 -08:00
Pavel Tatashin
d135e57502 mm/page_alloc.c: broken deferred calculation
In reset_deferred_meminit() we determine number of pages that must not
be deferred.  We initialize pages for at least 2G of memory, but also
pages for reserved memory in this node.

The reserved memory is determined in this function:
memblock_reserved_memory_within(), which operates over physical
addresses, and returns size in bytes.  However, reset_deferred_meminit()
assumes that that this function operates with pfns, and returns page
count.

The result is that in the best case machine boots slower than expected
due to initializing more pages than needed in single thread, and in the
worst case panics because fewer than needed pages are initialized early.

Link: http://lkml.kernel.org/r/20171021011707.15191-1-pasha.tatashin@oracle.com
Fixes: 864b9a393d ("mm: consider memblock reservations for deferred memory initialization sizing")
Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-15 18:21:07 -08:00
Tetsuo Handa
400e22499d mm: don't warn about allocations which stall for too long
Commit 63f53dea0c ("mm: warn about allocations which stall for too
long") was a great step for reducing possibility of silent hang up
problem caused by memory allocation stalls.  But this commit reverts it,
for it is possible to trigger OOM lockup and/or soft lockups when many
threads concurrently called warn_alloc() (in order to warn about memory
allocation stalls) due to current implementation of printk(), and it is
difficult to obtain useful information due to limitation of synchronous
warning approach.

Current printk() implementation flushes all pending logs using the
context of a thread which called console_unlock().  printk() should be
able to flush all pending logs eventually unless somebody continues
appending to printk() buffer.

Since warn_alloc() started appending to printk() buffer while waiting
for oom_kill_process() to make forward progress when oom_kill_process()
is processing pending logs, it became possible for warn_alloc() to force
oom_kill_process() loop inside printk().  As a result, warn_alloc()
significantly increased possibility of preventing oom_kill_process()
from making forward progress.

---------- Pseudo code start ----------
Before warn_alloc() was introduced:

  retry:
    if (mutex_trylock(&oom_lock)) {
      while (atomic_read(&printk_pending_logs) > 0) {
        atomic_dec(&printk_pending_logs);
        print_one_log();
      }
      // Send SIGKILL here.
      mutex_unlock(&oom_lock)
    }
    goto retry;

After warn_alloc() was introduced:

  retry:
    if (mutex_trylock(&oom_lock)) {
      while (atomic_read(&printk_pending_logs) > 0) {
        atomic_dec(&printk_pending_logs);
        print_one_log();
      }
      // Send SIGKILL here.
      mutex_unlock(&oom_lock)
    } else if (waited_for_10seconds()) {
      atomic_inc(&printk_pending_logs);
    }
    goto retry;
---------- Pseudo code end ----------

Although waited_for_10seconds() becomes true once per 10 seconds,
unbounded number of threads can call waited_for_10seconds() at the same
time.  Also, since threads doing waited_for_10seconds() keep doing
almost busy loop, the thread doing print_one_log() can use little CPU
resource.  Therefore, this situation can be simplified like

---------- Pseudo code start ----------
  retry:
    if (mutex_trylock(&oom_lock)) {
      while (atomic_read(&printk_pending_logs) > 0) {
        atomic_dec(&printk_pending_logs);
        print_one_log();
      }
      // Send SIGKILL here.
      mutex_unlock(&oom_lock)
    } else {
      atomic_inc(&printk_pending_logs);
    }
    goto retry;
---------- Pseudo code end ----------

when printk() is called faster than print_one_log() can process a log.

One of possible mitigation would be to introduce a new lock in order to
make sure that no other series of printk() (either oom_kill_process() or
warn_alloc()) can append to printk() buffer when one series of printk()
(either oom_kill_process() or warn_alloc()) is already in progress.

Such serialization will also help obtaining kernel messages in readable
form.

---------- Pseudo code start ----------
  retry:
    if (mutex_trylock(&oom_lock)) {
      mutex_lock(&oom_printk_lock);
      while (atomic_read(&printk_pending_logs) > 0) {
        atomic_dec(&printk_pending_logs);
        print_one_log();
      }
      // Send SIGKILL here.
      mutex_unlock(&oom_printk_lock);
      mutex_unlock(&oom_lock)
    } else {
      if (mutex_trylock(&oom_printk_lock)) {
        atomic_inc(&printk_pending_logs);
        mutex_unlock(&oom_printk_lock);
      }
    }
    goto retry;
---------- Pseudo code end ----------

But this commit does not go that direction, for we don't want to
introduce a new lock dependency, and we unlikely be able to obtain
useful information even if we serialized oom_kill_process() and
warn_alloc().

Synchronous approach is prone to unexpected results (e.g.  too late [1],
too frequent [2], overlooked [3]).  As far as I know, warn_alloc() never
helped with providing information other than "something is going wrong".
I want to consider asynchronous approach which can obtain information
during stalls with possibly relevant threads (e.g.  the owner of
oom_lock and kswapd-like threads) and serve as a trigger for actions
(e.g.  turn on/off tracepoints, ask libvirt daemon to take a memory dump
of stalling KVM guest for diagnostic purpose).

This commit temporarily loses ability to report e.g.  OOM lockup due to
unable to invoke the OOM killer due to !__GFP_FS allocation request.
But asynchronous approach will be able to detect such situation and emit
warning.  Thus, let's remove warn_alloc().

[1] https://bugzilla.kernel.org/show_bug.cgi?id=192981
[2] http://lkml.kernel.org/r/CAM_iQpWuPVGc2ky8M-9yukECtS+zKjiDasNymX7rMcBjBFyM_A@mail.gmail.com
[3] commit db73ee0d46 ("mm, vmscan: do not loop on too_many_isolated for ever"))

Link: http://lkml.kernel.org/r/1509017339-4802-1-git-send-email-penguin-kernel@I-love.SAKURA.ne.jp
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Reported-by: Cong Wang <xiyou.wangcong@gmail.com>
Reported-by: yuwang.yuwang <yuwang.yuwang@alibaba-inc.com>
Reported-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-15 18:21:07 -08:00
Johannes Weiner
df206988e0 fs: fuse: account fuse_inode slab memory as reclaimable
Fuse inodes are currently included in the unreclaimable slab counts -
SUnreclaim in /proc/meminfo, slab_unreclaimable in /proc/vmstat and the
per-cgroup memory.stat.  But they are reclaimable just like other
filesystems' inodes, and /proc/sys/vm/drop_caches frees them easily.

Mark the slab cache reclaimable.

Link: http://lkml.kernel.org/r/20171102202727.12539-1-hannes@cmpxchg.org
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-15 18:21:07 -08:00
Vlastimil Babka
b050e3769c mm, page_alloc: fix potential false positive in __zone_watermark_ok
Since commit 97a16fc82a ("mm, page_alloc: only enforce watermarks for
order-0 allocations"), __zone_watermark_ok() check for high-order
allocations will shortcut per-migratetype free list checks for
ALLOC_HARDER allocations, and return true as long as there's free page
of any migratetype.  The intention is that ALLOC_HARDER can allocate
from MIGRATE_HIGHATOMIC free lists, while normal allocations can't.

However, as a side effect, the watermark check will then also return
true when there are pages only on the MIGRATE_ISOLATE list, or (prior to
CMA conversion to ZONE_MOVABLE) on the MIGRATE_CMA list.  Since the
allocation cannot actually obtain isolated pages, and might not be able
to obtain CMA pages, this can result in a false positive.

The condition should be rare and perhaps the outcome is not a fatal one.
Still, it's better if the watermark check is correct.  There also
shouldn't be a performance tradeoff here.

Link: http://lkml.kernel.org/r/20171102125001.23708-1-vbabka@suse.cz
Fixes: 97a16fc82a ("mm, page_alloc: only enforce watermarks for order-0 allocations")
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Mel Gorman <mgorman@techsingularity.net>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-15 18:21:07 -08:00
Shakeel Butt
72b03fcd5d mm: mlock: remove lru_add_drain_all()
lru_add_drain_all() is not required by mlock() and it will drain
everything that has been cached at the time mlock is called.  And that
is not really related to the memory which will be faulted in (and
cached) and mlocked by the syscall itself.

If anything lru_add_drain_all() should be called _after_ pages have been
mlocked and faulted in but even that is not strictly needed because
those pages would get to the appropriate LRUs lazily during the reclaim
path.  Moreover follow_page_pte (gup) will drain the local pcp LRU
cache.

On larger machines the overhead of lru_add_drain_all() in mlock() can be
significant when mlocking data already in memory.  We have observed high
latency in mlock() due to lru_add_drain_all() when the users were
mlocking in memory tmpfs files.

[mhocko@suse.com: changelog fix]
Link: http://lkml.kernel.org/r/20171019222507.2894-1-shakeelb@google.com
Signed-off-by: Shakeel Butt <shakeelb@google.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Balbir Singh <bsingharora@gmail.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Yisheng Xie <xieyisheng1@huawei.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Greg Thelen <gthelen@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-15 18:21:07 -08:00