ksu
124 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
2d76dea417 |
Merge 4.19.323 into android-4.19-stable
Changes in 4.19.323 staging: iio: frequency: ad9833: Get frequency value statically staging: iio: frequency: ad9833: Load clock using clock framework staging: iio: frequency: ad9834: Validate frequency parameter value usbnet: ipheth: fix carrier detection in modes 1 and 4 net: ethernet: use ip_hdrlen() instead of bit shift net: phy: vitesse: repair vsc73xx autonegotiation scripts: kconfig: merge_config: config files: add a trailing newline arm64: dts: rockchip: override BIOS_DISABLE signal via GPIO hog on RK3399 Puma net/mlx5: Update the list of the PCI supported devices net: ftgmac100: Enable TX interrupt to avoid TX timeout net: dpaa: Pad packets to ETH_ZLEN soundwire: stream: Revert "soundwire: stream: fix programming slave ports for non-continous port maps" selftests/vm: remove call to ksft_set_plan() selftests/kcmp: remove call to ksft_set_plan() ASoC: allow module autoloading for table db1200_pids pinctrl: at91: make it work with current gpiolib microblaze: don't treat zero reserved memory regions as error net: ftgmac100: Ensure tx descriptor updates are visible wifi: iwlwifi: mvm: fix iwl_mvm_max_scan_ie_fw_cmd_room() wifi: iwlwifi: mvm: don't wait for tx queues if firmware is dead ASoC: tda7419: fix module autoloading spi: bcm63xx: Enable module autoloading x86/hyperv: Set X86_FEATURE_TSC_KNOWN_FREQ when Hyper-V provides frequency ocfs2: add bounds checking to ocfs2_xattr_find_entry() ocfs2: strict bound check before memcmp in ocfs2_xattr_find_entry() gpio: prevent potential speculation leaks in gpio_device_get_desc() USB: serial: pl2303: add device id for Macrosilicon MS3020 ACPI: PMIC: Remove unneeded check in tps68470_pmic_opregion_probe() wifi: ath9k: fix parameter check in ath9k_init_debug() wifi: ath9k: Remove error checks when creating debugfs entries netfilter: nf_tables: elements with timeout below CONFIG_HZ never expire wifi: cfg80211: fix UBSAN noise in cfg80211_wext_siwscan() wifi: cfg80211: fix two more possible UBSAN-detected off-by-one errors wifi: mac80211: use two-phase skb reclamation in ieee80211_do_stop() can: bcm: Clear bo->bcm_proc_read after remove_proc_entry(). Bluetooth: btusb: Fix not handling ZPL/short-transfer block, bfq: fix possible UAF for bfqq->bic with merge chain block, bfq: choose the last bfqq from merge chain in bfq_setup_cooperator() block, bfq: don't break merge chain in bfq_split_bfqq() spi: ppc4xx: handle irq_of_parse_and_map() errors spi: ppc4xx: Avoid returning 0 when failed to parse and map IRQ ARM: versatile: fix OF node leak in CPUs prepare reset: berlin: fix OF node leak in probe() error path clocksource/drivers/qcom: Add missing iounmap() on errors in msm_dt_timer_init() hwmon: (max16065) Fix overflows seen when writing limits mtd: slram: insert break after errors in parsing the map hwmon: (ntc_thermistor) fix module autoloading power: supply: max17042_battery: Fix SOC threshold calc w/ no current sense fbdev: hpfb: Fix an error handling path in hpfb_dio_probe() drm/stm: Fix an error handling path in stm_drm_platform_probe() drm/amd: fix typo drm/amdgpu: Replace one-element array with flexible-array member drm/amdgpu: properly handle vbios fake edid sizing drm/radeon: Replace one-element array with flexible-array member drm/radeon: properly handle vbios fake edid sizing drm/rockchip: vop: Allow 4096px width scaling drm/radeon/evergreen_cs: fix int overflow errors in cs track offsets jfs: fix out-of-bounds in dbNextAG() and diAlloc() drm/msm/a5xx: properly clear preemption records on resume drm/msm/a5xx: fix races in preemption evaluation stage ipmi: docs: don't advertise deprecated sysfs entries drm/msm: fix %s null argument error xen: use correct end address of kernel for conflict checking xen/swiotlb: simplify range_straddles_page_boundary() xen/swiotlb: add alignment check for dma buffers selftests/bpf: Fix error compiling test_lru_map.c xz: cleanup CRC32 edits from 2018 kthread: add kthread_work tracepoints kthread: fix task state in kthread worker if being frozen jbd2: introduce/export functions jbd2_journal_submit|finish_inode_data_buffers() ext4: clear EXT4_GROUP_INFO_WAS_TRIMMED_BIT even mount with discard smackfs: Use rcu_assign_pointer() to ensure safe assignment in smk_set_cipso ext4: avoid negative min_clusters in find_group_orlov() ext4: return error on ext4_find_inline_entry ext4: avoid OOB when system.data xattr changes underneath the filesystem nilfs2: fix potential null-ptr-deref in nilfs_btree_insert() nilfs2: determine empty node blocks as corrupted nilfs2: fix potential oob read in nilfs_btree_check_delete() perf sched timehist: Fix missing free of session in perf_sched__timehist() perf sched timehist: Fixed timestamp error when unable to confirm event sched_in time perf time-utils: Fix 32-bit nsec parsing clk: rockchip: Set parent rate for DCLK_VOP clock on RK3228 drivers: media: dvb-frontends/rtl2832: fix an out-of-bounds write error drivers: media: dvb-frontends/rtl2830: fix an out-of-bounds write error PCI: xilinx-nwl: Fix register misspelling RDMA/iwcm: Fix WARNING:at_kernel/workqueue.c:#check_flush_dependency pinctrl: single: fix missing error code in pcs_probe() clk: ti: dra7-atl: Fix leak of of_nodes pinctrl: mvebu: Fix devinit_dove_pinctrl_probe function RDMA/cxgb4: Added NULL check for lookup_atid ntb: intel: Fix the NULL vs IS_ERR() bug for debugfs_create_dir() nfsd: call cache_put if xdr_reserve_space returns NULL f2fs: enhance to update i_mode and acl atomically in f2fs_setattr() f2fs: fix typo f2fs: fix to update i_ctime in __f2fs_setxattr() f2fs: remove unneeded check condition in __f2fs_setxattr() f2fs: reduce expensive checkpoint trigger frequency coresight: tmc: sg: Do not leak sg_table netfilter: nf_reject_ipv6: fix nf_reject_ip6_tcphdr_put() net: seeq: Fix use after free vulnerability in ether3 Driver Due to Race Condition tcp: introduce tcp_skb_timestamp_us() helper tcp: check skb is non-NULL in tcp_rto_delta_us() net: qrtr: Update packets cloning when broadcasting netfilter: ctnetlink: compile ctnetlink_label_size with CONFIG_NF_CONNTRACK_EVENTS crypto: aead,cipher - zeroize key buffer after use Remove *.orig pattern from .gitignore soc: versatile: integrator: fix OF node leak in probe() error path USB: appledisplay: close race between probe and completion handler USB: misc: cypress_cy7c63: check for short transfer firmware_loader: Block path traversal tty: rp2: Fix reset with non forgiving PCIe host bridges drbd: Fix atomicity violation in drbd_uuid_set_bm() drbd: Add NULL check for net_conf to prevent dereference in state validation ACPI: sysfs: validate return type of _STR method f2fs: prevent possible int overflow in dir_block_index() f2fs: avoid potential int overflow in sanity_check_area_boundary() vfs: fix race between evice_inodes() and find_inode()&iput() fs: Fix file_set_fowner LSM hook inconsistencies nfs: fix memory leak in error path of nfs4_do_reclaim PCI: xilinx-nwl: Use irq_data_get_irq_chip_data() PCI: xilinx-nwl: Fix off-by-one in INTx IRQ handler soc: versatile: realview: fix memory leak during device remove soc: versatile: realview: fix soc_dev leak during device remove usb: yurex: Replace snprintf() with the safer scnprintf() variant USB: misc: yurex: fix race between read and write pps: remove usage of the deprecated ida_simple_xx() API pps: add an error check in parport_attach i2c: aspeed: Update the stop sw state when the bus recovery occurs i2c: isch: Add missed 'else' usb: yurex: Fix inconsistent locking bug in yurex_read() mailbox: rockchip: fix a typo in module autoloading mailbox: bcm2835: Fix timeout during suspend mode ceph: remove the incorrect Fw reference check when dirtying pages netfilter: uapi: NFTA_FLOWTABLE_HOOK is NLA_NESTED netfilter: nf_tables: prevent nf_skb_duplicated corruption r8152: Factor out OOB link list waits net: ethernet: lantiq_etop: fix memory disclosure net: avoid potential underflow in qdisc_pkt_len_init() with UFO net: add more sanity checks to qdisc_pkt_len_init() ipv4: ip_gre: Fix drops of small packets in ipgre_xmit sctp: set sk_state back to CLOSED if autobind fails in sctp_listen_start ALSA: hda/generic: Unconditionally prefer preferred_dacs pairs ALSA: hda/conexant: Fix conflicting quirk for System76 Pangolin f2fs: Require FMODE_WRITE for atomic write ioctls wifi: ath9k: fix possible integer overflow in ath9k_get_et_stats() wifi: ath9k_htc: Use __skb_set_length() for resetting urb before resubmit net: hisilicon: hip04: fix OF node leak in probe() net: hisilicon: hns_dsaf_mac: fix OF node leak in hns_mac_get_info() net: hisilicon: hns_mdio: fix OF node leak in probe() ACPICA: Fix memory leak if acpi_ps_get_next_namepath() fails ACPICA: Fix memory leak if acpi_ps_get_next_field() fails ACPI: EC: Do not release locks during operation region accesses ACPICA: check null return of ACPI_ALLOCATE_ZEROED() in acpi_db_convert_to_package() tipc: guard against string buffer overrun net: mvpp2: Increase size of queue_name buffer ipv4: Check !in_dev earlier for ioctl(SIOCSIFADDR). ipv4: Mask upper DSCP bits and ECN bits in NETLINK_FIB_LOOKUP family tcp: avoid reusing FIN_WAIT2 when trying to find port in connect() process ACPICA: iasl: handle empty connection_node wifi: mwifiex: Fix memcpy() field-spanning write warning in mwifiex_cmd_802_11_scan_ext() signal: Replace BUG_ON()s ALSA: asihpi: Fix potential OOB array access ALSA: hdsp: Break infinite MIDI input flush loop fbdev: pxafb: Fix possible use after free in pxafb_task() power: reset: brcmstb: Do not go into infinite loop if reset fails ata: sata_sil: Rename sil_blacklist to sil_quirks jfs: UBSAN: shift-out-of-bounds in dbFindBits jfs: Fix uaf in dbFreeBits jfs: check if leafidx greater than num leaves per dmap tree jfs: Fix uninit-value access of new_ea in ea_buffer drm/amd/display: Check stream before comparing them drm/amd/display: Fix index out of bounds in degamma hardware format translation drm/printer: Allow NULL data in devcoredump printer scsi: aacraid: Rearrange order of struct aac_srb_unit drm/radeon/r100: Handle unknown family in r100_cp_init_microcode() of/irq: Refer to actual buffer size in of_irq_parse_one() ext4: ext4_search_dir should return a proper error ext4: fix i_data_sem unlock order in ext4_ind_migrate() spi: s3c64xx: fix timeout counters in flush_fifo selftests: breakpoints: use remaining time to check if suspend succeed selftests: vDSO: fix vDSO symbols lookup for powerpc64 i2c: xiic: Wait for TX empty to avoid missed TX NAKs spi: bcm63xx: Fix module autoloading perf/core: Fix small negative period being ignored parisc: Fix itlb miss handler for 64-bit programs ALSA: core: add isascii() check to card ID generator ext4: no need to continue when the number of entries is 1 ext4: propagate errors from ext4_find_extent() in ext4_insert_range() ext4: fix incorrect tid assumption in __jbd2_log_wait_for_space() ext4: aovid use-after-free in ext4_ext_insert_extent() ext4: fix double brelse() the buffer of the extents path ext4: fix incorrect tid assumption in ext4_wait_for_tail_page_commit() parisc: Fix 64-bit userspace syscall path of/irq: Support #msi-cells=<0> in of_msi_get_domain jbd2: stop waiting for space when jbd2_cleanup_journal_tail() returns error ocfs2: fix the la space leak when unmounting an ocfs2 volume ocfs2: fix uninit-value in ocfs2_get_block() ocfs2: reserve space for inline xattr before attaching reflink tree ocfs2: cancel dqi_sync_work before freeing oinfo ocfs2: remove unreasonable unlock in ocfs2_read_blocks ocfs2: fix null-ptr-deref when journal load failed. ocfs2: fix possible null-ptr-deref in ocfs2_set_buffer_uptodate riscv: define ILLEGAL_POINTER_VALUE for 64bit aoe: fix the potential use-after-free problem in more places clk: rockchip: fix error for unknown clocks media: uapi/linux/cec.h: cec_msg_set_reply_to: zero flags media: venus: fix use after free bug in venus_remove due to race condition iio: magnetometer: ak8975: Fix reading for ak099xx sensors tomoyo: fallback to realpath if symlink's pathname does not exist Input: adp5589-keys - fix adp5589_gpio_get_value() btrfs: wait for fixup workers before stopping cleaner kthread during umount gpio: davinci: fix lazy disable ext4: avoid ext4_error()'s caused by ENOMEM in the truncate path ext4: fix slab-use-after-free in ext4_split_extent_at() ext4: update orig_path in ext4_find_extent() arm64: Add Cortex-715 CPU part definition arm64: cputype: Add Neoverse-N3 definitions arm64: errata: Expand speculative SSBS workaround once more uprobes: fix kernel info leak via "[uprobes]" vma nfsd: use ktime_get_seconds() for timestamps nfsd: fix delegation_blocked() to block correctly for at least 30 seconds rtc: at91sam9: drop platform_data support rtc: at91sam9: fix OF node leak in probe() error path ACPI: battery: Simplify battery hook locking ACPI: battery: Fix possible crash when unregistering a battery hook ext4: fix inode tree inconsistency caused by ENOMEM net: ethernet: cortina: Drop TSO support tracing: Remove precision vsnprintf() check from print event drm: Move drm_mode_setcrtc() local re-init to failure path drm/crtc: fix uninitialized variable use even harder virtio_console: fix misc probe bugs Input: synaptics-rmi4 - fix UAF of IRQ domain on driver removal bpf: Check percpu map value size first s390/facility: Disable compile time optimization for decompressor code s390/mm: Add cond_resched() to cmm_alloc/free_pages() ext4: nested locking for xattr inode s390/cpum_sf: Remove WARN_ON_ONCE statements ktest.pl: Avoid false positives with grub2 skip regex clk: bcm: bcm53573: fix OF node leak in init i2c: i801: Use a different adapter-name for IDF adapters PCI: Mark Creative Labs EMU20k2 INTx masking as broken media: videobuf2-core: clear memory related fields in __vb2_plane_dmabuf_put() usb: chipidea: udc: enable suspend interrupt after usb reset tools/iio: Add memory allocation failure check for trigger_name driver core: bus: Return -EIO instead of 0 when show/store invalid bus attribute fbdev: sisfb: Fix strbuf array overflow NFS: Remove print_overflow_msg() SUNRPC: Fix integer overflow in decode_rc_list() tcp: fix tcp_enter_recovery() to zero retrans_stamp when it's safe netfilter: br_netfilter: fix panic with metadata_dst skb Bluetooth: RFCOMM: FIX possible deadlock in rfcomm_sk_state_change gpio: aspeed: Add the flush write to ensure the write complete. clk: Add (devm_)clk_get_optional() functions clk: generalize devm_clk_get() a bit clk: Provide new devm_clk helpers for prepared and enabled clocks gpio: aspeed: Use devm_clk api to manage clock source igb: Do not bring the device up after non-fatal error net: ibm: emac: mal: fix wrong goto ppp: fix ppp_async_encode() illegal access net: ipv6: ensure we call ipv6_mc_down() at most once CDC-NCM: avoid overflow in sanity checking HID: plantronics: Workaround for an unexcepted opposite volume key Revert "usb: yurex: Replace snprintf() with the safer scnprintf() variant" usb: xhci: Fix problem with xhci resume from suspend usb: storage: ignore bogus device raised by JieLi BR21 USB sound chip net: Fix an unsafe loop on the list posix-clock: Fix missing timespec64 check in pc_clock_settime() arm64: probes: Remove broken LDR (literal) uprobe support arm64: probes: Fix simulate_ldr*_literal() PCI: Add function 0 DMA alias quirk for Glenfly Arise chip fat: fix uninitialized variable KVM: Fix a data race on last_boosted_vcpu in kvm_vcpu_on_spin() net: dsa: mv88e6xxx: Fix out-of-bound access s390/sclp_vt220: Convert newlines to CRLF instead of LFCR KVM: s390: Change virtual to physical address access in diag 0x258 handler x86/cpufeatures: Define X86_FEATURE_AMD_IBPB_RET drm/vmwgfx: Handle surface check failure correctly iio: dac: stm32-dac-core: add missing select REGMAP_MMIO in Kconfig iio: adc: ti-ads8688: add missing select IIO_(TRIGGERED_)BUFFER in Kconfig iio: hid-sensors: Fix an error handling path in _hid_sensor_set_report_latency() iio: light: opt3001: add missing full-scale range value Bluetooth: Remove debugfs directory on module init failure Bluetooth: btusb: Fix regression with fake CSR controllers 0a12:0001 xhci: Fix incorrect stream context type macro USB: serial: option: add support for Quectel EG916Q-GL USB: serial: option: add Telit FN920C04 MBIM compositions parport: Proper fix for array out-of-bounds access x86/apic: Always explicitly disarm TSC-deadline timer nilfs2: propagate directory read errors from nilfs_find_entry() clk: Fix pointer casting to prevent oops in devm_clk_release() clk: Fix slab-out-of-bounds error in devm_clk_release() RDMA/bnxt_re: Fix incorrect AVID type in WQE structure RDMA/cxgb4: Fix RDMA_CM_EVENT_UNREACHABLE error for iWARP RDMA/bnxt_re: Return more meaningful error drm/msm/dsi: fix 32-bit signed integer extension in pclk_rate calculation macsec: don't increment counters for an unrelated SA net: ethernet: aeroflex: fix potential memory leak in greth_start_xmit_gbit() net: systemport: fix potential memory leak in bcm_sysport_xmit() usb: typec: altmode should keep reference to parent Bluetooth: bnep: fix wild-memory-access in proto_unregister arm64:uprobe fix the uprobe SWBP_INSN in big-endian arm64: probes: Fix uprobes for big-endian kernels KVM: s390: gaccess: Refactor gpa and length calculation KVM: s390: gaccess: Refactor access address range check KVM: s390: gaccess: Cleanup access to guest pages KVM: s390: gaccess: Check if guest address is in memslot udf: fix uninit-value use in udf_get_fileshortad jfs: Fix sanity check in dbMount net/sun3_82586: fix potential memory leak in sun3_82586_send_packet() be2net: fix potential memory leak in be_xmit() net: usb: usbnet: fix name regression posix-clock: posix-clock: Fix unbalanced locking in pc_clock_settime() ALSA: hda/realtek: Update default depop procedure drm/amd: Guard against bad data for ATIF ACPI method ACPI: button: Add DMI quirk for Samsung Galaxy Book2 to fix initial lid detection issue nilfs2: fix kernel bug due to missing clearing of buffer delay flag hv_netvsc: Fix VF namespace also in synthetic NIC NETDEV_REGISTER event selinux: improve error checking in sel_write_load() arm64/uprobes: change the uprobe_opcode_t typedef to fix the sparse warning xfrm: validate new SA's prefixlen using SA family when sel.family is unset usb: dwc3: remove generic PHY calibrate() calls usb: dwc3: Add splitdisable quirk for Hisilicon Kirin Soc usb: dwc3: core: Stop processing of pending events if controller is halted cgroup: Fix potential overflow issue when checking max_depth wifi: mac80211: skip non-uploaded keys in ieee80211_iter_keys gtp: simplify error handling code in 'gtp_encap_enable()' gtp: allow -1 to be specified as file description from userspace net/sched: stop qdisc_tree_reduce_backlog on TC_H_ROOT bpf: Fix out-of-bounds write in trie_get_next_key() net: support ip generic csum processing in skb_csum_hwoffload_help net: skip offload for NETIF_F_IPV6_CSUM if ipv6 header contains extension netfilter: nft_payload: sanitize offset and length before calling skb_checksum() firmware: arm_sdei: Fix the input parameter of cpuhp_remove_state() net: amd: mvme147: Fix probe banner message misc: sgi-gru: Don't disable preemption in GRU driver usbip: tools: Fix detach_port() invalid port error path usb: phy: Fix API devm_usb_put_phy() can not release the phy xhci: Fix Link TRB DMA in command ring stopped completion event Revert "driver core: Fix uevent_show() vs driver detach race" wifi: mac80211: do not pass a stopped vif to the driver in .get_txpower wifi: ath10k: Fix memory leak in management tx wifi: iwlegacy: Clear stale interrupts before resuming device nilfs2: fix potential deadlock with newly created symlinks ocfs2: pass u64 to ocfs2_truncate_inline maybe overflow nilfs2: fix kernel bug due to missing clearing of checked flag mm: shmem: fix data-race in shmem_getattr() vt: prevent kernel-infoleak in con_font_get() Linux 4.19.323 Change-Id: I2348f834187153067ab46b3b48b8fe7da9cee1f1 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> |
||
|
|
6430d6a00b |
kthread: fix task state in kthread worker if being frozen
[ Upstream commit e16c7b07784f3fb03025939c4590b9a7c64970a7 ]
When analyzing a kernel waring message, Peter pointed out that there is a
race condition when the kworker is being frozen and falls into
try_to_freeze() with TASK_INTERRUPTIBLE, which could trigger a
might_sleep() warning in try_to_freeze(). Although the root cause is not
related to freeze()[1], it is still worthy to fix this issue ahead.
One possible race scenario:
CPU 0 CPU 1
----- -----
// kthread_worker_fn
set_current_state(TASK_INTERRUPTIBLE);
suspend_freeze_processes()
freeze_processes
static_branch_inc(&freezer_active);
freeze_kernel_threads
pm_nosig_freezing = true;
if (work) { //false
__set_current_state(TASK_RUNNING);
} else if (!freezing(current)) //false, been frozen
freezing():
if (static_branch_unlikely(&freezer_active))
if (pm_nosig_freezing)
return true;
schedule()
}
// state is still TASK_INTERRUPTIBLE
try_to_freeze()
might_sleep() <--- warning
Fix this by explicitly set the TASK_RUNNING before entering
try_to_freeze().
Link: https://lore.kernel.org/lkml/Zs2ZoAcUsZMX2B%2FI@chenyu5-mobl2/ [1]
Link: https://lkml.kernel.org/r/20240827112308.181081-1-yu.c.chen@intel.com
Fixes:
|
||
|
|
65c1957181 |
kthread: add kthread_work tracepoints
[ Upstream commit f630c7c6f10546ebff15c3a856e7949feb7a2372 ] While migrating some code from wq to kthread_worker, I found that I missed the execute_start/end tracepoints. So add similar tracepoints for kthread_work. And for completeness, queue_work tracepoint (although this one differs slightly from the matching workqueue tracepoint). Link: https://lkml.kernel.org/r/20201010180323.126634-1-robdclark@gmail.com Signed-off-by: Rob Clark <robdclark@chromium.org> Cc: Rob Clark <robdclark@chromium.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org> Cc: Phil Auld <pauld@redhat.com> Cc: Valentin Schneider <valentin.schneider@arm.com> Cc: Thara Gopinath <thara.gopinath@linaro.org> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Vincent Donnefort <vincent.donnefort@arm.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Jens Axboe <axboe@kernel.dk> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Frederic Weisbecker <frederic@kernel.org> Cc: Ilias Stamatis <stamatis.iliass@gmail.com> Cc: Liang Chen <cl@rock-chips.com> Cc: Ben Dooks <ben.dooks@codethink.co.uk> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: "J. Bruce Fields" <bfields@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Stable-dep-of: e16c7b07784f ("kthread: fix task state in kthread worker if being frozen") Signed-off-by: Sasha Levin <sashal@kernel.org> |
||
|
|
ec10e35858 |
Merge 4.19.197 into android-4.19-stable
Changes in 4.19.197 mm: add VM_WARN_ON_ONCE_PAGE() macro mm/rmap: remove unneeded semicolon in page_not_mapped() mm/rmap: use page_not_mapped in try_to_unmap() mm/thp: fix __split_huge_pmd_locked() on shmem migration entry mm/thp: make is_huge_zero_pmd() safe and quicker mm/thp: try_to_unmap() use TTU_SYNC for safe splitting mm/thp: fix vma_address() if virtual address below file offset mm/thp: fix page_address_in_vma() on file THP tails mm/thp: unmap_mapping_page() to fix THP truncate_cleanup_page() mm: thp: replace DEBUG_VM BUG with VM_WARN when unmap fails for split mm: page_vma_mapped_walk(): use page for pvmw->page mm: page_vma_mapped_walk(): settle PageHuge on entry mm: page_vma_mapped_walk(): use pmde for *pvmw->pmd mm: page_vma_mapped_walk(): prettify PVMW_MIGRATION block mm: page_vma_mapped_walk(): crossing page table boundary mm: page_vma_mapped_walk(): add a level of indentation mm: page_vma_mapped_walk(): use goto instead of while (1) mm: page_vma_mapped_walk(): get vma_address_end() earlier mm/thp: fix page_vma_mapped_walk() if THP mapped by ptes mm/thp: another PVMW_SYNC fix in page_vma_mapped_walk() mm, futex: fix shared futex pgoff on shmem huge page scsi: sr: Return appropriate error code when disk is ejected drm/nouveau: fix dma_address check for CPU/GPU sync ext4: eliminate bogus error in ext4_data_block_valid_rcu() KVM: SVM: Periodically schedule when unregistering regions on destroy ARM: dts: imx6qdl-sabresd: Remove incorrect power supply assignment kthread_worker: split code for canceling the delayed work timer kthread: prevent deadlock when kthread_mod_delayed_work() races with kthread_cancel_delayed_work_sync() xen/events: reset active flag for lateeoi events later KVM: SVM: Call SEV Guest Decommission if ASID binding fails ARM: OMAP: replace setup_irq() by request_irq() clocksource/drivers/timer-ti-dm: Add clockevent and clocksource support clocksource/drivers/timer-ti-dm: Prepare to handle dra7 timer wrap issue clocksource/drivers/timer-ti-dm: Handle dra7 timer wrap errata i940 Linux 4.19.197 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> Change-Id: I733cdb08e37808851f4ff54a2e333f3c7d4e9257 |
||
|
|
6e2a98bc90 |
kthread: prevent deadlock when kthread_mod_delayed_work() races with kthread_cancel_delayed_work_sync()
commit 5fa54346caf67b4b1b10b1f390316ae466da4d53 upstream.
The system might hang with the following backtrace:
schedule+0x80/0x100
schedule_timeout+0x48/0x138
wait_for_common+0xa4/0x134
wait_for_completion+0x1c/0x2c
kthread_flush_work+0x114/0x1cc
kthread_cancel_work_sync.llvm.16514401384283632983+0xe8/0x144
kthread_cancel_delayed_work_sync+0x18/0x2c
xxxx_pm_notify+0xb0/0xd8
blocking_notifier_call_chain_robust+0x80/0x194
pm_notifier_call_chain_robust+0x28/0x4c
suspend_prepare+0x40/0x260
enter_state+0x80/0x3f4
pm_suspend+0x60/0xdc
state_store+0x108/0x144
kobj_attr_store+0x38/0x88
sysfs_kf_write+0x64/0xc0
kernfs_fop_write_iter+0x108/0x1d0
vfs_write+0x2f4/0x368
ksys_write+0x7c/0xec
It is caused by the following race between kthread_mod_delayed_work()
and kthread_cancel_delayed_work_sync():
CPU0 CPU1
Context: Thread A Context: Thread B
kthread_mod_delayed_work()
spin_lock()
__kthread_cancel_work()
spin_unlock()
del_timer_sync()
kthread_cancel_delayed_work_sync()
spin_lock()
__kthread_cancel_work()
spin_unlock()
del_timer_sync()
spin_lock()
work->canceling++
spin_unlock
spin_lock()
queue_delayed_work()
// dwork is put into the worker->delayed_work_list
spin_unlock()
kthread_flush_work()
// flush_work is put at the tail of the dwork
wait_for_completion()
Context: IRQ
kthread_delayed_work_timer_fn()
spin_lock()
list_del_init(&work->node);
spin_unlock()
BANG: flush_work is not longer linked and will never get proceed.
The problem is that kthread_mod_delayed_work() checks work->canceling
flag before canceling the timer.
A simple solution is to (re)check work->canceling after
__kthread_cancel_work(). But then it is not clear what should be
returned when __kthread_cancel_work() removed the work from the queue
(list) and it can't queue it again with the new @delay.
The return value might be used for reference counting. The caller has
to know whether a new work has been queued or an existing one was
replaced.
The proper solution is that kthread_mod_delayed_work() will remove the
work from the queue (list) _only_ when work->canceling is not set. The
flag must be checked after the timer is stopped and the remaining
operations can be done under worker->lock.
Note that kthread_mod_delayed_work() could remove the timer and then
bail out. It is fine. The other canceling caller needs to cancel the
timer as well. The important thing is that the queue (list)
manipulation is done atomically under worker->lock.
Link: https://lkml.kernel.org/r/20210610133051.15337-3-pmladek@suse.com
Fixes:
|
||
|
|
13bcf5aeb3 |
kthread_worker: split code for canceling the delayed work timer
commit 34b3d5344719d14fd2185b2d9459b3abcb8cf9d8 upstream. Patch series "kthread_worker: Fix race between kthread_mod_delayed_work() and kthread_cancel_delayed_work_sync()". This patchset fixes the race between kthread_mod_delayed_work() and kthread_cancel_delayed_work_sync() including proper return value handling. This patch (of 2): Simple code refactoring as a preparation step for fixing a race between kthread_mod_delayed_work() and kthread_cancel_delayed_work_sync(). It does not modify the existing behavior. Link: https://lkml.kernel.org/r/20210610133051.15337-2-pmladek@suse.com Signed-off-by: Petr Mladek <pmladek@suse.com> Cc: <jenhaochen@google.com> Cc: Martin Liu <liumartin@google.com> Cc: Minchan Kim <minchan@google.com> Cc: Nathan Chancellor <nathan@kernel.org> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Tejun Heo <tj@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
||
|
|
699a3fa25b |
Merge 4.19.174 into android-4.19-stable
Changes in 4.19.174
net: dsa: bcm_sf2: put device node before return
ibmvnic: Ensure that CRQ entry read are correctly ordered
ACPI: thermal: Do not call acpi_thermal_check() directly
sysctl: handle overflow in proc_get_long
net_sched: gen_estimator: support large ewma log
phy: cpcap-usb: Fix warning for missing regulator_disable
platform/x86: touchscreen_dmi: Add swap-x-y quirk for Goodix touchscreen on Estar Beauty HD tablet
platform/x86: intel-vbtn: Support for tablet mode on Dell Inspiron 7352
x86: __always_inline __{rd,wr}msr()
scsi: scsi_transport_srp: Don't block target in failfast state
scsi: libfc: Avoid invoking response handler twice if ep is already completed
mac80211: fix fast-rx encryption check
scsi: ibmvfc: Set default timeout to avoid crash during migration
selftests/powerpc: Only test lwm/stmw on big endian
objtool: Don't fail on missing symbol table
kthread: Extract KTHREAD_IS_PER_CPU
workqueue: Restrict affinity change to rescuer
Linux 4.19.174
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I97197c2b93cf098202c4cedbe80b86a1268a26c9
|
||
|
|
fbad32181a |
kthread: Extract KTHREAD_IS_PER_CPU
[ Upstream commit ac687e6e8c26181a33270efd1a2e2241377924b0 ] There is a need to distinguish geniune per-cpu kthreads from kthreads that happen to have a single CPU affinity. Geniune per-cpu kthreads are kthreads that are CPU affine for correctness, these will obviously have PF_KTHREAD set, but must also have PF_NO_SETAFFINITY set, lest userspace modify their affinity and ruins things. However, these two things are not sufficient, PF_NO_SETAFFINITY is also set on other tasks that have their affinities controlled through other means, like for instance workqueues. Therefore another bit is needed; it turns out kthread_create_per_cpu() already has such a bit: KTHREAD_IS_PER_CPU, which is used to make kthread_park()/kthread_unpark() work correctly. Expose this flag and remove the implicit setting of it from kthread_create_on_cpu(); the io_uring usage of it seems dubious at best. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Valentin Schneider <valentin.schneider@arm.com> Tested-by: Valentin Schneider <valentin.schneider@arm.com> Link: https://lkml.kernel.org/r/20210121103506.557620262@infradead.org Signed-off-by: Sasha Levin <sashal@kernel.org> |
||
|
|
bc09bee25e |
Merge 4.19.156 into android-4.19-stable
Changes in 4.19.156 drm/i915: Break up error capture compression loops with cond_resched() tipc: fix use-after-free in tipc_bcast_get_mode ptrace: fix task_join_group_stop() for the case when current is traced cadence: force nonlinear buffers to be cloned chelsio/chtls: fix memory leaks caused by a race chelsio/chtls: fix always leaking ctrl_skb gianfar: Replace skb_realloc_headroom with skb_cow_head for PTP gianfar: Account for Tx PTP timestamp in the skb headroom net: usb: qmi_wwan: add Telit LE910Cx 0x1230 composition sctp: Fix COMM_LOST/CANT_STR_ASSOC err reporting on big-endian platforms sfp: Fix error handing in sfp_probe() blktrace: fix debugfs use after free btrfs: extent_io: Kill the forward declaration of flush_write_bio btrfs: extent_io: Move the BUG_ON() in flush_write_bio() one level up Revert "btrfs: flush write bio if we loop in extent_write_cache_pages" btrfs: flush write bio if we loop in extent_write_cache_pages btrfs: extent_io: Handle errors better in extent_write_full_page() btrfs: extent_io: Handle errors better in btree_write_cache_pages() btrfs: extent_io: add proper error handling to lock_extent_buffer_for_io() Btrfs: fix unwritten extent buffers and hangs on future writeback attempts btrfs: Don't submit any btree write bio if the fs has errors btrfs: Move btrfs_check_chunk_valid() to tree-check.[ch] and export it btrfs: tree-checker: Make chunk item checker messages more readable btrfs: tree-checker: Make btrfs_check_chunk_valid() return EUCLEAN instead of EIO btrfs: tree-checker: Check chunk item at tree block read time btrfs: tree-checker: Verify dev item btrfs: tree-checker: Fix wrong check on max devid btrfs: tree-checker: Enhance chunk checker to validate chunk profile btrfs: tree-checker: Verify inode item btrfs: tree-checker: fix the error message for transid error Fonts: Replace discarded const qualifier ALSA: usb-audio: Add implicit feedback quirk for Zoom UAC-2 ALSA: usb-audio: add usb vendor id as DSD-capable for Khadas devices ALSA: usb-audio: Add implicit feedback quirk for Qu-16 ALSA: usb-audio: Add implicit feedback quirk for MODX mm: mempolicy: fix potential pte_unmap_unlock pte error lib/crc32test: remove extra local_irq_disable/enable kthread_worker: prevent queuing delayed work from timer_fn when it is being canceled mm: always have io_remap_pfn_range() set pgprot_decrypted() gfs2: Wake up when sd_glock_disposal becomes zero ring-buffer: Fix recursion protection transitions between interrupt context ftrace: Fix recursion check for NMI test ftrace: Handle tracing when switching between context tracing: Fix out of bounds write in get_trace_buf futex: Handle transient "ownerless" rtmutex state correctly ARM: dts: sun4i-a10: fix cpu_alert temperature x86/kexec: Use up-to-dated screen_info copy to fill boot params of: Fix reserved-memory overlap detection blk-cgroup: Fix memleak on error path blk-cgroup: Pre-allocate tree node on blkg_conf_prep scsi: core: Don't start concurrent async scan on same host vsock: use ns_capable_noaudit() on socket create drm/vc4: drv: Add error handding for bind ACPI: NFIT: Fix comparison to '-ENXIO' vt: Disable KD_FONT_OP_COPY fork: fix copy_process(CLONE_PARENT) race with the exiting ->real_parent serial: 8250_mtk: Fix uart_get_baud_rate warning serial: txx9: add missing platform_driver_unregister() on error in serial_txx9_init USB: serial: cyberjack: fix write-URB completion race USB: serial: option: add Quectel EC200T module support USB: serial: option: add LE910Cx compositions 0x1203, 0x1230, 0x1231 USB: serial: option: add Telit FN980 composition 0x1055 USB: Add NO_LPM quirk for Kingston flash drive usb: mtu3: fix panic in mtu3_gadget_stop() ARC: stack unwinding: avoid indefinite looping Revert "ARC: entry: fix potential EFA clobber when TIF_SYSCALL_TRACE" PM: runtime: Resume the device earlier in __device_release_driver() perf/core: Fix a memory leak in perf_event_parse_addr_filter() tools: perf: Fix build error in v4.19.y net: dsa: read mac address from DT for slave device arm64: dts: marvell: espressobin: Add ethernet switch aliases Linux 4.19.156 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> Change-Id: I87af8871465f54de0332fa74bc1f342b7fe99061 |
||
|
|
68e8b8ed78 |
kthread_worker: prevent queuing delayed work from timer_fn when it is being canceled
commit 6993d0fdbee0eb38bfac350aa016f65ad11ed3b1 upstream.
There is a small race window when a delayed work is being canceled and
the work still might be queued from the timer_fn:
CPU0 CPU1
kthread_cancel_delayed_work_sync()
__kthread_cancel_work_sync()
__kthread_cancel_work()
work->canceling++;
kthread_delayed_work_timer_fn()
kthread_insert_work();
BUG: kthread_insert_work() should not get called when work->canceling is
set.
Signed-off-by: Zqiang <qiang.zhang@windriver.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Acked-by: Tejun Heo <tj@kernel.org>
Cc: <stable@vger.kernel.org>
Link: https://lkml.kernel.org/r/20201014083030.16895-1-qiang.zhang@windriver.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
||
|
|
599bf02de4 |
Merge 4.19.142 into android-4.19-stable
Changes in 4.19.142
drm/vgem: Replace opencoded version of drm_gem_dumb_map_offset()
perf probe: Fix memory leakage when the probe point is not found
khugepaged: khugepaged_test_exit() check mmget_still_valid()
khugepaged: adjust VM_BUG_ON_MM() in __khugepaged_enter()
btrfs: export helpers for subvolume name/id resolution
btrfs: don't show full path of bind mounts in subvol=
btrfs: Move free_pages_out label in inline extent handling branch in compress_file_range
btrfs: inode: fix NULL pointer dereference if inode doesn't need compression
btrfs: sysfs: use NOFS for device creation
romfs: fix uninitialized memory leak in romfs_dev_read()
kernel/relay.c: fix memleak on destroy relay channel
mm: include CMA pages in lowmem_reserve at boot
mm, page_alloc: fix core hung in free_pcppages_bulk()
ext4: fix checking of directory entry validity for inline directories
jbd2: add the missing unlock_buffer() in the error path of jbd2_write_superblock()
scsi: zfcp: Fix use-after-free in request timeout handlers
drm/amd/display: fix pow() crashing when given base 0
kthread: Do not preempt current task if it is going to call schedule()
spi: Prevent adding devices below an unregistering controller
scsi: ufs: Add DELAY_BEFORE_LPM quirk for Micron devices
scsi: target: tcmu: Fix crash in tcmu_flush_dcache_range on ARM
media: budget-core: Improve exception handling in budget_register()
rtc: goldfish: Enable interrupt in set_alarm() when necessary
media: vpss: clean up resources in init
Input: psmouse - add a newline when printing 'proto' by sysfs
m68knommu: fix overwriting of bits in ColdFire V3 cache control
svcrdma: Fix another Receive buffer leak
xfs: fix inode quota reservation checks
jffs2: fix UAF problem
ceph: fix use-after-free for fsc->mdsc
cpufreq: intel_pstate: Fix cpuinfo_max_freq when MSR_TURBO_RATIO_LIMIT is 0
scsi: libfc: Free skb in fc_disc_gpn_id_resp() for valid cases
virtio_ring: Avoid loop when vq is broken in virtqueue_poll
tools/testing/selftests/cgroup/cgroup_util.c: cg_read_strcmp: fix null pointer dereference
xfs: Fix UBSAN null-ptr-deref in xfs_sysfs_init
alpha: fix annotation of io{read,write}{16,32}be()
fs/signalfd.c: fix inconsistent return codes for signalfd4
ext4: fix potential negative array index in do_split()
ext4: don't allow overlapping system zones
ASoC: q6routing: add dummy register read/write function
i40e: Set RX_ONLY mode for unicast promiscuous on VLAN
i40e: Fix crash during removing i40e driver
net: fec: correct the error path for regulator disable in probe
bonding: show saner speed for broadcast mode
bonding: fix a potential double-unregister
s390/runtime_instrumentation: fix storage key handling
s390/ptrace: fix storage key handling
ASoC: msm8916-wcd-analog: fix register Interrupt offset
ASoC: intel: Fix memleak in sst_media_open
vfio/type1: Add proper error unwind for vfio_iommu_replay()
kvm: x86: Toggling CR4.SMAP does not load PDPTEs in PAE mode
kvm: x86: Toggling CR4.PKE does not load PDPTEs in PAE mode
kconfig: qconf: do not limit the pop-up menu to the first row
kconfig: qconf: fix signal connection to invalid slots
efi: avoid error message when booting under Xen
Fix build error when CONFIG_ACPI is not set/enabled:
RDMA/bnxt_re: Do not add user qps to flushlist
afs: Fix NULL deref in afs_dynroot_depopulate()
bonding: fix active-backup failover for current ARP slave
net: ena: Prevent reset after device destruction
net: gemini: Fix missing free_netdev() in error path of gemini_ethernet_port_probe()
hv_netvsc: Fix the queue_mapping in netvsc_vf_xmit()
net: dsa: b53: check for timeout
powerpc/pseries: Do not initiate shutdown when system is running on UPS
efi: add missed destroy_workqueue when efisubsys_init fails
epoll: Keep a reference on files added to the check list
do_epoll_ctl(): clean the failure exits up a bit
mm/hugetlb: fix calculation of adjust_range_if_pmd_sharing_possible
xen: don't reschedule in preemption off sections
clk: Evict unregistered clks from parent caches
KVM: Pass MMU notifier range flags to kvm_unmap_hva_range()
KVM: arm64: Only reschedule if MMU_NOTIFIER_RANGE_BLOCKABLE is not set
Linux 4.19.142
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Ibfe4a0a4249f76ab35076f4b003e32cd6f9788a5
|
||
|
|
1c263d0e54 |
kthread: Do not preempt current task if it is going to call schedule()
commit 26c7295be0c5e6da3fa45970e9748be983175b1b upstream.
when we create a kthread with ktrhead_create_on_cpu(),the child thread
entry is ktread.c:ktrhead() which will be preempted by the parent after
call complete(done) while schedule() is not called yet,then the parent
will call wait_task_inactive(child) but the child is still on the runqueue,
so the parent will schedule_hrtimeout() for 1 jiffy,it will waste a lot of
time,especially on startup.
parent child
ktrhead_create_on_cpu()
wait_fo_completion(&done) -----> ktread.c:ktrhead()
|----- complete(done);--wakeup and preempted by parent
kthread_bind() <------------| |-> schedule();--dequeue here
wait_task_inactive(child) |
schedule_hrtimeout(1 jiffy) -|
So we hope the child just wakeup parent but not preempted by parent, and the
child is going to call schedule() soon,then the parent will not call
schedule_hrtimeout(1 jiffy) as the child is already dequeue.
The same issue for ktrhead_park()&&kthread_parkme().
This patch can save 120ms on rk312x startup with CONFIG_HZ=300.
Signed-off-by: Liang Chen <cl@rock-chips.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Link: https://lkml.kernel.org/r/20200306070133.18335-2-cl@rock-chips.com
Signed-off-by: Chanho Park <chanho61.park@samsung.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
||
|
|
3a905dc573 |
FROMLIST: refactor header includes to allow kthread.h inclusion in psi_types.h
kthread.h can't be included in psi_types.h because it creates a circular inclusion with kthread.h eventually including psi_types.h and complaining on kthread structures not being defined because they are defined further in the kthread.h. Resolve this by removing psi_types.h inclusion from the headers included from kthread.h. Signed-off-by: Suren Baghdasaryan <surenb@google.com> (not upstream yet, latest version published at: https://lore.kernel.org/patchwork/patch/1052417/) Bug: 127712811 Bug: 129157727 Test: lmkd in PSI mode Change-Id: I88cd99f41534f0b9df18043cde8d1ee54aaa93de Signed-off-by: Suren Baghdasaryan <surenb@google.com> |
||
|
|
f7951c33f0 |
Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler updates from Thomas Gleixner: - Cleanup and improvement of NUMA balancing - Refactoring and improvements to the PELT (Per Entity Load Tracking) code - Watchdog simplification and related cleanups - The usual pile of small incremental fixes and improvements * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (41 commits) watchdog: Reduce message verbosity stop_machine: Reflow cpu_stop_queue_two_works() sched/numa: Move task_numa_placement() closer to numa_migrate_preferred() sched/numa: Use group_weights to identify if migration degrades locality sched/numa: Update the scan period without holding the numa_group lock sched/numa: Remove numa_has_capacity() sched/numa: Modify migrate_swap() to accept additional parameters sched/numa: Remove unused task_capacity from 'struct numa_stats' sched/numa: Skip nodes that are at 'hoplimit' sched/debug: Reverse the order of printing faults sched/numa: Use task faults only if numa_group is not yet set up sched/numa: Set preferred_node based on best_cpu sched/numa: Simplify load_too_imbalanced() sched/numa: Evaluate move once per node sched/numa: Remove redundant field sched/debug: Show the sum wait time of a task group sched/fair: Remove #ifdefs from scale_rt_capacity() sched/core: Remove get_cpu() from sched_fork() sched/cpufreq: Clarify sugov_get_util() sched/sysctl: Remove unused sched_time_avg_ms sysctl ... |
||
|
|
3e536e222f |
kthread, tracing: Don't expose half-written comm when creating kthreads
There is a window for racing when printing directly to task->comm, allowing other threads to see a non-terminated string. The vsnprintf function fills the buffer, counts the truncated chars, then finally writes the \0 at the end. creator other vsnprintf: fill (not terminated) count the rest trace_sched_waking(p): ... memcpy(comm, p->comm, TASK_COMM_LEN) write \0 The consequences depend on how 'other' uses the string. In our case, it was copied into the tracing system's saved cmdlines, a buffer of adjacent TASK_COMM_LEN-byte buffers (note the 'n' where 0 should be): crash-arm64> x/1024s savedcmd->saved_cmdlines | grep 'evenk' 0xffffffd5b3818640: "irq/497-pwr_evenkworker/u16:12" ...and a strcpy out of there would cause stack corruption: [224761.522292] Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: ffffff9bf9783c78 crash-arm64> kbt | grep 'comm\|trace_print_context' #6 0xffffff9bf9783c78 in trace_print_context+0x18c(+396) comm (char [16]) = "irq/497-pwr_even" crash-arm64> rd 0xffffffd4d0e17d14 8 ffffffd4d0e17d14: 2f71726900000000 5f7277702d373934 ....irq/497-pwr_ ffffffd4d0e17d24: 726f776b6e657665 3a3631752f72656b evenkworker/u16: ffffffd4d0e17d34: f9780248ff003231 cede60e0ffffff9b 12..H.x......`.. ffffffd4d0e17d44: cede60c8ffffffd4 00000fffffffffd4 .....`.......... The workaround in |
||
|
|
f83ee19be4 |
kthread: Simplify kthread_park() completion
Oleg explains the reason we could hit park+park is that
smpboot_update_cpumask_percpu_thread()'s
for_each_cpu_and(cpu, &tmp, cpu_online_mask)
smpboot_park_kthread();
turns into:
for ((cpu) = 0; (cpu) < 1; (cpu)++, (void)mask, (void)and)
smpboot_park_kthread();
on UP, ignoring the mask. But since we just completely removed that
function, this is no longer relevant.
So revert commit:
|
||
|
|
1cef1150ef |
kthread, sched/core: Fix kthread_parkme() (again...)
Gaurav reports that commit: |
||
|
|
b1f5b378e1 |
kthread: Allow kthread_park() on a parked kthread
The following commit: |
||
|
|
85f1abe001 |
kthread, sched/wait: Fix kthread_parkme() completion issue
Even with the wait-loop fixed, there is a further issue with kthread_parkme(). Upon hotplug, when we do takedown_cpu(), smpboot_park_threads() can return before all those threads are in fact blocked, due to the placement of the complete() in __kthread_parkme(). When that happens, sched_cpu_dying() -> migrate_tasks() can end up migrating such a still runnable task onto another CPU. Normally the task will have hit schedule() and gone to sleep by the time we do kthread_unpark(), which will then do __kthread_bind() to re-bind the task to the correct CPU. However, when we loose the initial TASK_PARKED store to the concurrent wakeup issue described previously, do the complete(), get migrated, it is possible to either: - observe kthread_unpark()'s clearing of SHOULD_PARK and terminate the park and set TASK_RUNNING, or - __kthread_bind()'s wait_task_inactive() to observe the competing TASK_RUNNING store. Either way the WARN() in __kthread_bind() will trigger and fail to correctly set the CPU affinity. Fix this by only issuing the complete() when the kthread has scheduled out. This does away with all the icky 'still running' nonsense. The alternative is to promote TASK_PARKED to a special state, this guarantees wait_task_inactive() cannot observe a 'stale' TASK_RUNNING and we'll end up doing the right thing, but this preserves the whole icky business of potentially migating the still runnable thing. Reported-by: Gaurav Kohli <gkohli@codeaurora.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> |
||
|
|
741a76b350 |
kthread, sched/wait: Fix kthread_parkme() wait-loop
Gaurav reported a problem with __kthread_parkme() where a concurrent
try_to_wake_up() could result in competing stores to ->state which,
when the TASK_PARKED store got lost bad things would happen.
The comment near set_current_state() actually mentions this competing
store, but only mentions the case against TASK_RUNNING. This same
store, with different timing, can happen against a subsequent !RUNNING
store.
This normally is not a problem, because as per that same comment, the
!RUNNING state store is inside a condition based wait-loop:
for (;;) {
set_current_state(TASK_UNINTERRUPTIBLE);
if (!need_sleep)
break;
schedule();
}
__set_current_state(TASK_RUNNING);
If we loose the (first) TASK_UNINTERRUPTIBLE store to a previous
(concurrent) wakeup, the schedule() will NO-OP and we'll go around the
loop once more.
The problem here is that the TASK_PARKED store is not inside the
KTHREAD_SHOULD_PARK condition wait-loop.
There is a genuine issue with sleeps that do not have a condition;
this is addressed in a subsequent patch.
Reported-by: Gaurav Kohli <gkohli@codeaurora.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
||
|
|
841b86f328 |
treewide: Remove TIMER_FUNC_TYPE and TIMER_DATA_TYPE casts
With all callbacks converted, and the timer callback prototype
switched over, the TIMER_FUNC_TYPE cast is no longer needed,
so remove it. Conversion was done with the following scripts:
perl -pi -e 's|\(TIMER_FUNC_TYPE\)||g' \
$(git grep TIMER_FUNC_TYPE | cut -d: -f1 | sort -u)
perl -pi -e 's|\(TIMER_DATA_TYPE\)||g' \
$(git grep TIMER_DATA_TYPE | cut -d: -f1 | sort -u)
The now unused macros are also dropped from include/linux/timer.h.
Signed-off-by: Kees Cook <keescook@chromium.org>
|
||
|
|
e2c5923c34 |
Merge branch 'for-4.15/block' of git://git.kernel.dk/linux-block
Pull core block layer updates from Jens Axboe:
"This is the main pull request for block storage for 4.15-rc1.
Nothing out of the ordinary in here, and no API changes or anything
like that. Just various new features for drivers, core changes, etc.
In particular, this pull request contains:
- A patch series from Bart, closing the whole on blk/scsi-mq queue
quescing.
- A series from Christoph, building towards hidden gendisks (for
multipath) and ability to move bio chains around.
- NVMe
- Support for native multipath for NVMe (Christoph).
- Userspace notifications for AENs (Keith).
- Command side-effects support (Keith).
- SGL support (Chaitanya Kulkarni)
- FC fixes and improvements (James Smart)
- Lots of fixes and tweaks (Various)
- bcache
- New maintainer (Michael Lyle)
- Writeback control improvements (Michael)
- Various fixes (Coly, Elena, Eric, Liang, et al)
- lightnvm updates, mostly centered around the pblk interface
(Javier, Hans, and Rakesh).
- Removal of unused bio/bvec kmap atomic interfaces (me, Christoph)
- Writeback series that fix the much discussed hundreds of millions
of sync-all units. This goes all the way, as discussed previously
(me).
- Fix for missing wakeup on writeback timer adjustments (Yafang
Shao).
- Fix laptop mode on blk-mq (me).
- {mq,name} tupple lookup for IO schedulers, allowing us to have
alias names. This means you can use 'deadline' on both !mq and on
mq (where it's called mq-deadline). (me).
- blktrace race fix, oopsing on sg load (me).
- blk-mq optimizations (me).
- Obscure waitqueue race fix for kyber (Omar).
- NBD fixes (Josef).
- Disable writeback throttling by default on bfq, like we do on cfq
(Luca Miccio).
- Series from Ming that enable us to treat flush requests on blk-mq
like any other request. This is a really nice cleanup.
- Series from Ming that improves merging on blk-mq with schedulers,
getting us closer to flipping the switch on scsi-mq again.
- BFQ updates (Paolo).
- blk-mq atomic flags memory ordering fixes (Peter Z).
- Loop cgroup support (Shaohua).
- Lots of minor fixes from lots of different folks, both for core and
driver code"
* 'for-4.15/block' of git://git.kernel.dk/linux-block: (294 commits)
nvme: fix visibility of "uuid" ns attribute
blk-mq: fixup some comment typos and lengths
ide: ide-atapi: fix compile error with defining macro DEBUG
blk-mq: improve tag waiting setup for non-shared tags
brd: remove unused brd_mutex
blk-mq: only run the hardware queue if IO is pending
block: avoid null pointer dereference on null disk
fs: guard_bio_eod() needs to consider partitions
xtensa/simdisk: fix compile error
nvme: expose subsys attribute to sysfs
nvme: create 'slaves' and 'holders' entries for hidden controllers
block: create 'slaves' and 'holders' entries for hidden gendisks
nvme: also expose the namespace identification sysfs files for mpath nodes
nvme: implement multipath access to nvme subsystems
nvme: track shared namespaces
nvme: introduce a nvme_ns_ids structure
nvme: track subsystems
block, nvme: Introduce blk_mq_req_flags_t
block, scsi: Make SCSI quiesce and resume work reliably
block: Add the QUEUE_FLAG_PREEMPT_ONLY request queue flag
...
|
||
|
|
e10237cc76 |
kthread: zero the kthread data structure
kthread() could bail out early before we initialize blkcg_css (if the
kthread is killed very early. Please see xchg() statement in kthread()),
which confuses free_kthread_struct. Instead of moving the blkcg_css
initialization early, we simply zero the whole 'self' data structure,
which doesn't sound much overhead.
Reported-by: syzbot <syzkaller@googlegroups.com>
Fixes:
|
||
|
|
fe5c3b69b5 |
kthread: Convert callback to use from_timer()
In preparation for unconditionally passing the struct timer_list pointer to all timer callbacks, switch kthread to use from_timer() and pass the timer pointer explicitly. Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: linux-mips@linux-mips.org Cc: Len Brown <len.brown@intel.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Lai Jiangshan <jiangshanlai@gmail.com> Cc: Sebastian Reichel <sre@kernel.org> Cc: Kalle Valo <kvalo@qca.qualcomm.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Pavel Machek <pavel@ucw.cz> Cc: linux1394-devel@lists.sourceforge.net Cc: Chris Metcalf <cmetcalf@mellanox.com> Cc: linux-s390@vger.kernel.org Cc: linux-wireless@vger.kernel.org Cc: "James E.J. Bottomley" <jejb@linux.vnet.ibm.com> Cc: Wim Van Sebroeck <wim@iguana.be> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Ursula Braun <ubraun@linux.vnet.ibm.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Viresh Kumar <viresh.kumar@linaro.org> Cc: Harish Patil <harish.patil@cavium.com> Cc: Stephen Boyd <sboyd@codeaurora.org> Cc: Guenter Roeck <linux@roeck-us.net> Cc: Manish Chopra <manish.chopra@cavium.com> Cc: Petr Mladek <pmladek@suse.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: linux-pm@vger.kernel.org Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Julian Wiedmann <jwi@linux.vnet.ibm.com> Cc: John Stultz <john.stultz@linaro.org> Cc: Mark Gross <mark.gross@intel.com> Cc: linux-watchdog@vger.kernel.org Cc: linux-scsi@vger.kernel.org Cc: "Martin K. Petersen" <martin.petersen@oracle.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Stefan Richter <stefanr@s5r6.in-berlin.de> Cc: Michael Reed <mdr@sgi.com> Cc: netdev@vger.kernel.org Cc: Tejun Heo <tj@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: linuxppc-dev@lists.ozlabs.org Cc: Sudip Mukherjee <sudipm.mukherjee@gmail.com> Link: https://lkml.kernel.org/r/1507159627-127660-13-git-send-email-keescook@chromium.org |
||
|
|
0b508bc926 |
block: fix a build error
The code is only for blkcg not for all cgroups
Fixes:
|
||
|
|
05e3db95eb |
kthread: add a mechanism to store cgroup info
kthread usually runs jobs on behalf of other threads. The jobs should be charged to cgroup of original threads. But the jobs run in a kthread, where we lose the cgroup context of original threads. The patch adds a machanism to record cgroup info of original threads in kthread context. Later we can retrieve the cgroup info and attach the cgroup info to jobs. Since this mechanism is only required by kthread, we store the cgroup info in kthread data instead of generic task_struct. Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Shaohua Li <shli@fb.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> |
||
|
|
22cf8bc6cb |
kernel/kthread.c: kthread_worker: don't hog the cpu
If the worker thread continues getting work, it will hog the cpu and rcu stall complains. Make it a good citizen. This is triggered in a loop block device test. Link: http://lkml.kernel.org/r/5de0a179b3184e1a2183fc503448b0269f24d75b.1503697127.git.shli@fb.com Signed-off-by: Shaohua Li <shli@fb.com> Cc: Petr Mladek <pmladek@suse.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tejun Heo <tj@kernel.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Ingo Molnar <mingo@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
77f88796ce |
cgroup, kthread: close race window where new kthreads can be migrated to non-root cgroups
Creation of a kthread goes through a couple interlocked stages between
the kthread itself and its creator. Once the new kthread starts
running, it initializes itself and wakes up the creator. The creator
then can further configure the kthread and then let it start doing its
job by waking it up.
In this configuration-by-creator stage, the creator is the only one
that can wake it up but the kthread is visible to userland. When
altering the kthread's attributes from userland is allowed, this is
fine; however, for cases where CPU affinity is critical,
kthread_bind() is used to first disable affinity changes from userland
and then set the affinity. This also prevents the kthread from being
migrated into non-root cgroups as that can affect the CPU affinity and
many other things.
Unfortunately, the cgroup side of protection is racy. While the
PF_NO_SETAFFINITY flag prevents further migrations, userland can win
the race before the creator sets the flag with kthread_bind() and put
the kthread in a non-root cgroup, which can lead to all sorts of
problems including incorrect CPU affinity and starvation.
This bug got triggered by userland which periodically tries to migrate
all processes in the root cpuset cgroup to a non-root one. Per-cpu
workqueue workers got caught while being created and ended up with
incorrected CPU affinity breaking concurrency management and sometimes
stalling workqueue execution.
This patch adds task->no_cgroup_migration which disallows the task to
be migrated by userland. kthreadd starts with the flag set making
every child kthread start in the root cgroup with migration
disallowed. The flag is cleared after the kthread finishes
initialization by which time PF_NO_SETAFFINITY is set if the kthread
should stay in the root cgroup.
It'd be better to wait for the initialization instead of failing but I
couldn't think of a way of implementing that without adding either a
new PF flag, or sleeping and retrying from waiting side. Even if
userland depends on changing cgroup membership of a kthread, it either
has to be synchronized with kthread_create() or periodically repeat,
so it's unlikely that this would break anything.
v2: Switch to a simpler implementation using a new task_struct bit
field suggested by Oleg.
Signed-off-by: Tejun Heo <tj@kernel.org>
Suggested-by: Oleg Nesterov <oleg@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Reported-and-debugged-by: Chris Mason <clm@fb.com>
Cc: stable@vger.kernel.org # v4.3+ (we can't close the race on < v4.3)
Signed-off-by: Tejun Heo <tj@kernel.org>
|
||
|
|
299300258d |
sched/headers: Prepare for new header dependencies before moving code to <linux/sched/task.h>
We are going to split <linux/sched/task.h> out of <linux/sched.h>, which will have to be picked up from other headers and a couple of .c files. Create a trivial placeholder <linux/sched/task.h> file that just maps to <linux/sched.h> to make this patch obviously correct and bisectable. Include the new header in the files that are going to need it. Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org> |
||
|
|
ae7e81c077 |
sched/headers: Prepare for new header dependencies before moving code to <uapi/linux/sched/types.h>
We are going to move scheduler ABI details to <uapi/linux/sched/types.h>, which will be used from a number of .c files. Create empty placeholder header that maps to <linux/types.h>. Include the new header in the files that are going to need it. Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org> |
||
|
|
dfb4357da6 |
time: Remove CONFIG_TIMER_STATS
Currently CONFIG_TIMER_STATS exposes process information across namespaces:
kernel/time/timer_list.c print_timer():
SEQ_printf(m, ", %s/%d", tmp, timer->start_pid);
/proc/timer_list:
#11: <0000000000000000>, hrtimer_wakeup, S:01, do_nanosleep, cron/2570
Given that the tracer can give the same information, this patch entirely
removes CONFIG_TIMER_STATS.
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: John Stultz <john.stultz@linaro.org>
Cc: Nicolas Pitre <nicolas.pitre@linaro.org>
Cc: linux-doc@vger.kernel.org
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Xing Gao <xgao01@email.wm.edu>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Jessica Frazelle <me@jessfraz.com>
Cc: kernel-hardening@lists.openwall.com
Cc: Nicolas Iooss <nicolas.iooss_linux@m4x.org>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Michal Marek <mmarek@suse.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Olof Johansson <olof@lixom.net>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: linux-api@vger.kernel.org
Cc: Arjan van de Ven <arjan@linux.intel.com>
Link: http://lkml.kernel.org/r/20170208192659.GA32582@beast
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
||
|
|
c0b942a763 |
kthread: add __printf attributes
When commit
|
||
|
|
8fb9dcbdc3 |
kthread: Don't abuse kthread_create_on_cpu() in __kthread_create_worker()
kthread_create_on_cpu() sets KTHREAD_IS_PER_CPU and kthread->cpu, this only makes sense if this kthread can be parked/unparked by cpuhp code. kthread workers never call kthread_parkme() so this has no effect. Change __kthread_create_worker() to simply call kthread_bind(task, cpu). The very fact that kthread_create_on_cpu() doesn't accept a generic fmt shows that it should not be used outside of smpboot.c. Now, the only reason we can not unexport this helper and move it into smpboot.c is that it sets kthread->cpu and struct kthread is not exported. And the only reason we can not kill kthread->cpu is that kthread_unpark() is used by drivers/gpu/drm/amd/scheduler/gpu_scheduler.c and thus we can not turn _unpark into kthread_unpark(struct smp_hotplug_thread *, cpu). Signed-off-by: Oleg Nesterov <oleg@redhat.com> Tested-by: Petr Mladek <pmladek@suse.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Petr Mladek <pmladek@suse.com> Cc: Chunming Zhou <David1.Zhou@amd.com> Cc: Roman Pen <roman.penyaev@profitbricks.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Tejun Heo <tj@kernel.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Andrew Morton <akpm@linux-foundation.org> Link: http://lkml.kernel.org/r/20161129175110.GA5342@redhat.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de> |
||
|
|
cf380a4a96 |
kthread: Don't use to_live_kthread() in kthread_[un]park()
Now that to_kthread() is always validm change kthread_park() and kthread_unpark() to use it and kill to_live_kthread(). The conversion of kthread_unpark() is trivial. If KTHREAD_IS_PARKED is set then the task has called complete(&self->parked) and there the function cannot race against a concurrent kthread_stop() and exit. kthread_park() is more tricky, because its semantics are not well defined. It returns -ENOSYS if the thread exited but this can never happen and as Roman pointed out kthread_park() can obviously block forever if it would race with the exiting kthread. The usage of kthread_park() in cpuhp code (cpu.c, smpboot.c, stop_machine.c) is fine. It can never see an exiting/exited kthread, smpboot_destroy_threads() clears *ht->store, smpboot_park_thread() checks it is not NULL under the same smpboot_threads_lock. cpuhp_threads and cpu_stop_threads never exit, so other callers are fine too. But it has two more users: - watchdog_park_threads(): The code is actually correct, get_online_cpus() ensures that kthread_park() can't race with itself (note that kthread_park() can't handle this race correctly), but it should not use kthread_park() directly. - drivers/gpu/drm/amd/scheduler/gpu_scheduler.c should not use kthread_park() either. kthread_park() must not be called after amd_sched_fini() which does kthread_stop(), otherwise even to_live_kthread() is not safe because task_struct can be already freed and sched->thread can point to nowhere. The usage of kthread_park/unpark should either be restricted to core code which is properly protected against the exit race or made more robust so it is safe to use it in drivers. To catch eventual exit issues, add a WARN_ON(PF_EXITING) for now. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: Chunming Zhou <David1.Zhou@amd.com> Cc: Roman Pen <roman.penyaev@profitbricks.com> Cc: Petr Mladek <pmladek@suse.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Tejun Heo <tj@kernel.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Andrew Morton <akpm@linux-foundation.org> Link: http://lkml.kernel.org/r/20161129175107.GA5339@redhat.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de> |
||
|
|
efb29fbfa5 |
kthread: Don't use to_live_kthread() in kthread_stop()
kthread_stop() had to use to_live_kthread() simply because it was not possible to access kthread->exited after the exiting task clears task_struct->vfork_done. Now that to_kthread() is always valid, wake_up_process() + wait_for_completion() can be done ununconditionally. It's not an issue anymore if the task has already issued complete_vfork_done() or died. The exiting task can get the spurious wakeup after mm_release() but this is possible without this change too and is fine; do_task_dead() ensures that this can't make any harm. As a further enhancement this could be converted to task_work_add() later, so ->vfork_done can be avoided completely. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: Chunming Zhou <David1.Zhou@amd.com> Cc: Roman Pen <roman.penyaev@profitbricks.com> Cc: Petr Mladek <pmladek@suse.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Tejun Heo <tj@kernel.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Andrew Morton <akpm@linux-foundation.org> Link: http://lkml.kernel.org/r/20161129175103.GA5336@redhat.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de> |
||
|
|
eff9662547 |
Revert "kthread: Pin the stack via try_get_task_stack()/put_task_stack() in to_live_kthread() function"
This reverts commit
|
||
|
|
1da5c46fa9 |
kthread: Make struct kthread kmalloc'ed
commit
|
||
|
|
dbf52682cb |
kthread: better support freezable kthread workers
This patch allows to make kthread worker freezable via a new @flags parameter. It will allow to avoid an init work in some kthreads. It currently does not affect the function of kthread_worker_fn() but it might help to do some optimization or fixes eventually. I currently do not know about any other use for the @flags parameter but I believe that we will want more flags in the future. Finally, I hope that it will not cause confusion with @flags member in struct kthread. Well, I guess that we will want to rework the basic kthreads implementation once all kthreads are converted into kthread workers or workqueues. It is possible that we will merge the two structures. Link: http://lkml.kernel.org/r/1470754545-17632-12-git-send-email-pmladek@suse.com Signed-off-by: Petr Mladek <pmladek@suse.com> Acked-by: Tejun Heo <tj@kernel.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Borislav Petkov <bp@suse.de> Cc: Michal Hocko <mhocko@suse.cz> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
9a6b06c8d9 |
kthread: allow to modify delayed kthread work
There are situations when we need to modify the delay of a delayed kthread work. For example, when the work depends on an event and the initial delay means a timeout. Then we want to queue the work immediately when the event happens. This patch implements kthread_mod_delayed_work() as inspired workqueues. It cancels the timer, removes the work from any worker list and queues it again with the given timeout. A very special case is when the work is being canceled at the same time. It might happen because of the regular kthread_cancel_delayed_work_sync() or by another kthread_mod_delayed_work(). In this case, we do nothing and let the other operation win. This should not normally happen as the caller is supposed to synchronize these operations a reasonable way. Link: http://lkml.kernel.org/r/1470754545-17632-11-git-send-email-pmladek@suse.com Signed-off-by: Petr Mladek <pmladek@suse.com> Acked-by: Tejun Heo <tj@kernel.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Borislav Petkov <bp@suse.de> Cc: Michal Hocko <mhocko@suse.cz> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
37be45d49d |
kthread: allow to cancel kthread work
We are going to use kthread workers more widely and sometimes we will need to make sure that the work is neither pending nor running. This patch implements cancel_*_sync() operations as inspired by workqueues. Well, we are synchronized against the other operations via the worker lock, we use del_timer_sync() and a counter to count parallel cancel operations. Therefore the implementation might be easier. First, we check if a worker is assigned. If not, the work has newer been queued after it was initialized. Second, we take the worker lock. It must be the right one. The work must not be assigned to another worker unless it is initialized in between. Third, we try to cancel the timer when it exists. The timer is deleted synchronously to make sure that the timer call back is not running. We need to temporary release the worker->lock to avoid a possible deadlock with the callback. In the meantime, we set work->canceling counter to avoid any queuing. Fourth, we try to remove the work from a worker list. It might be the list of either normal or delayed works. Fifth, if the work is running, we call kthread_flush_work(). It might take an arbitrary time. We need to release the worker-lock again. In the meantime, we again block any queuing by the canceling counter. As already mentioned, the check for a pending kthread work is done under a lock. In compare with workqueues, we do not need to fight for a single PENDING bit to block other operations. Therefore we do not suffer from the thundering storm problem and all parallel canceling jobs might use kthread_flush_work(). Any queuing is blocked until the counter gets zero. Link: http://lkml.kernel.org/r/1470754545-17632-10-git-send-email-pmladek@suse.com Signed-off-by: Petr Mladek <pmladek@suse.com> Acked-by: Tejun Heo <tj@kernel.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Borislav Petkov <bp@suse.de> Cc: Michal Hocko <mhocko@suse.cz> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
22597dc3d9 |
kthread: initial support for delayed kthread work
We are going to use kthread_worker more widely and delayed works will be pretty useful. The implementation is inspired by workqueues. It uses a timer to queue the work after the requested delay. If the delay is zero, the work is queued immediately. In compare with workqueues, each work is associated with a single worker (kthread). Therefore the implementation could be much easier. In particular, we use the worker->lock to synchronize all the operations with the work. We do not need any atomic operation with a flags variable. In fact, we do not need any state variable at all. Instead, we add a list of delayed works into the worker. Then the pending work is listed either in the list of queued or delayed works. And the existing check of pending works is the same even for the delayed ones. A work must not be assigned to another worker unless reinitialized. Therefore the timer handler might expect that dwork->work->worker is valid and it could simply take the lock. We just add some sanity checks to help with debugging a potential misuse. Link: http://lkml.kernel.org/r/1470754545-17632-9-git-send-email-pmladek@suse.com Signed-off-by: Petr Mladek <pmladek@suse.com> Acked-by: Tejun Heo <tj@kernel.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Borislav Petkov <bp@suse.de> Cc: Michal Hocko <mhocko@suse.cz> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
8197b3d43b |
kthread: detect when a kthread work is used by more workers
Nothing currently prevents a work from queuing for a kthread worker when it is already running on another one. This means that the work might run in parallel on more than one worker. Also some operations are not reliable, e.g. flush. This problem will be even more visible after we add kthread_cancel_work() function. It will only have "work" as the parameter and will use worker->lock to synchronize with others. Well, normally this is not a problem because the API users are sane. But bugs might happen and users also might be crazy. This patch adds a warning when we try to insert the work for another worker. It does not fully prevent the misuse because it would make the code much more complicated without a big benefit. It adds the same warning also into kthread_flush_work() instead of the repeated attempts to get the right lock. A side effect is that one needs to explicitly reinitialize the work if it must be queued into another worker. This is needed, for example, when the worker is stopped and started again. It is a bit inconvenient. But it looks like a good compromise between the stability and complexity. I have double checked all existing users of the kthread worker API and they all seems to initialize the work after the worker gets started. Just for completeness, the patch adds a check that the work is not already in a queue. The patch also puts all the checks into a separate function. It will be reused when implementing delayed works. Link: http://lkml.kernel.org/r/1470754545-17632-8-git-send-email-pmladek@suse.com Signed-off-by: Petr Mladek <pmladek@suse.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Tejun Heo <tj@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Borislav Petkov <bp@suse.de> Cc: Michal Hocko <mhocko@suse.cz> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
35033fe9cb |
kthread: add kthread_destroy_worker()
The current kthread worker users call flush() and stop() explicitly. This function does the same plus it frees the kthread_worker struct in one call. It is supposed to be used together with kthread_create_worker*() that allocates struct kthread_worker. Link: http://lkml.kernel.org/r/1470754545-17632-7-git-send-email-pmladek@suse.com Signed-off-by: Petr Mladek <pmladek@suse.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Tejun Heo <tj@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Borislav Petkov <bp@suse.de> Cc: Michal Hocko <mhocko@suse.cz> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
fbae2d44aa |
kthread: add kthread_create_worker*()
Kthread workers are currently created using the classic kthread API, namely kthread_run(). kthread_worker_fn() is passed as the @threadfn parameter. This patch defines kthread_create_worker() and kthread_create_worker_on_cpu() functions that hide implementation details. They enforce using kthread_worker_fn() for the main thread. But I doubt that there are any plans to create any alternative. In fact, I think that we do not want any alternative main thread because it would be hard to support consistency with the rest of the kthread worker API. The naming and function of kthread_create_worker() is inspired by the workqueues API like the rest of the kthread worker API. The kthread_create_worker_on_cpu() variant is motivated by the original kthread_create_on_cpu(). Note that we need to bind per-CPU kthread workers already when they are created. It makes the life easier. kthread_bind() could not be used later for an already running worker. This patch does _not_ convert existing kthread workers. The kthread worker API need more improvements first, e.g. a function to destroy the worker. IMPORTANT: kthread_create_worker_on_cpu() allows to use any format of the worker name, in compare with kthread_create_on_cpu(). The good thing is that it is more generic. The bad thing is that most users will need to pass the cpu number in two parameters, e.g. kthread_create_worker_on_cpu(cpu, "helper/%d", cpu). To be honest, the main motivation was to avoid the need for an empty va_list. The only legal way was to create a helper function that would be called with an empty list. Other attempts caused compilation warnings or even errors on different architectures. There were also other alternatives, for example, using #define or splitting __kthread_create_worker(). The used solution looked like the least ugly. Link: http://lkml.kernel.org/r/1470754545-17632-6-git-send-email-pmladek@suse.com Signed-off-by: Petr Mladek <pmladek@suse.com> Acked-by: Tejun Heo <tj@kernel.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Borislav Petkov <bp@suse.de> Cc: Michal Hocko <mhocko@suse.cz> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
255451e453 |
kthread: allow to call __kthread_create_on_node() with va_list args
kthread_create_on_node() implements a bunch of logic to create the kthread. It is already called by kthread_create_on_cpu(). We are going to extend the kthread worker API and will need to call kthread_create_on_node() with va_list args there. This patch does only a refactoring and does not modify the existing behavior. Link: http://lkml.kernel.org/r/1470754545-17632-5-git-send-email-pmladek@suse.com Signed-off-by: Petr Mladek <pmladek@suse.com> Acked-by: Tejun Heo <tj@kernel.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Borislav Petkov <bp@suse.de> Cc: Michal Hocko <mhocko@suse.cz> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
a65d40961d |
kthread/smpboot: do not park in kthread_create_on_cpu()
kthread_create_on_cpu() was added by the commit
|
||
|
|
3989144f86 |
kthread: kthread worker API cleanup
A good practice is to prefix the names of functions by the name
of the subsystem.
The kthread worker API is a mix of classic kthreads and workqueues. Each
worker has a dedicated kthread. It runs a generic function that process
queued works. It is implemented as part of the kthread subsystem.
This patch renames the existing kthread worker API to use
the corresponding name from the workqueues API prefixed by
kthread_:
__init_kthread_worker() -> __kthread_init_worker()
init_kthread_worker() -> kthread_init_worker()
init_kthread_work() -> kthread_init_work()
insert_kthread_work() -> kthread_insert_work()
queue_kthread_work() -> kthread_queue_work()
flush_kthread_work() -> kthread_flush_work()
flush_kthread_worker() -> kthread_flush_worker()
Note that the names of DEFINE_KTHREAD_WORK*() macros stay
as they are. It is common that the "DEFINE_" prefix has
precedence over the subsystem names.
Note that INIT() macros and init() functions use different
naming scheme. There is no good solution. There are several
reasons for this solution:
+ "init" in the function names stands for the verb "initialize"
aka "initialize worker". While "INIT" in the macro names
stands for the noun "INITIALIZER" aka "worker initializer".
+ INIT() macros are used only in DEFINE() macros
+ init() functions are used close to the other kthread()
functions. It looks much better if all the functions
use the same scheme.
+ There will be also kthread_destroy_worker() that will
be used close to kthread_cancel_work(). It is related
to the init() function. Again it looks better if all
functions use the same naming scheme.
+ there are several precedents for such init() function
names, e.g. amd_iommu_init_device(), free_area_init_node(),
jump_label_init_type(), regmap_init_mmio_clk(),
+ It is not an argument but it was inconsistent even before.
[arnd@arndb.de: fix linux-next merge conflict]
Link: http://lkml.kernel.org/r/20160908135724.1311726-1-arnd@arndb.de
Link: http://lkml.kernel.org/r/1470754545-17632-3-git-send-email-pmladek@suse.com
Suggested-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Petr Mladek <pmladek@suse.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Borislav Petkov <bp@suse.de>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
||
|
|
e700591ae0 |
kthread: rename probe_kthread_data() to kthread_probe_data()
Patch series "kthread: Kthread worker API improvements" The intention of this patchset is to make it easier to manipulate and maintain kthreads. Especially, I want to replace all the custom main cycles with a generic one. Also I want to make the kthreads sleep in a consistent state in a common place when there is no work. This patch (of 11): A good practice is to prefix the names of functions by the name of the subsystem. This patch fixes the name of probe_kthread_data(). The other wrong functions names are part of the kthread worker API and will be fixed separately. Link: http://lkml.kernel.org/r/1470754545-17632-2-git-send-email-pmladek@suse.com Signed-off-by: Petr Mladek <pmladek@suse.com> Suggested-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Tejun Heo <tj@kernel.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Borislav Petkov <bp@suse.de> Cc: Michal Hocko <mhocko@suse.cz> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
23196f2e5f |
kthread: Pin the stack via try_get_task_stack()/put_task_stack() in to_live_kthread() function
get_task_struct(tsk) no longer pins tsk->stack so all users of to_live_kthread() should do try_get_task_stack/put_task_stack to protect "struct kthread" which lives on kthread's stack. TODO: Kill to_live_kthread(), perhaps we can even kill "struct kthread" too, and rework kthread_stop(), it can use task_work_add() to sync with the exiting kernel thread. Message-Id: <20160629180357.GA7178@redhat.com> Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jann Horn <jann@thejh.net> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/cb9b16bbc19d4aea4507ab0552e4644c1211d130.1474003868.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org> |
||
|
|
e9f069868d |
kernel/kthread.c:kthread_create_on_node(): clarify documentation
- Make it clear that the `node' arg refers to memory allocations only: kthread_create_on_node() does not pin the new thread to that node's CPUs. - Encourage the use of NUMA_NO_NODE. [nzimmer@sgi.com: use NUMA_NO_NODE in kthread_create() also] Cc: Nathan Zimmer <nzimmer@sgi.com> Cc: Tejun Heo <tj@kernel.org> Cc: Eric Dumazet <edumazet@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |