vic
33 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
6369256a83 |
Merge 4.19.269 into android-4.19-stable
Changes in 4.19.269 arm: dts: rockchip: fix node name for hym8563 rtc ARM: dts: rockchip: fix ir-receiver node names ARM: 9251/1: perf: Fix stacktraces for tracepoint events in THUMB2 kernels ARM: 9266/1: mm: fix no-MMU ZERO_PAGE() implementation ARM: dts: rockchip: disable arm_global_timer on rk3066 and rk3188 9p/fd: Use P9_HDRSZ for header size ALSA: seq: Fix function prototype mismatch in snd_seq_expand_var_event ASoC: soc-pcm: Add NULL check in BE reparenting regulator: twl6030: fix get status of twl6032 regulators fbcon: Use kzalloc() in fbcon_prepare_logo() 9p/xen: check logical size for buffer size net: usb: qmi_wwan: add u-blox 0x1342 composition xen/netback: Ensure protocol headers don't fall in the non-linear area xen/netback: do some code cleanup xen/netback: don't call kfree_skb() with interrupts disabled rcutorture: Automatically create initrd directory media: v4l2-dv-timings.c: fix too strict blanking sanity checks memcg: fix possible use-after-free in memcg_write_event_control() KVM: s390: vsie: Fix the initialization of the epoch extension (epdx) field HID: hid-lg4ff: Add check for empty lbuf HID: core: fix shift-out-of-bounds in hid_report_raw_event ieee802154: cc2520: Fix error return code in cc2520_hw_init() ca8210: Fix crash by zero initializing data gpio: amd8111: Fix PCI device reference count leak e1000e: Fix TX dispatch condition igb: Allocate MSI-X vector when testing Bluetooth: 6LoWPAN: add missing hci_dev_put() in get_l2cap_conn() Bluetooth: Fix not cleanup led when bt_init fails selftests: rtnetlink: correct xfrm policy rule in kci_test_ipsec_offload mac802154: fix missing INIT_LIST_HEAD in ieee802154_if_add() net: encx24j600: Add parentheses to fix precedence net: encx24j600: Fix invalid logic in reading of MISTAT register xen-netfront: Fix NULL sring after live migration net: mvneta: Prevent out of bounds read in mvneta_config_rss() i40e: Fix not setting default xps_cpus after reset i40e: Fix for VF MAC address 0 i40e: Disallow ip4 and ip6 l4_4_bytes NFC: nci: Bounds check struct nfc_target arrays nvme initialize core quirks before calling nvme_init_subsystem net: stmmac: fix "snps,axi-config" node property parsing net: hisilicon: Fix potential use-after-free in hisi_femac_rx() net: hisilicon: Fix potential use-after-free in hix5hd2_rx() tipc: Fix potential OOB in tipc_link_proto_rcv() ethernet: aeroflex: fix potential skb leak in greth_init_rings() xen/netback: fix build warning net: plip: don't call kfree_skb/dev_kfree_skb() under spin_lock_irq() ipv6: avoid use-after-free in ip6_fragment() net: mvneta: Fix an out of bounds check can: esd_usb: Allow REC and TEC to return to zero Linux 4.19.269 Change-Id: Ie79e55bad6376d2314d44479ef1ec3c546d24030 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> |
||
|
|
e1ae97624e |
memcg: fix possible use-after-free in memcg_write_event_control()
commit 4a7ba45b1a435e7097ca0f79a847d0949d0eb088 upstream. memcg_write_event_control() accesses the dentry->d_name of the specified control fd to route the write call. As a cgroup interface file can't be renamed, it's safe to access d_name as long as the specified file is a regular cgroup file. Also, as these cgroup interface files can't be removed before the directory, it's safe to access the parent too. Prior to |
||
|
|
ce7025b713 |
Merge 4.19.238 into android-4.19-stable
Changes in 4.19.238 USB: serial: pl2303: add IBM device IDs USB: serial: simple: add Nokia phone driver netdevice: add the case if dev is NULL xfrm: fix tunnel model fragmentation behavior virtio_console: break out of buf poll on remove ethernet: sun: Free the coherent when failing in probing spi: Fix invalid sgs value net:mcf8390: Use platform_get_irq() to get the interrupt spi: Fix erroneous sgs value with min_t() af_key: add __GFP_ZERO flag for compose_sadb_supported in function pfkey_register fuse: fix pipe buffer lifetime for direct_io tpm: fix reference counting for struct tpm_chip block: Add a helper to validate the block size virtio-blk: Use blk_validate_block_size() to validate block size USB: usb-storage: Fix use of bitfields for hardware data in ene_ub6250.c xhci: make xhci_handshake timeout for xhci_reset() adjustable coresight: Fix TRCCONFIGR.QE sysfs interface iio: afe: rescale: use s64 for temporary scale calculations iio: inkern: apply consumer scale on IIO_VAL_INT cases iio: inkern: apply consumer scale when no channel scale is available iio: inkern: make a best effort on offset calculation clk: uniphier: Fix fixed-rate initialization ptrace: Check PTRACE_O_SUSPEND_SECCOMP permission on PTRACE_SEIZE Documentation: add link to stable release candidate tree Documentation: update stable tree link SUNRPC: avoid race between mod_timer() and del_timer_sync() NFSD: prevent underflow in nfssvc_decode_writeargs() NFSD: prevent integer overflow on 32 bit systems f2fs: fix to unlock page correctly in error path of is_alive() pinctrl: samsung: drop pin banks references on error paths can: ems_usb: ems_usb_start_xmit(): fix double dev_kfree_skb() in error path jffs2: fix use-after-free in jffs2_clear_xattr_subsystem jffs2: fix memory leak in jffs2_do_mount_fs jffs2: fix memory leak in jffs2_scan_medium mm/pages_alloc.c: don't create ZONE_MOVABLE beyond the end of a node mm: invalidate hwpoison page cache page in fault path mempolicy: mbind_range() set_policy() after vma_merge() scsi: libsas: Fix sas_ata_qc_issue() handling of NCQ NON DATA commands qed: display VF trust config qed: validate and restrict untrusted VFs vlan promisc mode Revert "Input: clear BTN_RIGHT/MIDDLE on buttonpads" ALSA: cs4236: fix an incorrect NULL check on list iterator ALSA: hda/realtek: Fix audio regression on Mi Notebook Pro 2020 mm,hwpoison: unmap poisoned page before invalidation drbd: fix potential silent data corruption powerpc/kvm: Fix kvm_use_magic_page ACPI: properties: Consistently return -ENOENT if there are no more references drivers: hamradio: 6pack: fix UAF bug caused by mod_timer() block: don't merge across cgroup boundaries if blkcg is enabled drm/edid: check basic audio support on CEA extension block video: fbdev: sm712fb: Fix crash in smtcfb_read() video: fbdev: atari: Atari 2 bpp (STe) palette bugfix ARM: dts: at91: sama5d2: Fix PMERRLOC resource size ARM: dts: exynos: fix UART3 pins configuration in Exynos5250 ARM: dts: exynos: add missing HDMI supplies on SMDK5250 ARM: dts: exynos: add missing HDMI supplies on SMDK5420 carl9170: fix missing bit-wise or operator for tx_params thermal: int340x: Increase bitmap size lib/raid6/test: fix multiple definition linking error DEC: Limit PMAX memory probing to R3k systems media: davinci: vpif: fix unbalanced runtime PM get brcmfmac: firmware: Allocate space for default boardrev in nvram brcmfmac: pcie: Replace brcmf_pcie_copy_mem_todev with memcpy_toio PCI: pciehp: Clear cmd_busy bit in polling mode regulator: qcom_smd: fix for_each_child.cocci warnings crypto: authenc - Fix sleep in atomic context in decrypt_tail crypto: mxs-dcp - Fix scatterlist processing spi: tegra114: Add missing IRQ check in tegra_spi_probe selftests/x86: Add validity check and allow field splitting spi: pxa2xx-pci: Balance reference count for PCI DMA device hwmon: (pmbus) Add mutex to regulator ops hwmon: (sch56xx-common) Replace WDOG_ACTIVE with WDOG_HW_RUNNING block: don't delete queue kobject before its children PM: hibernate: fix __setup handler error handling PM: suspend: fix return value of __setup handler hwrng: atmel - disable trng on failure path crypto: vmx - add missing dependencies clocksource/drivers/timer-of: Check return value of of_iomap in timer_of_base_init() ACPI: APEI: fix return value of __setup handlers crypto: ccp - ccp_dmaengine_unregister release dma channels hwmon: (pmbus) Add Vin unit off handling clocksource: acpi_pm: fix return value of __setup handler sched/debug: Remove mpol_get/put and task_lock/unlock from sched_show_numa perf/core: Fix address filter parser for multiple filters perf/x86/intel/pt: Fix address filter config for 32-bit kernel media: coda: Fix missing put_device() call in coda_get_vdoa_data video: fbdev: smscufx: Fix null-ptr-deref in ufx_usb_probe() video: fbdev: fbcvt.c: fix printing in fb_cvt_print_name() ARM: dts: qcom: ipq4019: fix sleep clock soc: ti: wkup_m3_ipc: Fix IRQ check in wkup_m3_ipc_probe media: em28xx: initialize refcount before kref_get media: usb: go7007: s2250-board: fix leak in probe() ASoC: rt5663: check the return value of devm_kzalloc() in rt5663_parse_dp() ASoC: ti: davinci-i2s: Add check for clk_enable() ALSA: spi: Add check for clk_enable() arm64: dts: ns2: Fix spi-cpol and spi-cpha property arm64: dts: broadcom: Fix sata nodename printk: fix return value of printk.devkmsg __setup handler ASoC: mxs-saif: Handle errors for clk_enable ASoC: atmel_ssc_dai: Handle errors for clk_enable memory: emif: Add check for setup_interrupts memory: emif: check the pointer temp in get_device_details() ALSA: firewire-lib: fix uninitialized flag for AV/C deferred transaction media: stk1160: If start stream fails, return buffers with VB2_BUF_STATE_QUEUED ASoC: atmel: Add missing of_node_put() in at91sam9g20ek_audio_probe ASoC: wm8350: Handle error for wm8350_register_irq ASoC: fsi: Add check for clk_enable video: fbdev: omapfb: Add missing of_node_put() in dvic_probe_of ASoC: dmaengine: do not use a NULL prepare_slave_config() callback ASoC: mxs: Fix error handling in mxs_sgtl5000_probe ASoC: imx-es8328: Fix error return code in imx_es8328_probe() ASoC: msm8916-wcd-digital: Fix missing clk_disable_unprepare() in msm8916_wcd_digital_probe mmc: davinci_mmc: Handle error for clk_enable drm/bridge: Fix free wrong object in sii8620_init_rcp_input_dev ath10k: fix memory overwrite of the WoWLAN wakeup packet pattern Bluetooth: hci_serdev: call init_rwsem() before p->open() mtd: onenand: Check for error irq drm/edid: Don't clear formats if using deep color drm/amd/display: Fix a NULL pointer dereference in amdgpu_dm_connector_add_common_modes() ath9k_htc: fix uninit value bugs KVM: PPC: Fix vmx/vsx mixup in mmio emulation power: reset: gemini-poweroff: Fix IRQ check in gemini_poweroff_probe ray_cs: Check ioremap return value power: supply: ab8500: Fix memory leak in ab8500_fg_sysfs_init HID: i2c-hid: fix GET/SET_REPORT for unnumbered reports iwlwifi: Fix -EIO error code that is never returned dm crypt: fix get_key_size compiler warning if !CONFIG_KEYS scsi: pm8001: Fix command initialization in pm80XX_send_read_log() scsi: pm8001: Fix command initialization in pm8001_chip_ssp_tm_req() scsi: pm8001: Fix payload initialization in pm80xx_set_thermal_config() scsi: pm8001: Fix abort all task initialization TOMOYO: fix __setup handlers return values ext2: correct max file size computing drm/tegra: Fix reference leak in tegra_dsi_ganged_probe power: supply: bq24190_charger: Fix bq24190_vbus_is_enabled() wrong false return drm/bridge: cdns-dsi: Make sure to to create proper aliases for dt powerpc/Makefile: Don't pass -mcpu=powerpc64 when building 32-bit KVM: x86: Fix emulation in writing cr8 KVM: x86/emulator: Defer not-present segment check in __load_segment_descriptor() hv_balloon: rate-limit "Unhandled message" warning i2c: xiic: Make bus names unique power: supply: wm8350-power: Handle error for wm8350_register_irq power: supply: wm8350-power: Add missing free in free_charger_irq PCI: Reduce warnings on possible RW1C corruption powerpc/sysdev: fix incorrect use to determine if list is empty mfd: mc13xxx: Add check for mc13xxx_irq_request vxcan: enable local echo for sent CAN frames MIPS: RB532: fix return value of __setup handler mtd: rawnand: atmel: fix refcount issue in atmel_nand_controller_init USB: storage: ums-realtek: fix error code in rts51x_read_mem() af_netlink: Fix shift out of bounds in group mask calculation i2c: mux: demux-pinctrl: do not deactivate a master that is not active selftests/bpf/test_lirc_mode2.sh: Exit with proper code tcp: ensure PMTU updates are processed during fastopen mfd: asic3: Add missing iounmap() on error asic3_mfd_probe mxser: fix xmit_buf leak in activate when LSR == 0xff pwm: lpc18xx-sct: Initialize driver data and hardware before pwmchip_add() staging:iio:adc:ad7280a: Fix handing of device address bit reversing. clk: qcom: ipq8074: Use floor ops for SDCC1 clock serial: 8250_mid: Balance reference count for PCI DMA device serial: 8250: Fix race condition in RTS-after-send handling iio: adc: Add check for devm_request_threaded_irq dma-debug: fix return value of __setup handlers clk: qcom: clk-rcg2: Update the frac table for pixel clock remoteproc: qcom_wcnss: Add missing of_node_put() in wcnss_alloc_memory_region clk: actions: Terminate clk_div_table with sentinel element clk: loongson1: Terminate clk_div_table with sentinel element clk: clps711x: Terminate clk_div_table with sentinel element clk: tegra: tegra124-emc: Fix missing put_device() call in emc_ensure_emc_driver NFS: remove unneeded check in decode_devicenotify_args() pinctrl: mediatek: Fix missing of_node_put() in mtk_pctrl_init pinctrl: nomadik: Add missing of_node_put() in nmk_pinctrl_probe pinctrl/rockchip: Add missing of_node_put() in rockchip_pinctrl_probe tty: hvc: fix return value of __setup handler kgdboc: fix return value of __setup handler kgdbts: fix return value of __setup handler jfs: fix divide error in dbNextAG netfilter: nf_conntrack_tcp: preserve liberal flag in tcp options clk: qcom: gcc-msm8994: Fix gpll4 width xen: fix is_xen_pmu() net: phy: broadcom: Fix brcm_fet_config_init() qlcnic: dcb: default to returning -EOPNOTSUPP net/x25: Fix null-ptr-deref caused by x25_disconnect NFSv4/pNFS: Fix another issue with a list iterator pointing to the head lib/test: use after free in register_test_dev_kmod() selinux: use correct type for context length loop: use sysfs_emit() in the sysfs xxx show() Fix incorrect type in assignment of ipv6 port for audit irqchip/qcom-pdc: Fix broken locking irqchip/nvic: Release nvic_base upon failure bfq: fix use-after-free in bfq_dispatch_request ACPICA: Avoid walking the ACPI Namespace if it is not there lib/raid6/test/Makefile: Use $(pound) instead of \# for Make 4.3 Revert "Revert "block, bfq: honor already-setup queue merges"" ACPI/APEI: Limit printable size of BERT table data PM: core: keep irq flags in device_pm_check_callbacks() spi: tegra20: Use of_device_get_match_data() ext4: don't BUG if someone dirty pages without asking ext4 first ntfs: add sanity check on allocation size video: fbdev: nvidiafb: Use strscpy() to prevent buffer overflow video: fbdev: w100fb: Reset global state video: fbdev: cirrusfb: check pixclock to avoid divide by zero video: fbdev: omapfb: acx565akm: replace snprintf with sysfs_emit ARM: dts: qcom: fix gic_irq_domain_translate warnings for msm8960 ARM: dts: bcm2837: Add the missing L1/L2 cache information video: fbdev: omapfb: panel-dsi-cm: Use sysfs_emit() instead of snprintf() video: fbdev: omapfb: panel-tpo-td043mtea1: Use sysfs_emit() instead of snprintf() video: fbdev: udlfb: replace snprintf in show functions with sysfs_emit ASoC: soc-core: skip zero num_dai component in searching dai name media: cx88-mpeg: clear interrupt status register before streaming video ARM: tegra: tamonten: Fix I2C3 pad setting ARM: mmp: Fix failure to remove sram device video: fbdev: sm712fb: Fix crash in smtcfb_write() media: Revert "media: em28xx: add missing em28xx_close_extension" media: hdpvr: initialize dev->worker at hdpvr_register_videodev mmc: host: Return an error when ->enable_sdio_irq() ops is missing powerpc/lib/sstep: Fix 'sthcx' instruction powerpc/lib/sstep: Fix build errors with newer binutils powerpc: Fix build errors with newer binutils scsi: qla2xxx: Fix stuck session in gpdb scsi: qla2xxx: Fix warning for missing error code scsi: qla2xxx: Check for firmware dump already collected scsi: qla2xxx: Suppress a kernel complaint in qla_create_qpair() scsi: qla2xxx: Fix incorrect reporting of task management failure scsi: qla2xxx: Fix hang due to session stuck scsi: qla2xxx: Reduce false trigger to login scsi: qla2xxx: Use correct feature type field during RFF_ID processing KVM: Prevent module exit until all VMs are freed KVM: x86: fix sending PV IPI ubifs: rename_whiteout: Fix double free for whiteout_ui->data ubifs: Fix deadlock in concurrent rename whiteout and inode writeback ubifs: Add missing iput if do_tmpfile() failed in rename whiteout ubifs: setflags: Make dirtied_ino_d 8 bytes aligned ubifs: Fix read out-of-bounds in ubifs_wbuf_write_nolock() ubifs: rename_whiteout: correct old_dir size computing can: mcba_usb: mcba_usb_start_xmit(): fix double dev_kfree_skb in error path can: mcba_usb: properly check endpoint type gfs2: Make sure FITRIM minlen is rounded up to fs block size pinctrl: pinconf-generic: Print arguments for bias-pull-* ubi: Fix race condition between ctrl_cdev_ioctl and ubi_cdev_ioctl ACPI: CPPC: Avoid out of bounds access when parsing _CPC data mm/mmap: return 1 from stack_guard_gap __setup() handler mm/memcontrol: return 1 from cgroup.memory __setup() handler mm/usercopy: return 1 from hardened_usercopy __setup() handler bpf: Fix comment for helper bpf_current_task_under_cgroup() ubi: fastmap: Return error code if memory allocation fails in add_aeb() ASoC: topology: Allow TLV control to be either read or write ARM: dts: spear1340: Update serial node properties ARM: dts: spear13xx: Update SPI dma properties um: Fix uml_mconsole stop/go openvswitch: Fixed nd target mask field in the flow dump. KVM: x86: Forbid VMM to set SYNIC/STIMER MSRs when SynIC wasn't activated ubifs: Rectify space amount budget for mkdir/tmpfile operations rtc: wm8350: Handle error for wm8350_register_irq riscv module: remove (NOLOAD) ARM: 9187/1: JIVE: fix return value of __setup handler KVM: x86/svm: Clear reserved bits written to PerfEvtSeln MSRs drm: Add orientation quirk for GPD Win Max ath5k: fix OOB in ath5k_eeprom_read_pcal_info_5111 drm/amd/amdgpu/amdgpu_cs: fix refcount leak of a dma_fence obj ptp: replace snprintf with sysfs_emit powerpc: dts: t104xrdb: fix phy type for FMAN 4/5 scsi: mvsas: Replace snprintf() with sysfs_emit() scsi: bfa: Replace snprintf() with sysfs_emit() power: supply: axp20x_battery: properly report current when discharging powerpc: Set crashkernel offset to mid of RMA region PCI: aardvark: Fix support for MSI interrupts iommu/arm-smmu-v3: fix event handling soft lockup usb: ehci: add pci device support for Aspeed platforms PCI: pciehp: Add Qualcomm quirk for Command Completed erratum ipv4: Invalidate neighbour for broadcast address upon address addition dm ioctl: prevent potential spectre v1 gadget drm/amdkfd: make CRAT table missing message informational only scsi: pm8001: Fix pm8001_mpi_task_abort_resp() scsi: aha152x: Fix aha152x_setup() __setup handler return value net/smc: correct settings of RMB window update limit macvtap: advertise link netns via netlink bnxt_en: Eliminate unintended link toggle during FW reset MIPS: fix fortify panic when copying asm exception handlers scsi: libfc: Fix use after free in fc_exch_abts_resp() usb: dwc3: omap: fix "unbalanced disables for smps10_out1" on omap5evm xtensa: fix DTC warning unit_address_format Bluetooth: Fix use after free in hci_send_acl init/main.c: return 1 from handled __setup() functions minix: fix bug when opening a file with O_DIRECT w1: w1_therm: fixes w1_seq for ds28ea00 sensors NFSv4: Protect the state recovery thread against direct reclaim xen: delay xen_hvm_init_time_ops() if kdump is boot on vcpu>=32 clk: Enforce that disjoints limits are invalid SUNRPC/call_alloc: async tasks mustn't block waiting for memory NFS: swap IO handling is slightly different for O_DIRECT IO NFS: swap-out must always use STABLE writes. serial: samsung_tty: do not unlock port->lock for uart_write_wakeup() virtio_console: eliminate anonymous module_init & module_exit jfs: prevent NULL deref in diFree parisc: Fix CPU affinity for Lasi, WAX and Dino chips net: add missing SOF_TIMESTAMPING_OPT_ID support mm: fix race between MADV_FREE reclaim and blkdev direct IO read KVM: arm64: Check arm64_get_bp_hardening_data() didn't return NULL drm/amdgpu: fix off by one in amdgpu_gfx_kiq_acquire() Drivers: hv: vmbus: Fix potential crash on module unload scsi: zorro7xx: Fix a resource leak in zorro7xx_remove_one() net: stmmac: Fix unset max_speed difference between DT and non-DT platforms drm/imx: Fix memory leak in imx_pd_connector_get_modes net: openvswitch: don't send internal clone attribute to the userspace. rxrpc: fix a race in rxrpc_exit_net() qede: confirm skb is allocated before using spi: bcm-qspi: fix MSPI only access with bcm_qspi_exec_mem_op() drbd: Fix five use after free bugs in get_initial_state Revert "mmc: sdhci-xenon: fix annoying 1.8V regulator warning" mmc: renesas_sdhi: don't overwrite TAP settings when HS400 tuning is complete mmmremap.c: avoid pointless invalidate_range_start/end on mremap(old_size=0) mm/mempolicy: fix mpol_new leak in shared_policy_replace x86/pm: Save the MSR validity status at context setup x86/speculation: Restore speculation related MSRs during S3 resume btrfs: fix qgroup reserve overflow the qgroup limit arm64: patch_text: Fixup last cpu should be master ata: sata_dwc_460ex: Fix crash due to OOB write perf: qcom_l2_pmu: fix an incorrect NULL check on list iterator irqchip/gic-v3: Fix GICR_CTLR.RWP polling tools build: Filter out options and warnings not supported by clang tools build: Use $(shell ) instead of `` to get embedded libperl's ccopts dmaengine: Revert "dmaengine: shdma: Fix runtime PM imbalance on error" mm: don't skip swap entry even if zap_details specified arm64: module: remove (NOLOAD) from linker script mm/sparsemem: fix 'mem_section' will never be NULL gcc 12 warning cgroup: Use open-time credentials for process migraton perm checks cgroup: Allocate cgroup_file_ctx for kernfs_open_file->priv cgroup: Use open-time cgroup namespace for process migration perm checks selftests: cgroup: Make cg_create() use 0755 for permission instead of 0644 selftests: cgroup: Test open-time credential usage for migration checks selftests: cgroup: Test open-time cgroup namespace usage for migration checks xfrm: policy: match with both mark and mask on user interfaces drm/amdgpu: Check if fd really is an amdgpu fd. drm/amdkfd: Use drm_priv to pass VM from KFD to amdgpu Linux 4.19.238 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> Change-Id: I55a3615d2fbf9bde9ac152456701b36a6c9d20b6 |
||
|
|
74ac12c718 |
cgroup: Use open-time cgroup namespace for process migration perm checks
commit e57457641613fef0d147ede8bd6a3047df588b95 upstream.
cgroup process migration permission checks are performed at write time as
whether a given operation is allowed or not is dependent on the content of
the write - the PID. This currently uses current's cgroup namespace which is
a potential security weakness as it may allow scenarios where a less
privileged process tricks a more privileged one into writing into a fd that
it created.
This patch makes cgroup remember the cgroup namespace at the time of open
and uses it for migration permission checks instad of current's. Note that
this only applies to cgroup2 as cgroup1 doesn't have namespace support.
This also fixes a use-after-free bug on cgroupns reported in
https://lore.kernel.org/r/00000000000048c15c05d0083397@google.com
Note that backporting this fix also requires the preceding patch.
Reported-by: "Eric W. Biederman" <ebiederm@xmission.com>
Suggested-by: Linus Torvalds <torvalds@linuxfoundation.org>
Cc: Michal Koutný <mkoutny@suse.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Michal Koutný <mkoutny@suse.com>
Reported-by: syzbot+50f5cf33a284ce738b62@syzkaller.appspotmail.com
Link: https://lore.kernel.org/r/00000000000048c15c05d0083397@google.com
Fixes:
|
||
|
|
de37e01dd2 |
cgroup: Allocate cgroup_file_ctx for kernfs_open_file->priv
commit 0d2b5955b36250a9428c832664f2079cbf723bec upstream.
of->priv is currently used by each interface file implementation to store
private information. This patch collects the current two private data usages
into struct cgroup_file_ctx which is allocated and freed by the common path.
This allows generic private data which applies to multiple files, which will
be used to in the following patch.
Note that cgroup_procs iterator is now embedded as procs.iter in the new
cgroup_file_ctx so that it doesn't need to be allocated and freed
separately.
v2: union dropped from cgroup_file_ctx and the procs iterator is embedded in
cgroup_file_ctx as suggested by Linus.
v3: Michal pointed out that cgroup1's procs pidlist uses of->priv too.
Converted. Didn't change to embedded allocation as cgroup1 pidlists get
stored for caching.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Reviewed-by: Michal Koutný <mkoutny@suse.com>
[mkoutny: v5.10: modify cgroup.pressure handlers, adjust context]
Signed-off-by: Michal Koutný <mkoutny@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[OP: backport to v4.19: drop changes to cgroup_pressure_*() functions]
Signed-off-by: Ovidiu Panait <ovidiu.panait@windriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
||
|
|
2d4721389f |
UPSTREAM: cgroup: implement __cgroup_task_count() helper
The helper is identical to the existing cgroup_task_count() except it doesn't take the css_set_lock by itself, assuming that the caller does. Also, move cgroup_task_count() implementation into kernel/cgroup/cgroup.c, as there is nothing specific to cgroup v1. Signed-off-by: Roman Gushchin <guro@fb.com> Signed-off-by: Tejun Heo <tj@kernel.org> Cc: kernel-team@fb.com Change-Id: Iaa9085d2375d395a051543d2555389213c2892d6 (cherry picked from commit aade7f9efba098859681f8e88d81a5b44ad09b12) Bug: 154548692 Signed-off-by: Marco Ballesio <balejs@google.com> |
||
|
|
8e0f48a73d |
UPSTREAM: cgroup: saner refcounting for cgroup_root
* make the reference from superblock to cgroup_root counting - do cgroup_put() in cgroup_kill_sb() whether we'd done percpu_ref_kill() or not; matching grab is done when we allocate a new root. That gives the same refcounting rules for all callers of cgroup_do_mount() - a reference to cgroup_root has been grabbed by caller and it either is transferred to new superblock or dropped. * have cgroup_kill_sb() treat an already killed refcount as "just don't bother killing it, then". * after successful cgroup_do_mount() have cgroup1_mount() recheck if we'd raced with mount/umount from somebody else and cgroup_root got killed. In that case we drop the superblock and bugger off with -ERESTARTSYS, same as if we'd found it in the list already dying. * don't bother with delayed initialization of refcount - it's unreliable and not needed. No need to prevent attempts to bump the refcount if we find cgroup_root of another mount in progress - sget will reuse an existing superblock just fine and if the other sb manages to die before we get there, we'll catch that immediately after cgroup_do_mount(). * don't bother with kernfs_pin_sb() - no need for doing that either. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Change-Id: I8e088dfc516b76c42d9d4b34db7f49f0cebc5414 (cherry picked from commit 35ac1184244f1329783e1d897f74926d8bb1103a) Bug: 154548692 Signed-off-by: Marco Ballesio <balejs@google.com> |
||
|
|
e4f8d81c73 |
cgroup/tracing: Move taking of spin lock out of trace event handlers
It is unwise to take spin locks from the handlers of trace events. Mainly, because they can introduce lockups, because it introduces locks in places that are normally not tested. Worse yet, because trace events are tucked away in the include/trace/events/ directory, locks that are taken there are forgotten about. As a general rule, I tell people never to take any locks in a trace event handler. Several cgroup trace event handlers call cgroup_path() which eventually takes the kernfs_rename_lock spinlock. This injects the spinlock in the code without people realizing it. It also can cause issues for the PREEMPT_RT patch, as the spinlock becomes a mutex, and the trace event handlers are called with preemption disabled. By moving the calculation of the cgroup_path() out of the trace event handlers and into a macro (surrounded by a trace_cgroup_##type##_enabled()), then we could place the cgroup_path into a string, and pass that to the trace event. Not only does this remove the taking of the spinlock out of the trace event handler, but it also means that the cgroup_path() only needs to be called once (it is currently called twice, once to get the length to reserver the buffer for, and once again to get the path itself. Now it only needs to be done once. Reported-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Tejun Heo <tj@kernel.org> |
||
|
|
9f25a8da42 |
Merge branch 'for-4.18' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
Pull cgroup updates from Tejun Heo: - For cpustat, cgroup has a percpu hierarchical stat mechanism which propagates up the hierarchy lazily. This contains commits to factor out and generalize the mechanism so that it can be used for other cgroup stats too. The original intention was to update memcg stats to use it but memcg went for a different approach, so still the only user is cpustat. The factoring out and generalization still make sense and it's likely that this can be used for other purposes in the future. - cgroup uses kernfs_notify() (which uses fsnotify()) to inform user space of certain events. A rate limiting mechanism is added. - Other misc changes. * 'for-4.18' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: cgroup: css_set_lock should nest inside tasklist_lock rdmacg: Convert to use match_string() helper cgroup: Make cgroup_rstat_updated() ready for root cgroup usage cgroup: Add memory barriers to plug cgroup_rstat_updated() race window cgroup: Add cgroup_subsys->css_rstat_flush() cgroup: Replace cgroup_rstat_mutex with a spinlock cgroup: Factor out and expose cgroup_rstat_*() interface functions cgroup: Reorganize kernel/cgroup/rstat.c cgroup: Distinguish base resource stat implementation from rstat cgroup: Rename stat to rstat cgroup: Rename kernel/cgroup/stat.c to kernel/cgroup/rstat.c cgroup: Limit event generation frequency cgroup: Explicitly remove core interface files |
||
|
|
3f3942aca6 |
proc: introduce proc_create_single{,_data}
Variants of proc_create{,_data} that directly take a seq_file show
callback and drastically reduces the boilerplate code in the callers.
All trivial callers converted over.
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
||
|
|
6162cef0f7 |
cgroup: Factor out and expose cgroup_rstat_*() interface functions
cgroup_rstat is being generalized so that controllers can use it too. This patch factors out and exposes the following interface functions. * cgroup_rstat_updated(): Renamed from cgroup_rstat_cpu_updated() for consistency. * cgroup_rstat_flush_hold/release(): Factored out from base stat implementation. * cgroup_rstat_flush(): Verbatim expose. While at it, drop assert on cgroup_rstat_mutex in cgroup_base_stat_flush() as it crosses layers and make a minor comment update. v2: Added EXPORT_SYMBOL_GPL(cgroup_rstat_updated) to fix a build bug. Signed-off-by: Tejun Heo <tj@kernel.org> |
||
|
|
a17556f8d9 |
cgroup: Reorganize kernel/cgroup/rstat.c
Currently, rstat.c has rstat and base stat implementations intermixed. Collect base stat implementation at the end of the file. Also, reorder the prototypes. This patch doesn't make any functional changes. Signed-off-by: Tejun Heo <tj@kernel.org> |
||
|
|
d4ff749b5e |
cgroup: Distinguish base resource stat implementation from rstat
Base resource stat accounts universial (not specific to any controller) resource consumptions on top of rstat. Currently, its implementation is intermixed with rstat implementation making the code confusing to follow. This patch clarifies the distintion by doing the followings. * Encapsulate base resource stat counters, currently only cputime, in struct cgroup_base_stat. * Move prev_cputime into struct cgroup and initialize it with cgroup. * Rename the related functions so that they start with cgroup_base_stat. * Prefix the related variables and field names with b. This patch doesn't make any functional changes. Signed-off-by: Tejun Heo <tj@kernel.org> |
||
|
|
c58632b363 |
cgroup: Rename stat to rstat
stat is too generic a name and ends up causing subtle confusions. It'll be made generic so that controllers can plug into it, which will make the problem worse. Let's rename it to something more specific - cgroup_rstat for cgroup recursive stat. This patch does the following renames. No other changes. * cpu_stat -> rstat_cpu * stat -> rstat * ?cstat -> ?rstatc Note that the renames are selective. The unrenamed are the ones which implement basic resource statistics on top of rstat. This will be further cleaned up in the following patches. Signed-off-by: Tejun Heo <tj@kernel.org> |
||
|
|
22714a2ba4 |
Merge branch 'for-4.15' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
Pull cgroup updates from Tejun Heo:
"Cgroup2 cpu controller support is finally merged.
- Basic cpu statistics support to allow monitoring by default without
the CPU controller enabled.
- cgroup2 cpu controller support.
- /sys/kernel/cgroup files to help dealing with new / optional
features"
* 'for-4.15' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
cgroup: export list of cgroups v2 features using sysfs
cgroup: export list of delegatable control files using sysfs
cgroup: mark @cgrp __maybe_unused in cpu_stat_show()
MAINTAINERS: relocate cpuset.c
cgroup, sched: Move basic cpu stats from cgroup.stat to cpu.stat
sched: Implement interface for cgroup unified hierarchy
sched: Misc preps for cgroup unified hierarchy interface
sched/cputime: Add dummy cputime_adjust() implementation for CONFIG_VIRT_CPU_ACCOUNTING_NATIVE
cgroup: statically initialize init_css_set->dfl_cgrp
cgroup: Implement cgroup2 basic CPU usage accounting
cpuacct: Introduce cgroup_account_cputime[_field]()
sched/cputime: Expose cputime_adjust()
|
||
|
|
b24413180f |
License cleanup: add SPDX GPL-2.0 license identifier to files with no license
Many source files in the tree are missing licensing information, which makes it harder for compliance tools to determine the correct license. By default all files without license information are under the default license of the kernel, which is GPL version 2. Update the files which contain no license information with the 'GPL-2.0' SPDX license identifier. The SPDX identifier is a legally binding shorthand, which can be used instead of the full boiler plate text. This patch is based on work done by Thomas Gleixner and Kate Stewart and Philippe Ombredanne. How this work was done: Patches were generated and checked against linux-4.14-rc6 for a subset of the use cases: - file had no licensing information it it. - file was a */uapi/* one with no licensing information in it, - file was a */uapi/* one with existing licensing information, Further patches will be generated in subsequent months to fix up cases where non-standard license headers were used, and references to license had to be inferred by heuristics based on keywords. The analysis to determine which SPDX License Identifier to be applied to a file was done in a spreadsheet of side by side results from of the output of two independent scanners (ScanCode & Windriver) producing SPDX tag:value files created by Philippe Ombredanne. Philippe prepared the base worksheet, and did an initial spot review of a few 1000 files. The 4.13 kernel was the starting point of the analysis with 60,537 files assessed. Kate Stewart did a file by file comparison of the scanner results in the spreadsheet to determine which SPDX license identifier(s) to be applied to the file. She confirmed any determination that was not immediately clear with lawyers working with the Linux Foundation. Criteria used to select files for SPDX license identifier tagging was: - Files considered eligible had to be source code files. - Make and config files were included as candidates if they contained >5 lines of source - File already had some variant of a license header in it (even if <5 lines). All documentation files were explicitly excluded. The following heuristics were used to determine which SPDX license identifiers to apply. - when both scanners couldn't find any license traces, file was considered to have no license information in it, and the top level COPYING file license applied. For non */uapi/* files that summary was: SPDX license identifier # files ---------------------------------------------------|------- GPL-2.0 11139 and resulted in the first patch in this series. If that file was a */uapi/* path one, it was "GPL-2.0 WITH Linux-syscall-note" otherwise it was "GPL-2.0". Results of that was: SPDX license identifier # files ---------------------------------------------------|------- GPL-2.0 WITH Linux-syscall-note 930 and resulted in the second patch in this series. - if a file had some form of licensing information in it, and was one of the */uapi/* ones, it was denoted with the Linux-syscall-note if any GPL family license was found in the file or had no licensing in it (per prior point). Results summary: SPDX license identifier # files ---------------------------------------------------|------ GPL-2.0 WITH Linux-syscall-note 270 GPL-2.0+ WITH Linux-syscall-note 169 ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) 21 ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) 17 LGPL-2.1+ WITH Linux-syscall-note 15 GPL-1.0+ WITH Linux-syscall-note 14 ((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause) 5 LGPL-2.0+ WITH Linux-syscall-note 4 LGPL-2.1 WITH Linux-syscall-note 3 ((GPL-2.0 WITH Linux-syscall-note) OR MIT) 3 ((GPL-2.0 WITH Linux-syscall-note) AND MIT) 1 and that resulted in the third patch in this series. - when the two scanners agreed on the detected license(s), that became the concluded license(s). - when there was disagreement between the two scanners (one detected a license but the other didn't, or they both detected different licenses) a manual inspection of the file occurred. - In most cases a manual inspection of the information in the file resulted in a clear resolution of the license that should apply (and which scanner probably needed to revisit its heuristics). - When it was not immediately clear, the license identifier was confirmed with lawyers working with the Linux Foundation. - If there was any question as to the appropriate license identifier, the file was flagged for further research and to be revisited later in time. In total, over 70 hours of logged manual review was done on the spreadsheet to determine the SPDX license identifiers to apply to the source files by Kate, Philippe, Thomas and, in some cases, confirmation by lawyers working with the Linux Foundation. Kate also obtained a third independent scan of the 4.13 code base from FOSSology, and compared selected files where the other two scanners disagreed against that SPDX file, to see if there was new insights. The Windriver scanner is based on an older version of FOSSology in part, so they are related. Thomas did random spot checks in about 500 files from the spreadsheets for the uapi headers and agreed with SPDX license identifier in the files he inspected. For the non-uapi files Thomas did random spot checks in about 15000 files. In initial set of patches against 4.14-rc6, 3 files were found to have copy/paste license identifier errors, and have been fixed to reflect the correct identifier. Additionally Philippe spent 10 hours this week doing a detailed manual inspection and review of the 12,461 patched files from the initial patch version early this week with: - a full scancode scan run, collecting the matched texts, detected license ids and scores - reviewing anything where there was a license detected (about 500+ files) to ensure that the applied SPDX license was correct - reviewing anything where there was no detection but the patch license was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied SPDX license was correct This produced a worksheet with 20 files needing minor correction. This worksheet was then exported into 3 different .csv files for the different types of files to be modified. These .csv files were then reviewed by Greg. Thomas wrote a script to parse the csv files and add the proper SPDX tag to the file, in the format that the file expected. This script was further refined by Greg based on the output to detect more types of files automatically and to distinguish between header and source .c files (which need different comment types.) Finally Greg ran the script using the .csv files to generate the patches. Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org> Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
||
|
|
d41bf8c9de |
cgroup, sched: Move basic cpu stats from cgroup.stat to cpu.stat
The basic cpu stat is currently shown with "cpu." prefix in cgroup.stat, and the same information is duplicated in cpu.stat when cpu controller is enabled. This is ugly and not very scalable as we want to expand the coverage of stat information which is always available. This patch makes cgroup core always create "cpu.stat" file and show the basic cpu stat there and calls the cpu controller to show the extra stats when enabled. This ensures that the same information isn't presented in multiple places and makes future expansion of basic stats easier. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> |
||
|
|
041cd640b2 |
cgroup: Implement cgroup2 basic CPU usage accounting
In cgroup1, while cpuacct isn't actually controlling any resources, it
is a separate controller due to combination of two factors -
1. enabling cpu controller has significant side effects, and 2. we
have to pick one of the hierarchies to account CPU usages on. cpuacct
controller is effectively used to designate a hierarchy to track CPU
usages on.
cgroup2's unified hierarchy removes the second reason and we can
account basic CPU usages by default. While we can use cpuacct for
this purpose, both its interface and implementation leave a lot to be
desired - it collects and exposes two sources of truth which don't
agree with each other and some of the exposed statistics don't make
much sense. Also, it propagates all the way up the hierarchy on each
accounting event which is unnecessary.
This patch adds basic resource accounting mechanism to cgroup2's
unified hierarchy and accounts CPU usages using it.
* All accountings are done per-cpu and don't propagate immediately.
It just bumps the per-cgroup per-cpu counters and links to the
parent's updated list if not already on it.
* On a read, the per-cpu counters are collected into the global ones
and then propagated upwards. Only the per-cpu counters which have
changed since the last read are propagated.
* CPU usage stats are collected and shown in "cgroup.stat" with "cpu."
prefix. Total usage is collected from scheduling events. User/sys
breakdown is sourced from tick sampling and adjusted to the usage
using cputime_adjust().
This keeps the accounting side hot path O(1) and per-cpu and the read
side O(nr_updated_since_last_read).
v2: Minor changes and documentation updates as suggested by Waiman and
Roman.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Li Zefan <lizefan@huawei.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Waiman Long <longman@redhat.com>
Cc: Roman Gushchin <guro@fb.com>
|
||
|
|
608c1d3c17 |
Merge branch 'for-4.14' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
Pull cgroup updates from Tejun Heo:
"Several notable changes this cycle:
- Thread mode was merged. This will be used for cgroup2 support for
CPU and possibly other controllers. Unfortunately, CPU controller
cgroup2 support didn't make this pull request but most contentions
have been resolved and the support is likely to be merged before
the next merge window.
- cgroup.stat now shows the number of descendant cgroups.
- cpuset now can enable the easier-to-configure v2 behavior on v1
hierarchy"
* 'for-4.14' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: (21 commits)
cpuset: Allow v2 behavior in v1 cgroup
cgroup: Add mount flag to enable cpuset to use v2 behavior in v1 cgroup
cgroup: remove unneeded checks
cgroup: misc changes
cgroup: short-circuit cset_cgroup_from_root() on the default hierarchy
cgroup: re-use the parent pointer in cgroup_destroy_locked()
cgroup: add cgroup.stat interface with basic hierarchy stats
cgroup: implement hierarchy limits
cgroup: keep track of number of descent cgroups
cgroup: add comment to cgroup_enable_threaded()
cgroup: remove unnecessary empty check when enabling threaded mode
cgroup: update debug controller to print out thread mode information
cgroup: implement cgroup v2 thread support
cgroup: implement CSS_TASK_ITER_THREADED
cgroup: introduce cgroup->dom_cgrp and threaded css_set handling
cgroup: add @flags to css_task_iter_start() and implement CSS_TASK_ITER_PROCS
cgroup: reorganize cgroup.procs / task write path
cgroup: replace css_set walking populated test with testing cgrp->nr_populated_csets
cgroup: distinguish local and children populated states
cgroup: remove now unused list_head @pending in cgroup_apply_cftypes()
...
|
||
|
|
7a0cf0e74a |
cgroup: update debug controller to print out thread mode information
Update debug controller so that it prints out debug info about thread
mode.
1) The relationship between proc_cset and threaded_csets are displayed.
2) The status of being a thread root or threaded cgroup is displayed.
This patch is extracted from Waiman's larger patch.
v2: - Removed [thread root] / [threaded] from debug.cgroup_css_links
file as the same information is available from cgroup.type.
Suggested by Waiman.
- Threaded marking is moved to the previous patch.
Patch-originally-by: Waiman Long <longman@redhat.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
|
||
|
|
8cfd8147df |
cgroup: implement cgroup v2 thread support
This patch implements cgroup v2 thread support. The goal of the
thread mode is supporting hierarchical accounting and control at
thread granularity while staying inside the resource domain model
which allows coordination across different resource controllers and
handling of anonymous resource consumptions.
A cgroup is always created as a domain and can be made threaded by
writing to the "cgroup.type" file. When a cgroup becomes threaded, it
becomes a member of a threaded subtree which is anchored at the
closest ancestor which isn't threaded.
The threads of the processes which are in a threaded subtree can be
placed anywhere without being restricted by process granularity or
no-internal-process constraint. Note that the threads aren't allowed
to escape to a different threaded subtree. To be used inside a
threaded subtree, a controller should explicitly support threaded mode
and be able to handle internal competition in the way which is
appropriate for the resource.
The root of a threaded subtree, the nearest ancestor which isn't
threaded, is called the threaded domain and serves as the resource
domain for the whole subtree. This is the last cgroup where domain
controllers are operational and where all the domain-level resource
consumptions in the subtree are accounted. This allows threaded
controllers to operate at thread granularity when requested while
staying inside the scope of system-level resource distribution.
As the root cgroup is exempt from the no-internal-process constraint,
it can serve as both a threaded domain and a parent to normal cgroups,
so, unlike non-root cgroups, the root cgroup can have both domain and
threaded children.
Internally, in a threaded subtree, each css_set has its ->dom_cset
pointing to a matching css_set which belongs to the threaded domain.
This ensures that thread root level cgroup_subsys_state for all
threaded controllers are readily accessible for domain-level
operations.
This patch enables threaded mode for the pids and perf_events
controllers. Neither has to worry about domain-level resource
consumptions and it's enough to simply set the flag.
For more details on the interface and behavior of the thread mode,
please refer to the section 2-2-2 in Documentation/cgroup-v2.txt added
by this patch.
v5: - Dropped silly no-op ->dom_cgrp init from cgroup_create().
Spotted by Waiman.
- Documentation updated as suggested by Waiman.
- cgroup.type content slightly reformatted.
- Mark the debug controller threaded.
v4: - Updated to the general idea of marking specific cgroups
domain/threaded as suggested by PeterZ.
v3: - Dropped "join" and always make mixed children join the parent's
threaded subtree.
v2: - After discussions with Waiman, support for mixed thread mode is
added. This should address the issue that Peter pointed out
where any nesting should be avoided for thread subtrees while
coexisting with other domain cgroups.
- Enabling / disabling thread mode now piggy backs on the existing
control mask update mechanism.
- Bug fixes and cleanup.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Waiman Long <longman@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
|
||
|
|
715c809d9a |
cgroup: reorganize cgroup.procs / task write path
Currently, writes "cgroup.procs" and "cgroup.tasks" files are all
handled by __cgroup_procs_write() on both v1 and v2. This patch
reoragnizes the write path so that there are common helper functions
that different write paths use.
While this somewhat increases LOC, the different paths are no longer
intertwined and each path has more flexibility to implement different
behaviors which will be necessary for the planned v2 thread support.
v3: - Restructured so that cgroup_procs_write_permission() takes
@src_cgrp and @dst_cgrp.
v2: - Rolled in Waiman's task reference count fix.
- Updated on top of nsdelegate changes.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Waiman Long <longman@redhat.com>
|
||
|
|
610467270f |
cgroup: don't call migration methods if there are no tasks to migrate
Subsystem migration methods shouldn't be called for empty migrations. cgroup_migrate_execute() implements this guarantee by bailing early if there are no source css_sets. This used to be correct before |
||
|
|
a28f8f5e99 |
cgroup: Move debug cgroup to its own file
The debug cgroup currently resides within cgroup-v1.c and is enabled only for v1 cgroup. To enable the debug cgroup also for v2, it makes sense to put the code into its own file as it will no longer be v1 specific. There is no change to the debug cgroup specific code. Signed-off-by: Waiman Long <longman@redhat.com> Signed-off-by: Tejun Heo <tj@kernel.org> |
||
|
|
9732adc5d6 |
cgroup: avoid attaching a cgroup root to two different superblocks, take 2
Commit
|
||
|
|
4b9502e63b |
kernel: convert css_set.refcount from atomic_t to refcount_t
refcount_t type and corresponding API should be used instead of atomic_t when the variable is used as a reference counter. This allows to avoid accidental refcounter overflows that might lead to use-after-free situations. Signed-off-by: Elena Reshetova <elena.reshetova@intel.com> Signed-off-by: Hans Liljestrand <ishkamiel@gmail.com> Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: David Windsor <dwindsor@gmail.com> Signed-off-by: Tejun Heo <tj@kernel.org> |
||
|
|
bfc2cf6f61 |
cgroup: call subsys->*attach() only for subsystems which are actually affected by migration
Currently, subsys->*attach() callbacks are called for all subsystems which are attached to the hierarchy on which the migration is taking place. With cgroup_migrate_prepare_dst() filtering out identity migrations, v1 hierarchies can avoid spurious ->*attach() callback invocations where the source and destination csses are identical; however, this isn't enough on v2 as only a subset of the attached controllers can be affected on controller enable/disable. While spurious ->*attach() invocations aren't critically broken, they're unnecessary overhead and can lead to temporary overcharges on certain controllers. Fix it by tracking which subsystems are affected by a migration and invoking ->*attach() callbacks only on those subsystems. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Zefan Li <lizefan@huawei.com> |
||
|
|
e595cd7069 |
cgroup: track migration context in cgroup_mgctx
cgroup migration is performed in four steps - css_set preloading, addition of target tasks, actual migration, and clean up. A list named preloaded_csets is used to track the preloading. This is a bit too restricted and the code is already depending on the subtlety that all source css_sets appear before destination ones. Let's create struct cgroup_mgctx which keeps track of everything during migration. Currently, it has separate preload lists for source and destination csets and also embeds cgroup_taskset which is used during the actual migration. This moves struct cgroup_taskset definition to cgroup-internal.h. This patch doesn't cause any functional changes. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Zefan Li <lizefan@huawei.com> |
||
|
|
dcfe149b9f |
cgroup: move namespace code to kernel/cgroup/namespace.c
get/put_css_set() get exposed in cgroup-internal.h in the process. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Acked-by: Zefan Li <lizefan@huawei.com> |
||
|
|
d62beb7f3d |
cgroup: rename functions for consistency
Now that v1 functions are separated out, rename some functions for consistency. cgroup_dfl_base_files -> cgroup_base_files cgroup_legacy_base_files -> cgroup1_base_files cgroup_ssid_no_v1() -> cgroup1_ssid_disabled() cgroup_pidlist_destroy_all -> cgroup1_pidlist_destroy_all() cgroup_release_agent() -> cgroup1_release_agent() check_for_release() -> cgroup1_check_for_release() Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Acked-by: Zefan Li <lizefan@huawei.com> |
||
|
|
1592c9b223 |
cgroup: move v1 mount functions to kernel/cgroup/cgroup-v1.c
Now that the v1 mount code is split into separate functions, move them to kernel/cgroup/cgroup-v1.c along with the mount option handling code. As this puts all v1-only kernfs_syscall_ops in cgroup-v1.c, move cgroup1_kf_syscall_ops to cgroup-v1.c too. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Acked-by: Zefan Li <lizefan@huawei.com> |
||
|
|
fa069904dd |
cgroup: separate out cgroup1_kf_syscall_ops
Currently, cgroup_kf_syscall_ops is shared by v1 and v2 and the specific methods test the version and take different actions. Split out v1 functions and put them in cgroup1_kf_syscall_ops and remove the now unnecessary explicit branches in specific methods. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Acked-by: Zefan Li <lizefan@huawei.com> |
||
|
|
0a268dbd79 |
cgroup: move cgroup v1 specific code to kernel/cgroup/cgroup-v1.c
cgroup.c is getting too unwieldy. Let's move out cgroup v1 specific
code along with the debug controller into kernel/cgroup/cgroup-v1.c.
v2: cgroup_mutex and css_set_lock made available in cgroup-internal.h
regardless of CONFIG_PROVE_RCU.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Acked-by: Zefan Li <lizefan@huawei.com>
|