This adds a sysfs entry to call checkpoint during fsync() in order to avoid
long elapsed time to run roll-forward recovery when booting the device.
Default value doesn't enforce the limitation which is same as before.
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Once F2FS_IPU_FORCE policy is enabled in some cases:
a) f2fs forces to use F2FS_IPU_FORCE in a small-sized volume
b) user sets F2FS_IPU_FORCE policy via sysfs
Then we may fail to defragment file due to IPU policy check, it doesn't
make sense, let's introduce a new IPU policy to allow OPU during file
defragmentation.
In small-sized volume, let's enable F2FS_IPU_HONOR_OPU_WRITE policy
by default.
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
f2fs rw_semaphores work better if writers can starve readers,
especially for the checkpoint thread, because writers are strictly
more important than reader threads. This prevents significant priority
inversion between low-priority readers that blocked while trying to
acquire the read lock and a second acquisition of the write lock that
might be blocking high priority work.
Signed-off-by: Tim Murray <timmurray@google.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
https://bugzilla.kernel.org/show_bug.cgi?id=204137
With below script, we will hit panic during new segment allocation:
DISK=bingo.img
MOUNT_DIR=/mnt/f2fs
dd if=/dev/zero of=$DISK bs=1M count=105
mkfs.f2fe -a 1 -o 19 -t 1 -z 1 -f -q $DISK
mount -t f2fs $DISK $MOUNT_DIR -o "noinline_dentry,flush_merge,noextent_cache,mode=lfs,io_bits=7,fsync_mode=strict"
for (( i = 0; i < 4096; i++ )); do
name=`head /dev/urandom | tr -dc A-Za-z0-9 | head -c 10`
mkdir $MOUNT_DIR/$name
done
umount $MOUNT_DIR
rm $DISK
--- Core dump ---
Call Trace:
allocate_segment_by_default+0x9d/0x100 [f2fs]
f2fs_allocate_data_block+0x3c0/0x5c0 [f2fs]
do_write_page+0x62/0x110 [f2fs]
f2fs_outplace_write_data+0x43/0xc0 [f2fs]
f2fs_do_write_data_page+0x386/0x560 [f2fs]
__write_data_page+0x706/0x850 [f2fs]
f2fs_write_cache_pages+0x267/0x6a0 [f2fs]
f2fs_write_data_pages+0x19c/0x2e0 [f2fs]
do_writepages+0x1c/0x70
__filemap_fdatawrite_range+0xaa/0xe0
filemap_fdatawrite+0x1f/0x30
f2fs_sync_dirty_inodes+0x74/0x1f0 [f2fs]
block_operations+0xdc/0x350 [f2fs]
f2fs_write_checkpoint+0x104/0x1150 [f2fs]
f2fs_sync_fs+0xa2/0x120 [f2fs]
f2fs_balance_fs_bg+0x33c/0x390 [f2fs]
f2fs_write_node_pages+0x4c/0x1f0 [f2fs]
do_writepages+0x1c/0x70
__writeback_single_inode+0x45/0x320
writeback_sb_inodes+0x273/0x5c0
wb_writeback+0xff/0x2e0
wb_workfn+0xa1/0x370
process_one_work+0x138/0x350
worker_thread+0x4d/0x3d0
kthread+0x109/0x140
ret_from_fork+0x25/0x30
The root cause here is, with IO alignment feature enables, in worst
case, we need F2FS_IO_SIZE() free blocks space for single one 4k write
due to IO alignment feature will fill dummy pages to make IO being
aligned.
So we will easily run out of free segments during non-inline directory's
data writeback, even in process of foreground GC.
In order to fix this issue, I just propose to reserve additional free
space for IO alignment feature to handle worst case of free space usage
ratio during FGGC.
Fixes: 0a595ebaaa ("f2fs: support IO alignment for DATA and NODE writes")
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
f2fs: support fault injection for f2fs_trylock_op()
This patch supports to inject fault into f2fs_trylock_op().
Usage:
a) echo 65536 > /sys/fs/f2fs/<dev>/inject_type or
b) mount -o fault_type=65536 <dev> <mountpoint>
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Added a new sysfs node called gc_urgent_high_remaining. The user can
set the trial count limit for GC urgent high mode with this value. If
GC thread gets to the limit, the mode will turn back to GC normal mode.
By default, the value is zero, which means there is no limit like before.
Signed-off-by: Daeho Jeong <daehojeong@google.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
if2fs_fill_super
-> f2fs_build_segment_manager
-> create_discard_cmd_control
-> f2fs_start_discard_thread
It invokes kthread_run to create a thread and run issue_discard_thread.
However, if f2fs_build_node_manager fails, the control flow goes to
free_nm and calls f2fs_destroy_node_manager. This function will free
sbi->nm_info. However, if issue_discard_thread accesses sbi->nm_info
after the deallocation, but before the f2fs_stop_discard_thread, it will
cause UAF(Use-after-free).
-> f2fs_destroy_segment_manager
-> destroy_discard_cmd_control
-> f2fs_stop_discard_thread
Fix this by stopping discard thread before f2fs_destroy_node_manager.
Note that, the commit d6d2b491a82e1 introduces the call of
f2fs_available_free_memory into issue_discard_thread.
Cc: stable@vger.kernel.org
Fixes: d6d2b491a82e ("f2fs: allow to change discard policy based on cached discard cmds")
Signed-off-by: Dongliang Mu <mudongliangabcd@gmail.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
This patch adds a new function f2fs_dquot_initialize() to wrap
dquot_initialize(), and it supports to inject fault into
f2fs_dquot_initialize() to simulate inner failure occurs in
dquot_initialize().
Usage:
a) echo 65536 > /sys/fs/f2fs/<dev>/inject_type or
b) mount -o fault_type=65536 <dev> <mountpoint>
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
As Pavel Machek reported in [1]
This code looks quite confused: part of function returns 1 on
corruption, part returns -errno. The problem is not stable-specific.
[1] https://lkml.org/lkml/2021/9/19/207
Let's fix to make 'insane cp_payload case' to return 1 rater than
EFSCORRUPTED, so that return value can be kept consistent for all
error cases, it can avoid confusion of code logic.
Fixes: 65ddf6564843 ("f2fs: fix to do sanity check for sb/cp fields correctly")
Reported-by: Pavel Machek <pavel@denx.de>
Reviewed-by: Pavel Machek <pavel@denx.de>
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Commit 3c62be17d4 ("f2fs: support multiple devices") missed
to support direct IO for multiple device feature, this patch
adds to support the missing part of multidevice feature.
In addition, for multiple device image, we should be aware of
any issued direct write IO rather than just buffered write IO,
so that fsync and syncfs can issue a preflush command to the
device where direct write IO goes, to persist user data for
posix compliant.
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Added two options into "mode=" mount option to make it possible for
developers to simulate filesystem fragmentation/after-GC situation
itself. The developers use these modes to understand filesystem
fragmentation/after-GC condition well, and eventually get some
insights to handle them better.
"fragment:segment": f2fs allocates a new segment in ramdom position.
With this, we can simulate the after-GC condition.
"fragment:block" : We can scatter block allocation with
"max_fragment_chunk" and "max_fragment_hole" sysfs
nodes. f2fs will allocate 1..<max_fragment_chunk>
blocks in a chunk and make a hole in the length of
1..<max_fragment_hole> by turns in a newly allocated
free segment. Plus, this mode implicitly enables
"fragment:segment" option for more randomness.
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Daeho Jeong <daehojeong@google.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Since active_logs can be set to 2 or 4 or NR_CURSEG_PERSIST_TYPE(6),
it cannot be set to NR_CURSEG_TYPE(8).
That is, whint_mode is always off.
Therefore, the condition is changed from NR_CURSEG_TYPE to NR_CURSEG_PERSIST_TYPE.
Cc: Chao Yu <chao@kernel.org>
Fixes: d0b9e42ab615 (f2fs: introduce inmem curseg)
Reported-by: tanghuan <tanghuan@vivo.com>
Signed-off-by: Keoseong Park <keosung.park@samsung.com>
Signed-off-by: Fengnan Chang <changfengnan@vivo.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Let's only enable realtime discard if and only if device supports
discard functionality.
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
We must flush all the dirty data when enabling checkpoint back. Let's guarantee
that first by adding a retry logic on sync_inodes_sb(). In addition to that,
this patch adds to flush data in fsync when checkpoint is disabled, which can
mitigate the sync_inodes_sb() failures in advance.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Don't create discard thread when device doesn't support realtime discard
or user specifies nodiscard mount option.
Signed-off-by: Fengnan Chang <changfengnan@vivo.com>
Signed-off-by: Yangtao Li <frank.li@vivo.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Use rwsem to ensure serialization of the callers and to avoid
starvation of high priority tasks, when the system is under
heavy IO workload.
Change-Id: Ifac519c3de127f79d8613ee742a68f7fc0377e36
Signed-off-by: Sahitya Tummala <stummala@codeaurora.org>
Whenever we notice some sluggish issues on our machines, we are always
curious about how well all types of I/O in the f2fs filesystem are
handled. But, it's hard to get this kind of real data. First of all,
we need to reproduce the issue while turning on the profiling tool like
blktrace, but the issue doesn't happen again easily. Second, with the
intervention of any tools, the overall timing of the issue will be
slightly changed and it sometimes makes us hard to figure it out.
So, I added the feature printing out IO latency statistics tracepoint
events, which are minimal things to understand filesystem's I/O related
behaviors, into F2FS_IOSTAT kernel config. With "iostat_enable" sysfs
node on, we can get this statistics info in a periodic way and it
would cause the least overhead.
[samples]
f2fs_ckpt-254:1-507 [003] .... 2842.439683: f2fs_iostat_latency:
dev = (254,11), iotype [peak lat.(ms)/avg lat.(ms)/count],
rd_data [136/1/801], rd_node [136/1/1704], rd_meta [4/2/4],
wr_sync_data [164/16/3331], wr_sync_node [152/3/648],
wr_sync_meta [160/2/4243], wr_async_data [24/13/15],
wr_async_node [0/0/0], wr_async_meta [0/0/0]
f2fs_ckpt-254:1-507 [002] .... 2845.450514: f2fs_iostat_latency:
dev = (254,11), iotype [peak lat.(ms)/avg lat.(ms)/count],
rd_data [60/3/456], rd_node [60/3/1258], rd_meta [0/0/1],
wr_sync_data [120/12/2285], wr_sync_node [88/5/428],
wr_sync_meta [52/6/2990], wr_async_data [4/1/3],
wr_async_node [0/0/0], wr_async_meta [0/0/0]
Signed-off-by: Daeho Jeong <daehojeong@google.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Added F2FS_IOSTAT config option to support getting IO statistics through
sysfs and printing out periodic IO statistics tracepoint events and
moved I/O statistics related codes into separate files for better
maintenance.
Signed-off-by: Daeho Jeong <daehojeong@google.com>
Reviewed-by: Chao Yu <chao@kernel.org>
[Jaegeuk Kim: set default=y]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
This patch supports to inject fault into f2fs_kmem_cache_alloc().
Usage:
a) echo 32768 > /sys/fs/f2fs/<dev>/inject_type or
b) mount -o fault_type=32768 <dev> <mountpoint>
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
This patch fixes below problems of sb/cp sanity check:
- in sanity_check_raw_superi(), it missed to consider log header
blocks while cp_payload check.
- in f2fs_sanity_check_ckpt(), it missed to check nat_bits_blocks.
Cc: <stable@kernel.org>
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
As James Z reported in bugzilla:
https://bugzilla.kernel.org/show_bug.cgi?id=213877
[1.] One-line summary of the problem:
Mount multiple SMR block devices exceed certain number cause system non-response
[2.] Full description of the problem/report:
Created some F2FS on SMR devices (mkfs.f2fs -m), then mounted in sequence. Each device is the same Model: HGST HSH721414AL (Size 14TB).
Empirically, found that when the amount of SMR device * 1.5Gb > System RAM, the system ran out of memory and hung. No dmesg output. For example, 24 SMR Disk need 24*1.5GB = 36GB. A system with 32G RAM can only mount 21 devices, the 22nd device will be a reproducible cause of system hang.
The number of SMR devices with other FS mounted on this system does not interfere with the result above.
[3.] Keywords (i.e., modules, networking, kernel):
F2FS, SMR, Memory
[4.] Kernel information
[4.1.] Kernel version (uname -a):
Linux 5.13.4-200.fc34.x86_64 #1 SMP Tue Jul 20 20:27:29 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
[4.2.] Kernel .config file:
Default Fedora 34 with f2fs-tools-1.14.0-2.fc34.x86_64
[5.] Most recent kernel version which did not have the bug:
None
[6.] Output of Oops.. message (if applicable) with symbolic information
resolved (see Documentation/admin-guide/oops-tracing.rst)
None
[7.] A small shell script or example program which triggers the
problem (if possible)
mount /dev/sdX /mnt/0X
[8.] Memory consumption
With 24 * 14T SMR Block device with F2FS
free -g
total used free shared buff/cache available
Mem: 46 36 0 0 10 10
Swap: 0 0 0
With 3 * 14T SMR Block device with F2FS
free -g
total used free shared buff/cache available
Mem: 7 5 0 0 1 1
Swap: 7 0 7
The root cause is, there are three bitmaps:
- cur_valid_map
- ckpt_valid_map
- discard_map
and each of them will cost ~500MB memory, {cur, ckpt}_valid_map are
necessary, but discard_map is optional, since this bitmap will only be
useful in mountpoint that small discard is enabled.
For a blkzoned device such as SMR or ZNS devices, f2fs will only issue
discard for a section(zone) when all blocks of that section are invalid,
so, for such device, we don't need small discard functionality at all.
This patch introduces a new mountoption "discard_unit=block|segment|
section" to support issuing discard with different basic unit which is
aligned to block, segment or section, so that user can specify
"discard_unit=segment" or "discard_unit=section" to disable small
discard functionality.
Note that this mount option can not be changed by remount() due to
related metadata need to be initialized during mount().
In order to save memory, let's use "discard_unit=section" for blkzoned
device by default.
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
In f2fs_remount(), return value of test_opt() is an unsigned int type
variable, however when we compare it to a bool type variable, it cause
wrong result, fix it.
Fixes: 4354994f097d ("f2fs: checkpoint disabling")
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Changes in 4.19.198
scsi: core: Retry I/O for Notify (Enable Spinup) Required error
ALSA: usb-audio: fix rate on Ozone Z90 USB headset
ALSA: usb-audio: Fix OOB access at proc output
media: dvb-usb: fix wrong definition
Input: usbtouchscreen - fix control-request directions
net: can: ems_usb: fix use-after-free in ems_usb_disconnect()
usb: gadget: eem: fix echo command packet response issue
USB: cdc-acm: blacklist Heimann USB Appset device
usb: dwc3: Fix debugfs creation flow
usb: typec: Add the missed altmode_id_remove() in typec_register_altmode()
xhci: solve a double free problem while doing s4
ntfs: fix validity check for file name attribute
iov_iter_fault_in_readable() should do nothing in xarray case
Input: joydev - prevent use of not validated data in JSIOCSBTNMAP ioctl
arm_pmu: Fix write counter incorrect in ARMv7 big-endian mode
ARM: dts: at91: sama5d4: fix pinctrl muxing
btrfs: send: fix invalid path for unlink operations after parent orphanization
btrfs: clear defrag status of a root if starting transaction fails
ext4: cleanup in-core orphan list if ext4_truncate() failed to get a transaction handle
ext4: fix kernel infoleak via ext4_extent_header
ext4: return error code when ext4_fill_flex_info() fails
ext4: correct the cache_nr in tracepoint ext4_es_shrink_exit
ext4: remove check for zero nr_to_scan in ext4_es_scan()
ext4: fix avefreec in find_group_orlov
ext4: use ext4_grp_locked_error in mb_find_extent
can: bcm: delay release of struct bcm_op after synchronize_rcu()
can: gw: synchronize rcu operations before removing gw job entry
can: peak_pciefd: pucan_handle_status(): fix a potential starvation issue in TX path
SUNRPC: Fix the batch tasks count wraparound.
SUNRPC: Should wake up the privileged task firstly.
s390/cio: dont call css_wait_for_slow_path() inside a lock
rtc: stm32: Fix unbalanced clk_disable_unprepare() on probe error path
iio: light: tcs3472: do not free unallocated IRQ
iio: ltr501: mark register holding upper 8 bits of ALS_DATA{0,1} and PS_DATA as volatile, too
iio: ltr501: ltr559: fix initialization of LTR501_ALS_CONTR
iio: ltr501: ltr501_read_ps(): add missing endianness conversion
serial: sh-sci: Stop dmaengine transfer in sci_stop_tx()
serial_cs: Add Option International GSM-Ready 56K/ISDN modem
serial_cs: remove wrong GLOBETROTTER.cis entry
ath9k: Fix kernel NULL pointer dereference during ath_reset_internal()
ssb: sdio: Don't overwrite const buffer if block_write fails
rsi: Assign beacon rate settings to the correct rate_info descriptor field
rsi: fix AP mode with WPA failure due to encrypted EAPOL
tracing/histograms: Fix parsing of "sym-offset" modifier
tracepoint: Add tracepoint_probe_register_may_exist() for BPF tracing
seq_buf: Make trace_seq_putmem_hex() support data longer than 8
powerpc/stacktrace: Fix spurious "stale" traces in raise_backtrace_ipi()
evm: Execute evm_inode_init_security() only when an HMAC key is loaded
evm: Refuse EVM_ALLOW_METADATA_WRITES only if an HMAC key is loaded
fuse: check connected before queueing on fpq->io
spi: Make of_register_spi_device also set the fwnode
spi: spi-loopback-test: Fix 'tx_buf' might be 'rx_buf'
spi: spi-topcliff-pch: Fix potential double free in pch_spi_process_messages()
spi: omap-100k: Fix the length judgment problem
regulator: uniphier: Add missing MODULE_DEVICE_TABLE
crypto: nx - add missing MODULE_DEVICE_TABLE
media: cpia2: fix memory leak in cpia2_usb_probe
media: cobalt: fix race condition in setting HPD
media: pvrusb2: fix warning in pvr2_i2c_core_done
crypto: qat - check return code of qat_hal_rd_rel_reg()
crypto: qat - remove unused macro in FW loader
sched/fair: Fix ascii art by relpacing tabs
media: em28xx: Fix possible memory leak of em28xx struct
media: v4l2-core: Avoid the dangling pointer in v4l2_fh_release
media: bt8xx: Fix a missing check bug in bt878_probe
media: st-hva: Fix potential NULL pointer dereferences
media: dvd_usb: memory leak in cinergyt2_fe_attach
mmc: via-sdmmc: add a check against NULL pointer dereference
crypto: shash - avoid comparing pointers to exported functions under CFI
media: dvb_net: avoid speculation from net slot
media: siano: fix device register error path
media: imx-csi: Skip first few frames from a BT.656 source
btrfs: fix error handling in __btrfs_update_delayed_inode
btrfs: abort transaction if we fail to update the delayed inode
btrfs: disable build on platforms having page size 256K
regulator: da9052: Ensure enough delay time for .set_voltage_time_sel
HID: do not use down_interruptible() when unbinding devices
EDAC/ti: Add missing MODULE_DEVICE_TABLE
ACPI: processor idle: Fix up C-state latency if not ordered
hv_utils: Fix passing zero to 'PTR_ERR' warning
lib: vsprintf: Fix handling of number field widths in vsscanf
ACPI: EC: Make more Asus laptops use ECDT _GPE
block_dump: remove block_dump feature in mark_inode_dirty()
fs: dlm: cancel work sync othercon
random32: Fix implicit truncation warning in prandom_seed_state()
fs: dlm: fix memory leak when fenced
ACPICA: Fix memory leak caused by _CID repair function
ACPI: bus: Call kobject_put() in acpi_init() error path
platform/x86: toshiba_acpi: Fix missing error code in toshiba_acpi_setup_keyboard()
clocksource: Retry clock read if long delays detected
ACPI: tables: Add custom DSDT file as makefile prerequisite
HID: wacom: Correct base usage for capacitive ExpressKey status bits
ia64: mca_drv: fix incorrect array size calculation
media: s5p_cec: decrement usage count if disabled
crypto: ixp4xx - dma_unmap the correct address
crypto: ux500 - Fix error return code in hash_hw_final()
sata_highbank: fix deferred probing
pata_rb532_cf: fix deferred probing
media: I2C: change 'RST' to "RSET" to fix multiple build errors
pata_octeon_cf: avoid WARN_ON() in ata_host_activate()
evm: fix writing <securityfs>/evm overflow
crypto: ccp - Fix a resource leak in an error handling path
media: rc: i2c: Fix an error message
pata_ep93xx: fix deferred probing
media: exynos4-is: Fix a use after free in isp_video_release
media: tc358743: Fix error return code in tc358743_probe_of()
media: gspca/gl860: fix zero-length control requests
media: siano: Fix out-of-bounds warnings in smscore_load_firmware_family2()
mmc: usdhi6rol0: fix error return code in usdhi6_probe()
media: s5p-g2d: Fix a memory leak on ctx->fh.m2m_ctx
hwmon: (max31722) Remove non-standard ACPI device IDs
hwmon: (max31790) Fix fan speed reporting for fan7..12
btrfs: clear log tree recovering status if starting transaction fails
spi: spi-sun6i: Fix chipselect/clock bug
crypto: nx - Fix RCU warning in nx842_OF_upd_status
ACPI: sysfs: Fix a buffer overrun problem with description_show()
blk-wbt: introduce a new disable state to prevent false positive by rwb_enabled()
blk-wbt: make sure throttle is enabled properly
ocfs2: fix snprintf() checking
net: mvpp2: Put fwnode in error case during ->probe()
net: pch_gbe: Propagate error from devm_gpio_request_one()
drm/rockchip: cdn-dp-core: add missing clk_disable_unprepare() on error in cdn_dp_grf_write()
ehea: fix error return code in ehea_restart_qps()
RDMA/rxe: Fix failure during driver load
drm: qxl: ensure surf.data is ininitialized
tools/bpftool: Fix error return code in do_batch()
wireless: carl9170: fix LEDS build errors & warnings
ieee802154: hwsim: Fix possible memory leak in hwsim_subscribe_all_others
wcn36xx: Move hal_buf allocation to devm_kmalloc in probe
ssb: Fix error return code in ssb_bus_scan()
brcmfmac: fix setting of station info chains bitmask
brcmfmac: correctly report average RSSI in station info
brcmsmac: mac80211_if: Fix a resource leak in an error handling path
ath10k: Fix an error code in ath10k_add_interface()
netlabel: Fix memory leak in netlbl_mgmt_add_common
RDMA/mlx5: Don't add slave port to unaffiliated list
netfilter: nft_exthdr: check for IPv6 packet before further processing
netfilter: nft_osf: check for TCP packet before further processing
netfilter: nft_tproxy: restrict support to TCP and UDP transport protocols
RDMA/rxe: Fix qp reference counting for atomic ops
samples/bpf: Fix the error return code of xdp_redirect's main()
net: ethernet: aeroflex: fix UAF in greth_of_remove
net: ethernet: ezchip: fix UAF in nps_enet_remove
net: ethernet: ezchip: fix error handling
pkt_sched: sch_qfq: fix qfq_change_class() error path
vxlan: add missing rcu_read_lock() in neigh_reduce()
net/ipv4: swap flow ports when validating source
ieee802154: hwsim: Fix memory leak in hwsim_add_one
ieee802154: hwsim: avoid possible crash in hwsim_del_edge_nl()
mac80211: remove iwlwifi specific workaround NDPs of null_response
net: bcmgenet: Fix attaching to PYH failed on RPi 4B
ipv6: exthdrs: do not blindly use init_net
bpf: Do not change gso_size during bpf_skb_change_proto()
i40e: Fix error handling in i40e_vsi_open
i40e: Fix autoneg disabling for non-10GBaseT links
Revert "ibmvnic: remove duplicate napi_schedule call in open function"
ibmvnic: free tx_pool if tso_pool alloc fails
ipv6: fix out-of-bound access in ip6_parse_tlv()
Bluetooth: mgmt: Fix slab-out-of-bounds in tlv_data_is_valid
Bluetooth: Fix handling of HCI_LE_Advertising_Set_Terminated event
writeback: fix obtain a reference to a freeing memcg css
net: lwtunnel: handle MTU calculation in forwading
net: sched: fix warning in tcindex_alloc_perfect_hash
RDMA/mlx5: Don't access NULL-cleared mpi pointer
tty: nozomi: Fix a resource leak in an error handling function
mwifiex: re-fix for unaligned accesses
iio: adis_buffer: do not return ints in irq handlers
iio: accel: bma180: Fix buffer alignment in iio_push_to_buffers_with_timestamp()
iio: accel: bma220: Fix buffer alignment in iio_push_to_buffers_with_timestamp()
iio: accel: hid: Fix buffer alignment in iio_push_to_buffers_with_timestamp()
iio: accel: kxcjk-1013: Fix buffer alignment in iio_push_to_buffers_with_timestamp()
iio: accel: stk8312: Fix buffer alignment in iio_push_to_buffers_with_timestamp()
iio: accel: stk8ba50: Fix buffer alignment in iio_push_to_buffers_with_timestamp()
iio: adc: ti-ads1015: Fix buffer alignment in iio_push_to_buffers_with_timestamp()
iio: adc: vf610: Fix buffer alignment in iio_push_to_buffers_with_timestamp()
iio: gyro: bmg160: Fix buffer alignment in iio_push_to_buffers_with_timestamp()
iio: humidity: am2315: Fix buffer alignment in iio_push_to_buffers_with_timestamp()
iio: prox: srf08: Fix buffer alignment in iio_push_to_buffers_with_timestamp()
iio: prox: pulsed-light: Fix buffer alignment in iio_push_to_buffers_with_timestamp()
iio: prox: as3935: Fix buffer alignment in iio_push_to_buffers_with_timestamp()
iio: light: isl29125: Fix buffer alignment in iio_push_to_buffers_with_timestamp()
iio: light: tcs3414: Fix buffer alignment in iio_push_to_buffers_with_timestamp()
iio: light: tcs3472: Fix buffer alignment in iio_push_to_buffers_with_timestamp()
iio: potentiostat: lmp91000: Fix alignment of buffer in iio_push_to_buffers_with_timestamp()
ASoC: hisilicon: fix missing clk_disable_unprepare() on error in hi6210_i2s_startup()
ASoC: rsnd: tidyup loop on rsnd_adg_clk_query()
Input: hil_kbd - fix error return code in hil_dev_connect()
char: pcmcia: error out if 'num_bytes_read' is greater than 4 in set_protocol()
tty: nozomi: Fix the error handling path of 'nozomi_card_init()'
scsi: FlashPoint: Rename si_flags field
fsi: core: Fix return of error values on failures
fsi: scom: Reset the FSI2PIB engine for any error
fsi/sbefifo: Clean up correct FIFO when receiving reset request from SBE
fsi/sbefifo: Fix reset timeout
visorbus: fix error return code in visorchipset_init()
s390: appldata depends on PROC_SYSCTL
eeprom: idt_89hpesx: Put fwnode in matching case during ->probe()
eeprom: idt_89hpesx: Restore printing the unsupported fwnode name
iio: adc: hx711: Fix buffer alignment in iio_push_to_buffers_with_timestamp()
iio: adc: mxs-lradc: Fix buffer alignment in iio_push_to_buffers_with_timestamp()
iio: adc: ti-ads8688: Fix alignment of buffer in iio_push_to_buffers_with_timestamp()
staging: gdm724x: check for buffer overflow in gdm_lte_multi_sdu_pkt()
staging: gdm724x: check for overflow in gdm_lte_netif_rx()
staging: mt7621-dts: fix pci address for PCI memory range
serial: 8250: Actually allow UPF_MAGIC_MULTIPLIER baud rates
iio: prox: isl29501: Fix buffer alignment in iio_push_to_buffers_with_timestamp()
ASoC: cs42l42: Correct definition of CS42L42_ADC_PDN_MASK
of: Fix truncation of memory sizes on 32-bit platforms
mtd: rawnand: marvell: add missing clk_disable_unprepare() on error in marvell_nfc_resume()
scsi: mpt3sas: Fix error return value in _scsih_expander_add()
phy: ti: dm816x: Fix the error handling path in 'dm816x_usb_phy_probe()
extcon: sm5502: Drop invalid register write in sm5502_reg_data
extcon: max8997: Add missing modalias string
ASoC: atmel-i2s: Fix usage of capture and playback at the same time
configfs: fix memleak in configfs_release_bin_file
leds: as3645a: Fix error return code in as3645a_parse_node()
leds: ktd2692: Fix an error handling path
powerpc: Offline CPU in stop_this_cpu()
serial: mvebu-uart: correctly calculate minimal possible baudrate
arm64: dts: marvell: armada-37xx: Fix reg for standard variant of UART
vfio/pci: Handle concurrent vma faults
mm/huge_memory.c: don't discard hugepage if other processes are mapping it
selftests/vm/pkeys: fix alloc_random_pkey() to make it really, really random
perf llvm: Return -ENOMEM when asprintf() fails
mmc: block: Disable CMDQ on the ioctl path
mmc: vub3000: fix control-request direction
drm/mxsfb: Don't select DRM_KMS_FB_HELPER
drm/zte: Don't select DRM_KMS_FB_HELPER
drm/amd/amdgpu/sriov disable all ip hw status by default
net: pch_gbe: Use proper accessors to BE data in pch_ptp_match()
drm/amd/display: fix use_max_lb flag for 420 pixel formats
hugetlb: clear huge pte during flush function on mips platform
atm: iphase: fix possible use-after-free in ia_module_exit()
mISDN: fix possible use-after-free in HFC_cleanup()
atm: nicstar: Fix possible use-after-free in nicstar_cleanup()
net: Treat __napi_schedule_irqoff() as __napi_schedule() on PREEMPT_RT
reiserfs: add check for invalid 1st journal block
drm/virtio: Fix double free on probe failure
udf: Fix NULL pointer dereference in udf_symlink function
e100: handle eeprom as little endian
clk: renesas: r8a77995: Add ZA2 clock
clk: tegra: Ensure that PLLU configuration is applied properly
ipv6: use prandom_u32() for ID generation
RDMA/cxgb4: Fix missing error code in create_qp()
dm space maps: don't reset space map allocation cursor when committing
pinctrl: mcp23s08: fix race condition in irq handler
ice: set the value of global config lock timeout longer
virtio_net: Remove BUG() to avoid machine dead
net: bcmgenet: check return value after calling platform_get_resource()
net: mvpp2: check return value after calling platform_get_resource()
net: micrel: check return value after calling platform_get_resource()
fjes: check return value after calling platform_get_resource()
selinux: use __GFP_NOWARN with GFP_NOWAIT in the AVC
xfrm: Fix error reporting in xfrm_state_construct.
wlcore/wl12xx: Fix wl12xx get_mac error if device is in ELP
wl1251: Fix possible buffer overflow in wl1251_cmd_scan
cw1200: add missing MODULE_DEVICE_TABLE
net: fix mistake path for netdev_features_strings
rtl8xxxu: Fix device info for RTL8192EU devices
MIPS: add PMD table accounting into MIPS'pmd_alloc_one
atm: nicstar: use 'dma_free_coherent' instead of 'kfree'
atm: nicstar: register the interrupt handler in the right place
vsock: notify server to shutdown when client has pending signal
RDMA/rxe: Don't overwrite errno from ib_umem_get()
iwlwifi: mvm: don't change band on bound PHY contexts
iwlwifi: pcie: free IML DMA memory allocation
sfc: avoid double pci_remove of VFs
sfc: error code if SRIOV cannot be disabled
wireless: wext-spy: Fix out-of-bounds warning
media, bpf: Do not copy more entries than user space requested
net: ip: avoid OOM kills with large UDP sends over loopback
RDMA/cma: Fix rdma_resolve_route() memory leak
Bluetooth: Fix the HCI to MGMT status conversion table
Bluetooth: Shutdown controller after workqueues are flushed or cancelled
Bluetooth: btusb: fix bt fiwmare downloading failure issue for qca btsoc.
sctp: validate from_addr_param return
sctp: add size validation when walking chunks
MIPS: set mips32r5 for virt extensions
fscrypt: don't ignore minor_hash when hash is 0
bdi: Do not use freezable workqueue
serial: mvebu-uart: clarify the baud rate derivation
serial: mvebu-uart: fix calculation of clock divisor
fuse: reject internal errno
powerpc/barrier: Avoid collision with clang's __lwsync macro
usb: gadget: f_fs: Fix setting of device and driver data cross-references
drm/radeon: Add the missed drm_gem_object_put() in radeon_user_framebuffer_create()
drm/amd/display: fix incorrrect valid irq check
pinctrl/amd: Add device HID for new AMD GPIO controller
drm/msm/mdp4: Fix modifier support enabling
mmc: sdhci: Fix warning message when accessing RPMB in HS400 mode
mmc: core: clear flags before allowing to retune
mmc: core: Allow UHS-I voltage switch for SDSC cards if supported
ata: ahci_sunxi: Disable DIPM
cpu/hotplug: Cure the cpusets trainwreck
clocksource/arm_arch_timer: Improve Allwinner A64 timer workaround
ASoC: tegra: Set driver_name=tegra for all machine drivers
qemu_fw_cfg: Make fw_cfg_rev_attr a proper kobj_attribute
ipmi/watchdog: Stop watchdog timer when the current action is 'none'
power: supply: ab8500: Fix an old bug
seq_buf: Fix overflow in seq_buf_putmem_hex()
tracing: Simplify & fix saved_tgids logic
tracing: Resize tgid_map to pid_max, not PID_MAX_DEFAULT
ipack/carriers/tpci200: Fix a double free in tpci200_pci_probe
coresight: tmc-etf: Fix global-out-of-bounds in tmc_update_etf_buffer()
dm btree remove: assign new_root only when removal succeeds
PCI: Leave Apple Thunderbolt controllers on for s2idle or standby
PCI: aardvark: Fix checking for PIO Non-posted Request
media: subdev: disallow ioctl for saa6588/davinci
media: dtv5100: fix control-request directions
media: zr364xx: fix memory leak in zr364xx_start_readpipe
media: gspca/sq905: fix control-request direction
media: gspca/sunplus: fix zero-length control requests
media: uvcvideo: Fix pixel format change for Elgato Cam Link 4K
pinctrl: mcp23s08: Fix missing unlock on error in mcp23s08_irq()
jfs: fix GPF in diFree
smackfs: restrict bytes count in smk_set_cipso()
KVM: x86: Use guest MAXPHYADDR from CPUID.0x8000_0008 iff TDP is enabled
KVM: X86: Disable hardware breakpoints unconditionally before kvm_x86->run()
scsi: core: Fix bad pointer dereference when ehandler kthread is invalid
tracing: Do not reference char * as a string in histograms
PCI: aardvark: Don't rely on jiffies while holding spinlock
PCI: aardvark: Fix kernel panic during PIO transfer
tty: serial: fsl_lpuart: fix the potential risk of division or modulo by zero
misc/libmasm/module: Fix two use after free in ibmasm_init_one
Revert "ALSA: bebob/oxfw: fix Kconfig entry for Mackie d.2 Pro"
w1: ds2438: fixing bug that would always get page0
scsi: lpfc: Fix "Unexpected timeout" error in direct attach topology
scsi: lpfc: Fix crash when lpfc_sli4_hba_setup() fails to initialize the SGLs
scsi: core: Cap scsi_host cmd_per_lun at can_queue
ALSA: ac97: fix PM reference leak in ac97_bus_remove()
tty: serial: 8250: serial_cs: Fix a memory leak in error handling path
scsi: scsi_dh_alua: Check for negative result value
fs/jfs: Fix missing error code in lmLogInit()
scsi: iscsi: Add iscsi_cls_conn refcount helpers
scsi: iscsi: Fix conn use after free during resets
scsi: iscsi: Fix shost->max_id use
scsi: qedi: Fix null ref during abort handling
mfd: da9052/stmpe: Add and modify MODULE_DEVICE_TABLE
s390/sclp_vt220: fix console name to match device
selftests: timers: rtcpie: skip test if default RTC device does not exist
ALSA: sb: Fix potential double-free of CSP mixer elements
powerpc/ps3: Add dma_mask to ps3_dma_region
gpio: zynq: Check return value of pm_runtime_get_sync
ALSA: ppc: fix error return code in snd_pmac_probe()
selftests/powerpc: Fix "no_handler" EBB selftest
gpio: pca953x: Add support for the On Semi pca9655
ASoC: soc-core: Fix the error return code in snd_soc_of_parse_audio_routing()
Input: hideep - fix the uninitialized use in hideep_nvm_unlock()
ALSA: bebob: add support for ToneWeal FW66
usb: gadget: f_hid: fix endianness issue with descriptors
usb: gadget: hid: fix error return code in hid_bind()
powerpc/boot: Fixup device-tree on little endian
backlight: lm3630a: Fix return code of .update_status() callback
ALSA: hda: Add IRQ check for platform_get_irq()
staging: rtl8723bs: fix macro value for 2.4Ghz only device
intel_th: Wait until port is in reset before programming it
i2c: core: Disable client irq on reboot/shutdown
lib/decompress_unlz4.c: correctly handle zero-padding around initrds.
pwm: spear: Don't modify HW state in .remove callback
power: supply: ab8500: Avoid NULL pointers
power: supply: max17042: Do not enforce (incorrect) interrupt trigger type
power: reset: gpio-poweroff: add missing MODULE_DEVICE_TABLE
ARM: 9087/1: kprobes: test-thumb: fix for LLVM_IAS=1
watchdog: Fix possible use-after-free in wdt_startup()
watchdog: sc520_wdt: Fix possible use-after-free in wdt_turnoff()
watchdog: Fix possible use-after-free by calling del_timer_sync()
watchdog: iTCO_wdt: Account for rebooting on second timeout
x86/fpu: Return proper error codes from user access functions
PCI: tegra: Add missing MODULE_DEVICE_TABLE
orangefs: fix orangefs df output.
ceph: remove bogus checks and WARN_ONs from ceph_set_page_dirty
NFS: nfs_find_open_context() may only select open files
power: supply: charger-manager: add missing MODULE_DEVICE_TABLE
power: supply: ab8500: add missing MODULE_DEVICE_TABLE
pwm: tegra: Don't modify HW state in .remove callback
ACPI: AMBA: Fix resource name in /proc/iomem
ACPI: video: Add quirk for the Dell Vostro 3350
virtio-blk: Fix memory leak among suspend/resume procedure
virtio_net: Fix error handling in virtnet_restore()
virtio_console: Assure used length from device is limited
f2fs: add MODULE_SOFTDEP to ensure crc32 is included in the initramfs
PCI/sysfs: Fix dsm_label_utf16s_to_utf8s() buffer overrun
power: supply: rt5033_battery: Fix device tree enumeration
NFSv4: Initialise connection to the server in nfs4_alloc_client()
um: fix error return code in slip_open()
um: fix error return code in winch_tramp()
watchdog: aspeed: fix hardware timeout calculation
nfs: fix acl memory leak of posix_acl_create()
ubifs: Set/Clear I_LINKABLE under i_lock for whiteout inode
PCI: iproc: Fix multi-MSI base vector number allocation
PCI: iproc: Support multi-MSI only on uniprocessor kernel
x86/fpu: Limit xstate copy size in xstateregs_set()
virtio_net: move tx vq operation under tx queue lock
ALSA: isa: Fix error return code in snd_cmi8330_probe()
NFSv4/pNFS: Don't call _nfs4_pnfs_v3_ds_connect multiple times
hexagon: use common DISCARDS macro
reset: a10sr: add missing of_match_table reference
ARM: dts: exynos: fix PWM LED max brightness on Odroid XU/XU3
ARM: dts: exynos: fix PWM LED max brightness on Odroid HC1
ARM: dts: exynos: fix PWM LED max brightness on Odroid XU4
memory: atmel-ebi: add missing of_node_put for loop iteration
rtc: fix snprintf() checking in is_rtc_hctosys()
arm64: dts: renesas: v3msk: Fix memory size
ARM: dts: r8a7779, marzen: Fix DU clock names
ARM: dts: BCM5301X: Fixup SPI binding
reset: bail if try_module_get() fails
memory: fsl_ifc: fix leak of IO mapping on probe failure
memory: fsl_ifc: fix leak of private memory on probe failure
ARM: dts: am335x: align ti,pindir-d0-out-d1-in property with dt-shema
ARM: dts: am437x: align ti,pindir-d0-out-d1-in property with dt-shema
ARM: dts: imx6q-dhcom: Fix ethernet reset time properties
ARM: dts: imx6q-dhcom: Fix ethernet plugin detection problems
ARM: dts: imx6q-dhcom: Add gpios pinctrl for i2c bus recovery
scsi: be2iscsi: Fix an error handling path in beiscsi_dev_probe()
mips: always link byteswap helpers into decompressor
mips: disable branch profiling in boot/decompress.o
MIPS: vdso: Invalid GIC access through VDSO
net: bridge: multicast: fix PIM hello router port marking race
scsi: scsi_dh_alua: Fix signedness bug in alua_rtpg()
seq_file: disallow extremely large seq buffer allocations
Linux 4.19.198
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Iaa8a95c4d30ca85021bae6c60b4818038797e04e
[ Upstream commit 0dd571785d61528d62cdd8aa49d76bc6085152fe ]
As marcosfrm reported in bugzilla:
https://bugzilla.kernel.org/show_bug.cgi?id=213089
Initramfs generators rely on "pre" softdeps (and "depends") to include
additional required modules.
F2FS does not declare "pre: crc32" softdep. Then every generator (dracut,
mkinitcpio...) has to maintain a hardcoded list for this purpose.
Hence let's use MODULE_SOFTDEP("pre: crc32") in f2fs code.
Fixes: 43b6573bac ("f2fs: use cryptoapi crc32 functions")
Reported-by: marcosfrm <marcosfrm@gmail.com>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
When we create a directory with enable compression, all file write into
directory will try to compress.But sometimes we may know, new file
cannot meet compression ratio requirements.
We need a nocompress extension to skip those files to avoid unnecessary
compress page test.
After add nocompress_extension, the priority should be:
dir_flag < comp_extention,nocompress_extension < comp_file_flag,
no_comp_file_flag.
Priority in between FS_COMPR_FL, FS_NOCOMP_FS, extensions:
* compress_extension=so; nocompress_extension=zip; chattr +c dir;
touch dir/foo.so; touch dir/bar.zip; touch dir/baz.txt; then foo.so
and baz.txt should be compresse, bar.zip should be non-compressed.
chattr +c dir/bar.zip can enable compress on bar.zip.
* compress_extension=so; nocompress_extension=zip; chattr -c dir;
touch dir/foo.so; touch dir/bar.zip; touch dir/baz.txt; then foo.so
should be compresse, bar.zip and baz.txt should be non-compressed.
chattr+c dir/bar.zip; chattr+c dir/baz.txt; can enable compress on
bar.zip and baz.txt.
Signed-off-by: Fengnan Chang <changfengnan@vivo.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Add a slab cache: "f2fs_casefolded_name" for memory allocation
of casefold name.
Reviewed-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Support to use address space of inner inode to cache compressed block,
in order to improve cache hit ratio of random read.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Given RO feature in superblock, we don't need to check provisioning/reserve
spaces and SSA area.
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Update the logging uses that have unnecessary newlines as the f2fs_printk
function and so its f2fs_<level> macro callers already adds one.
This allows searching single line logging entries with an easier grep and
also avoids unnecessary blank lines in the logging.
Miscellanea:
o Coalesce formats
o Align to open parenthesis
Signed-off-by: Joe Perches <joe@perches.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Quoted from [1]
"I do remember that I've added this code back then because otherwise
orphan cleanup was losing updates to quota files. But you're right
that now I don't see how that could be happening and it would be nice
if we could get rid of this hack"
[1] https://lore.kernel.org/linux-ext4/99cce8ca-e4a0-7301-840f-2ace67c551f3@huawei.com/T/#m04990cfbc4f44592421736b504afcc346b2a7c00
Related fix in ext4 by
commit 72ffb49a7b62 ("ext4: do not set SB_ACTIVE in ext4_orphan_cleanup()").
f2fs has the same hack implementation in
- f2fs_recover_orphan_inodes()
- f2fs_recover_fsync_data()
- f2fs_disable_checkpoint()
Let's get rid of this hack as well in f2fs.
Cc: Zhang Yi <yi.zhang@huawei.com>
Cc: Jan Kara <jack@suse.cz>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Acked-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
As marcosfrm reported in bugzilla:
https://bugzilla.kernel.org/show_bug.cgi?id=213089
Initramfs generators rely on "pre" softdeps (and "depends") to include
additional required modules.
F2FS does not declare "pre: crc32" softdep. Then every generator (dracut,
mkinitcpio...) has to maintain a hardcoded list for this purpose.
Hence let's use MODULE_SOFTDEP("pre: crc32") in f2fs code.
Fixes: 43b6573bac ("f2fs: use cryptoapi crc32 functions")
Reported-by: marcosfrm <marcosfrm@gmail.com>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
As syzbot reported, there is an use-after-free issue during f2fs recovery:
Use-after-free write at 0xffff88823bc16040 (in kfence-#10):
kmem_cache_destroy+0x1f/0x120 mm/slab_common.c:486
f2fs_recover_fsync_data+0x75b0/0x8380 fs/f2fs/recovery.c:869
f2fs_fill_super+0x9393/0xa420 fs/f2fs/super.c:3945
mount_bdev+0x26c/0x3a0 fs/super.c:1367
legacy_get_tree+0xea/0x180 fs/fs_context.c:592
vfs_get_tree+0x86/0x270 fs/super.c:1497
do_new_mount fs/namespace.c:2905 [inline]
path_mount+0x196f/0x2be0 fs/namespace.c:3235
do_mount fs/namespace.c:3248 [inline]
__do_sys_mount fs/namespace.c:3456 [inline]
__se_sys_mount+0x2f9/0x3b0 fs/namespace.c:3433
do_syscall_64+0x3f/0xb0 arch/x86/entry/common.c:47
entry_SYSCALL_64_after_hwframe+0x44/0xae
The root cause is multi f2fs filesystem instances can race on accessing
global fsync_entry_slab pointer, result in use-after-free issue of slab
cache, fixes to init/destroy this slab cache only once during module
init/destroy procedure to avoid this issue.
Reported-by: syzbot+9d90dad32dd9727ed084@syzkaller.appspotmail.com
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
This patch combined the below three clean-up patches.
- modify open brace '{' following function definitions
- ERROR: spaces required around that ':'
- ERROR: spaces required before the open parenthesis '('
- ERROR: spaces prohibited before that ','
- Made suggested modifications from checkpatch in reference to WARNING:
Missing a blank line after declarations
Signed-off-by: Yi Zhuang <zhuangyi1@huawei.com>
Signed-off-by: Jia Yang <jiayang5@huawei.com>
Signed-off-by: Jack Qiu <jack.qiu@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Once we introduced checkpoint_merge, we've seen some contention w/o the option.
In order to avoid it, let's set it by default.
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
When we mount an unclean f2fs image in a readonly block device, let's
make mount() succeed only when there is no recoverable data in that
image, otherwise after mount(), file fsyned won't be recovered as user
expected.
Fixes: 938a184265d7 ("f2fs: give a warning only for readonly partition")
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
In this patch, we will add two new mount options: "gc_merge" and
"nogc_merge", when background_gc is on, "gc_merge" option can be
set to let background GC thread to handle foreground GC requests,
it can eliminate the sluggish issue caused by slow foreground GC
operation when GC is triggered from a process with limited I/O
and CPU resources.
Original idea is from Xiang.
Signed-off-by: Gao Xiang <xiang@kernel.org>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
In error path of f2fs_remount(), it missed to restart/stop kernel thread
or enable/disable checkpoint, then mount option status may not be
consistent with real condition of filesystem, so let's reorder remount
flow a bit as below and do recovery correctly in error path:
1) handle gc thread
2) handle ckpt thread
3) handle flush thread
4) handle checkpoint disabling
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
In readonly mountpoint, there should be no write IOs include checkpoint
IO, so that it's not needed to create kernel checkpoint thread.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
LKP reports:
fs/f2fs/super.c:1516:20: warning: unused function 'f2fs_show_compress_options' [-Wunused-function]
static inline void f2fs_show_compress_options(struct seq_file *seq,
Fix this issue by covering f2fs_show_compress_options() with
CONFIG_F2FS_FS_COMPRESSION macro.
Fixes: 4c8ff7095bef ("f2fs: support data compression")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
F2FS_IOC_FLUSH_DEVICE/F2FS_IOC_RESIZE_FS needs to migrate all blocks of
target segment to other place, no matter the segment has partially or fully
valid blocks.
However, after commit 803e74be04b3 ("f2fs: stop GC when the victim becomes
fully valid"), we may skip migration due to target segment is fully valid,
result in failing the ioctl interface, fix this.
Fixes: 803e74be04b3 ("f2fs: stop GC when the victim becomes fully valid")
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Let's allow mounting readonly partition. We're able to recovery later once we
have it as read-write back.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
We've added a new mount options, "checkpoint_merge" and "nocheckpoint_merge",
which creates a kernel daemon and makes it to merge concurrent checkpoint
requests as much as possible to eliminate redundant checkpoint issues. Plus,
we can eliminate the sluggish issue caused by slow checkpoint operation
when the checkpoint is done in a process context in a cgroup having
low i/o budget and cpu shares. To make this do better, we set the
default i/o priority of the kernel daemon to "3", to give one higher
priority than other kernel threads. The below verification result
explains this.
The basic idea has come from https://opensource.samsung.com.
[Verification]
Android Pixel Device(ARM64, 7GB RAM, 256GB UFS)
Create two I/O cgroups (fg w/ weight 100, bg w/ wight 20)
Set "strict_guarantees" to "1" in BFQ tunables
In "fg" cgroup,
- thread A => trigger 1000 checkpoint operations
"for i in `seq 1 1000`; do touch test_dir1/file; fsync test_dir1;
done"
- thread B => gererating async. I/O
"fio --rw=write --numjobs=1 --bs=128k --runtime=3600 --time_based=1
--filename=test_img --name=test"
In "bg" cgroup,
- thread C => trigger repeated checkpoint operations
"echo $$ > /dev/blkio/bg/tasks; while true; do touch test_dir2/file;
fsync test_dir2; done"
We've measured thread A's execution time.
[ w/o patch ]
Elapsed Time: Avg. 68 seconds
[ w/ patch ]
Elapsed Time: Avg. 48 seconds
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
[Jaegeuk Kim: fix the return value in f2fs_start_ckpt_thread, reported by Dan]
Signed-off-by: Daeho Jeong <daehojeong@google.com>
Signed-off-by: Sungjong Seo <sj1557.seo@samsung.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
During checkpoint=disable period, f2fs bypasses all the synchronous IOs such as
sync and fsync. So, when enabling it back, we must flush all of them in order
to keep the data persistent. Otherwise, suddern power-cut right after enabling
checkpoint will cause data loss.
Fixes: 4354994f097d ("f2fs: checkpoint disabling")
Cc: stable@vger.kernel.org
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
During checkpoint=disable period, f2fs bypasses all the synchronous IOs such as
sync and fsync. So, when enabling it back, we must flush all of them in order
to keep the data persistent. Otherwise, suddern power-cut right after enabling
checkpoint will cause data loss.
Bug: 171063590
Fixes: 4354994f097d ("f2fs: checkpoint disabling")
Cc: stable@vger.kernel.org
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
(cherry picked from commit 8d52dbb373579b48f5758dd0cdd2ac0fb4e5be7f git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs.git dev)
Signed-off-by: Jaegeuk Kim <jaegeuk@google.com>
Change-Id: Iaca2d6fc1841fffa8677d5d592732c94241fb3fb