Export function for use at external invoke.
Change-Id: Ifa51a5dd55511ffcca538c73a3e97db9d9b92531
Chang-eId: Ib028e5f8ada5632edf1de6c024ac51c1f6d4303d
Signed-off-by: Tengfei Fan <tengfeif@codeaurora.org>
* refs/heads/tmp-cab4399:
Linux 4.19.47
NFS: Fix a double unlock from nfs_match,get_client
drm/sun4i: dsi: Enforce boundaries on the start delay
vfio-ccw: Prevent quiesce function going into an infinite loop
drm/sun4i: dsi: Change the start delay calculation
drm: Wake up next in drm_read() chain if we are forced to putback the event
drm/drv: Hold ref on parent device during drm_device lifetime
drm/v3d: Handle errors from IRQ setup.
ASoC: davinci-mcasp: Fix clang warning without CONFIG_PM
spi: Fix zero length xfer bug
spi: imx: stop buffer overflow in RX FIFO flush
spi: rspi: Fix sequencer reset during initialization
drm/omap: dsi: Fix PM for display blank with paired dss_pll calls
spi : spi-topcliff-pch: Fix to handle empty DMA buffers
scsi: lpfc: Fix SLI3 commands being issued on SLI4 devices
media: saa7146: avoid high stack usage with clang
scsi: lpfc: Fix fc4type information for FDMI
scsi: lpfc: Fix FDMI manufacturer attribute value
media: vimc: zero the media_device on probe
media: go7007: avoid clang frame overflow warning with KASAN
media: gspca: do not resubmit URBs when streaming has stopped
media: vimc: stream: fix thread state before sleep
scsi: ufs: fix a missing check of devm_reset_control_get
drm/amd/display: Set stream->mode_changed when connectors change
drm/amd/display: Fix Divide by 0 in memory calculations
media: staging: davinci_vpfe: disallow building with COMPILE_TEST
media: m88ds3103: serialize reset messages in m88ds3103_set_frontend
media: dvbsky: Avoid leaking dvb frontend
media: si2165: fix a missing check of return value
igb: Exclude device from suspend direct complete optimization
tinydrm/mipi-dbi: Use dma-safe buffers for all SPI transfers
e1000e: Disable runtime PM on CNP+
thunderbolt: property: Fix a NULL pointer dereference
drm/amd/display: fix releasing planes when exiting odm
thunderbolt: Fix to check for kmemdup failure
thunderbolt: Fix to check return value of ida_simple_get
hwrng: omap - Set default quality
dmaengine: tegra210-adma: use devm_clk_*() helpers
batman-adv: allow updating DAT entry timeouts on incoming ARP Replies
selinux: avoid uninitialized variable warning
scsi: lpfc: avoid uninitialized variable warning
scsi: qla4xxx: avoid freeing unallocated dma memory
usb: core: Add PM runtime calls to usb_hcd_platform_shutdown
rcuperf: Fix cleanup path for invalid perf_type strings
x86/mce: Handle varying MCA bank counts
rcutorture: Fix cleanup path for invalid torture_type strings
x86/mce: Fix machine_check_poll() tests for error types
overflow: Fix -Wtype-limits compilation warnings
tty: ipwireless: fix missing checks for ioremap
virtio_console: initialize vtermno value for ports
scsi: qedf: Add missing return in qedf_post_io_req() in the fcport offload check
timekeeping: Force upper bound for setting CLOCK_REALTIME
thunderbolt: Fix to check the return value of kmemdup
thunderbolt: property: Fix a missing check of kzalloc
efifb: Omit memory map check on legacy boot
media: gspca: Kill URBs on USB device disconnect
media: wl128x: prevent two potential buffer overflows
media: video-mux: fix null pointer dereferences
kobject: Don't trigger kobject_uevent(KOBJ_REMOVE) twice.
spi: tegra114: reset controller on probe
HID: logitech-hidpp: change low battery level threshold from 31 to 30 percent
cxgb3/l2t: Fix undefined behaviour
ASoC: fsl_utils: fix a leaked reference by adding missing of_node_put
ASoC: eukrea-tlv320: fix a leaked reference by adding missing of_node_put
HID: core: move Usage Page concatenation to Main item
sh: sh7786: Add explicit I/O cast to sh7786_mm_sel()
RDMA/hns: Fix bad endianess of port_pd variable
chardev: add additional check for minor range overlap
x86/uaccess: Fix up the fixup
x86/ia32: Fix ia32_restore_sigcontext() AC leak
x86/uaccess, signal: Fix AC=1 bloat
x86/uaccess, ftrace: Fix ftrace_likely_update() vs. SMAP
wil6210: fix return code of wmi_mgmt_tx and wmi_mgmt_tx_ext
arm64: cpu_ops: fix a leaked reference by adding missing of_node_put
drm/panel: otm8009a: Add delay at the end of initialization
scsi: ufs: Avoid configuring regulator with undefined voltage range
scsi: ufs: Fix regulator load and icc-level configuration
rtlwifi: fix potential NULL pointer dereference
rtc: xgene: fix possible race condition
brcmfmac: fix Oops when bringing up interface during USB disconnect
brcmfmac: fix race during disconnect when USB completion is in progress
brcmfmac: fix WARNING during USB disconnect in case of unempty psq
brcmfmac: convert dev_init_lock mutex to completion
b43: shut up clang -Wuninitialized variable warning
brcmfmac: fix missing checks for kmemdup
mwifiex: Fix mem leak in mwifiex_tm_cmd
rtlwifi: fix a potential NULL pointer dereference
selftests/bpf: ksym_search won't check symbols exists
iio: adc: ti-ads7950: Fix improper use of mlock
iio: common: ssp_sensors: Initialize calculated_time in ssp_common_process_data
iio: hmc5843: fix potential NULL pointer dereferences
iio: ad_sigma_delta: Properly handle SPI bus locking vs CS assertion
drm/pl111: fix possible object reference leak
x86/build: Keep local relocations with ld.lld
block: sed-opal: fix IOC_OPAL_ENABLE_DISABLE_MBR
cpufreq: kirkwood: fix possible object reference leak
cpufreq: pmac32: fix possible object reference leak
cpufreq/pasemi: fix possible object reference leak
cpufreq: ppc_cbe: fix possible object reference leak
qmi_wwan: Add quirk for Quectel dynamic config
selftests: cgroup: fix cleanup path in test_memcg_subtree_control()
s390: cio: fix cio_irb declaration
s390/mm: silence compiler warning when compiling without CONFIG_PGSTE
x86/microcode: Fix the ancient deprecated microcode loading method
s390: zcrypt: initialize variables before_use
clk: rockchip: Make rkpwm a critical clock on rk3288
extcon: arizona: Disable mic detect if running when driver is removed
clk: rockchip: Fix video codec clocks on rk3288
PM / core: Propagate dev->power.wakeup_path when no callbacks
drm/amdgpu: fix old fence check in amdgpu_fence_emit
mmc: sdhci-of-esdhc: add erratum eSDHC-A001 and A-008358 support
mmc: sdhci-of-esdhc: add erratum A-009204 support
mmc: sdhci-of-esdhc: add erratum eSDHC5 support
mmc_spi: add a status check for spi_sync_locked
mmc: core: make pwrseq_emmc (partially) support sleepy GPIO controllers
scsi: libsas: Do discovery on empty PHY to update PHY info
hwmon: (f71805f) Use request_muxed_region for Super-IO accesses
hwmon: (pc87427) Use request_muxed_region for Super-IO accesses
hwmon: (smsc47b397) Use request_muxed_region for Super-IO accesses
hwmon: (smsc47m1) Use request_muxed_region for Super-IO accesses
hwmon: (vt1211) Use request_muxed_region for Super-IO accesses
perf/x86/intel/cstate: Add Icelake support
perf/x86/intel/rapl: Add Icelake support
perf/x86/msr: Add Icelake support
RDMA/cxgb4: Fix null pointer dereference on alloc_skb failure
arm64: vdso: Fix clock_getres() for CLOCK_REALTIME
ACPI/IORT: Reject platform device creation on NUMA node mapping failure
i40e: don't allow changes to HW VLAN stripping on active port VLANs
i40e: Able to add up to 16 MAC filters on an untrusted VF
phy: mapphone-mdm6600: add gpiolib dependency
phy: sun4i-usb: Make sure to disable PHY0 passby for peripheral mode
drm: etnaviv: avoid DMA API warning when importing buffers
x86/irq/64: Limit IST stack overflow check to #DB stack
USB: core: Don't unbind interfaces following device reset failure
s390/qeth: handle error from qeth_update_from_chp_desc()
thunderbolt: Take domain lock in switch sysfs attribute callbacks
irq_work: Do not raise an IPI when queueing work on the local CPU
drm/msm: a5xx: fix possible object reference leak
staging: vc04_services: handle kzalloc failure
sched/core: Handle overflow in cpu_shares_write_u64
sched/rt: Check integer overflow at usec to nsec conversion
sched/core: Check quota and period overflow at usec to nsec conversion
cgroup: protect cgroup->nr_(dying_)descendants by css_set_lock
random: add a spinlock_t to struct batched_entropy
random: fix CRNG initialization when random.trust_cpu=1
powerpc/64: Fix booting large kernels with STRICT_KERNEL_RWX
powerpc/numa: improve control of topology updates
block: fix use-after-free on gendisk
iio: adc: stm32-dfsdm: fix unmet direct dependencies detected
media: pvrusb2: Prevent a buffer overflow
media: au0828: Fix NULL pointer dereference in au0828_analog_stream_enable()
media: stm32-dcmi: fix crash when subdev do not expose any formats
audit: fix a memory leak bug
media: ov2659: make S_FMT succeed even if requested format doesn't match
media: au0828: stop video streaming only when last user stops
media: ov6650: Move v4l2_clk_get() to ov6650_video_probe() helper
media: coda: clear error return value before picture run
dmaengine: at_xdmac: remove BUG_ON macro in tasklet
perf/arm-cci: Remove broken race mitigation
clk: rockchip: undo several noc and special clocks as critical on rk3288
pinctrl: samsung: fix leaked of_node references
pinctrl: pistachio: fix leaked of_node references
HID: logitech-hidpp: use RAP instead of FAP to get the protocol version
Bluetooth: hci_qca: Give enough time to ROME controller to bootup.
mm/uaccess: Use 'unsigned long' to placate UBSAN warnings on older GCC versions
x86/mm: Remove in_nmi() warning from 64-bit implementation of vmalloc_fault()
smpboot: Place the __percpu annotation correctly
x86/build: Move _etext to actual end of .text
vfio-ccw: Release any channel program when releasing/removing vfio-ccw mdev
vfio-ccw: Do not call flush_workqueue while holding the spinlock
RDMA/cma: Consider scope_id while binding to ipv6 ll address
bcache: avoid clang -Wunintialized warning
bcache: add failure check to run_cache_set() for journal replay
bcache: fix failure in journal relplay
bcache: return error immediately in bch_journal_replay()
bcache: avoid potential memleak of list of journal_replay(s) in the CACHE_SYNC branch of run_cache_set
crypto: sun4i-ss - Fix invalid calculation of hash end
nvme-rdma: fix a NULL deref when an admin connect times out
nvme: set 0 capacity if namespace block size exceeds PAGE_SIZE
net: cw1200: fix a NULL pointer dereference
rsi: Fix NULL pointer dereference in kmalloc
mwifiex: prevent an array overflow
ASoC: fsl_sai: Update is_slave_mode with correct value
slimbus: fix a potential NULL pointer dereference in of_qcom_slim_ngd_register
libbpf: fix samples/bpf build failure due to undefined UINT32_MAX
mac80211/cfg80211: update bss channel on channel switch
dmaengine: pl330: _stop: clear interrupt status
s390: qeth: address type mismatch warning
w1: fix the resume command API
sched/nohz: Run NOHZ idle load balancer on HK_FLAG_MISC CPUs
s390/kexec_file: Fix detection of text segment in ELF loader
scsi: qedi: Abort ep termination if offload not scheduled
rtc: stm32: manage the get_irq probe defer case
rtc: 88pm860x: prevent use-after-free on device remove
iwlwifi: pcie: don't crash on invalid RX interrupt
btrfs: Don't panic when we can't find a root key
btrfs: fix panic during relocation after ENOSPC before writeback happens
Btrfs: fix data bytes_may_use underflow with fallocate due to failed quota reserve
x86/modules: Avoid breaking W^X while loading modules
scsi: qla2xxx: Fix hardirq-unsafe locking
scsi: qla2xxx: Avoid that lockdep complains about unsafe locking in tcm_qla2xxx_close_session()
scsi: qla2xxx: Fix abort handling in tcm_qla2xxx_write_pending()
scsi: qla2xxx: Fix a qla24xx_enable_msix() error path
sched/cpufreq: Fix kobject memleak
powerpc/watchdog: Use hrtimers for per-CPU heartbeat
arm64: Fix compiler warning from pte_unmap() with -Wunused-but-set-variable
ARM: vdso: Remove dependency with the arch_timer driver internals
media: stm32-dcmi: return appropriate error codes during probe
drm/nouveau/bar/nv50: ensure BAR is mapped
ACPI / property: fix handling of data_nodes in acpi_get_next_subnode()
brcm80211: potential NULL dereference in brcmf_cfg80211_vndr_cmds_dcmd_handler()
spi: pxa2xx: fix SCR (divisor) calculation
ASoC: imx: fix fiq dependencies
powerpc/perf: Fix loop exit condition in nest_imc_event_init
powerpc/boot: Fix missing check of lseek() return value
powerpc/perf: Return accordingly on invalid chip-id in
ASoC: hdmi-codec: unlock the device on startup errors
usb: dwc3: move synchronize_irq() out of the spinlock protected block
usb: dwc2: gadget: Increase descriptors count for ISOC's
ASoC: Intel: kbl_da7219_max98357a: Map BTN_0 to KEY_PLAYPAUSE
pinctrl: zte: fix leaked of_node references
Bluetooth: Ignore CC events not matching the last HCI command
hv_netvsc: fix race that may miss tx queue wakeup
net: ena: gcc 8: fix compilation warning
dmaengine: tegra210-dma: free dma controller in remove()
bpftool: exclude bash-completion/bpftool from .gitignore pattern
selftests/bpf: set RLIMIT_MEMLOCK properly for test_libbpf_open.c
tools/bpf: fix perf build error with uClibc (seen on ARC)
mmc: core: Verify SD bus width
gfs2: Fix occasional glock use-after-free
IB/hfi1: Fix WQ_MEM_RECLAIM warning
NFS: make nfs_match_client killable
cxgb4: Fix error path in cxgb4_init_module
gfs2: Fix lru_count going negative
Revert "btrfs: Honour FITRIM range constraints during free space trim"
acct_on(): don't mess with freeze protection
at76c50x-usb: Don't register led_trigger if usb_register_driver failed
batman-adv: mcast: fix multicast tt/tvlv worker locking
bpf: devmap: fix use-after-free Read in __dev_map_entry_free
ssb: Fix possible NULL pointer dereference in ssb_host_pcmcia_exit
media: vivid: use vfree() instead of kfree() for dev->bitmap_cap
media: vb2: add waiting_in_dqbuf flag
media: serial_ir: Fix use-after-free in serial_ir_init_module
media: cpia2: Fix use-after-free in cpia2_exit
fbdev: fix WARNING in __alloc_pages_nodemask bug
ovl: relax WARN_ON() for overlapping layers use case
btrfs: honor path->skip_locking in backref code
arm64: errata: Add workaround for Cortex-A76 erratum #1463225
brcmfmac: add subtype check for event handling in data path
brcmfmac: assure SSID length from firmware is limited
bpf: add bpf_jit_limit knob to restrict unpriv allocations
NFSv4.1 fix incorrect return value in copy_file_range
NFSv4.2 fix unnecessary retry in nfs4_copy_file_range
fbdev: fix divide error in fb_var_to_videomode
udlfb: fix some inconsistent NULL checking
btrfs: sysfs: don't leak memory when failing add fsid
btrfs: sysfs: Fix error path kobject memory leak
Btrfs: fix race between ranged fsync and writeback of adjacent ranges
Btrfs: avoid fallback to transaction commit during fsync of files with holes
Btrfs: do not abort transaction at btrfs_update_root() after failure to COW path
btrfs: don't double unlock on error in btrfs_punch_hole
gfs2: Fix sign extension bug in gfs2_update_stats
arm64/iommu: handle non-remapped addresses in ->mmap and ->get_sgtable
arm64/kernel: kaslr: reduce module randomization range to 2 GB
libnvdimm/pmem: Bypass CONFIG_HARDENED_USERCOPY overhead
kvm: svm/avic: fix off-by-one in checking host APIC ID
mmc: sdhci-iproc: Set NO_HISPD bit to fix HS50 data hold time problem
mmc: sdhci-iproc: cygnus: Set NO_HISPD bit to fix HS50 data hold time problem
crypto: vmx - CTR: always increment IV as quadword
Revert "scsi: sd: Keep disk read-only when re-reading partition"
sbitmap: fix improper use of smp_mb__before_atomic()
bio: fix improper use of smp_mb__before_atomic()
KVM: x86: fix return value for reserved EFER
f2fs: Fix use of number of devices
ext4: wait for outstanding dio during truncate in nojournal mode
ext4: do not delete unlinked inode from orphan list on failed truncate
x86: Hide the int3_emulate_call/jmp functions from UML
x86: Hide the int3_emulate_call/jmp functions from UML
f2fs: link f2fs quota ops for sysfile
Conflicts:
arch/arm64/Kconfig
arch/arm64/include/asm/cpucaps.h
arch/arm64/include/asm/cputype.h
arch/arm64/include/asm/pgtable.h
arch/arm64/kernel/cpu_errata.c
arch/arm64/mm/dma-mapping.c
arch/x86/include/asm/text-patching.h
drivers/scsi/ufs/ufshcd.c
drivers/slimbus/qcom-ngd-ctrl.c
kernel/sched/fair.c
Change-Id: Ie70cae851ffdbac1b9dbf611fc361c275f4826fd
Signed-off-by: Ivaylo Georgiev <irgeorgiev@codeaurora.org>
This is a snapshot of the crypto drivers as of msm-4.14 commit
367c46b1 (Enable hardware based FBE on f2fs and adapt ext4 fs).
Change-Id: Ifb52ed101d6e971c5823037f7895049b830c78c5
Signed-off-by: Zhen Kong <zkong@codeaurora.org>
Now only used by the bounce code, so move it there and mark the function
static.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Add and use a new op_stat_group() function for indexing partition stat
fields rather than indexing them by rq_data_dir() or bio_data_dir().
This function works similarly to op_is_sync() in that it takes the
request::cmd_flags or bio::bi_opf flags and determines which stats
should et updated.
In addition, the second parameter to generic_start_io_acct() and
generic_end_io_acct() is now a REQ_OP rather than simply a read or
write bit and it uses op_stat_group() on the parameter to determine
the stat group.
Note that the partition in_flight counts are not part of the per-cpu
statistics and as such are not indexed via this function. It's now
indexed by op_is_write().
tj: Refreshed on top of v4.17. Updated to pass around REQ_OP.
Signed-off-by: Michael Callahan <michaelcallahan@fb.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Joshua Morris <josh.h.morris@us.ibm.com>
Cc: Philipp Reisner <philipp.reisner@linbit.com>
Cc: Matias Bjorling <mb@lightnvm.io>
Cc: Kent Overstreet <kent.overstreet@gmail.com>
Cc: Alasdair Kergon <agk@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
For backcharging we need to know who the page belongs to when swapping
it out. We don't worry about things that do ->rw_page (zram etc) at the
moment, we're only worried about pages that actually go to a block
device.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Josef Bacik <jbacik@fb.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Currently io.low uses a bi_cg_private to stash its private data for the
blkg, however other blkcg policies may want to use this as well. Since
we can get the private data out of the blkg, move this to bi_blkg in the
bio and make it generic, then we can use bio_associate_blkg() to attach
the blkg to the bio.
Theoretically we could simply replace the bi_css with this since we can
get to all the same information from the blkg, however you have to
lookup the blkg, so for example wbc_init_bio() would have to lookup and
possibly allocate the blkg for the css it was trying to attach to the
bio. This could be problematic and result in us either not attaching
the css at all to the bio, or falling back to the root blkcg if we are
unable to allocate the corresponding blkg.
So for now do this, and in the future if possible we could just replace
the bi_css with bi_blkg and update the helpers to do the correct
translation.
Signed-off-by: Josef Bacik <jbacik@fb.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Pull block fixes from Jens Axboe:
"A few fixes for this merge window, where some of them should go in
sooner rather than later, hence a new pull this week. This pull
request contains:
- Set of NVMe fixes, mostly follow up cleanups/fixes to the queue
changes, but also teardown/removal and misc changes (Christop/Dan/
Johannes/Sagi/Steve).
- Two lightnvm fixes for issues that showed up in this window
(Colin/Wei).
- Failfast/driver flags inheritance for flush requests (Hannes).
- The md device put sanitization and fix (Kent).
- dm bio_set inheritance fix (me).
- nbd discard granularity fix (Josef).
- nbd consistency in command printing (Kevin).
- Loop recursion validation fix (Ted).
- Partition overlap check (Wang)"
[ .. and now my build is warning-free again thanks to the md fix - Linus ]
* tag 'for-linus-20180608' of git://git.kernel.dk/linux-block: (22 commits)
nvme: cleanup double shift issue
nvme-pci: make CMB SQ mod-param read-only
nvme-pci: unquiesce dead controller queues
nvme-pci: remove HMB teardown on reset
nvme-pci: queue creation fixes
nvme-pci: remove unnecessary completion doorbell check
nvme-pci: remove unnecessary nested locking
nvmet: filter newlines from user input
nvme-rdma: correctly check for target keyed sgl support
nvme: don't hold nvmf_transports_rwsem for more than transport lookups
nvmet: return all zeroed buffer when we can't find an active namespace
md: Unify mddev destruction paths
dm: use bioset_init_from_src() to copy bio_set
block: add bioset_init_from_src() helper
block: always set partition number to '0' in blk_partition_remap()
block: pass failfast and driver-specific flags to flush requests
nbd: set discard_alignment to the granularity
nbd: Consistently use request pointer in debug messages.
block: add verifier for cmdline partition
lightnvm: pblk: fix resource leak of invalid_bitmap
...
Pull xfs updates from Darrick Wong:
"New features this cycle include the ability to relabel mounted
filesystems, support for fallocated swapfiles, and using FUA for pure
data O_DSYNC directio writes. With this cycle we begin to integrate
online filesystem repair and refactor the growfs code in preparation
for eventual subvolume support, though the road ahead for both
features is quite long.
There are also numerous refactorings of the iomap code to remove
unnecessary log overhead, to disentangle some of the quota code, and
to prepare for buffer head removal in a future upstream kernel.
Metadata validation continues to improve, both in the hot path
veifiers and the online filesystem check code. I anticipate sending a
second pull request in a few days with more metadata validation
improvements.
This series has been run through a full xfstests run over the weekend
and through a quick xfstests run against this morning's master, with
no major failures reported.
Summary:
- Strengthen inode number and structure validation when allocating
inodes.
- Reduce pointless buffer allocations during cache miss
- Use FUA for pure data O_DSYNC directio writes
- Various iomap refactorings
- Strengthen quota metadata verification to avoid unfixable broken
quota
- Make AGFL block freeing a deferred operation to avoid blowing out
transaction reservations when running complex operations
- Get rid of the log item descriptors to reduce log overhead
- Fix various reflink bugs where inodes were double-joined to
transactions
- Don't issue discards when trimming unwritten extents
- Refactor incore dquot initialization and retrieval interfaces
- Fix some locking problmes in the quota scrub code
- Strengthen btree structure checks in scrub code
- Rewrite swapfile activation to use iomap and support unwritten
extents
- Make scrub exit to userspace sooner when corruptions or
cross-referencing problems are found
- Make scrub invoke the data fork scrubber directly on metadata
inodes
- Don't do background reclamation of post-eof and cow blocks when the
fs is suspended
- Fix secondary superblock buffer lifespan hinting
- Refactor growfs to use table-dispatched functions instead of long
stringy functions
- Move growfs code to libxfs
- Implement online fs label getting and setting
- Introduce online filesystem repair (in a very limited capacity)
- Fix unit conversion problems in the realtime freemap iteration
functions
- Various refactorings and cleanups in preparation to remove buffer
heads in a future release
- Reimplement the old bmap call with iomap
- Remove direct buffer head accesses from seek hole/data
- Various bug fixes"
* tag 'xfs-4.18-merge-3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: (121 commits)
fs: use ->is_partially_uptodate in page_cache_seek_hole_data
fs: remove the buffer_unwritten check in page_seek_hole_data
fs: move page_cache_seek_hole_data to iomap.c
xfs: use iomap_bmap
iomap: add an iomap-based bmap implementation
iomap: add a iomap_sector helper
iomap: use __bio_add_page in iomap_dio_zero
iomap: move IOMAP_F_BOUNDARY to gfs2
iomap: fix the comment describing IOMAP_NOWAIT
iomap: inline data should be an iomap type, not a flag
mm: split ->readpages calls to avoid non-contiguous pages lists
mm: return an unsigned int from __do_page_cache_readahead
mm: give the 'ret' variable a better name __do_page_cache_readahead
block: add a lower-level bio_add_page interface
xfs: fix error handling in xfs_refcount_insert()
xfs: fix xfs_rtalloc_rec units
xfs: strengthen rtalloc query range checks
xfs: xfs_rtbuf_get should check the bmapi_read results
xfs: xfs_rtword_t should be unsigned, not signed
dax: change bdev_dax_supported() to support boolean returns
...
For the upcoming removal of buffer heads in XFS we need to keep track of
the number of outstanding writeback requests per page. For this we need
to know if bio_add_page merged a region with the previous bvec or not.
Instead of adding additional arguments this refactors bio_add_page to
be implemented using three lower level helpers which users like XFS can
use directly if they care about the merge decisions.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jens Axboe <axboe@kernel.dk>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
All users have been converted to bioset_init(), kill off the
old API.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Convert the core block functionality to embedded bio sets.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Found a bug (with ASAN) where we were passing a bio to bio_copy_data()
with bi_next not NULL, when it should have been - a driver had left
bi_next set to something after calling bio_endio().
Since the normal case is only copying single bios, split out
bio_list_copy_data() to avoid more bugs like this in the future.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Add versions that take bvec_iter args instead of using bio->bi_iter - to
be used by bcachefs.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Minor optimization - remove a pointer indirection when using fs_bio_set.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Similarly to mempool_init()/mempool_exit(), take a pointer indirection
out of allocation/freeing by allowing biosets to be embedded in other
structs.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Minor performance improvement by getting rid of pointer indirections
from allocation/freeing fastpaths.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
bio_devname use __bdevname to display the device name, and can
only show the major and minor of the part0,
Fix this by using disk_name to display the correct name.
Fixes: 74d46992e0 ("block: replace bi_bdev with a gendisk pointer and partitions index")
Reviewed-by: Omar Sandoval <osandov@fb.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jiufei Xue <jiufei.xue@linux.alibaba.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Pull block updates from Jens Axboe:
"This is the main pull request for block IO related changes for the
4.16 kernel. Nothing major in this pull request, but a good amount of
improvements and fixes all over the map. This contains:
- BFQ improvements, fixes, and cleanups from Angelo, Chiara, and
Paolo.
- Support for SMR zones for deadline and mq-deadline from Damien and
Christoph.
- Set of fixes for bcache by way of Michael Lyle, including fixes
from himself, Kent, Rui, Tang, and Coly.
- Series from Matias for lightnvm with fixes from Hans Holmberg,
Javier, and Matias. Mostly centered around pblk, and the removing
rrpc 1.2 in preparation for supporting 2.0.
- A couple of NVMe pull requests from Christoph. Nothing major in
here, just fixes and cleanups, and support for command tracing from
Johannes.
- Support for blk-throttle for tracking reads and writes separately.
From Joseph Qi. A few cleanups/fixes also for blk-throttle from
Weiping.
- Series from Mike Snitzer that enables dm to register its queue more
logically, something that's alwways been problematic on dm since
it's a stacked device.
- Series from Ming cleaning up some of the bio accessor use, in
preparation for supporting multipage bvecs.
- Various fixes from Ming closing up holes around queue mapping and
quiescing.
- BSD partition fix from Richard Narron, fixing a problem where we
can't mount newer (10/11) FreeBSD partitions.
- Series from Tejun reworking blk-mq timeout handling. The previous
scheme relied on atomic bits, but it had races where we would think
a request had timed out if it to reused at the wrong time.
- null_blk now supports faking timeouts, to enable us to better
exercise and test that functionality separately. From me.
- Kill the separate atomic poll bit in the request struct. After
this, we don't use the atomic bits on blk-mq anymore at all. From
me.
- sgl_alloc/free helpers from Bart.
- Heavily contended tag case scalability improvement from me.
- Various little fixes and cleanups from Arnd, Bart, Corentin,
Douglas, Eryu, Goldwyn, and myself"
* 'for-4.16/block' of git://git.kernel.dk/linux-block: (186 commits)
block: remove smart1,2.h
nvme: add tracepoint for nvme_complete_rq
nvme: add tracepoint for nvme_setup_cmd
nvme-pci: introduce RECONNECTING state to mark initializing procedure
nvme-rdma: remove redundant boolean for inline_data
nvme: don't free uuid pointer before printing it
nvme-pci: Suspend queues after deleting them
bsg: use pr_debug instead of hand crafted macros
blk-mq-debugfs: don't allow write on attributes with seq_operations set
nvme-pci: Fix queue double allocations
block: Set BIO_TRACE_COMPLETION on new bio during split
blk-throttle: use queue_is_rq_based
block: Remove kblockd_schedule_delayed_work{,_on}()
blk-mq: Avoid that blk_mq_delay_run_hw_queue() introduces unintended delays
blk-mq: Rename blk_mq_request_direct_issue() into blk_mq_request_issue_directly()
lib/scatterlist: Fix chaining support in sgl_alloc_order()
blk-throttle: track read and write request individually
block: add bdev_read_only() checks to common helpers
block: fail op_is_write() requests to read-only partitions
blk-throttle: export io_serviced_recursive, io_service_bytes_recursive
...
bcache is the only user of bio_alloc_pages(), so move this function into
bcache, and avoid it being misused in the future.
Also rename it to bch_bio_allo_pages() since it is bcache only.
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
The following helpers are introduced for converting current users of
direct access to bvec table, and prepares for supporting multipage bvec:
bio_pages_all()
bio_first_bvec_all()
bio_first_page_all()
bio_last_bvec_all()
All are named as bio_*_all() to following bio_for_each_segment_all(),
they can only be used on bio of !bio_flagged(bio, BIO_CLONED), that means
the whole bvec table is covered.
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
If a bio is throttled and split after throttling, the bio could be
resubmited and enters the throttling again. This will cause part of the
bio to be charged multiple times. If the cgroup has an IO limit, the
double charge will significantly harm the performance. The bio split
becomes quite common after arbitrary bio size change.
To fix this, we always set the BIO_THROTTLED flag if a bio is throttled.
If the bio is cloned/split, we copy the flag to new bio too to avoid a
double charge. However, cloned bio could be directed to a new disk,
keeping the flag be a problem. The observation is we always set new disk
for the bio in this case, so we can clear the flag in bio_set_dev().
This issue exists for a long time, arbitrary bio size change just makes
it worse, so this should go into stable at least since v4.2.
V1-> V2: Not add extra field in bio based on discussion with Tejun
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: stable@vger.kernel.org
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Shaohua Li <shli@fb.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Pull iov_iter updates from Al Viro:
- bio_{map,copy}_user_iov() series; those are cleanups - fixes from the
same pile went into mainline (and stable) in late September.
- fs/iomap.c iov_iter-related fixes
- new primitive - iov_iter_for_each_range(), which applies a function
to kernel-mapped segments of an iov_iter.
Usable for kvec and bvec ones, the latter does kmap()/kunmap() around
the callback. _Not_ usable for iovec- or pipe-backed iov_iter; the
latter is not hard to fix if the need ever appears, the former is by
design.
Another related primitive will have to wait for the next cycle - it
passes page + offset + size instead of pointer + size, and that one
will be usable for everything _except_ kvec. Unfortunately, that one
didn't get exposure in -next yet, so...
- a bit more lustre iov_iter work, including a use case for
iov_iter_for_each_range() (checksum calculation)
- vhost/scsi leak fix in failure exit
- misc cleanups and detritectomy...
* 'work.iov_iter' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (21 commits)
iomap_dio_actor(): fix iov_iter bugs
switch ksocknal_lib_recv_...() to use of iov_iter_for_each_range()
lustre: switch struct ksock_conn to iov_iter
vhost/scsi: switch to iov_iter_get_pages()
fix a page leak in vhost_scsi_iov_to_sgl() error recovery
new primitive: iov_iter_for_each_range()
lnet_return_rx_credits_locked: don't abuse list_entry
xen: don't open-code iov_iter_kvec()
orangefs: remove detritus from struct orangefs_kiocb_s
kill iov_shorten()
bio_alloc_map_data(): do bmd->iter setup right there
bio_copy_user_iov(): saner bio size calculation
bio_map_user_iov(): get rid of copying iov_iter
bio_copy_from_iter(): get rid of copying iov_iter
move more stuff down into bio_copy_user_iov()
blk_rq_map_user_iov(): move iov_iter_advance() down
bio_map_user_iov(): get rid of the iov_for_each()
bio_map_user_iov(): move alignment check into the main loop
don't rely upon subsequent bio_add_pc_page() calls failing
... and with iov_iter_get_pages_alloc() it becomes even simpler
...
This helper doesn't buy us much over calling kmap_atomic directly.
In fact in the only caller it does a bit of useless work as the
caller already has the bvec at hand, and said caller would even
buggy for a multi-segment bio due to the use of this helper.
So just remove it.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Pull block layer updates from Jens Axboe:
"This is the first pull request for 4.14, containing most of the code
changes. It's a quiet series this round, which I think we needed after
the churn of the last few series. This contains:
- Fix for a registration race in loop, from Anton Volkov.
- Overflow complaint fix from Arnd for DAC960.
- Series of drbd changes from the usual suspects.
- Conversion of the stec/skd driver to blk-mq. From Bart.
- A few BFQ improvements/fixes from Paolo.
- CFQ improvement from Ritesh, allowing idling for group idle.
- A few fixes found by Dan's smatch, courtesy of Dan.
- A warning fixup for a race between changing the IO scheduler and
device remova. From David Jeffery.
- A few nbd fixes from Josef.
- Support for cgroup info in blktrace, from Shaohua.
- Also from Shaohua, new features in the null_blk driver to allow it
to actually hold data, among other things.
- Various corner cases and error handling fixes from Weiping Zhang.
- Improvements to the IO stats tracking for blk-mq from me. Can
drastically improve performance for fast devices and/or big
machines.
- Series from Christoph removing bi_bdev as being needed for IO
submission, in preparation for nvme multipathing code.
- Series from Bart, including various cleanups and fixes for switch
fall through case complaints"
* 'for-4.14/block' of git://git.kernel.dk/linux-block: (162 commits)
kernfs: checking for IS_ERR() instead of NULL
drbd: remove BIOSET_NEED_RESCUER flag from drbd_{md_,}io_bio_set
drbd: Fix allyesconfig build, fix recent commit
drbd: switch from kmalloc() to kmalloc_array()
drbd: abort drbd_start_resync if there is no connection
drbd: move global variables to drbd namespace and make some static
drbd: rename "usermode_helper" to "drbd_usermode_helper"
drbd: fix race between handshake and admin disconnect/down
drbd: fix potential deadlock when trying to detach during handshake
drbd: A single dot should be put into a sequence.
drbd: fix rmmod cleanup, remove _all_ debugfs entries
drbd: Use setup_timer() instead of init_timer() to simplify the code.
drbd: fix potential get_ldev/put_ldev refcount imbalance during attach
drbd: new disk-option disable-write-same
drbd: Fix resource role for newly created resources in events2
drbd: mark symbols static where possible
drbd: Send P_NEG_ACK upon write error in protocol != C
drbd: add explicit plugging when submitting batches
drbd: change list_for_each_safe to while(list_first_entry_or_null)
drbd: introduce drbd_recv_header_maybe_unplug
...
This way we don't need a block_device structure to submit I/O. The
block_device has different life time rules from the gendisk and
request_queue and is usually only available when the block device node
is open. Other callers need to explicitly create one (e.g. the lightnvm
passthrough code, or the new nvme multipathing code).
For the actual I/O path all that we need is the gendisk, which exists
once per block device. But given that the block layer also does
partition remapping we additionally need a partition index, which is
used for said remapping in generic_make_request.
Note that all the block drivers generally want request_queue or
sometimes the gendisk, so this removes a layer of indirection all
over the stack.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
No functional change in this patch, just in preparation for
basing the inflight mechanism on the queue in question.
Reviewed-by: Bart Van Assche <bart.vanassche@wdc.com>
Reviewed-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
And instead call directly into the integrity code from bio_end_io.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Some ->bi_end_io handlers (for example: pi_verify or decrypt handlers)
need to know original data vector, but after bio traverse io-stack it may
be advanced, splited and relocated many times so it is hard to guess
original iterator. Let's add 'bi_done' conter which accounts number
of bytes iterator was advanced during it's evolution. Later end_io handler
may easily restore original iterator by rewinding iterator to
iter->bi_done.
Note: this change makes sizeof (struct bvec_iter) multiple to 8
Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
[hch: switched to true/false return]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Currently if some one try to advance bvec beyond it's size we simply
dump WARN_ONCE and continue to iterate beyond bvec array boundaries.
This simply means that we endup dereferencing/corrupting random memory
region.
Sane reaction would be to propagate error back to calling context
But bvec_iter_advance's calling context is not always good for error
handling. For safity reason let truncate iterator size to zero which
will break external iteration loop which prevent us from unpredictable
memory range corruption. And even it caller ignores an error, it will
corrupt it's own bvecs, not others.
This patch does:
- Return error back to caller with hope that it will react on this
- Truncate iterator size
Code was added long time ago here 4550dd6c, luckily no one hit it
in real life :)
Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
[hch: switch to true/false returns instead of errno values]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Currently all integrity prep hooks are open-coded, and if prepare fails
we ignore it's code and fail bio with EIO. Let's return real error to
upper layer, so later caller may react accordingly.
In fact no one want to use bio_integrity_prep() w/o bio_integrity_enabled,
so it is reasonable to fold it in to one function.
Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
[hch: merged with the latest block tree,
return bool from bio_integrity_prep]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
bio_integrity_trim inherent it's interface from bio_trim and accept
offset and size, but this API is error prone because data offset
must always be insync with bio's data offset. That is why we have
integrity update hook in bio_advance()
So only meaningful values are: offset == 0, sectors == bio_sectors(bio)
Let's just remove them completely.
Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Pull core block/IO updates from Jens Axboe:
"This is the main pull request for the block layer for 4.13. Not a huge
round in terms of features, but there's a lot of churn related to some
core cleanups.
Note this depends on the UUID tree pull request, that Christoph
already sent out.
This pull request contains:
- A series from Christoph, unifying the error/stats codes in the
block layer. We now use blk_status_t everywhere, instead of using
different schemes for different places.
- Also from Christoph, some cleanups around request allocation and IO
scheduler interactions in blk-mq.
- And yet another series from Christoph, cleaning up how we handle
and do bounce buffering in the block layer.
- A blk-mq debugfs series from Bart, further improving on the support
we have for exporting internal information to aid debugging IO
hangs or stalls.
- Also from Bart, a series that cleans up the request initialization
differences across types of devices.
- A series from Goldwyn Rodrigues, allowing the block layer to return
failure if we will block and the user asked for non-blocking.
- Patch from Hannes for supporting setting loop devices block size to
that of the underlying device.
- Two series of patches from Javier, fixing various issues with
lightnvm, particular around pblk.
- A series from me, adding support for write hints. This comes with
NVMe support as well, so applications can help guide data placement
on flash to improve performance, latencies, and write
amplification.
- A series from Ming, improving and hardening blk-mq support for
stopping/starting and quiescing hardware queues.
- Two pull requests for NVMe updates. Nothing major on the feature
side, but lots of cleanups and bug fixes. From the usual crew.
- A series from Neil Brown, greatly improving the bio rescue set
support. Most notably, this kills the bio rescue work queues, if we
don't really need them.
- Lots of other little bug fixes that are all over the place"
* 'for-4.13/block' of git://git.kernel.dk/linux-block: (217 commits)
lightnvm: pblk: set line bitmap check under debug
lightnvm: pblk: verify that cache read is still valid
lightnvm: pblk: add initialization check
lightnvm: pblk: remove target using async. I/Os
lightnvm: pblk: use vmalloc for GC data buffer
lightnvm: pblk: use right metadata buffer for recovery
lightnvm: pblk: schedule if data is not ready
lightnvm: pblk: remove unused return variable
lightnvm: pblk: fix double-free on pblk init
lightnvm: pblk: fix bad le64 assignations
nvme: Makefile: remove dead build rule
blk-mq: map all HWQ also in hyperthreaded system
nvmet-rdma: register ib_client to not deadlock in device removal
nvme_fc: fix error recovery on link down.
nvmet_fc: fix crashes on bad opcodes
nvme_fc: Fix crash when nvme controller connection fails.
nvme_fc: replace ioabort msleep loop with completion
nvme_fc: fix double calls to nvme_cleanup_cmd()
nvme-fabrics: verify that a controller returns the correct NQN
nvme: simplify nvme_dev_attrs_are_visible
...
Wen reports significant memory leaks with DIF and O_DIRECT:
"With nvme devive + T10 enabled, On a system it has 256GB and started
logging /proc/meminfo & /proc/slabinfo for every minute and in an hour
it increased by 15968128 kB or ~15+GB.. Approximately 256 MB / minute
leaking.
/proc/meminfo | grep SUnreclaim...
SUnreclaim: 6752128 kB
SUnreclaim: 6874880 kB
SUnreclaim: 7238080 kB
....
SUnreclaim: 22307264 kB
SUnreclaim: 22485888 kB
SUnreclaim: 22720256 kB
When testcases with T10 enabled call into __blkdev_direct_IO_simple,
code doesn't free memory allocated by bio_integrity_alloc. The patch
fixes the issue. HTX has been run with +60 hours without failure."
Since __blkdev_direct_IO_simple() allocates the bio on the stack, it
doesn't go through the regular bio free. This means that any ancillary
data allocated with the bio through the stack is not freed. Hence, we
can leak the integrity data associated with the bio, if the device is
using DIF/DIX.
Fix this by providing a bio_uninit() and export it, so that we can use
it to free this data. Note that this is a minimal fix for this issue.
Any current user of bio's that are allocated outside of
bio_alloc_bioset() suffers from this issue, most notably some drivers.
We will fix those in a more comprehensive patch for 4.13. This also
means that the commit marked as being fixed by this isn't the real
culprit, it's just the most obvious one out there.
Fixes: 542ff7bf18 ("block: new direct I/O implementation")
Reported-by: Wen Xiong <wenxiong@linux.vnet.ibm.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
A new bio operation flag REQ_NOWAIT is introduced to identify bio's
orignating from iocb with IOCB_NOWAIT. This flag indicates
to return immediately if a request cannot be made instead
of retrying.
Stacked devices such as md (the ones with make_request_fn hooks)
currently are not supported because it may block for housekeeping.
For example, an md can have a part of the device suspended.
For this reason, only request based devices are supported.
In the future, this feature will be expanded to stacked devices
by teaching them how to handle the REQ_NOWAIT flags.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
bio_clone() is no longer used.
Only bio_clone_bioset() or bio_clone_fast().
This is for the best, as bio_clone() used fs_bio_set,
and filesystems are unlikely to want to use bio_clone().
So remove bio_clone() and all references.
This includes a fix to some incorrect documentation.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
This patch converts bioset_create() to not create a workqueue by
default, so alloctions will never trigger punt_bios_to_rescuer(). It
also introduces a new flag BIOSET_NEED_RESCUER which tells
bioset_create() to preserve the old behavior.
All callers of bioset_create() that are inside block device drivers,
are given the BIOSET_NEED_RESCUER flag.
biosets used by filesystems or other top-level users do not
need rescuing as the bio can never be queued behind other
bios. This includes fs_bio_set, blkdev_dio_pool,
btrfs_bioset, xfs_ioend_bioset, and one allocated by
target_core_iblock.c.
biosets used by md/raid do not need rescuing as
their usage was recently audited and revised to never
risk deadlock.
It is hoped that most, if not all, of the remaining biosets
can end up being the non-rescued version.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Credit-to: Ming Lei <ming.lei@redhat.com> (minor fixes)
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
"flags" arguments are often seen as good API design as they allow
easy extensibility.
bioset_create_nobvec() is implemented internally as a variation in
flags passed to __bioset_create().
To support future extension, make the internal structure part of the
API.
i.e. add a 'flags' argument to bioset_create() and discard
bioset_create_nobvec().
Note that the bio_split allocations in drivers/md/raid* do not need
the bvec mempool - they should have used bioset_create_nobvec().
Suggested-by: Christoph Hellwig <hch@infradead.org>
Reviewed-by: Christoph Hellwig <hch@infradead.org>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Replace bi_error with a new bi_status to allow for a clear conversion.
Note that device mapper overloaded bi_error with a private value, which
we'll have to keep arround at least for now and thus propagate to a
proper blk_status_t value.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
This reverts commit 6f8802852f.
bio_copy_data_partial() is no longer needed.
Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: Shaohua Li <shli@fb.com>