The arm64 module region is a 128 MB region that is kept close to
the core kernel, in order to ensure that relative branches are
always in range. So using the same region for programs that do
not have this restriction is wasteful, and preferably avoided.
Now that the core BPF JIT code permits the alloc/free routines to
be overridden, implement them by vmalloc()/vfree() calls from a
dedicated 128 MB region set aside for BPF programs. This ensures
that BPF programs are still in branching range of each other, which
is something the JIT currently depends upon (and is not guaranteed
when using module_alloc() on KASLR kernels like we do currently).
It also ensures that placement of BPF programs does not correlate
with the placement of the core kernel or modules, making it less
likely that leaking the former will reveal the latter.
This also solves an issue under KASAN, where shadow memory is
needlessly allocated for all BPF programs (which don't require KASAN
shadow pages since they are not KASAN instrumented)
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Danny Lin <danny@kdrag0n.dev>
Change-Id: Id3a066651f282249e8825b7bcdf9a78e84c1f878
Changes in 4.19.247
binfmt_flat: do not stop relocating GOT entries prematurely on riscv
ALSA: hda/realtek - Fix microphone noise on ASUS TUF B550M-PLUS
USB: serial: option: add Quectel BG95 modem
USB: new quirk for Dell Gen 2 devices
ptrace/xtensa: Replace PT_SINGLESTEP with TIF_SINGLESTEP
ptrace: Reimplement PTRACE_KILL by always sending SIGKILL
btrfs: add "0x" prefix for unsupported optional features
btrfs: repair super block num_devices automatically
drm/virtio: fix NULL pointer dereference in virtio_gpu_conn_get_modes
mwifiex: add mutex lock for call in mwifiex_dfs_chan_sw_work_queue
b43legacy: Fix assigning negative value to unsigned variable
b43: Fix assigning negative value to unsigned variable
ipw2x00: Fix potential NULL dereference in libipw_xmit()
ipv6: fix locking issues with loops over idev->addr_list
fbcon: Consistently protect deferred_takeover with console_lock()
ACPICA: Avoid cache flush inside virtual machines
ALSA: jack: Access input_dev under mutex
drm/amd/pm: fix double free in si_parse_power_table()
ath9k: fix QCA9561 PA bias level
media: venus: hfi: avoid null dereference in deinit
media: pci: cx23885: Fix the error handling in cx23885_initdev()
media: cx25821: Fix the warning when removing the module
md/bitmap: don't set sb values if can't pass sanity check
scsi: megaraid: Fix error check return value of register_chrdev()
drm/plane: Move range check for format_count earlier
drm/amd/pm: fix the compile warning
ipv6: Don't send rs packets to the interface of ARPHRD_TUNNEL
ASoC: dapm: Don't fold register value changes into notifications
mlxsw: spectrum_dcb: Do not warn about priority changes
ASoC: tscs454: Add endianness flag in snd_soc_component_driver
s390/preempt: disable __preempt_count_add() optimization for PROFILE_ALL_BRANCHES
dma-debug: change allocation mode from GFP_NOWAIT to GFP_ATIOMIC
ipmi:ssif: Check for NULL msg when handling events and messages
rtlwifi: Use pr_warn instead of WARN_ONCE
media: cec-adap.c: fix is_configuring state
openrisc: start CPU timer early in boot
nvme-pci: fix a NULL pointer dereference in nvme_alloc_admin_tags
ASoC: rt5645: Fix errorenous cleanup order
net: phy: micrel: Allow probing without .driver_data
media: exynos4-is: Fix compile warning
hwmon: Make chip parameter for with_info API mandatory
rxrpc: Return an error to sendmsg if call failed
eth: tg3: silence the GCC 12 array-bounds warning
ARM: dts: ox820: align interrupt controller node name with dtschema
PM / devfreq: rk3399_dmc: Disable edev on remove()
fs: jfs: fix possible NULL pointer dereference in dbFree()
ARM: OMAP1: clock: Fix UART rate reporting algorithm
fat: add ratelimit to fat*_ent_bread()
ARM: versatile: Add missing of_node_put in dcscb_init
ARM: dts: exynos: add atmel,24c128 fallback to Samsung EEPROM
ARM: hisi: Add missing of_node_put after of_find_compatible_node
PCI: Avoid pci_dev_lock() AB/BA deadlock with sriov_numvfs_store()
tracing: incorrect isolate_mote_t cast in mm_vmscan_lru_isolate
powerpc/xics: fix refcount leak in icp_opal_init()
macintosh/via-pmu: Fix build failure when CONFIG_INPUT is disabled
RDMA/hfi1: Prevent panic when SDMA is disabled
drm: fix EDID struct for old ARM OABI format
ath9k: fix ar9003_get_eepmisc
drm/edid: fix invalid EDID extension block filtering
drm/bridge: adv7511: clean up CEC adapter when probe fails
ASoC: mediatek: Fix error handling in mt8173_max98090_dev_probe
ASoC: mediatek: Fix missing of_node_put in mt2701_wm8960_machine_probe
x86/delay: Fix the wrong asm constraint in delay_loop()
drm/mediatek: Fix mtk_cec_mask()
drm/vc4: txp: Don't set TXP_VSTART_AT_EOF
drm/vc4: txp: Force alpha to be 0xff if it's disabled
nl80211: show SSID for P2P_GO interfaces
spi: spi-ti-qspi: Fix return value handling of wait_for_completion_timeout
NFC: NULL out the dev->rfkill to prevent UAF
efi: Add missing prototype for efi_capsule_setup_info
HID: hid-led: fix maximum brightness for Dream Cheeky
HID: elan: Fix potential double free in elan_input_configured
spi: img-spfi: Fix pm_runtime_get_sync() error checking
ath9k_htc: fix potential out of bounds access with invalid rxstatus->rs_keyix
inotify: show inotify mask flags in proc fdinfo
fsnotify: fix wrong lockdep annotations
of: overlay: do not break notify on NOTIFY_{OK|STOP}
scsi: ufs: core: Exclude UECxx from SFR dump list
x86/pm: Fix false positive kmemleak report in msr_build_context()
x86/speculation: Add missing prototype for unpriv_ebpf_notify()
drm/msm/disp/dpu1: set vbif hw config to NULL to avoid use after memory free during pm runtime resume
drm/msm/dsi: fix error checks and return values for DSI xmit functions
drm/msm/hdmi: check return value after calling platform_get_resource_byname()
drm/rockchip: vop: fix possible null-ptr-deref in vop_bind()
x86: Fix return value of __setup handlers
irqchip/aspeed-i2c-ic: Fix irq_of_parse_and_map() return value
x86/mm: Cleanup the control_va_addr_alignment() __setup handler
drm/msm/mdp5: Return error code in mdp5_pipe_release when deadlock is detected
drm/msm/mdp5: Return error code in mdp5_mixer_release when deadlock is detected
drm/msm: return an error pointer in msm_gem_prime_get_sg_table()
media: uvcvideo: Fix missing check to determine if element is found in list
perf/amd/ibs: Use interrupt regs ip for stack unwinding
ASoC: mxs-saif: Fix refcount leak in mxs_saif_probe
regulator: pfuze100: Fix refcount leak in pfuze_parse_regulators_dt
scripts/faddr2line: Fix overlapping text section failures
media: st-delta: Fix PM disable depth imbalance in delta_probe
media: exynos4-is: Change clk_disable to clk_disable_unprepare
media: pvrusb2: fix array-index-out-of-bounds in pvr2_i2c_core_init
media: vsp1: Fix offset calculation for plane cropping
Bluetooth: fix dangling sco_conn and use-after-free in sco_sock_timeout
m68k: math-emu: Fix dependencies of math emulation support
sctp: read sk->sk_bound_dev_if once in sctp_rcv()
ext4: reject the 'commit' option on ext2 filesystems
drm: msm: fix possible memory leak in mdp5_crtc_cursor_set()
ASoC: wm2000: fix missing clk_disable_unprepare() on error in wm2000_anc_transition()
NFC: hci: fix sleep in atomic context bugs in nfc_hci_hcp_message_tx
rxrpc: Fix listen() setting the bar too high for the prealloc rings
rxrpc: Don't try to resend the request if we're receiving the reply
soc: qcom: smp2p: Fix missing of_node_put() in smp2p_parse_ipc
soc: qcom: smsm: Fix missing of_node_put() in smsm_parse_ipc
PCI: cadence: Fix find_first_zero_bit() limit
PCI: rockchip: Fix find_first_zero_bit() limit
ARM: dts: bcm2835-rpi-zero-w: Fix GPIO line name for Wifi/BT
ARM: dts: bcm2835-rpi-b: Fix GPIO line names
crypto: marvell/cesa - ECB does not IV
mfd: ipaq-micro: Fix error check return value of platform_get_irq()
scsi: fcoe: Fix Wstringop-overflow warnings in fcoe_wwn_from_mac()
firmware: arm_scmi: Fix list protocols enumeration in the base protocol
pinctrl: mvebu: Fix irq_of_parse_and_map() return value
drivers/base/node.c: fix compaction sysfs file leak
dax: fix cache flush on PMD-mapped pages
powerpc/8xx: export 'cpm_setbrg' for modules
powerpc/idle: Fix return value of __setup() handler
powerpc/4xx/cpm: Fix return value of __setup() handler
proc: fix dentry/inode overinstantiating under /proc/${pid}/net
tty: fix deadlock caused by calling printk() under tty_port->lock
Input: sparcspkr - fix refcount leak in bbc_beep_probe
powerpc/perf: Fix the threshold compare group constraint for power9
powerpc/fsl_rio: Fix refcount leak in fsl_rio_setup
mailbox: forward the hrtimer if not queued and under a lock
RDMA/hfi1: Prevent use of lock before it is initialized
f2fs: fix dereference of stale list iterator after loop body
iommu/mediatek: Add list_del in mtk_iommu_remove
i2c: at91: use dma safe buffers
i2c: at91: Initialize dma_buf in at91_twi_xfer()
NFSv4/pNFS: Do not fail I/O when we fail to allocate the pNFS layout
video: fbdev: clcdfb: Fix refcount leak in clcdfb_of_vram_setup
dmaengine: stm32-mdma: remove GISR1 register
iommu/amd: Increase timeout waiting for GA log enablement
perf c2c: Use stdio interface if slang is not supported
perf jevents: Fix event syntax error caused by ExtSel
f2fs: fix deadloop in foreground GC
wifi: mac80211: fix use-after-free in chanctx code
iwlwifi: mvm: fix assert 1F04 upon reconfig
fs-writeback: writeback_sb_inodes:Recalculate 'wrote' according skipped pages
netfilter: nf_tables: disallow non-stateful expression in sets earlier
ext4: fix use-after-free in ext4_rename_dir_prepare
ext4: fix bug_on in ext4_writepages
ext4: verify dir block before splitting it
ext4: avoid cycles in directory h-tree
tracing: Fix potential double free in create_var_ref()
PCI/PM: Fix bridge_d3_blacklist[] Elo i2 overwrite of Gigabyte X299
PCI: qcom: Fix runtime PM imbalance on probe errors
PCI: qcom: Fix unbalanced PHY init on probe errors
dlm: fix plock invalid read
dlm: fix missing lkb refcount handling
ocfs2: dlmfs: fix error handling of user_dlm_destroy_lock
scsi: dc395x: Fix a missing check on list iterator
scsi: ufs: qcom: Add a readl() to make sure ref_clk gets enabled
drm/amdgpu/cs: make commands with 0 chunks illegal behaviour.
drm/nouveau/clk: Fix an incorrect NULL check on list iterator
drm/bridge: analogix_dp: Grab runtime PM reference for DP-AUX
md: fix an incorrect NULL check in does_sb_need_changing
md: fix an incorrect NULL check in md_reload_sb
media: coda: Fix reported H264 profile
media: coda: Add more H264 levels for CODA960
RDMA/hfi1: Fix potential integer multiplication overflow errors
irqchip/armada-370-xp: Do not touch Performance Counter Overflow on A375, A38x, A39x
irqchip: irq-xtensa-mx: fix initial IRQ affinity
mac80211: upgrade passive scan to active scan on DFS channels after beacon rx
um: chan_user: Fix winch_tramp() return value
um: Fix out-of-bounds read in LDT setup
iommu/msm: Fix an incorrect NULL check on list iterator
nodemask.h: fix compilation error with GCC12
hugetlb: fix huge_pmd_unshare address update
rtl818x: Prevent using not initialized queues
ASoC: rt5514: Fix event generation for "DSP Voice Wake Up" control
carl9170: tx: fix an incorrect use of list iterator
gma500: fix an incorrect NULL check on list iterator
arm64: dts: qcom: ipq8074: fix the sleep clock frequency
phy: qcom-qmp: fix struct clk leak on probe errors
docs/conf.py: Cope with removal of language=None in Sphinx 5.0.0
dt-bindings: gpio: altera: correct interrupt-cells
blk-iolatency: Fix inflight count imbalances and IO hangs on offline
phy: qcom-qmp: fix reset-controller leak on probe errors
RDMA/rxe: Generate a completion for unsupported/invalid opcode
MIPS: IP27: Remove incorrect `cpu_has_fpu' override
md: bcache: check the return value of kzalloc() in detached_dev_do_request()
pcmcia: db1xxx_ss: restrict to MIPS_DB1XXX boards
staging: greybus: codecs: fix type confusion of list iterator variable
tty: goldfish: Use tty_port_destroy() to destroy port
usb: usbip: fix a refcount leak in stub_probe()
usb: usbip: add missing device lock on tweak configuration cmd
USB: storage: karma: fix rio_karma_init return
usb: musb: Fix missing of_node_put() in omap2430_probe
pwm: lp3943: Fix duty calculation in case period was clamped
rpmsg: qcom_smd: Fix irq_of_parse_and_map() return value
usb: dwc3: pci: Fix pm_runtime_get_sync() error checking
iio: adc: sc27xx: fix read big scale voltage not right
rpmsg: qcom_smd: Fix returning 0 if irq_of_parse_and_map() fails
coresight: cpu-debug: Replace mutex with mutex_trylock on panic notifier
soc: rockchip: Fix refcount leak in rockchip_grf_init
clocksource/drivers/riscv: Events are stopped during CPU suspend
rtc: mt6397: check return value after calling platform_get_resource()
serial: meson: acquire port->lock in startup()
serial: 8250_fintek: Check SER_RS485_RTS_* only with RS485
serial: digicolor-usart: Don't allow CS5-6
serial: txx9: Don't allow CS5-6
serial: sh-sci: Don't allow CS5-6
serial: st-asc: Sanitize CSIZE and correct PARENB for CS7
serial: stm32-usart: Correct CSIZE, bits, and parity
firmware: dmi-sysfs: Fix memory leak in dmi_sysfs_register_handle
bus: ti-sysc: Fix warnings for unbind for serial
clocksource/drivers/oxnas-rps: Fix irq_of_parse_and_map() return value
s390/crypto: fix scatterwalk_unmap() callers in AES-GCM
net: ethernet: mtk_eth_soc: out of bounds read in mtk_hwlro_get_fdir_entry()
net: dsa: mv88e6xxx: Fix refcount leak in mv88e6xxx_mdios_register
modpost: fix removing numeric suffixes
jffs2: fix memory leak in jffs2_do_fill_super
ubi: ubi_create_volume: Fix use-after-free when volume creation failed
nfp: only report pause frame configuration for physical device
net/mlx5e: Update netdev features after changing XDP state
tcp: tcp_rtx_synack() can be called from process context
afs: Fix infinite loop found by xfstest generic/676
tipc: check attribute length for bearer name
perf c2c: Fix sorting in percent_rmt_hitm_cmp()
mips: cpc: Fix refcount leak in mips_cpc_default_phys_base
tracing: Fix sleeping function called from invalid context on RT kernel
tracing: Avoid adding tracer option before update_tracer_options
i2c: cadence: Increase timeout per message if necessary
m68knommu: set ZERO_PAGE() to the allocated zeroed page
m68knommu: fix undefined reference to `_init_sp'
NFSv4: Don't hold the layoutget locks across multiple RPC calls
video: fbdev: pxa3xx-gcu: release the resources correctly in pxa3xx_gcu_probe/remove()
xprtrdma: treat all calls not a bcall when bc_serv is NULL
ata: pata_octeon_cf: Fix refcount leak in octeon_cf_probe
af_unix: Fix a data-race in unix_dgram_peer_wake_me().
bpf, arm64: Clear prog->jited_len along prog->jited
net/mlx4_en: Fix wrong return value on ioctl EEPROM query failure
SUNRPC: Fix the calculation of xdr->end in xdr_get_next_encode_buffer()
net: mdio: unexport __init-annotated mdio_bus_init()
net: xfrm: unexport __init-annotated xfrm4_protocol_init()
net: ipv6: unexport __init-annotated seg6_hmac_init()
net/mlx5: Rearm the FW tracer after each tracer event
ip_gre: test csum_start instead of transport header
net: altera: Fix refcount leak in altera_tse_mdio_create
drm: imx: fix compiler warning with gcc-12
iio: dummy: iio_simple_dummy: check the return value of kstrdup()
lkdtm/usercopy: Expand size of "out of frame" object
tty: synclink_gt: Fix null-pointer-dereference in slgt_clean()
tty: Fix a possible resource leak in icom_probe
drivers: staging: rtl8192u: Fix deadlock in ieee80211_beacons_stop()
drivers: staging: rtl8192e: Fix deadlock in rtllib_beacons_stop()
USB: host: isp116x: check return value after calling platform_get_resource()
drivers: tty: serial: Fix deadlock in sa1100_set_termios()
drivers: usb: host: Fix deadlock in oxu_bus_suspend()
USB: hcd-pci: Fully suspend across freeze/thaw cycle
usb: dwc2: gadget: don't reset gadget's driver->bus
misc: rtsx: set NULL intfdata when probe fails
extcon: Modify extcon device to be created after driver data is set
clocksource/drivers/sp804: Avoid error on multiple instances
staging: rtl8712: fix uninit-value in r871xu_drv_init()
serial: msm_serial: disable interrupts in __msm_console_write()
kernfs: Separate kernfs_pr_cont_buf and rename_lock.
md: protect md_unregister_thread from reentrancy
Revert "net: af_key: add check for pfkey_broadcast in function pfkey_process"
ceph: allow ceph.dir.rctime xattr to be updatable
drm/radeon: fix a possible null pointer dereference
modpost: fix undefined behavior of is_arm_mapping_symbol()
nbd: call genl_unregister_family() first in nbd_cleanup()
nbd: fix race between nbd_alloc_config() and module removal
nbd: fix io hung while disconnecting device
nodemask: Fix return values to be unsigned
vringh: Fix loop descriptors check in the indirect cases
ALSA: hda/conexant - Fix loopback issue with CX20632
cifs: return errors during session setup during reconnects
ata: libata-transport: fix {dma|pio|xfer}_mode sysfs files
mmc: block: Fix CQE recovery reset success
nfc: st21nfca: fix incorrect validating logic in EVT_TRANSACTION
nfc: st21nfca: fix memory leaks in EVT_TRANSACTION handling
ixgbe: fix bcast packets Rx on VF after promisc removal
ixgbe: fix unexpected VLAN Rx in promisc mode on VF
Input: bcm5974 - set missing URB_NO_TRANSFER_DMA_MAP urb flag
powerpc/32: Fix overread/overwrite of thread_struct via ptrace
md/raid0: Ignore RAID0 layout if the second zone has only one device
mtd: cfi_cmdset_0002: Move and rename chip_check/chip_ready/chip_good_for_write
mtd: cfi_cmdset_0002: Use chip_ready() for write on S29GL064N
tcp: fix tcp_mtup_probe_success vs wrong snd_cwnd
Linux 4.19.247
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I58c002ddc38e389a13e2bdb9f291f05805718c9d
[ Upstream commit 10f3b29c65bb2fe0d47c2945cd0b4087be1c5218 ]
syzbot reported an illegal copy_to_user() attempt
from bpf_prog_get_info_by_fd() [1]
There was no repro yet on this bug, but I think
that commit 0aef499f3172 ("mm/usercopy: Detect vmalloc overruns")
is exposing a prior bug in bpf arm64.
bpf_prog_get_info_by_fd() looks at prog->jited_len
to determine if the JIT image can be copied out to user space.
My theory is that syzbot managed to get a prog where prog->jited_len
has been set to 43, while prog->bpf_func has ben cleared.
It is not clear why copy_to_user(uinsns, NULL, ulen) is triggering
this particular warning.
I thought find_vma_area(NULL) would not find a vm_struct.
As we do not hold vmap_area_lock spinlock, it might be possible
that the found vm_struct was garbage.
[1]
usercopy: Kernel memory exposure attempt detected from vmalloc (offset 792633534417210172, size 43)!
kernel BUG at mm/usercopy.c:101!
Internal error: Oops - BUG: 0 [#1] PREEMPT SMP
Modules linked in:
CPU: 0 PID: 25002 Comm: syz-executor.1 Not tainted 5.18.0-syzkaller-10139-g8291eaafed36 #0
Hardware name: linux,dummy-virt (DT)
pstate: 60400009 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : usercopy_abort+0x90/0x94 mm/usercopy.c:101
lr : usercopy_abort+0x90/0x94 mm/usercopy.c:89
sp : ffff80000b773a20
x29: ffff80000b773a30 x28: faff80000b745000 x27: ffff80000b773b48
x26: 0000000000000000 x25: 000000000000002b x24: 0000000000000000
x23: 00000000000000e0 x22: ffff80000b75db67 x21: 0000000000000001
x20: 000000000000002b x19: ffff80000b75db3c x18: 00000000fffffffd
x17: 2820636f6c6c616d x16: 76206d6f72662064 x15: 6574636574656420
x14: 74706d6574746120 x13: 2129333420657a69 x12: 73202c3237313031
x11: 3237313434333533 x10: 3336323937207465 x9 : 657275736f707865
x8 : ffff80000a30c550 x7 : ffff80000b773830 x6 : ffff80000b773830
x5 : 0000000000000000 x4 : ffff00007fbbaa10 x3 : 0000000000000000
x2 : 0000000000000000 x1 : f7ff000028fc0000 x0 : 0000000000000064
Call trace:
usercopy_abort+0x90/0x94 mm/usercopy.c:89
check_heap_object mm/usercopy.c:186 [inline]
__check_object_size mm/usercopy.c:252 [inline]
__check_object_size+0x198/0x36c mm/usercopy.c:214
check_object_size include/linux/thread_info.h:199 [inline]
check_copy_size include/linux/thread_info.h:235 [inline]
copy_to_user include/linux/uaccess.h:159 [inline]
bpf_prog_get_info_by_fd.isra.0+0xf14/0xfdc kernel/bpf/syscall.c:3993
bpf_obj_get_info_by_fd+0x12c/0x510 kernel/bpf/syscall.c:4253
__sys_bpf+0x900/0x2150 kernel/bpf/syscall.c:4956
__do_sys_bpf kernel/bpf/syscall.c:5021 [inline]
__se_sys_bpf kernel/bpf/syscall.c:5019 [inline]
__arm64_sys_bpf+0x28/0x40 kernel/bpf/syscall.c:5019
__invoke_syscall arch/arm64/kernel/syscall.c:38 [inline]
invoke_syscall+0x48/0x114 arch/arm64/kernel/syscall.c:52
el0_svc_common.constprop.0+0x44/0xec arch/arm64/kernel/syscall.c:142
do_el0_svc+0xa0/0xc0 arch/arm64/kernel/syscall.c:206
el0_svc+0x44/0xb0 arch/arm64/kernel/entry-common.c:624
el0t_64_sync_handler+0x1ac/0x1b0 arch/arm64/kernel/entry-common.c:642
el0t_64_sync+0x198/0x19c arch/arm64/kernel/entry.S:581
Code: aa0003e3 d00038c0 91248000 97fff65f (d4210000)
Fixes: db496944fd ("bpf: arm64: add JIT support for multi-function programs")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/20220531215113.1100754-1-eric.dumazet@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Changes in 4.19.207
ext4: fix race writing to an inline_data file while its xattrs are changing
xtensa: fix kconfig unmet dependency warning for HAVE_FUTEX_CMPXCHG
gpu: ipu-v3: Fix i.MX IPU-v3 offset calculations for (semi)planar U/V formats
qed: Fix the VF msix vectors flow
net: macb: Add a NULL check on desc_ptp
qede: Fix memset corruption
perf/x86/intel/pt: Fix mask of num_address_ranges
perf/x86/amd/ibs: Work around erratum #1197
cryptoloop: add a deprecation warning
ARM: 8918/2: only build return_address() if needed
ALSA: pcm: fix divide error in snd_pcm_lib_ioctl
clk: fix build warning for orphan_list
media: stkwebcam: fix memory leak in stk_camera_probe
ARM: imx: add missing clk_disable_unprepare()
ARM: imx: fix missing 3rd argument in macro imx_mmdc_perf_init
igmp: Add ip_mc_list lock in ip_check_mc_rcu
USB: serial: mos7720: improve OOM-handling in read_mos_reg()
ipv4/icmp: l3mdev: Perform icmp error route lookup on source device routing table (v2)
SUNRPC/nfs: Fix return value for nfs4_callback_compound()
crypto: talitos - reduce max key size for SEC1
powerpc/module64: Fix comment in R_PPC64_ENTRY handling
powerpc/boot: Delete unneeded .globl _zimage_start
net: ll_temac: Remove left-over debug message
mm/page_alloc: speed up the iteration of max_order
Revert "btrfs: compression: don't try to compress if we don't have enough pages"
ALSA: usb-audio: Add registration quirk for JBL Quantum 800
usb: host: xhci-rcar: Don't reload firmware after the completion
usb: mtu3: use @mult for HS isoc or intr
usb: mtu3: fix the wrong HS mult value
x86/reboot: Limit Dell Optiplex 990 quirk to early BIOS versions
PCI: Call Max Payload Size-related fixup quirks early
locking/mutex: Fix HANDOFF condition
regmap: fix the offset of register error log
crypto: mxs-dcp - Check for DMA mapping errors
sched/deadline: Fix reset_on_fork reporting of DL tasks
power: supply: axp288_fuel_gauge: Report register-address on readb / writeb errors
crypto: omap-sham - clear dma flags only after omap_sham_update_dma_stop()
sched/deadline: Fix missing clock update in migrate_task_rq_dl()
hrtimer: Avoid double reprogramming in __hrtimer_start_range_ns()
udf: Check LVID earlier
isofs: joliet: Fix iocharset=utf8 mount option
bcache: add proper error unwinding in bcache_device_init
nvme-rdma: don't update queue count when failing to set io queues
power: supply: max17042_battery: fix typo in MAx17042_TOFF
s390/cio: add dev_busid sysfs entry for each subchannel
libata: fix ata_host_start()
crypto: qat - do not ignore errors from enable_vf2pf_comms()
crypto: qat - handle both source of interrupt in VF ISR
crypto: qat - fix reuse of completion variable
crypto: qat - fix naming for init/shutdown VF to PF notifications
crypto: qat - do not export adf_iov_putmsg()
fcntl: fix potential deadlock for &fasync_struct.fa_lock
udf_get_extendedattr() had no boundary checks.
m68k: emu: Fix invalid free in nfeth_cleanup()
spi: spi-fsl-dspi: Fix issue with uninitialized dma_slave_config
spi: spi-pic32: Fix issue with uninitialized dma_slave_config
lib/mpi: use kcalloc in mpi_resize
clocksource/drivers/sh_cmt: Fix wrong setting if don't request IRQ for clock source channel
crypto: qat - use proper type for vf_mask
certs: Trigger creation of RSA module signing key if it's not an RSA key
spi: sprd: Fix the wrong WDG_LOAD_VAL
media: TDA1997x: enable EDID support
soc: rockchip: ROCKCHIP_GRF should not default to y, unconditionally
media: dvb-usb: fix uninit-value in dvb_usb_adapter_dvb_init
media: dvb-usb: fix uninit-value in vp702x_read_mac_addr
media: go7007: remove redundant initialization
Bluetooth: sco: prevent information leak in sco_conn_defer_accept()
tcp: seq_file: Avoid skipping sk during tcp_seek_last_pos
net: cipso: fix warnings in netlbl_cipsov4_add_std
i2c: highlander: add IRQ check
media: em28xx-input: fix refcount bug in em28xx_usb_disconnect
media: venus: venc: Fix potential null pointer dereference on pointer fmt
PCI: PM: Avoid forcing PCI_D0 for wakeup reasons inconsistently
PCI: PM: Enable PME if it can be signaled from D3cold
soc: qcom: smsm: Fix missed interrupts if state changes while masked
Bluetooth: increase BTNAMSIZ to 21 chars to fix potential buffer overflow
drm/msm/dpu: make dpu_hw_ctl_clear_all_blendstages clear necessary LMs
arm64: dts: exynos: correct GIC CPU interfaces address range on Exynos7
Bluetooth: fix repeated calls to sco_sock_kill
drm/msm/dsi: Fix some reference counted resource leaks
usb: gadget: udc: at91: add IRQ check
usb: phy: fsl-usb: add IRQ check
usb: phy: twl6030: add IRQ checks
Bluetooth: Move shutdown callback before flushing tx and rx queue
usb: host: ohci-tmio: add IRQ check
usb: phy: tahvo: add IRQ check
mac80211: Fix insufficient headroom issue for AMSDU
usb: gadget: mv_u3d: request_irq() after initializing UDC
Bluetooth: add timeout sanity check to hci_inquiry
i2c: iop3xx: fix deferred probing
i2c: s3c2410: fix IRQ check
mmc: dw_mmc: Fix issue with uninitialized dma_slave_config
mmc: moxart: Fix issue with uninitialized dma_slave_config
CIFS: Fix a potencially linear read overflow
i2c: mt65xx: fix IRQ check
usb: ehci-orion: Handle errors of clk_prepare_enable() in probe
usb: bdc: Fix an error handling path in 'bdc_probe()' when no suitable DMA config is available
tty: serial: fsl_lpuart: fix the wrong mapbase value
ath6kl: wmi: fix an error code in ath6kl_wmi_sync_point()
bcma: Fix memory leak for internally-handled cores
ipv4: make exception cache less predictible
net: sched: Fix qdisc_rate_table refcount leak when get tcf_block failed
net: qualcomm: fix QCA7000 checksum handling
ipv4: fix endianness issue in inet_rtm_getroute_build_skb()
netns: protect netns ID lookups with RCU
fscrypt: add fscrypt_symlink_getattr() for computing st_size
ext4: report correct st_size for encrypted symlinks
f2fs: report correct st_size for encrypted symlinks
ubifs: report correct st_size for encrypted symlinks
tty: Fix data race between tiocsti() and flush_to_ldisc()
x86/resctrl: Fix a maybe-uninitialized build warning treated as error
KVM: x86: Update vCPU's hv_clock before back to guest when tsc_offset is adjusted
IMA: remove -Wmissing-prototypes warning
IMA: remove the dependency on CRYPTO_MD5
fbmem: don't allow too huge resolutions
backlight: pwm_bl: Improve bootloader/kernel device handover
clk: kirkwood: Fix a clocking boot regression
rtc: tps65910: Correct driver module alias
btrfs: reset replace target device to allocation state on close
blk-zoned: allow zone management send operations without CAP_SYS_ADMIN
blk-zoned: allow BLKREPORTZONE without CAP_SYS_ADMIN
PCI/MSI: Skip masking MSI-X on Xen PV
powerpc/perf/hv-gpci: Fix counter value parsing
xen: fix setting of max_pfn in shared_info
include/linux/list.h: add a macro to test if entry is pointing to the head
9p/xen: Fix end of loop tests for list_for_each_entry
bpf/verifier: per-register parent pointers
bpf: correct slot_type marking logic to allow more stack slot sharing
bpf: Support variable offset stack access from helpers
bpf: Reject indirect var_off stack access in raw mode
bpf: Reject indirect var_off stack access in unpriv mode
bpf: Sanity check max value for var_off stack access
selftests/bpf: Test variable offset stack access
bpf: track spill/fill of constants
selftests/bpf: fix tests due to const spill/fill
bpf: Introduce BPF nospec instruction for mitigating Spectre v4
bpf: Fix leakage due to insufficient speculative store bypass mitigation
bpf: verifier: Allocate idmap scratch in verifier env
bpf: Fix pointer arithmetic mask tightening under state pruning
tools/thermal/tmon: Add cross compiling support
soc: aspeed: lpc-ctrl: Fix boundary check for mmap
arm64: head: avoid over-mapping in map_memory
crypto: public_key: fix overflow during implicit conversion
block: bfq: fix bfq_set_next_ioprio_data()
power: supply: max17042: handle fails of reading status register
dm crypt: Avoid percpu_counter spinlock contention in crypt_page_alloc()
VMCI: fix NULL pointer dereference when unmapping queue pair
media: uvc: don't do DMA on stack
media: rc-loopback: return number of emitters rather than error
libata: add ATA_HORKAGE_NO_NCQ_TRIM for Samsung 860 and 870 SSDs
ARM: 9105/1: atags_to_fdt: don't warn about stack size
PCI: Restrict ASMedia ASM1062 SATA Max Payload Size Supported
PCI: Return ~0 data on pciconfig_read() CAP_SYS_ADMIN failure
PCI: xilinx-nwl: Enable the clock through CCF
PCI: aardvark: Increase polling delay to 1.5s while waiting for PIO response
PCI: aardvark: Fix masking and unmasking legacy INTx interrupts
HID: input: do not report stylus battery state as "full"
RDMA/iwcm: Release resources if iw_cm module initialization fails
docs: Fix infiniband uverbs minor number
pinctrl: samsung: Fix pinctrl bank pin count
vfio: Use config not menuconfig for VFIO_NOIOMMU
powerpc/stacktrace: Include linux/delay.h
openrisc: don't printk() unconditionally
pinctrl: single: Fix error return code in pcs_parse_bits_in_pinctrl_entry()
scsi: qedi: Fix error codes in qedi_alloc_global_queues()
platform/x86: dell-smbios-wmi: Add missing kfree in error-exit from run_smbios_call
fscache: Fix cookie key hashing
f2fs: fix to account missing .skipped_gc_rwsem
f2fs: fix to unmap pages from userspace process in punch_hole()
MIPS: Malta: fix alignment of the devicetree buffer
userfaultfd: prevent concurrent API initialization
media: dib8000: rewrite the init prbs logic
crypto: mxs-dcp - Use sg_mapping_iter to copy data
PCI: Use pci_update_current_state() in pci_enable_device_flags()
tipc: keep the skb in rcv queue until the whole data is read
iio: dac: ad5624r: Fix incorrect handling of an optional regulator.
ARM: dts: qcom: apq8064: correct clock names
video: fbdev: kyro: fix a DoS bug by restricting user input
netlink: Deal with ESRCH error in nlmsg_notify()
Smack: Fix wrong semantics in smk_access_entry()
usb: host: fotg210: fix the endpoint's transactional opportunities calculation
usb: host: fotg210: fix the actual_length of an iso packet
usb: gadget: u_ether: fix a potential null pointer dereference
usb: gadget: composite: Allow bMaxPower=0 if self-powered
staging: board: Fix uninitialized spinlock when attaching genpd
tty: serial: jsm: hold port lock when reporting modem line changes
drm/amd/amdgpu: Update debugfs link_settings output link_rate field in hex
bpf/tests: Fix copy-and-paste error in double word test
bpf/tests: Do not PASS tests without actually testing the result
video: fbdev: asiliantfb: Error out if 'pixclock' equals zero
video: fbdev: kyro: Error out if 'pixclock' equals zero
video: fbdev: riva: Error out if 'pixclock' equals zero
ipv4: ip_output.c: Fix out-of-bounds warning in ip_copy_addrs()
flow_dissector: Fix out-of-bounds warnings
s390/jump_label: print real address in a case of a jump label bug
serial: 8250: Define RX trigger levels for OxSemi 950 devices
xtensa: ISS: don't panic in rs_init
hvsi: don't panic on tty_register_driver failure
serial: 8250_pci: make setup_port() parameters explicitly unsigned
staging: ks7010: Fix the initialization of the 'sleep_status' structure
samples: bpf: Fix tracex7 error raised on the missing argument
ata: sata_dwc_460ex: No need to call phy_exit() befre phy_init()
Bluetooth: skip invalid hci_sync_conn_complete_evt
bonding: 3ad: fix the concurrency between __bond_release_one() and bond_3ad_state_machine_handler()
ASoC: Intel: bytcr_rt5640: Move "Platform Clock" routes to the maps for the matching in-/output
media: imx258: Rectify mismatch of VTS value
media: imx258: Limit the max analogue gain to 480
media: v4l2-dv-timings.c: fix wrong condition in two for-loops
media: TDA1997x: fix tda1997x_query_dv_timings() return value
media: tegra-cec: Handle errors of clk_prepare_enable()
ARM: dts: imx53-ppd: Fix ACHC entry
arm64: dts: qcom: sdm660: use reg value for memory node
net: ethernet: stmmac: Do not use unreachable() in ipq806x_gmac_probe()
Bluetooth: schedule SCO timeouts with delayed_work
Bluetooth: avoid circular locks in sco_sock_connect
gpu: drm: amd: amdgpu: amdgpu_i2c: fix possible uninitialized-variable access in amdgpu_i2c_router_select_ddc_port()
ARM: tegra: tamonten: Fix UART pad setting
Bluetooth: Fix handling of LE Enhanced Connection Complete
serial: sh-sci: fix break handling for sysrq
tcp: enable data-less, empty-cookie SYN with TFO_SERVER_COOKIE_NOT_REQD
rpc: fix gss_svc_init cleanup on failure
staging: rts5208: Fix get_ms_information() heap buffer size
gfs2: Don't call dlm after protocol is unmounted
of: Don't allow __of_attached_node_sysfs() without CONFIG_SYSFS
mmc: sdhci-of-arasan: Check return value of non-void funtions
mmc: rtsx_pci: Fix long reads when clock is prescaled
selftests/bpf: Enlarge select() timeout for test_maps
mmc: core: Return correct emmc response in case of ioctl error
cifs: fix wrong release in sess_alloc_buffer() failed path
Revert "USB: xhci: fix U1/U2 handling for hardware with XHCI_INTEL_HOST quirk set"
usb: musb: musb_dsps: request_irq() after initializing musb
usbip: give back URBs for unsent unlink requests during cleanup
usbip:vhci_hcd USB port can get stuck in the disabled state
ASoC: rockchip: i2s: Fix regmap_ops hang
ASoC: rockchip: i2s: Fixup config for DAIFMT_DSP_A/B
parport: remove non-zero check on count
ath9k: fix OOB read ar9300_eeprom_restore_internal
ath9k: fix sleeping in atomic context
net: fix NULL pointer reference in cipso_v4_doi_free
net: w5100: check return value after calling platform_get_resource()
parisc: fix crash with signals and alloca
ovl: fix BUG_ON() in may_delete() when called from ovl_cleanup()
scsi: BusLogic: Fix missing pr_cont() use
scsi: qla2xxx: Sync queue idx with queue_pair_map idx
cpufreq: powernv: Fix init_chip_info initialization in numa=off
mm/hugetlb: initialize hugetlb_usage in mm_init
memcg: enable accounting for pids in nested pid namespaces
platform/chrome: cros_ec_proto: Send command again when timeout occurs
drm/amdgpu: Fix BUG_ON assert
dm thin metadata: Fix use-after-free in dm_bm_set_read_only
xen: reset legacy rtc flag for PV domU
bnx2x: Fix enabling network interfaces without VFs
arm64/sve: Use correct size when reinitialising SVE state
PM: base: power: don't try to use non-existing RTC for storing data
PCI: Add AMD GPU multi-function power dependencies
x86/mm: Fix kern_addr_valid() to cope with existing but not present entries
tipc: fix an use-after-free issue in tipc_recvmsg
net-caif: avoid user-triggerable WARN_ON(1)
ptp: dp83640: don't define PAGE0
dccp: don't duplicate ccid when cloning dccp sock
net/l2tp: Fix reference count leak in l2tp_udp_recv_core
r6040: Restore MDIO clock frequency after MAC reset
tipc: increase timeout in tipc_sk_enqueue()
perf machine: Initialize srcline string member in add_location struct
net/mlx5: Fix potential sleeping in atomic context
events: Reuse value read using READ_ONCE instead of re-reading it
net/af_unix: fix a data-race in unix_dgram_poll
net: dsa: destroy the phylink instance on any error in dsa_slave_phy_setup
tcp: fix tp->undo_retrans accounting in tcp_sacktag_one()
qed: Handle management FW error
ibmvnic: check failover_pending in login response
net: hns3: pad the short tunnel frame before sending to hardware
mm/memory_hotplug: use "unsigned long" for PFN in zone_for_pfn_range()
KVM: s390: index kvm->arch.idle_mask by vcpu_idx
dt-bindings: mtd: gpmc: Fix the ECC bytes vs. OOB bytes equation
mfd: Don't use irq_create_mapping() to resolve a mapping
PCI: Add ACS quirks for Cavium multi-function devices
net: usb: cdc_mbim: avoid altsetting toggling for Telit LN920
block, bfq: honor already-setup queue merges
ethtool: Fix an error code in cxgb2.c
NTB: perf: Fix an error code in perf_setup_inbuf()
mfd: axp20x: Update AXP288 volatile ranges
PCI: Fix pci_dev_str_match_path() alloc while atomic bug
KVM: arm64: Handle PSCI resets before userspace touches vCPU state
PCI: Sync __pci_register_driver() stub for CONFIG_PCI=n
mtd: rawnand: cafe: Fix a resource leak in the error handling path of 'cafe_nand_probe()'
ARC: export clear_user_page() for modules
net: dsa: b53: Fix calculating number of switch ports
netfilter: socket: icmp6: fix use-after-scope
fq_codel: reject silly quantum parameters
qlcnic: Remove redundant unlock in qlcnic_pinit_from_rom
ip_gre: validate csum_start only on pull
net: renesas: sh_eth: Fix freeing wrong tx descriptor
s390/bpf: Fix 64-bit subtraction of the -0x80000000 constant
Linux 4.19.207
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I18108cb47ba9e95838ebe55aaabe34de345ee846
commit f5e81d1117501546b7be050c5fbafa6efd2c722c upstream.
In case of JITs, each of the JIT backends compiles the BPF nospec instruction
/either/ to a machine instruction which emits a speculation barrier /or/ to
/no/ machine instruction in case the underlying architecture is not affected
by Speculative Store Bypass or has different mitigations in place already.
This covers both x86 and (implicitly) arm64: In case of x86, we use 'lfence'
instruction for mitigation. In case of arm64, we rely on the firmware mitigation
as controlled via the ssbd kernel parameter. Whenever the mitigation is enabled,
it works for all of the kernel code with no need to provide any additional
instructions here (hence only comment in arm64 JIT). Other archs can follow
as needed. The BPF nospec instruction is specifically targeting Spectre v4
since i) we don't use a serialization barrier for the Spectre v1 case, and
ii) mitigation instructions for v1 and v4 might be different on some archs.
The BPF nospec is required for a future commit, where the BPF verifier does
annotate intermediate BPF programs with speculation barriers.
Co-developed-by: Piotr Krysiuk <piotras@gmail.com>
Co-developed-by: Benedict Schlueter <benedict.schlueter@rub.de>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Piotr Krysiuk <piotras@gmail.com>
Signed-off-by: Benedict Schlueter <benedict.schlueter@rub.de>
Acked-by: Alexei Starovoitov <ast@kernel.org>
[OP: adjusted context for 4.19, drop riscv and ppc32 changes]
Signed-off-by: Ovidiu Panait <ovidiu.panait@windriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Implement arch_bpf_jit_check_func to check that pointers to jited BPF
functions are correctly aligned and point to the BPF JIT region. This
narrows down the attack surface on the stored pointer.
Bug: 140377409
Change-Id: I10c448eda6a8b0bf4c16ee591fc65974696216b9
Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
commit 34b8ab091f9ef57a2bb3c8c8359a0a03a8abf2f9 upstream.
Since ARMv8.1 supplement introduced LSE atomic instructions back in 2016,
lets add support for STADD and use that in favor of LDXR / STXR loop for
the XADD mapping if available. STADD is encoded as an alias for LDADD with
XZR as the destination register, therefore add LDADD to the instruction
encoder along with STADD as special case and use it in the JIT for CPUs
that advertise LSE atomics in CPUID register. If immediate offset in the
BPF XADD insn is 0, then use dst register directly instead of temporary
one.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 8968c67a82ab7501bc3b9439c3624a49b42fe54c upstream.
Prefetch-with-intent-to-write is currently part of the XADD mapping in
the AArch64 JIT and follows the kernel's implementation of atomic_add.
This may interfere with other threads executing the LDXR/STXR loop,
leading to potential starvation and fairness issues. Drop the optional
prefetch instruction.
Fixes: 85f68fe898 ("bpf, arm64: implement jiting of BPF_XADD")
Reported-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
We can trivially save 4 bytes in prologue for cBPF since tail calls
can never be used from there. The register push/pop is pairwise,
here, x25 (fp) and x26 (tcc), so no point in changing that, only
reset to zero is not needed.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Improve the JIT to emit 64 and 32 bit immediates, the current
algorithm is not optimal and we often emit more instructions
than actually needed. arm64 has movz, movn, movk variants but
for the current 64 bit immediates we only use movz with a
series of movk when needed.
For example loading ffffffffffffabab emits the following 4
instructions in the JIT today:
* movz: abab, shift: 0, result: 000000000000abab
* movk: ffff, shift: 16, result: 00000000ffffabab
* movk: ffff, shift: 32, result: 0000ffffffffabab
* movk: ffff, shift: 48, result: ffffffffffffabab
Whereas after the patch the same load only needs a single
instruction:
* movn: 5454, shift: 0, result: ffffffffffffabab
Another example where two extra instructions can be saved:
* movz: abab, shift: 0, result: 000000000000abab
* movk: 1f2f, shift: 16, result: 000000001f2fabab
* movk: ffff, shift: 32, result: 0000ffff1f2fabab
* movk: ffff, shift: 48, result: ffffffff1f2fabab
After the patch:
* movn: e0d0, shift: 16, result: ffffffff1f2fffff
* movk: abab, shift: 0, result: ffffffff1f2fabab
Another example with movz, before:
* movz: 0000, shift: 0, result: 0000000000000000
* movk: fea0, shift: 32, result: 0000fea000000000
After:
* movz: fea0, shift: 32, result: 0000fea000000000
Moreover, reuse emit_a64_mov_i() for 32 bit immediates that
are loaded via emit_a64_mov_i64() which is a similar optimization
as done in 6fe8b9c1f4 ("bpf, x64: save several bytes by using
mov over movabsq when possible"). On arm64, the latter allows to
use a single instruction with movn due to zero extension where
otherwise two would be needed. And last but not least add a
missing optimization in emit_a64_mov_i() where movn is used but
the subsequent movk not needed. With some of the Cilium programs
in use, this shrinks the needed instructions by about three
percent. Tested on Cavium ThunderX CN8890.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Follow-up to 816d9ef32a ("bpf, arm64: remove ld_abs/ld_ind") in
that the extra 4 byte JIT scratchpad is not needed anymore since it
was in ld_abs/ld_ind as stack buffer for bpf_load_pointer().
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Since LD_ABS/LD_IND instructions are now removed from the core and
reimplemented through a combination of inlined BPF instructions and
a slow-path helper, we can get rid of the complexity from arm64 JIT.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
I recently noticed a crash on arm64 when feeding a bogus index
into BPF tail call helper. The crash would not occur when the
interpreter is used, but only in case of JIT. Output looks as
follows:
[ 347.007486] Unable to handle kernel paging request at virtual address fffb850e96492510
[...]
[ 347.043065] [fffb850e96492510] address between user and kernel address ranges
[ 347.050205] Internal error: Oops: 96000004 [#1] SMP
[...]
[ 347.190829] x13: 0000000000000000 x12: 0000000000000000
[ 347.196128] x11: fffc047ebe782800 x10: ffff808fd7d0fd10
[ 347.201427] x9 : 0000000000000000 x8 : 0000000000000000
[ 347.206726] x7 : 0000000000000000 x6 : 001c991738000000
[ 347.212025] x5 : 0000000000000018 x4 : 000000000000ba5a
[ 347.217325] x3 : 00000000000329c4 x2 : ffff808fd7cf0500
[ 347.222625] x1 : ffff808fd7d0fc00 x0 : ffff808fd7cf0500
[ 347.227926] Process test_verifier (pid: 4548, stack limit = 0x000000007467fa61)
[ 347.235221] Call trace:
[ 347.237656] 0xffff000002f3a4fc
[ 347.240784] bpf_test_run+0x78/0xf8
[ 347.244260] bpf_prog_test_run_skb+0x148/0x230
[ 347.248694] SyS_bpf+0x77c/0x1110
[ 347.251999] el0_svc_naked+0x30/0x34
[ 347.255564] Code: 9100075a d280220a 8b0a002a d37df04b (f86b694b)
[...]
In this case the index used in BPF r3 is the same as in r1
at the time of the call, meaning we fed a pointer as index;
here, it had the value 0xffff808fd7cf0500 which sits in x2.
While I found tail calls to be working in general (also for
hitting the error cases), I noticed the following in the code
emission:
# bpftool p d j i 988
[...]
38: ldr w10, [x1,x10]
3c: cmp w2, w10
40: b.ge 0x000000000000007c <-- signed cmp
44: mov x10, #0x20 // #32
48: cmp x26, x10
4c: b.gt 0x000000000000007c
50: add x26, x26, #0x1
54: mov x10, #0x110 // #272
58: add x10, x1, x10
5c: lsl x11, x2, #3
60: ldr x11, [x10,x11] <-- faulting insn (f86b694b)
64: cbz x11, 0x000000000000007c
[...]
Meaning, the tests passed because commit ddb55992b0 ("arm64:
bpf: implement bpf_tail_call() helper") was using signed compares
instead of unsigned which as a result had the test wrongly passing.
Change this but also the tail call count test both into unsigned
and cap the index as u32. Latter we did as well in 90caccdd8c
("bpf: fix bpf_tail_call() x64 JIT") and is needed in addition here,
too. Tested on HiSilicon Hi1616.
Result after patch:
# bpftool p d j i 268
[...]
38: ldr w10, [x1,x10]
3c: add w2, w2, #0x0
40: cmp w2, w10
44: b.cs 0x0000000000000080
48: mov x10, #0x20 // #32
4c: cmp x26, x10
50: b.hi 0x0000000000000080
54: add x26, x26, #0x1
58: mov x10, #0x110 // #272
5c: add x10, x1, x10
60: lsl x11, x2, #3
64: ldr x11, [x10,x11]
68: cbz x11, 0x0000000000000080
[...]
Fixes: ddb55992b0 ("arm64: bpf: implement bpf_tail_call() helper")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Since we've changed div/mod exception handling for src_reg in
eBPF verifier itself, remove the leftovers from arm64 JIT.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Alexei Starovoitov says:
====================
pull-request: bpf-next 2018-01-19
The following pull-request contains BPF updates for your *net-next* tree.
The main changes are:
1) bpf array map HW offload, from Jakub.
2) support for bpf_get_next_key() for LPM map, from Yonghong.
3) test_verifier now runs loaded programs, from Alexei.
4) xdp cpumap monitoring, from Jesper.
5) variety of tests, cleanups and small x64 JIT optimization, from Daniel.
6) user space can now retrieve HW JITed program, from Jiong.
Note there is a minor conflict between Russell's arm32 JIT fixes
and removal of bpf_jit_enable variable by Daniel which should
be resolved by keeping Russell's comment and removing that variable.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
The BPF verifier conflict was some minor contextual issue.
The TUN conflict was less trivial. Cong Wang fixed a memory leak of
tfile->tx_array in 'net'. This is an skb_array. But meanwhile in
net-next tun changed tfile->tx_arry into tfile->tx_ring which is a
ptr_ring.
Signed-off-by: David S. Miller <davem@davemloft.net>
Having a pure_initcall() callback just to permanently enable BPF
JITs under CONFIG_BPF_JIT_ALWAYS_ON is unnecessary and could leave
a small race window in future where JIT is still disabled on boot.
Since we know about the setting at compilation time anyway, just
initialize it properly there. Also consolidate all the individual
bpf_jit_enable variables into a single one and move them under one
location. Moreover, don't allow for setting unspecified garbage
values on them.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Using dynamic stack_depth tracking in arm64 JIT is currently broken in
combination with tail calls. In prologue, we cache ctx->stack_size and
adjust SP reg for setting up function call stack, and tearing it down
again in epilogue. Problem is that when doing a tail call, the cached
ctx->stack_size might not be the same.
One way to fix the problem with minimal overhead is to re-adjust SP in
emit_bpf_tail_call() and properly adjust it to the current program's
ctx->stack_size. Tested on Cavium ThunderX ARMv8.
Fixes: f1c9eed7f4 ("bpf, arm64: take advantage of stack_depth tracking")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
fix the following issue:
arch/arm64/net/bpf_jit_comp.c: In function 'bpf_int_jit_compile':
arch/arm64/net/bpf_jit_comp.c:982:18: error: 'image_size' may be used
uninitialized in this function [-Werror=maybe-uninitialized]
Fixes: db496944fd ("bpf: arm64: add JIT support for multi-function programs")
Reported-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
similar to x64 add support for bpf-to-bpf calls.
When program has calls to in-kernel helpers the target call offset
is known at JIT time and arm64 architecture needs 2 passes.
With bpf-to-bpf calls the dynamically allocated function start
is unknown until all functions of the program are JITed.
Therefore (just like x64) arm64 JIT needs one extra pass over
the program to emit correct call offsets.
Implementation detail:
Avoid being too clever in 64-bit immediate moves and
always use 4 instructions (instead of 3-4 depending on the address)
to make sure only one extra pass is needed.
If some future optimization would make it worth while to optimize
'call 64-bit imm' further, the JIT would need to do 4 passes
over the program instead of 3 as in this patch.
For typical bpf program address the mov needs 3 or 4 insns,
so unconditional 4 insns to save extra pass is a worthy trade off
at this state of JIT.
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
global bpf_jit_enable variable is tested multiple times in JITs,
blinding and verifier core. The malicious root can try to toggle
it while loading the programs. This race condition was accounted
for and there should be no issues, but it's safer to avoid
this race condition.
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
This work implements jiting of BPF_J{LT,LE,SLT,SLE} instructions
with BPF_X/BPF_K variants for the arm64 eBPF JIT.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
struct jit_ctx::image is used the store a pointer to the jitted
intructions, which are always little-endian. These instructions
are thus correctly converted from native order to little-endian
before being stored but the pointer 'image' is declared as for
native order values.
Fix this by declaring the field as __le32* instead of u32*.
Same for the pointer used in jit_fill_hole() to initialize
the image.
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Make use of recently implemented stack_depth tracking for arm64 JIT,
so that stack usage can be reduced heavily for programs not using
tail calls at least.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Will reported that in BPF_XADD we must use a different register in stxr
instruction for the status flag due to otherwise CONSTRAINED UNPREDICTABLE
behavior per architecture. Reference manual says [1]:
If s == t, then one of the following behaviors must occur:
* The instruction is UNDEFINED.
* The instruction executes as a NOP.
* The instruction performs the store to the specified address, but
the value stored is UNKNOWN.
Thus, use a different temporary register for the status flag to fix it.
Disassembly extract from test 226/STX_XADD_DW from test_bpf.ko:
[...]
0000003c: c85f7d4b ldxr x11, [x10]
00000040: 8b07016b add x11, x11, x7
00000044: c80c7d4b stxr w12, x11, [x10]
00000048: 35ffffac cbnz w12, 0x0000003c
[...]
[1] https://static.docs.arm.com/ddi0487/b/DDI0487B_a_armv8_arm.pdf, p.6132
Fixes: 85f68fe898 ("bpf, arm64: implement jiting of BPF_XADD")
Reported-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add jited_len to struct bpf_prog. It will be
useful for the struct bpf_prog_info which will
be added in the later patch.
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Alexei Starovoitov <ast@fb.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
free up BPF_JMP | BPF_CALL | BPF_X opcode to be used by actual
indirect call by register and use kernel internal opcode to
mark call instruction into bpf_tail_call() helper.
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Shubham was recently asking on netdev why in arm64 JIT we don't multiply
the index for accessing the tail call map by 8. That led me into testing
out arm64 JIT wrt tail calls and it turned out I got a NULL pointer
dereference on the tail call.
The buggy access is at:
prog = array->ptrs[index];
if (prog == NULL)
goto out;
[...]
00000060: d2800e0a mov x10, #0x70 // #112
00000064: f86a682a ldr x10, [x1,x10]
00000068: f862694b ldr x11, [x10,x2]
0000006c: b40000ab cbz x11, 0x00000080
[...]
The code triggering the crash is f862694b. x1 at the time contains the
address of the bpf array, x10 offsetof(struct bpf_array, ptrs). Meaning,
above we load the pointer to the program at map slot 0 into x10. x10
can then be NULL if the slot is not occupied, which we later on try to
access with a user given offset in x2 that is the map index.
Fix this by emitting the following instead:
[...]
00000060: d2800e0a mov x10, #0x70 // #112
00000064: 8b0a002a add x10, x1, x10
00000068: d37df04b lsl x11, x2, #3
0000006c: f86b694b ldr x11, [x10,x11]
00000070: b40000ab cbz x11, 0x00000084
[...]
This basically adds the offset to ptrs to the base address of the bpf
array we got and we later on access the map with an index * 8 offset
relative to that. The tail call map itself is basically one large area
with meta data at the head followed by the array of prog pointers.
This makes tail calls working again, tested on Cavium ThunderX ARMv8.
Fixes: ddb55992b0 ("arm64: bpf: implement bpf_tail_call() helper")
Reported-by: Shubham Bansal <illusionist.neo@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
For both cases, the verifier is already rejecting such invalid
formed instructions. Thus, remove these artifacts from old times
and align it with ppc64, sparc64 and s390x JITs that don't have
them in the first place.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric and Willem reported that they recently saw random crashes when
JIT was in use and bisected this to 74451e66d5 ("bpf: make jited
programs visible in traces"). Issue was that the consolidation part
added bpf_jit_binary_unlock_ro() that would unlock previously made
read-only memory back to read-write. However, DEBUG_SET_MODULE_RONX
cannot be used for this to test for presence of set_memory_*()
functions. We need to use ARCH_HAS_SET_MEMORY instead to fix this;
also add the corresponding bpf_jit_binary_lock_ro() to filter.h.
Fixes: 74451e66d5 ("bpf: make jited programs visible in traces")
Reported-by: Eric Dumazet <edumazet@google.com>
Reported-by: Willem de Bruijn <willemb@google.com>
Bisected-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Long standing issue with JITed programs is that stack traces from
function tracing check whether a given address is kernel code
through {__,}kernel_text_address(), which checks for code in core
kernel, modules and dynamically allocated ftrace trampolines. But
what is still missing is BPF JITed programs (interpreted programs
are not an issue as __bpf_prog_run() will be attributed to them),
thus when a stack trace is triggered, the code walking the stack
won't see any of the JITed ones. The same for address correlation
done from user space via reading /proc/kallsyms. This is read by
tools like perf, but the latter is also useful for permanent live
tracing with eBPF itself in combination with stack maps when other
eBPF types are part of the callchain. See offwaketime example on
dumping stack from a map.
This work tries to tackle that issue by making the addresses and
symbols known to the kernel. The lookup from *kernel_text_address()
is implemented through a latched RB tree that can be read under
RCU in fast-path that is also shared for symbol/size/offset lookup
for a specific given address in kallsyms. The slow-path iteration
through all symbols in the seq file done via RCU list, which holds
a tiny fraction of all exported ksyms, usually below 0.1 percent.
Function symbols are exported as bpf_prog_<tag>, in order to aide
debugging and attribution. This facility is currently enabled for
root-only when bpf_jit_kallsyms is set to 1, and disabled if hardening
is active in any mode. The rationale behind this is that still a lot
of systems ship with world read permissions on kallsyms thus addresses
should not get suddenly exposed for them. If that situation gets
much better in future, we always have the option to change the
default on this. Likewise, unprivileged programs are not allowed
to add entries there either, but that is less of a concern as most
such programs types relevant in this context are for root-only anyway.
If enabled, call graphs and stack traces will then show a correct
attribution; one example is illustrated below, where the trace is
now visible in tooling such as perf script --kallsyms=/proc/kallsyms
and friends.
Before:
7fff8166889d bpf_clone_redirect+0x80007f0020ed (/lib/modules/4.9.0-rc8+/build/vmlinux)
f5d80 __sendmsg_nocancel+0xffff006451f1a007 (/usr/lib64/libc-2.18.so)
After:
7fff816688b7 bpf_clone_redirect+0x80007f002107 (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fffa0575728 bpf_prog_33c45a467c9e061a+0x8000600020fb (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fffa07ef1fc cls_bpf_classify+0x8000600020dc (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff81678b68 tc_classify+0x80007f002078 (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff8164d40b __netif_receive_skb_core+0x80007f0025fb (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff8164d718 __netif_receive_skb+0x80007f002018 (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff8164e565 process_backlog+0x80007f002095 (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff8164dc71 net_rx_action+0x80007f002231 (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff81767461 __softirqentry_text_start+0x80007f0020d1 (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff817658ac do_softirq_own_stack+0x80007f00201c (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff810a2c20 do_softirq+0x80007f002050 (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff810a2cb5 __local_bh_enable_ip+0x80007f002085 (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff8168d452 ip_finish_output2+0x80007f002152 (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff8168ea3d ip_finish_output+0x80007f00217d (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff8168f2af ip_output+0x80007f00203f (/lib/modules/4.9.0-rc8+/build/vmlinux)
[...]
7fff81005854 do_syscall_64+0x80007f002054 (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff817649eb return_from_SYSCALL_64+0x80007f002000 (/lib/modules/4.9.0-rc8+/build/vmlinux)
f5d80 __sendmsg_nocancel+0xffff01c484812007 (/usr/lib64/libc-2.18.so)
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
Remove the dummy bpf_jit_compile() stubs for eBPF JITs and make
that a single __weak function in the core that can be overridden
similarly to the eBPF one. Also remove stale pr_err() mentions
of bpf_jit_compile.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Remove superfluous stack frame, saving us 3 instructions for every
LD_ABS or LD_IND.
Signed-off-by: Zi Shen Lim <zlim.lnx@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Remove superfluous stack frame, saving us 3 instructions for
every JMP_CALL.
Signed-off-by: Zi Shen Lim <zlim.lnx@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add support for JMP_CALL_X (tail call) introduced by commit 04fd61ab36
("bpf: allow bpf programs to tail-call other bpf programs").
bpf_tail_call() arguments:
ctx - context pointer passed to next program
array - pointer to map which type is BPF_MAP_TYPE_PROG_ARRAY
index - index inside array that selects specific program to run
In this implementation arm64 JIT jumps into callee program after prologue,
so callee program reuses the same stack. For tail_call_cnt, we use the
callee-saved R26 (which was already saved/restored but previously unused
by JIT).
With this patch a tail call generates the following code on arm64:
if (index >= array->map.max_entries)
goto out;
34: mov x10, #0x10 // #16
38: ldr w10, [x1,x10]
3c: cmp w2, w10
40: b.ge 0x0000000000000074
if (tail_call_cnt > MAX_TAIL_CALL_CNT)
goto out;
tail_call_cnt++;
44: mov x10, #0x20 // #32
48: cmp x26, x10
4c: b.gt 0x0000000000000074
50: add x26, x26, #0x1
prog = array->ptrs[index];
if (prog == NULL)
goto out;
54: mov x10, #0x68 // #104
58: ldr x10, [x1,x10]
5c: ldr x11, [x10,x2]
60: cbz x11, 0x0000000000000074
goto *(prog->bpf_func + prologue_size);
64: mov x10, #0x20 // #32
68: ldr x10, [x11,x10]
6c: add x10, x10, #0x20
70: br x10
74:
Signed-off-by: Zi Shen Lim <zlim.lnx@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In the current implementation of ARM64 eBPF JIT, R23 and R24 are used for
tmp registers, which are callee-saved registers. This leads to variable size
of JIT prologue and epilogue. The latest blinding constant change prefers to
constant size of prologue and epilogue. AAPCS reserves R9 ~ R15 for temp
registers which not need to be saved/restored during function call. So, replace
R23 and R24 to R10 and R11, and remove tmp_used flag to save 2 instructions for
some jited BPF program.
CC: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Zi Shen Lim <zlim.lnx@gmail.com>
Signed-off-by: Yang Shi <yang.shi@linaro.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds recently added constant blinding helpers into the
arm64 eBPF JIT. In the bpf_int_jit_compile() path, requirements are
to utilize bpf_jit_blind_constants()/bpf_jit_prog_release_other()
pair for rewriting the program into a blinded one, and to map the
BPF_REG_AX register to a CPU register. The mapping is on x9.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Zi Shen Lim <zlim.lnx@gmail.com>
Acked-by: Yang Shi <yang.shi@linaro.org>
Tested-by: Yang Shi <yang.shi@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since the blinding is strictly only called from inside eBPF JITs,
we need to change signatures for bpf_int_jit_compile() and
bpf_prog_select_runtime() first in order to prepare that the
eBPF program we're dealing with can change underneath. Hence,
for call sites, we need to return the latest prog. No functional
change in this patch.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
There is never such a situation, where bpf_int_jit_compile() is
called with either prog as NULL or len as 0, so the tests are
unnecessary and confusing as people would just copy them. s390
doesn't have them, so no change is needed there.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Original implementation commit e54bcde3d6 ("arm64: eBPF JIT compiler")
had the relevant code paths, but due to an oversight always fail jiting.
As a result, we had been falling back to BPF interpreter whenever a BPF
program has JMP_JSET_{X,K} instructions.
With this fix, we confirm that the corresponding tests in lib/test_bpf
continue to pass, and also jited.
...
[ 2.784553] test_bpf: #30 JSET jited:1 188 192 197 PASS
[ 2.791373] test_bpf: #31 tcpdump port 22 jited:1 325 677 625 PASS
[ 2.808800] test_bpf: #32 tcpdump complex jited:1 323 731 991 PASS
...
[ 3.190759] test_bpf: #237 JMP_JSET_K: if (0x3 & 0x2) return 1 jited:1 110 PASS
[ 3.192524] test_bpf: #238 JMP_JSET_K: if (0x3 & 0xffffffff) return 1 jited:1 98 PASS
[ 3.211014] test_bpf: #249 JMP_JSET_X: if (0x3 & 0x2) return 1 jited:1 120 PASS
[ 3.212973] test_bpf: #250 JMP_JSET_X: if (0x3 & 0xffffffff) return 1 jited:1 89 PASS
...
Fixes: e54bcde3d6 ("arm64: eBPF JIT compiler")
Signed-off-by: Zi Shen Lim <zlim.lnx@gmail.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Acked-by: Yang Shi <yang.shi@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Code generation functions in arch/arm64/kernel/insn.c previously
BUG_ON invalid parameters. Following change of that behavior, now we
need to handle the error case where AARCH64_BREAK_FAULT is returned.
Instead of error-handling on every emit() in JIT, we add a new
validation pass at the end of JIT compilation. There's no point in
running JITed code at run-time only to trap due to AARCH64_BREAK_FAULT.
Instead, we drop this failed JIT compilation and allow the system to
gracefully fallback on the BPF interpreter.
Signed-off-by: Zi Shen Lim <zlim.lnx@gmail.com>
Suggested-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Back in the days where eBPF (or back then "internal BPF" ;->) was not
exposed to user space, and only the classic BPF programs internally
translated into eBPF programs, we missed the fact that for classic BPF
A and X needed to be cleared. It was fixed back then via 83d5b7ef99
("net: filter: initialize A and X registers"), and thus classic BPF
specifics were added to the eBPF interpreter core to work around it.
This added some confusion for JIT developers later on that take the
eBPF interpreter code as an example for deriving their JIT. F.e. in
f75298f5c3 ("s390/bpf: clear correct BPF accumulator register"), at
least X could leak stack memory. Furthermore, since this is only needed
for classic BPF translations and not for eBPF (verifier takes care
that read access to regs cannot be done uninitialized), more complexity
is added to JITs as they need to determine whether they deal with
migrations or native eBPF where they can just omit clearing A/X in
their prologue and thus reduce image size a bit, see f.e. cde66c2d88
("s390/bpf: Only clear A and X for converted BPF programs"). In other
cases (x86, arm64), A and X is being cleared in the prologue also for
eBPF case, which is unnecessary.
Lets move this into the BPF migration in bpf_convert_filter() where it
actually belongs as long as the number of eBPF JITs are still few. It
can thus be done generically; allowing us to remove the quirk from
__bpf_prog_run() and to slightly reduce JIT image size in case of eBPF,
while reducing code duplication on this matter in current(/future) eBPF
JITs.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Reviewed-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Tested-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Cc: Zi Shen Lim <zlim.lnx@gmail.com>
Cc: Yang Shi <yang.shi@linaro.org>
Acked-by: Yang Shi <yang.shi@linaro.org>
Acked-by: Zi Shen Lim <zlim.lnx@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
aarch64 doesn't have native store immediate instruction, such operation
has to be implemented by the below instruction sequence:
Load immediate to register
Store register
Signed-off-by: Yang Shi <yang.shi@linaro.org>
CC: Zi Shen Lim <zlim.lnx@gmail.com>
CC: Xi Wang <xi.wang@gmail.com>
Reviewed-by: Zi Shen Lim <zlim.lnx@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>