Commit Graph

462 Commits

Author SHA1 Message Date
Greg Kroah-Hartman
1427d077a0 Revert "xfrm: fix a data-race in xfrm_gen_index()"
This reverts commit 17c75411e2 which is
commit 3e4bc23926b83c3c67e5f61ae8571602754131a6 upstream.

It breaks the android ABI and if this is needed in the future, can be
brought back in an abi-safe way.

Bug: 161946584
Change-Id: I6af8ce540570c756ea9f16526c36f8815971e216
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2023-10-27 07:58:48 +00:00
Greg Kroah-Hartman
24a799db09 Merge 4.19.297 into android-4.19-stable
Changes in 4.19.297
	indirect call wrappers: helpers to speed-up indirect calls of builtin
	net: use indirect calls helpers at the socket layer
	net: fix kernel-doc warnings for socket.c
	net: prevent rewrite of msg_name in sock_sendmsg()
	RDMA/cxgb4: Check skb value for failure to allocate
	HID: logitech-hidpp: Fix kernel crash on receiver USB disconnect
	quota: Fix slow quotaoff
	net: prevent address rewrite in kernel_bind()
	drm: etvnaviv: fix bad backport leading to warning
	drm/msm/dsi: skip the wait for video mode done if not applicable
	ieee802154: ca8210: Fix a potential UAF in ca8210_probe
	xen-netback: use default TX queue size for vifs
	drm/vmwgfx: fix typo of sizeof argument
	ixgbe: fix crash with empty VF macvlan list
	net: nfc: fix races in nfc_llcp_sock_get() and nfc_llcp_sock_get_sn()
	nfc: nci: assert requested protocol is valid
	workqueue: Override implicit ordered attribute in workqueue_apply_unbound_cpumask()
	sched,idle,rcu: Push rcu_idle deeper into the idle path
	dmaengine: stm32-mdma: abort resume if no ongoing transfer
	usb: xhci: xhci-ring: Use sysdev for mapping bounce buffer
	net: usb: dm9601: fix uninitialized variable use in dm9601_mdio_read
	usb: dwc3: Soft reset phy on probe for host
	usb: musb: Get the musb_qh poniter after musb_giveback
	usb: musb: Modify the "HWVers" register address
	iio: pressure: bmp280: Fix NULL pointer exception
	iio: pressure: ms5611: ms5611_prom_is_valid false negative bug
	mcb: remove is_added flag from mcb_device struct
	ceph: fix incorrect revoked caps assert in ceph_fill_file_size()
	Input: powermate - fix use-after-free in powermate_config_complete
	Input: psmouse - fix fast_reconnect function for PS/2 mode
	Input: xpad - add PXN V900 support
	cgroup: Remove duplicates in cgroup v1 tasks file
	pinctrl: avoid unsafe code pattern in find_pinctrl()
	x86/cpu: Fix AMD erratum #1485 on Zen4-based CPUs
	usb: gadget: udc-xilinx: replace memcpy with memcpy_toio
	usb: gadget: ncm: Handle decoding of multiple NTB's in unwrap call
	powerpc/64e: Fix wrong test in __ptep_test_and_clear_young()
	x86/alternatives: Disable KASAN in apply_alternatives()
	dev_forward_skb: do not scrub skb mark within the same name space
	usb: hub: Guard against accesses to uninitialized BOS descriptors
	Bluetooth: hci_event: Ignore NULL link key
	Bluetooth: Reject connection with the device which has same BD_ADDR
	Bluetooth: Fix a refcnt underflow problem for hci_conn
	Bluetooth: vhci: Fix race when opening vhci device
	Bluetooth: hci_event: Fix coding style
	Bluetooth: avoid memcmp() out of bounds warning
	nfc: nci: fix possible NULL pointer dereference in send_acknowledge()
	regmap: fix NULL deref on lookup
	KVM: x86: Mask LVTPC when handling a PMI
	netfilter: nft_payload: fix wrong mac header matching
	xfrm: fix a data-race in xfrm_gen_index()
	xfrm: interface: use DEV_STATS_INC()
	net: ipv4: fix return value check in esp_remove_trailer
	net: ipv6: fix return value check in esp_remove_trailer
	net: rfkill: gpio: prevent value glitch during probe
	tcp: fix excessive TLP and RACK timeouts from HZ rounding
	tcp: tsq: relax tcp_small_queue_check() when rtx queue contains a single skb
	net: usb: smsc95xx: Fix an error code in smsc95xx_reset()
	i40e: prevent crash on probe if hw registers have invalid values
	net/sched: sch_hfsc: upgrade 'rt' to 'sc' when it becomes a inner curve
	netfilter: nft_set_rbtree: .deactivate fails if element has expired
	net: pktgen: Fix interface flags printing
	libceph: fix unaligned accesses in ceph_entity_addr handling
	libceph: use kernel_connect()
	ARM: dts: ti: omap: Fix noisy serial with overrun-throttle-ms for mapphone
	btrfs: return -EUCLEAN for delayed tree ref with a ref count not equals to 1
	btrfs: initialize start_slot in btrfs_log_prealloc_extents
	i2c: mux: Avoid potential false error message in i2c_mux_add_adapter
	overlayfs: set ctime when setting mtime and atime
	gpio: timberdale: Fix potential deadlock on &tgpio->lock
	ata: libata-eh: Fix compilation warning in ata_eh_link_report()
	tracing: relax trace_event_eval_update() execution with cond_resched()
	HID: holtek: fix slab-out-of-bounds Write in holtek_kbd_input_event
	Bluetooth: Avoid redundant authentication
	Bluetooth: hci_core: Fix build warnings
	wifi: mac80211: allow transmitting EAPOL frames with tainted key
	wifi: cfg80211: avoid leaking stack data into trace
	sky2: Make sure there is at least one frag_addr available
	drm: panel-orientation-quirks: Add quirk for One Mix 2S
	btrfs: fix some -Wmaybe-uninitialized warnings in ioctl.c
	Bluetooth: hci_event: Fix using memcmp when comparing keys
	mtd: rawnand: qcom: Unmap the right resource upon probe failure
	mtd: spinand: micron: correct bitmask for ecc status
	mmc: core: Capture correct oemid-bits for eMMC cards
	Revert "pinctrl: avoid unsafe code pattern in find_pinctrl()"
	ACPI: irq: Fix incorrect return value in acpi_register_gsi()
	USB: serial: option: add Telit LE910C4-WWX 0x1035 composition
	USB: serial: option: add entry for Sierra EM9191 with new firmware
	USB: serial: option: add Fibocom to DELL custom modem FM101R-GL
	perf: Disallow mis-matched inherited group reads
	s390/pci: fix iommu bitmap allocation
	gpio: vf610: set value before the direction to avoid a glitch
	ASoC: pxa: fix a memory leak in probe()
	phy: mapphone-mdm6600: Fix runtime PM for remove
	Bluetooth: hci_sock: fix slab oob read in create_monitor_event
	Bluetooth: hci_sock: Correctly bounds check and pad HCI_MON_NEW_INDEX name
	xfrm6: fix inet6_dev refcount underflow problem
	Linux 4.19.297

Change-Id: I495e8b8fbb6416ec3f94094fa905bdde364618b4
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2023-10-25 11:43:43 +00:00
Eric Dumazet
17c75411e2 xfrm: fix a data-race in xfrm_gen_index()
commit 3e4bc23926b83c3c67e5f61ae8571602754131a6 upstream.

xfrm_gen_index() mutual exclusion uses net->xfrm.xfrm_policy_lock.

This means we must use a per-netns idx_generator variable,
instead of a static one.
Alternative would be to use an atomic variable.

syzbot reported:

BUG: KCSAN: data-race in xfrm_sk_policy_insert / xfrm_sk_policy_insert

write to 0xffffffff87005938 of 4 bytes by task 29466 on cpu 0:
xfrm_gen_index net/xfrm/xfrm_policy.c:1385 [inline]
xfrm_sk_policy_insert+0x262/0x640 net/xfrm/xfrm_policy.c:2347
xfrm_user_policy+0x413/0x540 net/xfrm/xfrm_state.c:2639
do_ipv6_setsockopt+0x1317/0x2ce0 net/ipv6/ipv6_sockglue.c:943
ipv6_setsockopt+0x57/0x130 net/ipv6/ipv6_sockglue.c:1012
rawv6_setsockopt+0x21e/0x410 net/ipv6/raw.c:1054
sock_common_setsockopt+0x61/0x70 net/core/sock.c:3697
__sys_setsockopt+0x1c9/0x230 net/socket.c:2263
__do_sys_setsockopt net/socket.c:2274 [inline]
__se_sys_setsockopt net/socket.c:2271 [inline]
__x64_sys_setsockopt+0x66/0x80 net/socket.c:2271
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd

read to 0xffffffff87005938 of 4 bytes by task 29460 on cpu 1:
xfrm_sk_policy_insert+0x13e/0x640
xfrm_user_policy+0x413/0x540 net/xfrm/xfrm_state.c:2639
do_ipv6_setsockopt+0x1317/0x2ce0 net/ipv6/ipv6_sockglue.c:943
ipv6_setsockopt+0x57/0x130 net/ipv6/ipv6_sockglue.c:1012
rawv6_setsockopt+0x21e/0x410 net/ipv6/raw.c:1054
sock_common_setsockopt+0x61/0x70 net/core/sock.c:3697
__sys_setsockopt+0x1c9/0x230 net/socket.c:2263
__do_sys_setsockopt net/socket.c:2274 [inline]
__se_sys_setsockopt net/socket.c:2271 [inline]
__x64_sys_setsockopt+0x66/0x80 net/socket.c:2271
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd

value changed: 0x00006ad8 -> 0x00006b18

Reported by Kernel Concurrency Sanitizer on:
CPU: 1 PID: 29460 Comm: syz-executor.1 Not tainted 6.5.0-rc5-syzkaller-00243-g9106536c1aa3 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 07/26/2023

Fixes: 1121994c80 ("netns xfrm: policy insertion in netns")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-10-25 11:16:43 +02:00
Greg Kroah-Hartman
813e482b1b Merge 4.19.291 into android-4.19-stable
Changes in 4.19.291
	gfs2: Don't deref jdesc in evict
	x86/smp: Use dedicated cache-line for mwait_play_dead()
	video: imsttfb: check for ioremap() failures
	fbdev: imsttfb: Fix use after free bug in imsttfb_probe
	drm/edid: Fix uninitialized variable in drm_cvt_modes()
	scripts/tags.sh: Resolve gtags empty index generation
	drm/amdgpu: Validate VM ioctl flags.
	treewide: Remove uninitialized_var() usage
	md/raid10: check slab-out-of-bounds in md_bitmap_get_counter
	md/raid10: fix overflow of md/safe_mode_delay
	md/raid10: fix wrong setting of max_corr_read_errors
	md/raid10: fix io loss while replacement replace rdev
	irqchip/jcore-aic: Kill use of irq_create_strict_mappings()
	irqchip/jcore-aic: Fix missing allocation of IRQ descriptors
	clocksource/drivers: Unify the names to timer-* format
	clocksource/drivers/cadence-ttc: Use ttc driver as platform driver
	clocksource/drivers/cadence-ttc: Fix memory leak in ttc_timer_probe
	PM: domains: fix integer overflow issues in genpd_parse_state()
	ARM: 9303/1: kprobes: avoid missing-declaration warnings
	evm: Complete description of evm_inode_setattr()
	wifi: ath9k: fix AR9003 mac hardware hang check register offset calculation
	wifi: ath9k: avoid referencing uninit memory in ath9k_wmi_ctrl_rx
	samples/bpf: Fix buffer overflow in tcp_basertt
	wifi: mwifiex: Fix the size of a memory allocation in mwifiex_ret_802_11_scan()
	nfc: constify several pointers to u8, char and sk_buff
	nfc: llcp: fix possible use of uninitialized variable in nfc_llcp_send_connect()
	wifi: orinoco: Fix an error handling path in spectrum_cs_probe()
	wifi: orinoco: Fix an error handling path in orinoco_cs_probe()
	wifi: atmel: Fix an error handling path in atmel_probe()
	wl3501_cs: Fix a bunch of formatting issues related to function docs
	wl3501_cs: Remove unnecessary NULL check
	wl3501_cs: Fix misspelling and provide missing documentation
	net: create netdev->dev_addr assignment helpers
	wl3501_cs: use eth_hw_addr_set()
	wifi: wl3501_cs: Fix an error handling path in wl3501_probe()
	wifi: ray_cs: Utilize strnlen() in parse_addr()
	wifi: ray_cs: Drop useless status variable in parse_addr()
	wifi: ray_cs: Fix an error handling path in ray_probe()
	wifi: ath9k: don't allow to overwrite ENDPOINT0 attributes
	wifi: rsi: Do not set MMC_PM_KEEP_POWER in shutdown
	watchdog/perf: define dummy watchdog_update_hrtimer_threshold() on correct config
	watchdog/perf: more properly prevent false positives with turbo modes
	kexec: fix a memory leak in crash_shrink_memory()
	memstick r592: make memstick_debug_get_tpc_name() static
	wifi: ath9k: Fix possible stall on ath9k_txq_list_has_key()
	wifi: ath9k: convert msecs to jiffies where needed
	netlink: fix potential deadlock in netlink_set_err()
	netlink: do not hard code device address lenth in fdb dumps
	gtp: Fix use-after-free in __gtp_encap_destroy().
	lib/ts_bm: reset initial match offset for every block of text
	netfilter: nf_conntrack_sip: fix the ct_sip_parse_numerical_param() return value.
	ipvlan: Fix return value of ipvlan_queue_xmit()
	netlink: Add __sock_i_ino() for __netlink_diag_dump().
	radeon: avoid double free in ci_dpm_init()
	Input: drv260x - sleep between polling GO bit
	ARM: dts: BCM5301X: Drop "clock-names" from the SPI node
	Input: adxl34x - do not hardcode interrupt trigger type
	drm/panel: simple: fix active size for Ampire AM-480272H3TMQW-T01H
	ARM: ep93xx: fix missing-prototype warnings
	ASoC: es8316: Increment max value for ALC Capture Target Volume control
	soc/fsl/qe: fix usb.c build errors
	IB/hfi1: Fix sdma.h tx->num_descs off-by-one errors
	arm64: dts: renesas: ulcb-kf: Remove flow control for SCIF1
	fbdev: omapfb: lcd_mipid: Fix an error handling path in mipid_spi_probe()
	drm/radeon: fix possible division-by-zero errors
	ALSA: ac97: Fix possible NULL dereference in snd_ac97_mixer
	scsi: 3w-xxxx: Add error handling for initialization failure in tw_probe()
	PCI: Add pci_clear_master() stub for non-CONFIG_PCI
	pinctrl: cherryview: Return correct value if pin in push-pull mode
	perf dwarf-aux: Fix off-by-one in die_get_varname()
	pinctrl: at91-pio4: check return value of devm_kasprintf()
	hwrng: virtio - add an internal buffer
	hwrng: virtio - don't wait on cleanup
	hwrng: virtio - don't waste entropy
	hwrng: virtio - always add a pending request
	hwrng: virtio - Fix race on data_avail and actual data
	crypto: nx - fix build warnings when DEBUG_FS is not enabled
	modpost: fix section mismatch message for R_ARM_ABS32
	modpost: fix section mismatch message for R_ARM_{PC24,CALL,JUMP24}
	ARCv2: entry: comments about hardware auto-save on taken interrupts
	ARCv2: entry: push out the Z flag unclobber from common EXCEPTION_PROLOGUE
	ARCv2: entry: avoid a branch
	ARCv2: entry: rewrite to enable use of double load/stores LDD/STD
	ARC: define ASM_NL and __ALIGN(_STR) outside #ifdef __ASSEMBLY__ guard
	USB: serial: option: add LARA-R6 01B PIDs
	block: change all __u32 annotations to __be32 in affs_hardblocks.h
	w1: fix loop in w1_fini()
	sh: j2: Use ioremap() to translate device tree address into kernel memory
	media: usb: Check az6007_read() return value
	media: videodev2.h: Fix struct v4l2_input tuner index comment
	media: usb: siano: Fix warning due to null work_func_t function pointer
	extcon: Fix kernel doc of property fields to avoid warnings
	extcon: Fix kernel doc of property capability fields to avoid warnings
	usb: phy: phy-tahvo: fix memory leak in tahvo_usb_probe()
	mfd: rt5033: Drop rt5033-battery sub-device
	KVM: s390: fix KVM_S390_GET_CMMA_BITS for GFNs in memslot holes
	mfd: intel-lpss: Add missing check for platform_get_resource
	mfd: stmpe: Only disable the regulators if they are enabled
	rtc: st-lpc: Release some resources in st_rtc_probe() in case of error
	sctp: fix potential deadlock on &net->sctp.addr_wq_lock
	Add MODULE_FIRMWARE() for FIRMWARE_TG357766.
	spi: bcm-qspi: return error if neither hif_mspi nor mspi is available
	mailbox: ti-msgmgr: Fill non-message tx data fields with 0x0
	f2fs: fix error path handling in truncate_dnode()
	powerpc: allow PPC_EARLY_DEBUG_CPM only when SERIAL_CPM=y
	net: bridge: keep ports without IFF_UNICAST_FLT in BR_PROMISC mode
	tcp: annotate data races in __tcp_oow_rate_limited()
	net/sched: act_pedit: Add size check for TCA_PEDIT_PARMS_EX
	sh: dma: Fix DMA channel offset calculation
	i2c: xiic: Defer xiic_wakeup() and __xiic_start_xfer() in xiic_process()
	i2c: xiic: Don't try to handle more interrupt events after error
	ALSA: jack: Fix mutex call in snd_jack_report()
	NFSD: add encoding of op_recall flag for write delegation
	mmc: core: disable TRIM on Kingston EMMC04G-M627
	mmc: core: disable TRIM on Micron MTFC4GACAJCN-1M
	bcache: Remove unnecessary NULL point check in node allocations
	integrity: Fix possible multiple allocation in integrity_inode_get()
	jffs2: reduce stack usage in jffs2_build_xattr_subsystem()
	btrfs: fix race when deleting quota root from the dirty cow roots list
	ARM: orion5x: fix d2net gpio initialization
	spi: spi-fsl-spi: remove always-true conditional in fsl_spi_do_one_msg
	spi: spi-fsl-spi: relax message sanity checking a little
	spi: spi-fsl-spi: allow changing bits_per_word while CS is still active
	netfilter: nf_tables: fix nat hook table deletion
	netfilter: nf_tables: add rescheduling points during loop detection walks
	netfilter: nftables: add helper function to set the base sequence number
	netfilter: add helper function to set up the nfnetlink header and use it
	netfilter: nf_tables: use net_generic infra for transaction data
	netfilter: nf_tables: incorrect error path handling with NFT_MSG_NEWRULE
	netfilter: nf_tables: add NFT_TRANS_PREPARE_ERROR to deal with bound set/chain
	netfilter: nf_tables: reject unbound anonymous set before commit phase
	netfilter: nf_tables: unbind non-anonymous set if rule construction fails
	netfilter: nf_tables: fix scheduling-while-atomic splat
	netfilter: conntrack: Avoid nf_ct_helper_hash uses after free
	netfilter: nf_tables: prevent OOB access in nft_byteorder_eval
	net: lan743x: Don't sleep in atomic context
	workqueue: clean up WORK_* constant types, clarify masking
	net: mvneta: fix txq_map in case of txq_number==1
	vrf: Increment Icmp6InMsgs on the original netdev
	icmp6: Fix null-ptr-deref of ip6_null_entry->rt6i_idev in icmp6_dev().
	udp6: fix udp6_ehashfn() typo
	ntb: idt: Fix error handling in idt_pci_driver_init()
	NTB: amd: Fix error handling in amd_ntb_pci_driver_init()
	ntb: intel: Fix error handling in intel_ntb_pci_driver_init()
	NTB: ntb_transport: fix possible memory leak while device_register() fails
	NTB: ntb_tool: Add check for devm_kcalloc
	ipv6/addrconf: fix a potential refcount underflow for idev
	wifi: airo: avoid uninitialized warning in airo_get_rate()
	net/sched: make psched_mtu() RTNL-less safe
	pinctrl: amd: Fix mistake in handling clearing pins at startup
	pinctrl: amd: Detect internal GPIO0 debounce handling
	pinctrl: amd: Only use special debounce behavior for GPIO 0
	tpm: tpm_vtpm_proxy: fix a race condition in /dev/vtpmx creation
	net: bcmgenet: Ensure MDIO unregistration has clocks enabled
	SUNRPC: Fix UAF in svc_tcp_listen_data_ready()
	perf intel-pt: Fix CYC timestamps after standalone CBR
	ext4: fix wrong unit use in ext4_mb_clear_bb
	ext4: only update i_reserved_data_blocks on successful block allocation
	jfs: jfs_dmap: Validate db_l2nbperpage while mounting
	PCI/PM: Avoid putting EloPOS E2/S2/H2 PCIe Ports in D3cold
	PCI: Add function 1 DMA alias quirk for Marvell 88SE9235
	PCI: qcom: Disable write access to read only registers for IP v2.3.3
	PCI: rockchip: Assert PCI Configuration Enable bit after probe
	PCI: rockchip: Write PCI Device ID to correct register
	PCI: rockchip: Add poll and timeout to wait for PHY PLLs to be locked
	PCI: rockchip: Fix legacy IRQ generation for RK3399 PCIe endpoint core
	PCI: rockchip: Use u32 variable to access 32-bit registers
	misc: pci_endpoint_test: Free IRQs before removing the device
	misc: pci_endpoint_test: Re-init completion for every test
	md/raid0: add discard support for the 'original' layout
	fs: dlm: return positive pid value for F_GETLK
	serial: atmel: don't enable IRQs prematurely
	hwrng: imx-rngc - fix the timeout for init and self check
	ceph: don't let check_caps skip sending responses for revoke msgs
	meson saradc: fix clock divider mask length
	Revert "8250: add support for ASIX devices with a FIFO bug"
	tty: serial: samsung_tty: Fix a memory leak in s3c24xx_serial_getclk() in case of error
	tty: serial: samsung_tty: Fix a memory leak in s3c24xx_serial_getclk() when iterating clk
	ring-buffer: Fix deadloop issue on reading trace_pipe
	xtensa: ISS: fix call to split_if_spec
	scsi: qla2xxx: Wait for io return on terminate rport
	scsi: qla2xxx: Fix potential NULL pointer dereference
	scsi: qla2xxx: Check valid rport returned by fc_bsg_to_rport()
	scsi: qla2xxx: Pointer may be dereferenced
	drm/atomic: Fix potential use-after-free in nonblocking commits
	tracing/histograms: Add histograms to hist_vars if they have referenced variables
	perf probe: Add test for regression introduced by switch to die_get_decl_file()
	fuse: revalidate: don't invalidate if interrupted
	can: bcm: Fix UAF in bcm_proc_show()
	ext4: correct inline offset when handling xattrs in inode body
	debugobjects: Recheck debug_objects_enabled before reporting
	nbd: Add the maximum limit of allocated index in nbd_dev_add
	md: fix data corruption for raid456 when reshape restart while grow up
	md/raid10: prevent soft lockup while flush writes
	posix-timers: Ensure timer ID search-loop limit is valid
	sched/fair: Don't balance task to its current running CPU
	bpf: Address KCSAN report on bpf_lru_list
	wifi: wext-core: Fix -Wstringop-overflow warning in ioctl_standard_iw_point()
	wifi: iwlwifi: mvm: avoid baid size integer overflow
	igb: Fix igb_down hung on surprise removal
	spi: bcm63xx: fix max prepend length
	fbdev: imxfb: warn about invalid left/right margin
	pinctrl: amd: Use amd_pinconf_set() for all config options
	net: ethernet: ti: cpsw_ale: Fix cpsw_ale_get_field()/cpsw_ale_set_field()
	net:ipv6: check return value of pskb_trim()
	Revert "tcp: avoid the lookup process failing to get sk in ehash table"
	fbdev: au1200fb: Fix missing IRQ check in au1200fb_drv_probe
	llc: Don't drop packet from non-root netns.
	netfilter: nf_tables: fix spurious set element insertion failure
	netfilter: nf_tables: can't schedule in nft_chain_validate
	net: Replace the limit of TCP_LINGER2 with TCP_FIN_TIMEOUT_MAX
	tcp: annotate data-races around tp->linger2
	tcp: annotate data-races around rskq_defer_accept
	tcp: annotate data-races around tp->notsent_lowat
	tcp: annotate data-races around fastopenq.max_qlen
	tracing/histograms: Return an error if we fail to add histogram to hist_vars list
	gpio: tps68470: Make tps68470_gpio_output() always set the initial value
	bcache: use MAX_CACHES_PER_SET instead of magic number 8 in __bch_bucket_alloc_set
	bcache: remove 'int n' from parameter list of bch_bucket_alloc_set()
	bcache: Fix __bch_btree_node_alloc to make the failure behavior consistent
	btrfs: fix extent buffer leak after tree mod log failure at split_node()
	ext4: rename journal_dev to s_journal_dev inside ext4_sb_info
	ext4: Fix reusing stale buffer heads from last failed mounting
	PCI: Rework pcie_retrain_link() wait loop
	PCI/ASPM: Return 0 or -ETIMEDOUT from pcie_retrain_link()
	PCI/ASPM: Factor out pcie_wait_for_retrain()
	PCI/ASPM: Avoid link retraining race
	dlm: cleanup plock_op vs plock_xop
	dlm: rearrange async condition return
	fs: dlm: interrupt posix locks only when process is killed
	ftrace: Add information on number of page groups allocated
	ftrace: Check if pages were allocated before calling free_pages()
	ftrace: Store the order of pages allocated in ftrace_page
	ftrace: Fix possible warning on checking all pages used in ftrace_process_locs()
	scsi: qla2xxx: Fix inconsistent format argument type in qla_os.c
	scsi: qla2xxx: Array index may go out of bound
	ext4: fix to check return value of freeze_bdev() in ext4_shutdown()
	i40e: Fix an NULL vs IS_ERR() bug for debugfs_create_dir()
	phy: hisilicon: Fix an out of bounds check in hisi_inno_phy_probe()
	ethernet: atheros: fix return value check in atl1e_tso_csum()
	ipv6 addrconf: fix bug where deleting a mngtmpaddr can create a new temporary address
	tcp: Reduce chance of collisions in inet6_hashfn().
	bonding: reset bond's flags when down link is P2P device
	team: reset team's flags when down link is P2P device
	platform/x86: msi-laptop: Fix rfkill out-of-sync on MSI Wind U100
	net/sched: mqprio: refactor nlattr parsing to a separate function
	net/sched: mqprio: add extack to mqprio_parse_nlattr()
	net/sched: mqprio: Add length check for TCA_MQPRIO_{MAX/MIN}_RATE64
	benet: fix return value check in be_lancer_xmit_workarounds()
	RDMA/mlx4: Make check for invalid flags stricter
	drm/msm: Fix IS_ERR_OR_NULL() vs NULL check in a5xx_submit_in_rb()
	ASoC: fsl_spdif: Silence output on stop
	block: Fix a source code comment in include/uapi/linux/blkzoned.h
	dm raid: fix missing reconfig_mutex unlock in raid_ctr() error paths
	ata: pata_ns87415: mark ns87560_tf_read static
	ring-buffer: Fix wrong stat of cpu_buffer->read
	tracing: Fix warning in trace_buffered_event_disable()
	USB: serial: option: support Quectel EM060K_128
	USB: serial: option: add Quectel EC200A module support
	USB: serial: simple: add Kaufmann RKS+CAN VCP
	USB: serial: simple: sort driver entries
	can: gs_usb: gs_can_close(): add missing set of CAN state to CAN_STATE_STOPPED
	Revert "usb: dwc3: core: Enable AutoRetry feature in the controller"
	usb: dwc3: pci: skip BYT GPIO lookup table for hardwired phy
	usb: dwc3: don't reset device side if dwc3 was configured as host-only
	usb: ohci-at91: Fix the unhandle interrupt when resume
	USB: quirks: add quirk for Focusrite Scarlett
	usb: xhci-mtk: set the dma max_seg_size
	Documentation: security-bugs.rst: update preferences when dealing with the linux-distros group
	Documentation: security-bugs.rst: clarify CVE handling
	staging: ks7010: potential buffer overflow in ks_wlan_set_encode_ext()
	hwmon: (nct7802) Fix for temp6 (PECI1) processed even if PECI1 disabled
	btrfs: check for commit error at btrfs_attach_transaction_barrier()
	tpm_tis: Explicitly check for error code
	irq-bcm6345-l1: Do not assume a fixed block to cpu mapping
	serial: 8250_dw: split Synopsys DesignWare 8250 common functions
	serial: 8250_dw: Preserve original value of DLF register
	virtio-net: fix race between set queues and probe
	s390/dasd: fix hanging device after quiesce/resume
	ASoC: wm8904: Fill the cache for WM8904_ADC_TEST_0 register
	dm cache policy smq: ensure IO doesn't prevent cleaner policy progress
	drm/client: Fix memory leak in drm_client_target_cloned
	net/sched: cls_fw: Fix improper refcount update leads to use-after-free
	net/sched: sch_qfq: account for stab overhead in qfq_enqueue
	ASoC: cs42l51: fix driver to properly autoload with automatic module loading
	net/sched: cls_u32: Fix reference counter leak leading to overflow
	perf: Fix function pointer case
	loop: Select I/O scheduler 'none' from inside add_disk()
	word-at-a-time: use the same return type for has_zero regardless of endianness
	KVM: s390: fix sthyi error handling
	net/mlx5e: fix return value check in mlx5e_ipsec_remove_trailer()
	perf test uprobe_from_different_cu: Skip if there is no gcc
	net: sched: cls_u32: Fix match key mis-addressing
	net: add missing data-race annotations around sk->sk_peek_off
	net: add missing data-race annotation for sk_ll_usec
	net/sched: cls_u32: No longer copy tcf_result on update to avoid use-after-free
	net/sched: cls_route: No longer copy tcf_result on update to avoid use-after-free
	ip6mr: Fix skb_under_panic in ip6mr_cache_report()
	tcp_metrics: fix addr_same() helper
	tcp_metrics: annotate data-races around tm->tcpm_stamp
	tcp_metrics: annotate data-races around tm->tcpm_lock
	tcp_metrics: annotate data-races around tm->tcpm_vals[]
	tcp_metrics: annotate data-races around tm->tcpm_net
	tcp_metrics: fix data-race in tcpm_suck_dst() vs fastopen
	scsi: zfcp: Defer fc_rport blocking until after ADISC response
	libceph: fix potential hang in ceph_osdc_notify()
	USB: zaurus: Add ID for A-300/B-500/C-700
	fs/sysv: Null check to prevent null-ptr-deref bug
	Bluetooth: L2CAP: Fix use-after-free in l2cap_sock_ready_cb
	net: usbnet: Fix WARNING in usbnet_start_xmit/usb_submit_urb
	ext2: Drop fragment support
	test_firmware: fix a memory leak with reqs buffer
	test_firmware: return ENOMEM instead of ENOSPC on failed memory allocation
	mtd: rawnand: omap_elm: Fix incorrect type in assignment
	powerpc/mm/altmap: Fix altmap boundary check
	PM / wakeirq: support enabling wake-up irq after runtime_suspend called
	PM: sleep: wakeirq: fix wake irq arming
	ARM: dts: imx6sll: Make ssi node name same as other platforms
	ARM: dts: imx: add usb alias
	ARM: dts: imx6sll: fixup of operating points
	ARM: dts: nxp/imx6sll: fix wrong property name in usbphy node
	drivers core: Use sysfs_emit and sysfs_emit_at for show(device *...) functions
	arm64: dts: stratix10: fix incorrect I2C property for SCL signal
	drm/edid: fix objtool warning in drm_cvt_modes()
	Linux 4.19.291

Change-Id: I4f78e25efd18415989ecf5e227a17e05b0d6386c
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2023-08-25 11:24:56 +00:00
Pablo Neira Ayuso
88bae77d66 netfilter: nf_tables: use net_generic infra for transaction data
[ 0854db2aaef3fcdd3498a9d299c60adea2aa3dc6 ]

This moves all nf_tables pernet data from struct net to a net_generic
extension, with the exception of the gencursor.

The latter is used in the data path and also outside of the nf_tables
core. All others are only used from the configuration plane.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-08-11 11:45:16 +02:00
Greg Kroah-Hartman
0e19062e22 Merge 4.19.287 into android-4.19-stable
Changes in 4.19.287
	power: supply: ab8500: Fix external_power_changed race
	power: supply: bq27xxx: Use mod_delayed_work() instead of cancel() + schedule()
	ARM: dts: vexpress: add missing cache properties
	power: supply: Ratelimit no data debug output
	regulator: Fix error checking for debugfs_create_dir
	irqchip/meson-gpio: Mark OF related data as maybe unused
	power: supply: Fix logic checking if system is running from battery
	parisc: Improve cache flushing for PCXL in arch_sync_dma_for_cpu()
	MIPS: Alchemy: fix dbdma2
	mips: Move initrd_start check after initrd address sanitisation.
	xen/blkfront: Only check REQ_FUA for writes
	ocfs2: fix use-after-free when unmounting read-only filesystem
	ocfs2: check new file size on fallocate call
	nios2: dts: Fix tse_mac "max-frame-size" property
	nilfs2: fix incomplete buffer cleanup in nilfs_btnode_abort_change_key()
	nilfs2: fix possible out-of-bounds segment allocation in resize ioctl
	kexec: support purgatories with .text.hot sections
	powerpc/purgatory: remove PGO flags
	nouveau: fix client work fence deletion race
	RDMA/uverbs: Restrict usage of privileged QKEYs
	net: usb: qmi_wwan: add support for Compal RXM-G1
	Remove DECnet support from kernel
	USB: serial: option: add Quectel EM061KGL series
	usb: dwc3: gadget: Reset num TRBs before giving back the request
	usb: gadget: f_ncm: Add OS descriptor support
	usb: gadget: f_ncm: Fix NTP-32 support
	netfilter: nfnetlink: skip error delivery on batch in case of ENOMEM
	ping6: Fix send to link-local addresses with VRF.
	RDMA/rxe: Remove the unused variable obj
	RDMA/rxe: Removed unused name from rxe_task struct
	RDMA/rxe: Fix the use-before-initialization error of resp_pkts
	IB/uverbs: Fix to consider event queue closing also upon non-blocking mode
	IB/isert: Fix dead lock in ib_isert
	IB/isert: Fix possible list corruption in CMA handler
	IB/isert: Fix incorrect release of isert connection
	sctp: fix an error code in sctp_sf_eat_auth()
	igb: fix nvm.ops.read() error handling
	drm/nouveau/dp: check for NULL nv_connector->native_mode
	drm/nouveau/kms: Don't change EDID when it hasn't actually changed
	drm/nouveau: add nv_encoder pointer check for NULL
	net: lapbether: only support ethernet devices
	net: tipc: resize nlattr array to correct size
	selftests/ptp: Fix timestamp printf format for PTP_SYS_OFFSET
	neighbour: Remove unused inline function neigh_key_eq16()
	net: Remove unused inline function dst_hold_and_use()
	neighbour: delete neigh_lookup_nodev as not used
	drm/nouveau/kms: Fix NULL pointer dereference in nouveau_connector_detect_depth
	powerpc: Fix defconfig choice logic when cross compiling
	mmc: block: ensure error propagation for non-blk
	Linux 4.19.287

Change-Id: Ib4119b05e8fe06820cd2d6f3aa66a7e7e8cc5100
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2023-06-21 16:30:35 +00:00
Stephen Hemminger
3e77bbc873 Remove DECnet support from kernel
commit 1202cdd665315c525b5237e96e0bedc76d7e754f upstream.

DECnet is an obsolete network protocol that receives more attention
from kernel janitors than users. It belongs in computer protocol
history museum not in Linux kernel.

It has been "Orphaned" in kernel since 2010. The iproute2 support
for DECnet was dropped in 5.0 release. The documentation link on
Sourceforge says it is abandoned there as well.

Leave the UAPI alone to keep userspace programs compiling.
This means that there is still an empty neighbour table
for AF_DECNET.

The table of /proc/sys/net entries was updated to match
current directories and reformatted to be alphabetical.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: David Ahern <dsahern@kernel.org>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-06-21 15:39:57 +02:00
Greg Kroah-Hartman
fcab486168 Revert "net: xfrm: Localize sequence counter per network namespace"
This reverts commit 0cc68d05c0 as it
breaks the abi and we can't allow that at this point in time.

Bug: 161946584
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I21e6c33ac7011d8489e766fd9b5e32bd528dc456
2021-04-16 07:42:49 +02:00
Greg Kroah-Hartman
e0524263c8 Merge 4.19.187 into android-4.19-stable
Changes in 4.19.187
	ALSA: aloop: Fix initialization of controls
	ASoC: intel: atom: Stop advertising non working S24LE support
	nfc: fix refcount leak in llcp_sock_bind()
	nfc: fix refcount leak in llcp_sock_connect()
	nfc: fix memory leak in llcp_sock_connect()
	nfc: Avoid endless loops caused by repeated llcp_sock_connect()
	xen/evtchn: Change irq_info lock to raw_spinlock_t
	net: ipv6: check for validity before dereferencing cfg->fc_nlinfo.nlh
	ia64: fix user_stack_pointer() for ptrace()
	nds32: flush_dcache_page: use page_mapping_file to avoid races with swapoff
	ocfs2: fix deadlock between setattr and dio_end_io_write
	fs: direct-io: fix missing sdio->boundary
	parisc: parisc-agp requires SBA IOMMU driver
	parisc: avoid a warning on u8 cast for cmpxchg on u8 pointers
	ARM: dts: turris-omnia: configure LED[2]/INTn pin as interrupt pin
	batman-adv: initialize "struct batadv_tvlv_tt_vlan_data"->reserved field
	ice: Increase control queue timeout
	net: hso: fix null-ptr-deref during tty device unregistration
	net: ensure mac header is set in virtio_net_hdr_to_skb()
	net: sched: sch_teql: fix null-pointer dereference
	net-ipv6: bugfix - raw & sctp - switch to ipv6_can_nonlocal_bind()
	usbip: add sysfs_lock to synchronize sysfs code paths
	usbip: stub-dev synchronize sysfs code paths
	usbip: vudc synchronize sysfs code paths
	usbip: synchronize event handler with sysfs code paths
	i2c: turn recovery error on init to debug
	virtio_net: Add XDP meta data support
	xfrm: interface: fix ipv4 pmtu check to honor ip header df
	regulator: bd9571mwv: Fix AVS and DVFS voltage range
	net: xfrm: Localize sequence counter per network namespace
	ASoC: wm8960: Fix wrong bclk and lrclk with pll enabled for some chips
	i40e: Added Asym_Pause to supported link modes
	i40e: Fix kernel oops when i40e driver removes VF's
	amd-xgbe: Update DMA coherency values
	sch_red: fix off-by-one checks in red_check_params()
	gianfar: Handle error code at MAC address change
	cxgb4: avoid collecting SGE_QBASE regs during traffic
	net:tipc: Fix a double free in tipc_sk_mcast_rcv
	ARM: dts: imx6: pbab01: Set vmmc supply for both SD interfaces
	net/ncsi: Avoid channel_monitor hrtimer deadlock
	ASoC: sunxi: sun4i-codec: fill ASoC card owner
	soc/fsl: qbman: fix conflicting alignment attributes
	clk: fix invalid usage of list cursor in register
	clk: fix invalid usage of list cursor in unregister
	workqueue: Move the position of debug_work_activate() in __queue_work()
	s390/cpcmd: fix inline assembly register clobbering
	net/mlx5: Fix placement of log_max_flow_counter
	net/mlx5: Fix PBMC register mapping
	RDMA/cxgb4: check for ipv6 address properly while destroying listener
	clk: socfpga: fix iomem pointer cast on 64-bit
	net: sched: bump refcount for new action in ACT replace mode
	cfg80211: remove WARN_ON() in cfg80211_sme_connect
	net: tun: set tun->dev->addr_len during TUNSETLINK processing
	drivers: net: fix memory leak in atusb_probe
	drivers: net: fix memory leak in peak_usb_create_dev
	net: mac802154: Fix general protection fault
	net: ieee802154: nl-mac: fix check on panid
	net: ieee802154: fix nl802154 del llsec key
	net: ieee802154: fix nl802154 del llsec dev
	net: ieee802154: fix nl802154 add llsec key
	net: ieee802154: fix nl802154 del llsec devkey
	net: ieee802154: forbid monitor for set llsec params
	net: ieee802154: forbid monitor for del llsec seclevel
	net: ieee802154: stop dump llsec params for monitors
	Revert "cifs: Set CIFS_MOUNT_USE_PREFIX_PATH flag on setting cifs_sb->prepath."
	Linux 4.19.187

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: If3e6588f5ba37d646e090aeec39e733817e6abda
2021-04-16 07:42:26 +02:00
Ahmed S. Darwish
0cc68d05c0 net: xfrm: Localize sequence counter per network namespace
[ Upstream commit e88add19f68191448427a6e4eb059664650a837f ]

A sequence counter write section must be serialized or its internal
state can get corrupted. The "xfrm_state_hash_generation" seqcount is
global, but its write serialization lock (net->xfrm.xfrm_state_lock) is
instantiated per network namespace. The write protection is thus
insufficient.

To provide full protection, localize the sequence counter per network
namespace instead. This should be safe as both the seqcount read and
write sections access data exclusively within the network namespace. It
also lays the foundation for transforming "xfrm_state_hash_generation"
data type from seqcount_t to seqcount_LOCKNAME_t in further commits.

Fixes: b65e3d7be0 ("xfrm: state: add sequence count to detect hash resizes")
Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-04-14 08:22:34 +02:00
Suren Baghdasaryan
f32f289c4f ANDROID: GKI: net: remove conditional members causing ABI diffs
Remove conditions for including struct net members based on kernel configs
CONFIG_IP_SCTP, CONFIG_NETFILTER_FAMILY_BRIDGE and
CONFIG_BRIDGE_NF_EBTABLES.

Bug: 151115806
Test: build
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Change-Id: I6b0e75f0976ab79ea933fe64bc73ba08c9e7c71a
2020-03-23 17:13:05 -07:00
Eric Dumazet
7f9f8a37e5 tcp: add tcp_min_snd_mss sysctl
commit 5f3e2bf008c2221478101ee72f5cb4654b9fc363 upstream.

Some TCP peers announce a very small MSS option in their SYN and/or
SYN/ACK messages.

This forces the stack to send packets with a very high network/cpu
overhead.

Linux has enforced a minimal value of 48. Since this value includes
the size of TCP options, and that the options can consume up to 40
bytes, this means that each segment can include only 8 bytes of payload.

In some cases, it can be useful to increase the minimal value
to a saner value.

We still let the default to 48 (TCP_MIN_SND_MSS), for compatibility
reasons.

Note that TCP_MAXSEG socket option enforces a minimal value
of (TCP_MIN_MSS). David Miller increased this minimal value
in commit c39508d6f1 ("tcp: Make TCP_MAXSEG minimum more correct.")
from 64 to 88.

We might in the future merge TCP_MIN_SND_MSS and TCP_MIN_MSS.

CVE-2019-11479 -- tcp mss hardcoded to 48

Signed-off-by: Eric Dumazet <edumazet@google.com>
Suggested-by: Jonathan Looney <jtl@netflix.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Tyler Hicks <tyhicks@canonical.com>
Cc: Bruce Curtis <brucec@netflix.com>
Cc: Jonathan Lemon <jonathan.lemon@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-06-17 19:51:56 +02:00
Eric Dumazet
07480da0c8 inet: switch IP ID generator to siphash
[ Upstream commit df453700e8d81b1bdafdf684365ee2b9431fb702 ]

According to Amit Klein and Benny Pinkas, IP ID generation is too weak
and might be used by attackers.

Even with recent net_hash_mix() fix (netns: provide pure entropy for net_hash_mix())
having 64bit key and Jenkins hash is risky.

It is time to switch to siphash and its 128bit keys.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Amit Klein <aksecurity@gmail.com>
Reported-by: Benny Pinkas <benny@pinkas.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-06-04 08:02:30 +02:00
Eric Dumazet
a1c2f32297 netns: provide pure entropy for net_hash_mix()
[ Upstream commit 355b98553789b646ed97ad801a619ff898471b92 ]

net_hash_mix() currently uses kernel address of a struct net,
and is used in many places that could be used to reveal this
address to a patient attacker, thus defeating KASLR, for
the typical case (initial net namespace, &init_net is
not dynamically allocated)

I believe the original implementation tried to avoid spending
too many cycles in this function, but security comes first.

Also provide entropy regardless of CONFIG_NET_NS.

Fixes: 0b4419162a ("netns: introduce the net_hash_mix "salt" for hashes")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Amit Klein <aksecurity@gmail.com>
Reported-by: Benny Pinkas <benny@pinkas.net>
Cc: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-04-17 08:38:40 +02:00
Virgile Jarry
e6f86b0f7a ipv6: Add icmp_echo_ignore_all support for ICMPv6
Preventing the kernel from responding to ICMP Echo Requests messages
can be useful in several ways. The sysctl parameter
'icmp_echo_ignore_all' can be used to prevent the kernel from
responding to IPv4 ICMP echo requests. For IPv6 pings, such
a sysctl kernel parameter did not exist.

Add the ability to prevent the kernel from responding to IPv6
ICMP echo requests through the use of the following sysctl
parameter : /proc/sys/net/ipv6/icmp/echo_ignore_all.
Update the documentation to reflect this change.

Signed-off-by: Virgile Jarry <virgile@acceis.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-13 08:42:25 -07:00
Petr Machata
432e05d328 net: ipv4: Control SKB reprioritization after forwarding
After IPv4 packets are forwarded, the priority of the corresponding SKB
is updated according to the TOS field of IPv4 header. This overrides any
prioritization done earlier by e.g. an skbedit action or ingress-qos-map
defined at a vlan device.

Such overriding may not always be desirable. Even if the packet ends up
being routed, which implies this is an L3 network node, an administrator
may wish to preserve whatever prioritization was done earlier on in the
pipeline.

Therefore introduce a sysctl that controls this behavior. Keep the
default value at 1 to maintain backward-compatible behavior.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-01 09:52:30 -07:00
Florian Westphal
f102d66b33 netfilter: nf_tables: use dedicated mutex to guard transactions
Continue to use nftnl subsys mutex to protect (un)registration of hook types,
expressions and so on, but force batch operations to do their own
locking.

This allows distinct net namespaces to perform transactions in parallel.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-07-18 11:26:48 +02:00
David S. Miller
5cd3da4ba2 Merge ra.kernel.org:/pub/scm/linux/kernel/git/davem/net
Simple overlapping changes in stmmac driver.

Adjust skb_gro_flush_final_remcsum function signature to make GRO list
changes in net-next, as per Stephen Rothwell's example merge
resolution.

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-03 10:29:26 +09:00
Eric Dumazet
5424ea2739 netns: get more entropy from net_hash_mix()
struct net are effectively allocated from order-1 pages on x86,
with one object per slab, meaning that the 13 low order bits
of their addresses are zero.

Once shifted by L1_CACHE_SHIFT, this leaves 7 zero-bits,
meaning that net_hash_mix() does not help spreading
objects on various hash tables.

For example, TCP listen table has 32 buckets, meaning that
all netns use the same bucket for port 80 or port 443.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Maciej Żenczykowski <maze@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-23 10:59:56 +09:00
Eric Dumazet
9ce7bc036a netfilter: ipv6: nf_defrag: reduce struct net memory waste
It is a waste of memory to use a full "struct netns_sysctl_ipv6"
while only one pointer is really used, considering netns_sysctl_ipv6
keeps growing.

Also, since "struct netns_frags" has cache line alignment,
it is better to move the frags_hdr pointer outside, otherwise
we spend a full cache line for this pointer.

This saves 192 bytes of memory per netns.

Fixes: c038a767cd ("ipv6: add a new namespace for nf_conntrack_reasm")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-06-18 14:13:25 +02:00
Pablo Neira Ayuso
a654de8fdc netfilter: nf_tables: fix chain dependency validation
The following ruleset:

 add table ip filter
 add chain ip filter input { type filter hook input priority 4; }
 add chain ip filter ap
 add rule ip filter input jump ap
 add rule ip filter ap masquerade

results in a panic, because the masquerade extension should be rejected
from the filter chain. The existing validation is missing a chain
dependency check when the rule is added to the non-base chain.

This patch fixes the problem by walking down the rules from the
basechains, searching for either immediate or lookup expressions, then
jumping to non-base chains and again walking down the rules to perform
the expression validation, so we make sure the full ruleset graph is
validated. This is done only once from the commit phase, in case of
problem, we abort the transaction and perform fine grain validation for
error reporting. This patch requires 003087911a ("netfilter:
nfnetlink: allow commit to fail") to achieve this behaviour.

This patch also adds a cleanup callback to nfnl batch interface to reset
the validate state from the exit path.

As a result of this patch, nf_tables_check_loops() doesn't use
->validate to check for loops, instead it just checks for immediate
expressions.

Reported-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-06-01 09:46:22 +02:00
David S. Miller
fb83eb93c6 Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next
Pablo Neira Ayuso says:

====================
Netfilter updates for net-next

The following patchset contains Netfilter updates for your net-next
tree, they are:

1) Remove obsolete nf_log tracing from nf_tables, from Florian Westphal.

2) Add support for map lookups to numgen, random and hash expressions,
   from Laura Garcia.

3) Allow to register nat hooks for iptables and nftables at the same
   time. Patchset from Florian Westpha.

4) Timeout support for rbtree sets.

5) ip6_rpfilter works needs interface for link-local addresses, from
   Vincent Bernat.

6) Add nf_ct_hook and nf_nat_hook structures and use them.

7) Do not drop packets on packets raceing to insert conntrack entries
   into hashes, this is particularly a problem in nfqueue setups.

8) Address fallout from xt_osf separation to nf_osf, patches
   from Florian Westphal and Fernando Mancera.

9) Remove reference to struct nft_af_info, which doesn't exist anymore.
   From Taehee Yoo.

This batch comes with is a conflict between 25fd386e0b ("netfilter:
core: add missing __rcu annotation") in your tree and 2c205dd398
("netfilter: add struct nf_nat_hook and use it") coming in this batch.
This conflict can be solved by leaving the __rcu tag on
__netfilter_net_init() - added by 25fd386e0b - and remove all code
related to nf_nat_decode_session_hook - which is gone after
2c205dd398, as described by:

diff --cc net/netfilter/core.c
index e0ae4aae96f5,206fb2c4c319..168af54db975
--- a/net/netfilter/core.c
+++ b/net/netfilter/core.c
@@@ -611,7 -580,13 +611,8 @@@ const struct nf_conntrack_zone nf_ct_zo
  EXPORT_SYMBOL_GPL(nf_ct_zone_dflt);
  #endif /* CONFIG_NF_CONNTRACK */

- static void __net_init __netfilter_net_init(struct nf_hook_entries **e, int max)
 -#ifdef CONFIG_NF_NAT_NEEDED
 -void (*nf_nat_decode_session_hook)(struct sk_buff *, struct flowi *);
 -EXPORT_SYMBOL(nf_nat_decode_session_hook);
 -#endif
 -
+ static void __net_init
+ __netfilter_net_init(struct nf_hook_entries __rcu **e, int max)
  {
  	int h;

I can also merge your net-next tree into nf-next, solve the conflict and
resend the pull request if you prefer so.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-23 16:37:11 -04:00
Taehee Yoo
0c6bca7471 netfilter: nf_tables: remove nft_af_info.
The struct nft_af_info was removed.

Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-05-23 12:16:25 +02:00
Eric Dumazet
9c21d2fc41 tcp: add tcp_comp_sack_nr sysctl
This per netns sysctl allows for TCP SACK compression fine-tuning.

This limits number of SACK that can be compressed.
Using 0 disables SACK compression.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-18 11:40:27 -04:00
Eric Dumazet
6d82aa2420 tcp: add tcp_comp_sack_delay_ns sysctl
This per netns sysctl allows for TCP SACK compression fine-tuning.

Its default value is 1,000,000, or 1 ms to meet TSO autosizing period.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-18 11:40:27 -04:00
Ahmed Abdelsalam
b5facfdba1 ipv6: sr: Compute flowlabel for outer IPv6 header of seg6 encap mode
ECMP (equal-cost multipath) hashes are typically computed on the packets'
5-tuple(src IP, dst IP, src port, dst port, L4 proto).

For encapsulated packets, the L4 data is not readily available and ECMP
hashing will often revert to (src IP, dst IP). This will lead to traffic
polarization on a single ECMP path, causing congestion and waste of network
capacity.

In IPv6, the 20-bit flow label field is also used as part of the ECMP hash.
In the lack of L4 data, the hashing will be on (src IP, dst IP, flow
label). Having a non-zero flow label is thus important for proper traffic
load balancing when L4 data is unavailable (i.e., when packets are
encapsulated).

Currently, the seg6_do_srh_encap() function extracts the original packet's
flow label and set it as the outer IPv6 flow label. There are two issues
with this behaviour:

a) There is no guarantee that the inner flow label is set by the source.
b) If the original packet is not IPv6, the flow label will be set to
zero (e.g., IPv4 or L2 encap).

This patch adds a function, named seg6_make_flowlabel(), that computes a
flow label from a given skb. It supports IPv6, IPv4 and L2 payloads, and
leverages the per namespace 'seg6_flowlabel" sysctl value.

The currently support behaviours are as follows:
-1 set flowlabel to zero.
0 copy flowlabel from Inner paceket in case of Inner IPv6
(Set flowlabel to 0 in case IPv4/L2)
1 Compute the flowlabel using seg6_make_flowlabel()

This patch has been tested for IPv6, IPv4, and L2 traffic.

Signed-off-by: Ahmed Abdelsalam <amsalam20@gmail.com>
Acked-by: David Lebrun <dlebrun@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-25 13:02:15 -04:00
David Ahern
8d1c802b28 net/ipv6: Flip FIB entries to fib6_info
Convert all code paths referencing a FIB entry from
rt6_info to fib6_info.

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-17 23:41:18 -04:00
David Ahern
421842edea net/ipv6: Add fib6_null_entry
ip6_null_entry will stay a dst based return for lookups that fail to
match an entry.

Add a new fib6_null_entry which constitutes the root node and leafs
for fibs. Replace existing references to ip6_null_entry with the
new fib6_null_entry when dealing with FIBs.

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-17 23:41:17 -04:00
Yuval Mintz
088aa3eec2 ip6mr: Support fib notifications
In similar fashion to ipmr, support fib notifications for ip6mr mfc and
vif related events. This would later allow drivers to react to said
notifications and offload the IPv6 mroutes.

Signed-off-by: Yuval Mintz <yuvalm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-26 13:14:43 -04:00
Kirill Tkhai
d9ff304973 net: Replace ip_ra_lock with per-net mutex
Since ra_chain is per-net, we may use per-net mutexes
to protect them in ip_ra_control(). This improves
scalability.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-22 15:12:56 -04:00
Kirill Tkhai
5796ef75ec net: Make ip_ra_chain per struct net
This is optimization, which makes ip_call_ra_chain()
iterate less sockets to find the sockets it's looking for.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-22 15:12:56 -04:00
Tonghao Zhang
1e80295158 udp: Move the udp sysctl to namespace.
This patch moves the udp_rmem_min, udp_wmem_min
to namespace and init the udp_l3mdev_accept explicitly.

The udp_rmem_min/udp_wmem_min affect udp rx/tx queue,
with this patch namespaces can set them differently.

Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-16 12:03:30 -04:00
David Ahern
b4bac172e9 net/ipv6: Add support for path selection using hash of 5-tuple
Some operators prefer IPv6 path selection to use a standard 5-tuple
hash rather than just an L3 hash with the flow the label. To that end
add support to IPv6 for multipath hash policy similar to bf4e0a3db9
("net: ipv4: add support for ECMP hash policy choice"). The default
is still L3 which covers source and destination addresses along with
flow label and IPv6 protocol.

Signed-off-by: David Ahern <dsahern@gmail.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Tested-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-04 13:04:23 -05:00
Yuval Mintz
b70432f731 mroute*: Make mr_table a common struct
Following previous changes to ip6mr, mr_table and mr6_table are
basically the same [up to mr6_table having additional '6' suffixes to
its variable names].
Move the common structure definition into a common header; This
requires renaming all references in ip6mr to variables that had the
distinct suffix.

Signed-off-by: Yuval Mintz <yuvalm@mellanox.com>
Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-01 13:13:23 -05:00
Roopa Prabhu
5e5d6fed37 ipv6: route: dissect flow in input path if fib rules need it
Dissect flow in fwd path if fib rules require it. Controlled by
a flag to avoid penatly for the common case. Flag is set when fib
rules with sport, dport and proto match that require flow dissect
are installed. Also passes the dissected hash keys to the multipath
hash function when applicable to avoid dissecting the flow again.
icmp packets will continue to use inner header for hash
calculations.

Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-28 22:44:44 -05:00
Roopa Prabhu
e37b1e978b ipv6: route: dissect flow in input path if fib rules need it
Dissect flow in fwd path if fib rules require it. Controlled by
a flag to avoid penatly for the common case. Flag is set when fib
rules with sport, dport and proto match that require flow dissect
are installed. Also passes the dissected hash keys to the multipath
hash function when applicable to avoid dissecting the flow again.
icmp packets will continue to use inner header for hash
calculations (Thanks to Nikolay Aleksandrov for some review here).

Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-28 22:44:44 -05:00
David S. Miller
cbcbeedbfd Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next
Pablo Neira Ayuso says:

====================
Netfilter/IPVS updates for net-next

The following patchset contains Netfilter/IPVS updates for your net-next
tree. Basically, a new extension for ip6tables, simplification work of
nf_tables that saves us 500 LoC, allow raw table registration before
defragmentation, conversion of the SNMP helper to use the ASN.1 code
generator, unique 64-bit handle for all nf_tables objects and fixes to
address fallout from previous nf-next batch.  More specifically, they
are:

1) Seven patches to remove family abstraction layer (struct nft_af_info)
   in nf_tables, this simplifies our codebase and it saves us 64 bytes per
   net namespace.

2) Add IPv6 segment routing header matching for ip6tables, from Ahmed
   Abdelsalam.

3) Allow to register iptable_raw table before defragmentation, some
   people do not want to waste cycles on defragmenting traffic that is
   going to be dropped, hence add a new module parameter to enable this
   behaviour in iptables and ip6tables. From Subash Abhinov
   Kasiviswanathan. This patch needed a couple of follow up patches to
   get things tidy from Arnd Bergmann.

4) SNMP helper uses the ASN.1 code generator, from Taehee Yoo. Several
   patches for this helper to prepare this change are also part of this
   patch series.

5) Add 64-bit handles to uniquely objects in nf_tables, from Harsha
   Sharma.

6) Remove log message that several netfilter subsystems print at
   boot/load time.

7) Restore x_tables module autoloading, that got broken in a previous
   patch to allow singleton NAT hook callback registration per hook
   spot, from Florian Westphal. Moreover, return EBUSY to report that
   the singleton NAT hook slot is already in instead.

8) Several fixes for the new nf_tables flowtable representation,
   including incorrect error check after nf_tables_flowtable_lookup(),
   missing Kconfig dependencies that lead to build breakage and missing
   initialization of priority and hooknum in flowtable object.

9) Missing NETFILTER_FAMILY_ARP dependency in Kconfig for the clusterip
   target. This is due to recent updates in the core to shrink the hook
   array size and compile it out if no specific family is enabled via
   .config file. Patch from Florian Westphal.

10) Remove duplicated include header files, from Wei Yongjun.

11) Sparse warning fix for the NFPROTO_INET handling from the core
    due to missing static function definition, also from Wei Yongjun.

12) Restore ICMPv6 Parameter Problem error reporting when
    defragmentation fails, from Subash Abhinov Kasiviswanathan.

13) Remove obsolete owner field initialization from struct
    file_operations, patch from Alexey Dobriyan.

14) Use boolean datatype where needed in the Netfilter codebase, from
    Gustavo A. R. Silva.

15) Remove double semicolon in dynset nf_tables expression, from
    Luis de Bethencourt.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-21 11:35:34 -05:00
David S. Miller
79d891c1bb Merge tag 'linux-can-next-for-4.16-20180105' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next
Marc Kleine-Budde says:

====================
pull-request: can-next 2017-12-01,Re: pull-request: can-next

this is a pull request of 7 patches for net-next/master.

All patches are by me. Patch 6 is for the "can_raw" protocol and add
error checking to the bind() function. All other patches clean up the
coding style and remove unused parameters in various CAN drivers and
infrastructure.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-15 16:13:34 -05:00
Pablo Neira Ayuso
dd4cbef723 netfilter: nf_tables: get rid of pernet families
Now that we have a single table list for each netns, we can get rid of
one pointer per family and the global afinfo list, thus, shrinking
struct netns for nftables that now becomes 64 bytes smaller.

And call __nft_release_afinfo() from __net_exit path accordingly to
release netnamespace objects on removal.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-01-10 15:32:10 +01:00
Pablo Neira Ayuso
36596dadf5 netfilter: nf_tables: add single table list for all families
Place all existing user defined tables in struct net *, instead of
having one list per family. This saves us from one level of indentation
in netlink dump functions.

Place pointer to struct nft_af_info in struct nft_table temporarily, as
we still need this to put back reference module reference counter on
table removal.

This patch comes in preparation for the removal of struct nft_af_info.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-01-10 15:32:08 +01:00
Florian Westphal
2a95183a5e netfilter: don't allocate space for arp/bridge hooks unless needed
no need to define hook points if the family isn't supported.
Because we need these hooks for either nftables, arp/ebtables
or the 'call-iptables' hack we have in the bridge layer add two
new dependencies, NETFILTER_FAMILY_{ARP,BRIDGE}, and have the
users select them.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-01-08 18:01:11 +01:00
Florian Westphal
bb4badf3a3 netfilter: don't allocate space for decnet hooks unless needed
no need to define hook points if the family isn't supported.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-01-08 18:01:10 +01:00
Florian Westphal
ef57170bbf netfilter: reduce hook array sizes to what is needed
Not all families share the same hook count, adjust sizes to what is
needed.

struct net before:
/* size: 6592, cachelines: 103, members: 46 */
after:
/* size: 5952, cachelines: 93, members: 46 */

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-01-08 18:01:09 +01:00
Florian Westphal
b0f38338ae netfilter: reduce size of hook entry point locations
struct net contains:

struct nf_hook_entries __rcu *hooks[NFPROTO_NUMPROTO][NF_MAX_HOOKS];

which store the hook entry point locations for the various protocol
families and the hooks.

Using array results in compact c code when doing accesses, i.e.
  x = rcu_dereference(net->nf.hooks[pf][hook]);

but its also wasting a lot of memory, as most families are
not used.

So split the array into those families that are used, which
are only 5 (instead of 13).  In most cases, the 'pf' argument is
constant, i.e. gcc removes switch statement.

struct net before:
 /* size: 5184, cachelines: 81, members: 46 */
after:
 /* size: 4672, cachelines: 73, members: 46 */

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-01-08 18:01:08 +01:00
Marc Kleine-Budde
ff847ee47b can: af_can: give struct holding the CAN per device receive lists a sensible name
This patch adds a "can_" prefix to the "struct dev_rcv_lists" to better
reflect the meaning and improbe code readability.

The conversion is done with:

	sed -i \
		-e "s/struct dev_rcv_lists/struct can_dev_rcv_lists/g" \
		net/can/*.[ch] include/net/netns/can.h

Acked-by: Oliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2018-01-05 11:12:08 +01:00
Tonghao Zhang
398b841e4a sock: Hide unused variable when !CONFIG_PROC_FS.
When CONFIG_PROC_FS is disabled, we will not use the prot_inuse
counter. This adds an #ifdef to hide the variable definition in
that case. This is not a bugfix. But we can save bytes when there
are many network namespace.

Cc: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: Martin Zhang <zhangjunweimartin@didichuxing.com>
Signed-off-by: Tonghao Zhang <zhangtonghao@didichuxing.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-19 09:58:14 -05:00
Tonghao Zhang
648845ab7e sock: Move the socket inuse to namespace.
In some case, we want to know how many sockets are in use in
different _net_ namespaces. It's a key resource metric.

This patch add a member in struct netns_core. This is a counter
for socket-inuse in the _net_ namespace. The patch will add/sub
counter in the sk_alloc, sk_clone_lock and __sk_free.

This patch will not counter the socket created in kernel.
It's not very useful for userspace to know how many kernel
sockets we created.

The main reasons for doing this are that:

1. When linux calls the 'do_exit' for process to exit, the functions
'exit_task_namespaces' and 'exit_task_work' will be called sequentially.
'exit_task_namespaces' may have destroyed the _net_ namespace, but
'sock_release' called in 'exit_task_work' may use the _net_ namespace
if we counter the socket-inuse in sock_release.

2. socket and sock are in pair. More important, sock holds the _net_
namespace. We counter the socket-inuse in sock, for avoiding holding
_net_ namespace again in socket. It's a easy way to maintain the code.

Signed-off-by: Martin Zhang <zhangjunweimartin@didichuxing.com>
Signed-off-by: Tonghao Zhang <zhangtonghao@didichuxing.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-19 09:58:14 -05:00
Tonghao Zhang
08fc7f8140 sock: Change the netns_core member name.
Change the member name will make the code more readable.
This patch will be used in next patch.

Signed-off-by: Martin Zhang <zhangjunweimartin@didichuxing.com>
Signed-off-by: Tonghao Zhang <zhangtonghao@didichuxing.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-19 09:58:14 -05:00
Xin Long
772a58693f sctp: add stream interleave enable members and sockopt
This patch adds intl_enable in asoc and netns, and strm_interleave in
sctp_sock to indicate if stream interleave is enabled and supported.

netns intl_enable would be set via procfs, but that is not added yet
until all stream interleave codes are completely implemented; asoc
intl_enable will be set when doing 4-shakehands.

sp strm_interleave can be set by sockopt SCTP_INTERLEAVING_SUPPORTED
which is also added in this patch. This socket option is defined in
section 4.3.1 of RFC8260.

Note that strm_interleave can only be set by sockopt when both netns
intl_enable and sp frag_interleave are set.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-11 11:23:04 -05:00
Stephen Hemminger
6670e15244 tcp: Namespace-ify sysctl_tcp_default_congestion_control
Make default TCP default congestion control to a per namespace
value. This changes default congestion control to a pointer to congestion ops
(rather than implicit as first element of available lsit).

The congestion control setting of new namespaces is inherited
from the current setting of the root namespace.

Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-15 14:09:52 +09:00