commit 97a9063518f198ec0adb2ecb89789de342bb8283 upstream.
If a TCP socket is using TCP_USER_TIMEOUT, and the other peer
retracted its window to zero, tcp_retransmit_timer() can
retransmit a packet every two jiffies (2 ms for HZ=1000),
for about 4 minutes after TCP_USER_TIMEOUT has 'expired'.
The fix is to make sure tcp_rtx_probe0_timed_out() takes
icsk->icsk_user_timeout into account.
Before blamed commit, the socket would not timeout after
icsk->icsk_user_timeout, but would use standard exponential
backoff for the retransmits.
Also worth noting that before commit e89688e3e978 ("net: tcp:
fix unexcepted socket die when snd_wnd is 0"), the issue
would last 2 minutes instead of 4.
Fixes: b701a99e43 ("tcp: Add tcp_clamp_rto_to_user_timeout() helper to improve accuracy")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Reviewed-by: Jason Xing <kerneljasonxing@gmail.com>
Reviewed-by: Jon Maxwell <jmaxwell37@gmail.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://patch.msgid.link/20240710001402.2758273-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 36534d3c54537bf098224a32dc31397793d4594d upstream.
Due to timer wheel implementation, a timer will usually fire
after its schedule.
For instance, for HZ=1000, a timeout between 512ms and 4s
has a granularity of 64ms.
For this range of values, the extra delay could be up to 63ms.
For TCP, this means that tp->rcv_tstamp may be after
inet_csk(sk)->icsk_timeout whenever the timer interrupt
finally triggers, if one packet came during the extra delay.
We need to make sure tcp_rtx_probe0_timed_out() handles this case.
Fixes: e89688e3e978 ("net: tcp: fix unexcepted socket die when snd_wnd is 0")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Menglong Dong <imagedong@tencent.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Reviewed-by: Jason Xing <kerneljasonxing@gmail.com>
Link: https://lore.kernel.org/r/20240607125652.1472540-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e89688e3e97868451a5d05b38a9d2633d6785cd4 upstream.
In tcp_retransmit_timer(), a window shrunk connection will be regarded
as timeout if 'tcp_jiffies32 - tp->rcv_tstamp > TCP_RTO_MAX'. This is not
right all the time.
The retransmits will become zero-window probes in tcp_retransmit_timer()
if the 'snd_wnd==0'. Therefore, the icsk->icsk_rto will come up to
TCP_RTO_MAX sooner or later.
However, the timer can be delayed and be triggered after 122877ms, not
TCP_RTO_MAX, as I tested.
Therefore, 'tcp_jiffies32 - tp->rcv_tstamp > TCP_RTO_MAX' is always true
once the RTO come up to TCP_RTO_MAX, and the socket will die.
Fix this by replacing the 'tcp_jiffies32' with '(u32)icsk->icsk_timeout',
which is exact the timestamp of the timeout.
However, "tp->rcv_tstamp" can restart from idle, then tp->rcv_tstamp
could already be a long time (minutes or hours) in the past even on the
first RTO. So we double check the timeout with the duration of the
retransmission.
Meanwhile, making "2 * TCP_RTO_MAX" as the timeout to avoid the socket
dying too soon.
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Link: https://lore.kernel.org/netdev/CADxym3YyMiO+zMD4zj03YPM3FBi-1LHi6gSD2XT8pyAMM096pg@mail.gmail.com/
Signed-off-by: Menglong Dong <imagedong@tencent.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 0d580fbd2db084a5c96ee9c00492236a279d5e0f upstream.
It appears linux-4.14 stable needs a backport of commit
88f8598d0a30 ("tcp: exit if nothing to retransmit on RTO timeout")
Since tcp_rtx_queue_empty() is not in pre 4.15 kernels,
let's refactor tcp_retransmit_timer() to only use tcp_rtx_queue_head()
I will provide to stable teams the squashed patches.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 0ec986ed7bab6801faed1440e8839dcc710331ff ]
Loss recovery undo_retrans bookkeeping had a long-standing bug where a
DSACK from a spurious TLP retransmit packet could cause an erroneous
undo of a fast recovery or RTO recovery that repaired a single
really-lost packet (in a sequence range outside that of the TLP
retransmit). Basically, because the loss recovery state machine didn't
account for the fact that it sent a TLP retransmit, the DSACK for the
TLP retransmit could erroneously be implicitly be interpreted as
corresponding to the normal fast recovery or RTO recovery retransmit
that plugged a real hole, thus resulting in an improper undo.
For example, consider the following buggy scenario where there is a
real packet loss but the congestion control response is improperly
undone because of this bug:
+ send packets P1, P2, P3, P4
+ P1 is really lost
+ send TLP retransmit of P4
+ receive SACK for original P2, P3, P4
+ enter fast recovery, fast-retransmit P1, increment undo_retrans to 1
+ receive DSACK for TLP P4, decrement undo_retrans to 0, undo (bug!)
+ receive cumulative ACK for P1-P4 (fast retransmit plugged real hole)
The fix: when we initialize undo machinery in tcp_init_undo(), if
there is a TLP retransmit in flight, then increment tp->undo_retrans
so that we make sure that we receive a DSACK corresponding to the TLP
retransmit, as well as DSACKs for all later normal retransmits, before
triggering a loss recovery undo. Note that we also have to move the
line that clears tp->tlp_high_seq for RTO recovery, so that upon RTO
we remember the tp->tlp_high_seq value until tcp_init_undo() and clear
it only afterward.
Also note that the bug dates back to the original 2013 TLP
implementation, commit 6ba8a3b19e ("tcp: Tail loss probe (TLP)").
However, this patch will only compile and work correctly with kernels
that have tp->tlp_retrans, which was added only in v5.8 in 2020 in
commit 76be93fc0702 ("tcp: allow at most one TLP probe per flight").
So we associate this fix with that later commit.
Fixes: 76be93fc0702 ("tcp: allow at most one TLP probe per flight")
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Kevin Yang <yyd@google.com>
Link: https://patch.msgid.link/20240703171246.1739561-1-ncardwell.sw@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
https://source.android.com/docs/security/bulletin/2023-09-01
* tag 'ASB-2023-09-05_4.19-stable' of https://android.googlesource.com/kernel/common:
Linux 4.19.294
Revert "ARM: ep93xx: fix missing-prototype warnings"
Revert "MIPS: Alchemy: fix dbdma2"
Linux 4.19.293
dma-buf/sw_sync: Avoid recursive lock during fence signal
clk: Fix undefined reference to `clk_rate_exclusive_{get,put}'
scsi: core: raid_class: Remove raid_component_add()
scsi: snic: Fix double free in snic_tgt_create()
irqchip/mips-gic: Don't touch vl_map if a local interrupt is not routable
rtnetlink: Reject negative ifindexes in RTM_NEWLINK
netfilter: nf_queue: fix socket leak
sched/rt: pick_next_rt_entity(): check list_entry
mmc: block: Fix in_flight[issue_type] value error
x86/fpu: Set X86_FEATURE_OSXSAVE feature after enabling OSXSAVE in CR4
PCI: acpiphp: Use pci_assign_unassigned_bridge_resources() only for non-root bus
media: vcodec: Fix potential array out-of-bounds in encoder queue_setup
lib/clz_ctz.c: Fix __clzdi2() and __ctzdi2() for 32-bit kernels
batman-adv: Fix batadv_v_ogm_aggr_send memory leak
batman-adv: Fix TT global entry leak when client roamed back
batman-adv: Do not get eth header before batadv_check_management_packet
batman-adv: Don't increase MTU when set by user
batman-adv: Trigger events for auto adjusted MTU
nfsd: Fix race to FREE_STATEID and cl_revoked
ibmveth: Use dcbf rather than dcbfl
ipvs: fix racy memcpy in proc_do_sync_threshold
ipvs: Improve robustness to the ipvs sysctl
bonding: fix macvlan over alb bond support
net: remove bond_slave_has_mac_rcu()
net/sched: fix a qdisc modification with ambiguous command request
igb: Avoid starting unnecessary workqueues
dccp: annotate data-races in dccp_poll()
sock: annotate data-races around prot->memory_pressure
tracing: Fix memleak due to race between current_tracer and trace
drm/amd/display: check TG is non-null before checking if enabled
drm/amd/display: do not wait for mpc idle if tg is disabled
regmap: Account for register length in SMBus I/O limits
dm integrity: reduce vmalloc space footprint on 32-bit architectures
dm integrity: increase RECALC_SECTORS to improve recalculate speed
powerpc: Fail build if using recordmcount with binutils v2.37
powerpc: remove leftover code of old GCC version checks
powerpc/32: add stack protector support
fbdev: fix potential OOB read in fast_imageblit()
fbdev: Fix sys_imageblit() for arbitrary image widths
fbdev: Improve performance of sys_imageblit()
tty: serial: fsl_lpuart: add earlycon for imx8ulp platform
Revert "tty: serial: fsl_lpuart: drop earlycon entry for i.MX8QXP"
MIPS: cpu-features: Use boot_cpu_type for CPU type based features
MIPS: cpu-features: Enable octeon_cache by cpu_type
fs: dlm: fix mismatch of plock results from userspace
fs: dlm: use dlm_plock_info for do_unlock_close
fs: dlm: change plock interrupted message to debug again
fs: dlm: add pid to debug log
dlm: replace usage of found with dedicated list iterator variable
dlm: improve plock logging if interrupted
PCI: acpiphp: Reassign resources on bridge if necessary
net: phy: broadcom: stub c45 read/write for 54810
net: xfrm: Amend XFRMA_SEC_CTX nla_policy structure
net: fix the RTO timer retransmitting skb every 1ms if linear option is enabled
virtio-net: set queues after driver_ok
af_unix: Fix null-ptr-deref in unix_stream_sendpage().
netfilter: set default timeout to 3 secs for sctp shutdown send and recv state
test_firmware: prevent race conditions by a correct implementation of locking
mmc: wbsd: fix double mmc_free_host() in wbsd_init()
cifs: Release folio lock on fscache read hit.
ALSA: usb-audio: Add support for Mythware XA001AU capture and playback interfaces.
serial: 8250: Fix oops for port->pm on uart_change_pm()
ASoC: meson: axg-tdm-formatter: fix channel slot allocation
ASoC: rt5665: add missed regulator_bulk_disable
net: do not allow gso_size to be set to GSO_BY_FRAGS
sock: Fix misuse of sk_under_memory_pressure()
i40e: fix misleading debug logs
team: Fix incorrect deletion of ETH_P_8021AD protocol vid from slaves
netfilter: nft_dynset: disallow object maps
selftests: mirror_gre_changes: Tighten up the TTL test match
xfrm: add NULL check in xfrm_update_ae_params
ip_vti: fix potential slab-use-after-free in decode_session6
ip6_vti: fix slab-use-after-free in decode_session6
xfrm: fix slab-use-after-free in decode_session6
xfrm: interface: rename xfrm_interface.c to xfrm_interface_core.c
net: af_key: fix sadb_x_filter validation
net: xfrm: Fix xfrm_address_filter OOB read
btrfs: fix BUG_ON condition in btrfs_cancel_balance
powerpc/rtas_flash: allow user copy to flash block cache objects
fbdev: mmp: fix value check in mmphw_probe()
virtio-mmio: don't break lifecycle of vm_dev
virtio-mmio: Use to_virtio_mmio_device() to simply code
virtio-mmio: convert to devm_platform_ioremap_resource
nfsd: Remove incorrect check in nfsd4_validate_stateid
nfsd4: kill warnings on testing stateids with mismatched clientids
block: fix signed int overflow in Amiga partition support
mmc: sunxi: fix deferred probing
mmc: bcm2835: fix deferred probing
mmc: Remove dev_err() usage after platform_get_irq()
mmc: tmio: move tmio_mmc_set_clock() to platform hook
mmc: tmio: replace tmio_mmc_clk_stop() calls with tmio_mmc_set_clock()
mmc: meson-gx: remove redundant mmc_request_done() call from irq context
mmc: meson-gx: remove useless lock
USB: dwc3: qcom: fix NULL-deref on suspend
usb: dwc3: qcom: Add helper functions to enable,disable wake irqs
irqchip/mips-gic: Use raw spinlock for gic_lock
irqchip/mips-gic: Get rid of the reliance on irq_cpu_online()
x86/topology: Fix erroneous smp_num_siblings on Intel Hybrid platforms
powerpc/64s/radix: Fix soft dirty tracking
powerpc: Move page table dump files in a dedicated subdirectory
powerpc/mm: dump block address translation on book3s/32
powerpc/mm: dump segment registers on book3s/32
powerpc/mm: Move pgtable_t into platform headers
powerpc/mm: move platform specific mmu-xxx.h in platform directories
iio: addac: stx104: Fix race condition when converting analog-to-digital
iio: addac: stx104: Fix race condition for stx104_write_raw()
iio: adc: stx104: Implement and utilize register structures
iio: adc: stx104: Utilize iomap interface
iio: add addac subdirectory
IMA: allow/fix UML builds
drm/amdgpu: Fix potential fence use-after-free v2
Bluetooth: L2CAP: Fix use-after-free
pcmcia: rsrc_nonstatic: Fix memory leak in nonstatic_release_resource_db()
gfs2: Fix possible data races in gfs2_show_options()
media: platform: mediatek: vpu: fix NULL ptr dereference
media: v4l2-mem2mem: add lock to protect parameter num_rdy
FS: JFS: Check for read-only mounted filesystem in txBegin
FS: JFS: Fix null-ptr-deref Read in txBegin
MIPS: dec: prom: Address -Warray-bounds warning
fs: jfs: Fix UBSAN: array-index-out-of-bounds in dbAllocDmapLev
udf: Fix uninitialized array access for some pathnames
HID: add quirk for 03f0:464a HP Elite Presenter Mouse
quota: fix warning in dqgrab()
quota: Properly disable quotas when add_dquot_ref() fails
ALSA: emu10k1: roll up loops in DSP setup code for Audigy
drm/radeon: Fix integer overflow in radeon_cs_parser_init
selftests: forwarding: tc_flower: Relax success criterion
lib/mpi: Eliminate unused umul_ppmm definitions for MIPS
Revert "posix-timers: Ensure timer ID search-loop limit is valid"
UPSTREAM: media: usb: siano: Fix warning due to null work_func_t function pointer
UPSTREAM: Bluetooth: L2CAP: Fix use-after-free in l2cap_sock_ready_cb
UPSTREAM: net/sched: cls_route: No longer copy tcf_result on update to avoid use-after-free
UPSTREAM: net/sched: cls_u32: No longer copy tcf_result on update to avoid use-after-free
Linux 4.19.292
sch_netem: fix issues in netem_change() vs get_dist_table()
alpha: remove __init annotation from exported page_is_ram()
scsi: core: Fix possible memory leak if device_add() fails
scsi: snic: Fix possible memory leak if device_add() fails
scsi: 53c700: Check that command slot is not NULL
scsi: storvsc: Fix handling of virtual Fibre Channel timeouts
scsi: core: Fix legacy /proc parsing buffer overflow
netfilter: nf_tables: report use refcount overflow
netfilter: nf_tables: bogus EBUSY when deleting flowtable after flush
btrfs: don't stop integrity writeback too early
ibmvnic: Handle DMA unmapping of login buffs in release functions
wifi: cfg80211: fix sband iftype data lookup for AP_VLAN
IB/hfi1: Fix possible panic during hotplug remove
drivers: net: prevent tun_build_skb() to exceed the packet size limit
dccp: fix data-race around dp->dccps_mss_cache
bonding: Fix incorrect deletion of ETH_P_8021AD protocol vid from slaves
net/packet: annotate data-races around tp->status
mISDN: Update parameter type of dsp_cmx_send()
drm/nouveau/disp: Revert a NULL check inside nouveau_connector_get_modes
x86: Move gds_ucode_mitigated() declaration to header
x86/mm: Fix VDSO and VVAR placement on 5-level paging machines
x86/cpu/amd: Enable Zenbleed fix for AMD Custom APU 0405
usb: dwc3: Properly handle processing of pending events
usb-storage: alauda: Fix uninit-value in alauda_check_media()
binder: fix memory leak in binder_init()
iio: cros_ec: Fix the allocation size for cros_ec_command
nilfs2: fix use-after-free of nilfs_root in dirtying inodes via iput
radix tree test suite: fix incorrect allocation size for pthreads
drm/nouveau/gr: enable memory loads on helper invocation on all channels
dmaengine: pl330: Return DMA_PAUSED when transaction is paused
ipv6: adjust ndisc_is_useropt() to also return true for PIO
mmc: moxart: read scr register without changing byte order
sparc: fix up arch_cpu_finalize_init() build breakage.
UPSTREAM: net/sched: cls_fw: Fix improper refcount update leads to use-after-free
Linux 4.19.291
drm/edid: fix objtool warning in drm_cvt_modes()
arm64: dts: stratix10: fix incorrect I2C property for SCL signal
drivers core: Use sysfs_emit and sysfs_emit_at for show(device *...) functions
ARM: dts: nxp/imx6sll: fix wrong property name in usbphy node
ARM: dts: imx6sll: fixup of operating points
ARM: dts: imx: add usb alias
ARM: dts: imx6sll: Make ssi node name same as other platforms
PM: sleep: wakeirq: fix wake irq arming
PM / wakeirq: support enabling wake-up irq after runtime_suspend called
powerpc/mm/altmap: Fix altmap boundary check
mtd: rawnand: omap_elm: Fix incorrect type in assignment
test_firmware: return ENOMEM instead of ENOSPC on failed memory allocation
test_firmware: fix a memory leak with reqs buffer
ext2: Drop fragment support
net: usbnet: Fix WARNING in usbnet_start_xmit/usb_submit_urb
Bluetooth: L2CAP: Fix use-after-free in l2cap_sock_ready_cb
fs/sysv: Null check to prevent null-ptr-deref bug
USB: zaurus: Add ID for A-300/B-500/C-700
libceph: fix potential hang in ceph_osdc_notify()
scsi: zfcp: Defer fc_rport blocking until after ADISC response
tcp_metrics: fix data-race in tcpm_suck_dst() vs fastopen
tcp_metrics: annotate data-races around tm->tcpm_net
tcp_metrics: annotate data-races around tm->tcpm_vals[]
tcp_metrics: annotate data-races around tm->tcpm_lock
tcp_metrics: annotate data-races around tm->tcpm_stamp
tcp_metrics: fix addr_same() helper
ip6mr: Fix skb_under_panic in ip6mr_cache_report()
net/sched: cls_route: No longer copy tcf_result on update to avoid use-after-free
net/sched: cls_u32: No longer copy tcf_result on update to avoid use-after-free
net: add missing data-race annotation for sk_ll_usec
net: add missing data-race annotations around sk->sk_peek_off
net: sched: cls_u32: Fix match key mis-addressing
perf test uprobe_from_different_cu: Skip if there is no gcc
net/mlx5e: fix return value check in mlx5e_ipsec_remove_trailer()
KVM: s390: fix sthyi error handling
word-at-a-time: use the same return type for has_zero regardless of endianness
loop: Select I/O scheduler 'none' from inside add_disk()
perf: Fix function pointer case
net/sched: cls_u32: Fix reference counter leak leading to overflow
ASoC: cs42l51: fix driver to properly autoload with automatic module loading
net/sched: sch_qfq: account for stab overhead in qfq_enqueue
net/sched: cls_fw: Fix improper refcount update leads to use-after-free
drm/client: Fix memory leak in drm_client_target_cloned
dm cache policy smq: ensure IO doesn't prevent cleaner policy progress
ASoC: wm8904: Fill the cache for WM8904_ADC_TEST_0 register
s390/dasd: fix hanging device after quiesce/resume
virtio-net: fix race between set queues and probe
serial: 8250_dw: Preserve original value of DLF register
serial: 8250_dw: split Synopsys DesignWare 8250 common functions
irq-bcm6345-l1: Do not assume a fixed block to cpu mapping
tpm_tis: Explicitly check for error code
btrfs: check for commit error at btrfs_attach_transaction_barrier()
hwmon: (nct7802) Fix for temp6 (PECI1) processed even if PECI1 disabled
staging: ks7010: potential buffer overflow in ks_wlan_set_encode_ext()
Documentation: security-bugs.rst: clarify CVE handling
Documentation: security-bugs.rst: update preferences when dealing with the linux-distros group
usb: xhci-mtk: set the dma max_seg_size
USB: quirks: add quirk for Focusrite Scarlett
usb: ohci-at91: Fix the unhandle interrupt when resume
usb: dwc3: don't reset device side if dwc3 was configured as host-only
usb: dwc3: pci: skip BYT GPIO lookup table for hardwired phy
Revert "usb: dwc3: core: Enable AutoRetry feature in the controller"
can: gs_usb: gs_can_close(): add missing set of CAN state to CAN_STATE_STOPPED
USB: serial: simple: sort driver entries
USB: serial: simple: add Kaufmann RKS+CAN VCP
USB: serial: option: add Quectel EC200A module support
USB: serial: option: support Quectel EM060K_128
tracing: Fix warning in trace_buffered_event_disable()
ring-buffer: Fix wrong stat of cpu_buffer->read
ata: pata_ns87415: mark ns87560_tf_read static
dm raid: fix missing reconfig_mutex unlock in raid_ctr() error paths
block: Fix a source code comment in include/uapi/linux/blkzoned.h
ASoC: fsl_spdif: Silence output on stop
drm/msm: Fix IS_ERR_OR_NULL() vs NULL check in a5xx_submit_in_rb()
RDMA/mlx4: Make check for invalid flags stricter
benet: fix return value check in be_lancer_xmit_workarounds()
net/sched: mqprio: Add length check for TCA_MQPRIO_{MAX/MIN}_RATE64
net/sched: mqprio: add extack to mqprio_parse_nlattr()
net/sched: mqprio: refactor nlattr parsing to a separate function
platform/x86: msi-laptop: Fix rfkill out-of-sync on MSI Wind U100
team: reset team's flags when down link is P2P device
bonding: reset bond's flags when down link is P2P device
tcp: Reduce chance of collisions in inet6_hashfn().
ipv6 addrconf: fix bug where deleting a mngtmpaddr can create a new temporary address
ethernet: atheros: fix return value check in atl1e_tso_csum()
phy: hisilicon: Fix an out of bounds check in hisi_inno_phy_probe()
i40e: Fix an NULL vs IS_ERR() bug for debugfs_create_dir()
ext4: fix to check return value of freeze_bdev() in ext4_shutdown()
scsi: qla2xxx: Array index may go out of bound
scsi: qla2xxx: Fix inconsistent format argument type in qla_os.c
ftrace: Fix possible warning on checking all pages used in ftrace_process_locs()
ftrace: Store the order of pages allocated in ftrace_page
ftrace: Check if pages were allocated before calling free_pages()
ftrace: Add information on number of page groups allocated
fs: dlm: interrupt posix locks only when process is killed
dlm: rearrange async condition return
dlm: cleanup plock_op vs plock_xop
PCI/ASPM: Avoid link retraining race
PCI/ASPM: Factor out pcie_wait_for_retrain()
PCI/ASPM: Return 0 or -ETIMEDOUT from pcie_retrain_link()
PCI: Rework pcie_retrain_link() wait loop
ext4: Fix reusing stale buffer heads from last failed mounting
ext4: rename journal_dev to s_journal_dev inside ext4_sb_info
btrfs: fix extent buffer leak after tree mod log failure at split_node()
bcache: Fix __bch_btree_node_alloc to make the failure behavior consistent
bcache: remove 'int n' from parameter list of bch_bucket_alloc_set()
bcache: use MAX_CACHES_PER_SET instead of magic number 8 in __bch_bucket_alloc_set
gpio: tps68470: Make tps68470_gpio_output() always set the initial value
tracing/histograms: Return an error if we fail to add histogram to hist_vars list
tcp: annotate data-races around fastopenq.max_qlen
tcp: annotate data-races around tp->notsent_lowat
tcp: annotate data-races around rskq_defer_accept
tcp: annotate data-races around tp->linger2
net: Replace the limit of TCP_LINGER2 with TCP_FIN_TIMEOUT_MAX
netfilter: nf_tables: can't schedule in nft_chain_validate
netfilter: nf_tables: fix spurious set element insertion failure
llc: Don't drop packet from non-root netns.
fbdev: au1200fb: Fix missing IRQ check in au1200fb_drv_probe
Revert "tcp: avoid the lookup process failing to get sk in ehash table"
net:ipv6: check return value of pskb_trim()
net: ethernet: ti: cpsw_ale: Fix cpsw_ale_get_field()/cpsw_ale_set_field()
pinctrl: amd: Use amd_pinconf_set() for all config options
fbdev: imxfb: warn about invalid left/right margin
spi: bcm63xx: fix max prepend length
igb: Fix igb_down hung on surprise removal
wifi: iwlwifi: mvm: avoid baid size integer overflow
wifi: wext-core: Fix -Wstringop-overflow warning in ioctl_standard_iw_point()
bpf: Address KCSAN report on bpf_lru_list
sched/fair: Don't balance task to its current running CPU
posix-timers: Ensure timer ID search-loop limit is valid
md/raid10: prevent soft lockup while flush writes
md: fix data corruption for raid456 when reshape restart while grow up
nbd: Add the maximum limit of allocated index in nbd_dev_add
debugobjects: Recheck debug_objects_enabled before reporting
ext4: correct inline offset when handling xattrs in inode body
can: bcm: Fix UAF in bcm_proc_show()
fuse: revalidate: don't invalidate if interrupted
perf probe: Add test for regression introduced by switch to die_get_decl_file()
tracing/histograms: Add histograms to hist_vars if they have referenced variables
drm/atomic: Fix potential use-after-free in nonblocking commits
scsi: qla2xxx: Pointer may be dereferenced
scsi: qla2xxx: Check valid rport returned by fc_bsg_to_rport()
scsi: qla2xxx: Fix potential NULL pointer dereference
scsi: qla2xxx: Wait for io return on terminate rport
xtensa: ISS: fix call to split_if_spec
ring-buffer: Fix deadloop issue on reading trace_pipe
tty: serial: samsung_tty: Fix a memory leak in s3c24xx_serial_getclk() when iterating clk
tty: serial: samsung_tty: Fix a memory leak in s3c24xx_serial_getclk() in case of error
Revert "8250: add support for ASIX devices with a FIFO bug"
meson saradc: fix clock divider mask length
ceph: don't let check_caps skip sending responses for revoke msgs
hwrng: imx-rngc - fix the timeout for init and self check
serial: atmel: don't enable IRQs prematurely
fs: dlm: return positive pid value for F_GETLK
md/raid0: add discard support for the 'original' layout
misc: pci_endpoint_test: Re-init completion for every test
misc: pci_endpoint_test: Free IRQs before removing the device
PCI: rockchip: Use u32 variable to access 32-bit registers
PCI: rockchip: Fix legacy IRQ generation for RK3399 PCIe endpoint core
PCI: rockchip: Add poll and timeout to wait for PHY PLLs to be locked
PCI: rockchip: Write PCI Device ID to correct register
PCI: rockchip: Assert PCI Configuration Enable bit after probe
PCI: qcom: Disable write access to read only registers for IP v2.3.3
PCI: Add function 1 DMA alias quirk for Marvell 88SE9235
PCI/PM: Avoid putting EloPOS E2/S2/H2 PCIe Ports in D3cold
jfs: jfs_dmap: Validate db_l2nbperpage while mounting
ext4: only update i_reserved_data_blocks on successful block allocation
ext4: fix wrong unit use in ext4_mb_clear_bb
perf intel-pt: Fix CYC timestamps after standalone CBR
SUNRPC: Fix UAF in svc_tcp_listen_data_ready()
net: bcmgenet: Ensure MDIO unregistration has clocks enabled
tpm: tpm_vtpm_proxy: fix a race condition in /dev/vtpmx creation
pinctrl: amd: Only use special debounce behavior for GPIO 0
pinctrl: amd: Detect internal GPIO0 debounce handling
pinctrl: amd: Fix mistake in handling clearing pins at startup
net/sched: make psched_mtu() RTNL-less safe
wifi: airo: avoid uninitialized warning in airo_get_rate()
ipv6/addrconf: fix a potential refcount underflow for idev
NTB: ntb_tool: Add check for devm_kcalloc
NTB: ntb_transport: fix possible memory leak while device_register() fails
ntb: intel: Fix error handling in intel_ntb_pci_driver_init()
NTB: amd: Fix error handling in amd_ntb_pci_driver_init()
ntb: idt: Fix error handling in idt_pci_driver_init()
udp6: fix udp6_ehashfn() typo
icmp6: Fix null-ptr-deref of ip6_null_entry->rt6i_idev in icmp6_dev().
vrf: Increment Icmp6InMsgs on the original netdev
net: mvneta: fix txq_map in case of txq_number==1
workqueue: clean up WORK_* constant types, clarify masking
net: lan743x: Don't sleep in atomic context
netfilter: nf_tables: prevent OOB access in nft_byteorder_eval
netfilter: conntrack: Avoid nf_ct_helper_hash uses after free
netfilter: nf_tables: fix scheduling-while-atomic splat
netfilter: nf_tables: unbind non-anonymous set if rule construction fails
netfilter: nf_tables: reject unbound anonymous set before commit phase
netfilter: nf_tables: add NFT_TRANS_PREPARE_ERROR to deal with bound set/chain
netfilter: nf_tables: incorrect error path handling with NFT_MSG_NEWRULE
netfilter: nf_tables: use net_generic infra for transaction data
netfilter: add helper function to set up the nfnetlink header and use it
netfilter: nftables: add helper function to set the base sequence number
netfilter: nf_tables: add rescheduling points during loop detection walks
netfilter: nf_tables: fix nat hook table deletion
spi: spi-fsl-spi: allow changing bits_per_word while CS is still active
spi: spi-fsl-spi: relax message sanity checking a little
spi: spi-fsl-spi: remove always-true conditional in fsl_spi_do_one_msg
ARM: orion5x: fix d2net gpio initialization
btrfs: fix race when deleting quota root from the dirty cow roots list
jffs2: reduce stack usage in jffs2_build_xattr_subsystem()
integrity: Fix possible multiple allocation in integrity_inode_get()
bcache: Remove unnecessary NULL point check in node allocations
mmc: core: disable TRIM on Micron MTFC4GACAJCN-1M
mmc: core: disable TRIM on Kingston EMMC04G-M627
NFSD: add encoding of op_recall flag for write delegation
ALSA: jack: Fix mutex call in snd_jack_report()
i2c: xiic: Don't try to handle more interrupt events after error
i2c: xiic: Defer xiic_wakeup() and __xiic_start_xfer() in xiic_process()
sh: dma: Fix DMA channel offset calculation
net/sched: act_pedit: Add size check for TCA_PEDIT_PARMS_EX
tcp: annotate data races in __tcp_oow_rate_limited()
net: bridge: keep ports without IFF_UNICAST_FLT in BR_PROMISC mode
powerpc: allow PPC_EARLY_DEBUG_CPM only when SERIAL_CPM=y
f2fs: fix error path handling in truncate_dnode()
mailbox: ti-msgmgr: Fill non-message tx data fields with 0x0
spi: bcm-qspi: return error if neither hif_mspi nor mspi is available
Add MODULE_FIRMWARE() for FIRMWARE_TG357766.
sctp: fix potential deadlock on &net->sctp.addr_wq_lock
rtc: st-lpc: Release some resources in st_rtc_probe() in case of error
mfd: stmpe: Only disable the regulators if they are enabled
mfd: intel-lpss: Add missing check for platform_get_resource
KVM: s390: fix KVM_S390_GET_CMMA_BITS for GFNs in memslot holes
mfd: rt5033: Drop rt5033-battery sub-device
usb: phy: phy-tahvo: fix memory leak in tahvo_usb_probe()
extcon: Fix kernel doc of property capability fields to avoid warnings
extcon: Fix kernel doc of property fields to avoid warnings
media: usb: siano: Fix warning due to null work_func_t function pointer
media: videodev2.h: Fix struct v4l2_input tuner index comment
media: usb: Check az6007_read() return value
sh: j2: Use ioremap() to translate device tree address into kernel memory
w1: fix loop in w1_fini()
block: change all __u32 annotations to __be32 in affs_hardblocks.h
USB: serial: option: add LARA-R6 01B PIDs
ARC: define ASM_NL and __ALIGN(_STR) outside #ifdef __ASSEMBLY__ guard
ARCv2: entry: rewrite to enable use of double load/stores LDD/STD
ARCv2: entry: avoid a branch
ARCv2: entry: push out the Z flag unclobber from common EXCEPTION_PROLOGUE
ARCv2: entry: comments about hardware auto-save on taken interrupts
modpost: fix section mismatch message for R_ARM_{PC24,CALL,JUMP24}
modpost: fix section mismatch message for R_ARM_ABS32
crypto: nx - fix build warnings when DEBUG_FS is not enabled
hwrng: virtio - Fix race on data_avail and actual data
hwrng: virtio - always add a pending request
hwrng: virtio - don't waste entropy
hwrng: virtio - don't wait on cleanup
hwrng: virtio - add an internal buffer
pinctrl: at91-pio4: check return value of devm_kasprintf()
perf dwarf-aux: Fix off-by-one in die_get_varname()
pinctrl: cherryview: Return correct value if pin in push-pull mode
PCI: Add pci_clear_master() stub for non-CONFIG_PCI
scsi: 3w-xxxx: Add error handling for initialization failure in tw_probe()
ALSA: ac97: Fix possible NULL dereference in snd_ac97_mixer
drm/radeon: fix possible division-by-zero errors
fbdev: omapfb: lcd_mipid: Fix an error handling path in mipid_spi_probe()
arm64: dts: renesas: ulcb-kf: Remove flow control for SCIF1
IB/hfi1: Fix sdma.h tx->num_descs off-by-one errors
soc/fsl/qe: fix usb.c build errors
ASoC: es8316: Increment max value for ALC Capture Target Volume control
ARM: ep93xx: fix missing-prototype warnings
drm/panel: simple: fix active size for Ampire AM-480272H3TMQW-T01H
Input: adxl34x - do not hardcode interrupt trigger type
ARM: dts: BCM5301X: Drop "clock-names" from the SPI node
Input: drv260x - sleep between polling GO bit
radeon: avoid double free in ci_dpm_init()
netlink: Add __sock_i_ino() for __netlink_diag_dump().
ipvlan: Fix return value of ipvlan_queue_xmit()
netfilter: nf_conntrack_sip: fix the ct_sip_parse_numerical_param() return value.
lib/ts_bm: reset initial match offset for every block of text
gtp: Fix use-after-free in __gtp_encap_destroy().
netlink: do not hard code device address lenth in fdb dumps
netlink: fix potential deadlock in netlink_set_err()
wifi: ath9k: convert msecs to jiffies where needed
wifi: ath9k: Fix possible stall on ath9k_txq_list_has_key()
memstick r592: make memstick_debug_get_tpc_name() static
kexec: fix a memory leak in crash_shrink_memory()
watchdog/perf: more properly prevent false positives with turbo modes
watchdog/perf: define dummy watchdog_update_hrtimer_threshold() on correct config
wifi: rsi: Do not set MMC_PM_KEEP_POWER in shutdown
wifi: ath9k: don't allow to overwrite ENDPOINT0 attributes
wifi: ray_cs: Fix an error handling path in ray_probe()
wifi: ray_cs: Drop useless status variable in parse_addr()
wifi: ray_cs: Utilize strnlen() in parse_addr()
wifi: wl3501_cs: Fix an error handling path in wl3501_probe()
wl3501_cs: use eth_hw_addr_set()
net: create netdev->dev_addr assignment helpers
wl3501_cs: Fix misspelling and provide missing documentation
wl3501_cs: Remove unnecessary NULL check
wl3501_cs: Fix a bunch of formatting issues related to function docs
wifi: atmel: Fix an error handling path in atmel_probe()
wifi: orinoco: Fix an error handling path in orinoco_cs_probe()
wifi: orinoco: Fix an error handling path in spectrum_cs_probe()
nfc: llcp: fix possible use of uninitialized variable in nfc_llcp_send_connect()
nfc: constify several pointers to u8, char and sk_buff
wifi: mwifiex: Fix the size of a memory allocation in mwifiex_ret_802_11_scan()
samples/bpf: Fix buffer overflow in tcp_basertt
wifi: ath9k: avoid referencing uninit memory in ath9k_wmi_ctrl_rx
wifi: ath9k: fix AR9003 mac hardware hang check register offset calculation
evm: Complete description of evm_inode_setattr()
ARM: 9303/1: kprobes: avoid missing-declaration warnings
PM: domains: fix integer overflow issues in genpd_parse_state()
clocksource/drivers/cadence-ttc: Fix memory leak in ttc_timer_probe
clocksource/drivers/cadence-ttc: Use ttc driver as platform driver
clocksource/drivers: Unify the names to timer-* format
irqchip/jcore-aic: Fix missing allocation of IRQ descriptors
irqchip/jcore-aic: Kill use of irq_create_strict_mappings()
md/raid10: fix io loss while replacement replace rdev
md/raid10: fix wrong setting of max_corr_read_errors
md/raid10: fix overflow of md/safe_mode_delay
md/raid10: check slab-out-of-bounds in md_bitmap_get_counter
treewide: Remove uninitialized_var() usage
drm/amdgpu: Validate VM ioctl flags.
scripts/tags.sh: Resolve gtags empty index generation
drm/edid: Fix uninitialized variable in drm_cvt_modes()
fbdev: imsttfb: Fix use after free bug in imsttfb_probe
video: imsttfb: check for ioremap() failures
x86/smp: Use dedicated cache-line for mwait_play_dead()
gfs2: Don't deref jdesc in evict
Linux 4.19.290
x86: fix backwards merge of GDS/SRSO bit
xen/netback: Fix buffer overrun triggered by unusual packet
Documentation/x86: Fix backwards on/off logic about YMM support
x86/xen: Fix secondary processors' FPU initialization
KVM: Add GDS_NO support to KVM
x86/speculation: Add Kconfig option for GDS
x86/speculation: Add force option to GDS mitigation
x86/speculation: Add Gather Data Sampling mitigation
x86/fpu: Move FPU initialization into arch_cpu_finalize_init()
x86/fpu: Mark init functions __init
x86/fpu: Remove cpuinfo argument from init functions
init, x86: Move mem_encrypt_init() into arch_cpu_finalize_init()
init: Invoke arch_cpu_finalize_init() earlier
init: Remove check_bugs() leftovers
um/cpu: Switch to arch_cpu_finalize_init()
sparc/cpu: Switch to arch_cpu_finalize_init()
sh/cpu: Switch to arch_cpu_finalize_init()
mips/cpu: Switch to arch_cpu_finalize_init()
m68k/cpu: Switch to arch_cpu_finalize_init()
ia64/cpu: Switch to arch_cpu_finalize_init()
ARM: cpu: Switch to arch_cpu_finalize_init()
x86/cpu: Switch to arch_cpu_finalize_init()
init: Provide arch_cpu_finalize_init()
Conflicts:
drivers/mmc/core/block.c
drivers/mmc/host/sdhci-msm.c
drivers/usb/dwc3/core.c
drivers/usb/dwc3/gadget.c
Change-Id: Id2f4d5c8067f8e5eda39c0eaa5e59d54a394b4c7
commit e4dd0d3a2f64b8bd8029ec70f52bdbebd0644408 upstream.
In the real workload, I encountered an issue which could cause the RTO
timer to retransmit the skb per 1ms with linear option enabled. The amount
of lost-retransmitted skbs can go up to 1000+ instantly.
The root cause is that if the icsk_rto happens to be zero in the 6th round
(which is the TCP_THIN_LINEAR_RETRIES value), then it will always be zero
due to the changed calculation method in tcp_retransmit_timer() as follows:
icsk->icsk_rto = min(icsk->icsk_rto << 1, TCP_RTO_MAX);
Above line could be converted to
icsk->icsk_rto = min(0 << 1, TCP_RTO_MAX) = 0
Therefore, the timer expires so quickly without any doubt.
I read through the RFC 6298 and found that the RTO value can be rounded
up to a certain value, in Linux, say TCP_RTO_MIN as default, which is
regarded as the lower bound in this patch as suggested by Eric.
Fixes: 36e31b0af5 ("net: TCP thin linear timeouts")
Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jason Xing <kernelxing@tencent.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
https://source.android.com/docs/security/bulletin/2022-09-01
CVE-2022-20399
CVE-2022-23960
CVE-2021-4083
CVE-2022-29582
* tag 'ASB-2022-09-05_4.19-stable' of https://android.googlesource.com/kernel/common:
Linux 4.19.255
x86/speculation: Add LFENCE to RSB fill sequence
x86/speculation: Add RSB VM Exit protections
macintosh/adb: fix oob read in do_adb_query() function
ACPI: video: Shortening quirk list by identifying Clevo by board_name only
ACPI: video: Force backlight native for some TongFang devices
scsi: core: Fix race between handling STS_RESOURCE and completion
mt7601u: add USB device ID for some versions of XiaoDu WiFi Dongle.
ARM: crypto: comment out gcc warning that breaks clang builds
perf symbol: Correct address for bss symbols
netfilter: nf_queue: do not allow packet truncation below transport header offset
sctp: fix sleep in atomic context bug in timer handlers
i40e: Fix interface init with MSI interrupts (no MSI-X)
tcp: Fix a data-race around sysctl_tcp_comp_sack_nr.
tcp: Fix a data-race around sysctl_tcp_comp_sack_delay_ns.
Documentation: fix sctp_wmem in ip-sysctl.rst
tcp: Fix a data-race around sysctl_tcp_invalid_ratelimit.
tcp: Fix a data-race around sysctl_tcp_autocorking.
tcp: Fix a data-race around sysctl_tcp_min_rtt_wlen.
tcp: Fix a data-race around sysctl_tcp_min_tso_segs.
net: sungem_phy: Add of_node_put() for reference returned by of_get_parent()
igmp: Fix data-races around sysctl_igmp_qrv.
net: ping6: Fix memleak in ipv6_renew_options().
tcp: Fix a data-race around sysctl_tcp_challenge_ack_limit.
scsi: ufs: host: Hold reference returned by of_parse_phandle()
tcp: Fix a data-race around sysctl_tcp_nometrics_save.
tcp: Fix a data-race around sysctl_tcp_frto.
tcp: Fix a data-race around sysctl_tcp_adv_win_scale.
tcp: Fix a data-race around sysctl_tcp_app_win.
tcp: Fix data-races around sysctl_tcp_dsack.
s390/archrandom: prevent CPACF trng invocations in interrupt context
ntfs: fix use-after-free in ntfs_ucsncmp()
Bluetooth: L2CAP: Fix use-after-free caused by l2cap_chan_put
FROMLIST: binder: fix UAF of ref->proc caused by race condition
Linux 4.19.254
PCI: hv: Fix interrupt mapping for multi-MSI
PCI: hv: Reuse existing IRTE allocation in compose_msi_msg()
PCI: hv: Fix hv_arch_irq_unmask() for multi-MSI
PCI: hv: Fix multi-MSI to allow more than one MSI vector
net: usb: ax88179_178a needs FLAG_SEND_ZLP
tty: use new tty_insert_flip_string_and_push_buffer() in pty_write()
tty: extract tty_flip_buffer_commit() from tty_flip_buffer_push()
tty: drop tty_schedule_flip()
tty: the rest, stop using tty_schedule_flip()
tty: drivers/tty/, stop using tty_schedule_flip()
serial: mvebu-uart: correctly report configured baudrate value
Bluetooth: Fix bt_skb_sendmmsg not allocating partial chunks
Bluetooth: SCO: Fix sco_send_frame returning skb->len
Bluetooth: Fix passing NULL to PTR_ERR
Bluetooth: RFCOMM: Replace use of memcpy_from_msg with bt_skb_sendmmsg
Bluetooth: SCO: Replace use of memcpy_from_msg with bt_skb_sendmsg
Bluetooth: Add bt_skb_sendmmsg helper
Bluetooth: Add bt_skb_sendmsg helper
ALSA: memalloc: Align buffer allocations in page size
ima: remove the IMA_TEMPLATE Kconfig option
dlm: fix pending remove if msg allocation fails
HID: add ALWAYS_POLL quirk to lenovo pixart mouse
HID: multitouch: add support for the Smart Tech panel
HID: multitouch: Lenovo X1 Tablet Gen3 trackpoint and buttons
HID: multitouch: simplify the application retrieval
tilcdc: tilcdc_external: fix an incorrect NULL check on list iterator
drm/tilcdc: Remove obsolete crtc_mode_valid() hack
bpf: Make sure mac_header was set before using it
mm/mempolicy: fix uninit-value in mpol_rebind_policy()
Revert "Revert "char/random: silence a lockdep splat with printk()""
tcp: Fix data-races around sysctl_tcp_max_reordering.
tcp: Fix a data-race around sysctl_tcp_rfc1337.
tcp: Fix a data-race around sysctl_tcp_stdurg.
tcp: Fix a data-race around sysctl_tcp_retrans_collapse.
tcp: Fix data-races around sysctl_tcp_slow_start_after_idle.
tcp: Fix a data-race around sysctl_tcp_thin_linear_timeouts.
tcp: Fix data-races around sysctl_tcp_recovery.
tcp: Fix a data-race around sysctl_tcp_early_retrans.
be2net: Fix buffer overflow in be_get_module_eeprom
tcp: Fix data-races around sysctl_tcp_fastopen.
tcp: Fix a data-race around sysctl_tcp_tw_reuse.
tcp: Fix a data-race around sysctl_tcp_notsent_lowat.
tcp: Fix data-races around some timeout sysctl knobs.
tcp: Fix data-races around sysctl_tcp_reordering.
igmp: Fix a data-race around sysctl_igmp_max_memberships.
igmp: Fix data-races around sysctl_igmp_llm_reports.
net/tls: Fix race in TLS device down flow
net: stmmac: fix dma queue left shift overflow issue
i2c: cadence: Change large transfer count reset logic to be unconditional
tcp: Fix a data-race around sysctl_tcp_probe_interval.
tcp: Fix a data-race around sysctl_tcp_probe_threshold.
tcp: Fix data-races around sysctl_tcp_mtu_probing.
tcp/dccp: Fix a data-race around sysctl_tcp_fwmark_accept.
ip: Fix a data-race around sysctl_fwmark_reflect.
ip: Fix data-races around sysctl_ip_nonlocal_bind.
ip: Fix data-races around sysctl_ip_fwd_use_pmtu.
perf/core: Fix data race between perf_event_set_output() and perf_mmap_close()
pinctrl: ralink: Check for null return of devm_kcalloc
power/reset: arm-versatile: Fix refcount leak in versatile_reboot_probe
xfrm: xfrm_policy: fix a possible double xfrm_pols_put() in xfrm_bundle_lookup()
xen/gntdev: Ignore failure to unmap INVALID_GRANT_HANDLE
riscv: add as-options for modules with assembly compontents
Revert "cgroup: Use separate src/dst nodes when preloading css_sets for migration"
Linux 4.19.253
can: m_can: m_can_tx_handler(): fix use after free of skb
serial: pl011: UPSTAT_AUTORTS requires .throttle/unthrottle
serial: stm32: Clear prev values before setting RTS delays
serial: 8250: fix return error code in serial8250_request_std_resource()
tty: serial: samsung_tty: set dma burst_size to 1
usb: dwc3: gadget: Fix event pending check
usb: typec: add missing uevent when partner support PD
USB: serial: ftdi_sio: add Belimo device ids
signal handling: don't use BUG_ON() for debugging
ARM: dts: stm32: use the correct clock source for CEC on stm32mp151
x86: Clear .brk area at early boot
irqchip: or1k-pic: Undefine mask_ack for level triggered hardware
ASoC: wm5110: Fix DRE control
ASoC: ops: Fix off by one in range control validation
net: sfp: fix memory leak in sfp_probe()
NFC: nxp-nci: don't print header length mismatch on i2c error
net: tipc: fix possible refcount leak in tipc_sk_create()
platform/x86: hp-wmi: Ignore Sanitization Mode event
cpufreq: pmac32-cpufreq: Fix refcount leak bug
netfilter: br_netfilter: do not skip all hooks with 0 priority
virtio_mmio: Restore guest page size on resume
virtio_mmio: Add missing PM calls to freeze/restore
sfc: fix kernel panic when creating VF
seg6: bpf: fix skb checksum in bpf_push_seg6_encap()
seg6: fix skb checksum in SRv6 End.B6 and End.B6.Encaps behaviors
seg6: fix skb checksum evaluation in SRH encapsulation/insertion
sfc: fix use after free when disabling sriov
ipv4: Fix data-races around sysctl_ip_dynaddr.
icmp: Fix a data-race around sysctl_icmp_ratemask.
icmp: Fix a data-race around sysctl_icmp_ratelimit.
ARM: dts: sunxi: Fix SPI NOR campatible on Orange Pi Zero
icmp: Fix data-races around sysctl.
cipso: Fix data-races around sysctl.
net: Fix data-races around sysctl_mem.
inetpeer: Fix data-races around sysctl.
ASoC: sgtl5000: Fix noise on shutdown/remove
ARM: 9209/1: Spectre-BHB: avoid pr_info() every time a CPU comes out of idle
ARM: dts: imx6qdl-ts7970: Fix ngpio typo and count
nilfs2: fix incorrect masking of permission flags for symlinks
cgroup: Use separate src/dst nodes when preloading css_sets for migration
ARM: 9214/1: alignment: advance IT state after emulating Thumb instruction
ARM: 9213/1: Print message about disabled Spectre workarounds only once
net: sock: tracing: Fix sock_exceed_buf_limit not to dereference stale pointer
tracing/histograms: Fix memory leak problem
xen/netback: avoid entering xenvif_rx_next_skb() with an empty rx queue
ALSA: hda/realtek - Fix headset mic problem for a HP machine with alc221
ALSA: hda/conexant: Apply quirk for another HP ProDesk 600 G3 model
ALSA: hda - Add fixup for Dell Latitidue E5430
Change-Id: I2ea45f6f13cf904894563522f1bb35830dd0bf9e
[ Upstream commit 7c6f2a86ca590d5187a073d987e9599985fb1c7c ]
While reading sysctl_tcp_thin_linear_timeouts, it can be changed
concurrently. Thus, we need to add READ_ONCE() to its reader.
Fixes: 36e31b0af5 ("net: TCP thin linear timeouts")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 39e24435a776e9de5c6dd188836cf2523547804b ]
While reading these sysctl knobs, they can be changed concurrently.
Thus, we need to add READ_ONCE() to their readers.
- tcp_retries1
- tcp_retries2
- tcp_orphan_retries
- tcp_fin_timeout
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit f47d00e077e7d61baf69e46dde3210c886360207 ]
While reading sysctl_tcp_mtu_probing, it can be changed concurrently.
Thus, we need to add READ_ONCE() to its readers.
Fixes: 5d424d5a67 ("[TCP]: MTU probing")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
commit 3256a2d6ab1f71f9a1bd2d7f6f18eb8108c48d17 upstream.
The cited commit exposed an old retransmits_timed_out() bug
which assumed it could call tcp_model_timeout() with
TCP_RTO_MIN as rto_base for all states.
But flows in SYN_SENT or SYN_RECV state uses a different
RTO base (1 sec instead of 200 ms, unless BPF choses
another value)
This caused a reduction of SYN retransmits from 6 to 4 with
the default /proc/sys/net/ipv4/tcp_syn_retries value.
Fixes: a41e8a88b06e ("tcp: better handle TCP_USER_TIMEOUT in SYN_SENT state")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Marek Majkowski <marek@cloudflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Qiumiao Zhang <zhangqiumiao1@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 7ae189759cc48cf8b54beebff566e9fd2d4e7d7c upstream.
Previously TCP socket's retrans_stamp is not set if the
retransmission has failed to send. As a result if a socket is
experiencing local issues to retransmit packets, determining when
to abort a socket is complicated w/o knowning the starting time of
the recovery since retrans_stamp may remain zero.
This complication causes sub-optimal behavior that TCP may use the
latest, instead of the first, retransmission time to compute the
elapsed time of a stalling connection due to local issues. Then TCP
may disrecard TCP retries settings and keep retrying until it finally
succeed: not a good idea when the local host is already strained.
The simple fix is to always timestamp the start of a recovery.
It's worth noting that retrans_stamp is also used to compare echo
timestamp values to detect spurious recovery. This patch does
not break that because retrans_stamp is still later than when the
original packet was sent.
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Neal Cardwell <ncardwell@google.com>
Reviewed-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Qiumiao Zhang <zhangqiumiao1@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 9efdda4e3abed13f0903b7b6e4d4c2102019440a upstream.
When a qdisc setup including pacing FQ is dismantled and recreated,
some TCP packets are sent earlier than instructed by TCP stack.
TCP can be fooled when ACK comes back, because the following
operation can return a negative value.
tcp_time_stamp(tp) - tp->rx_opt.rcv_tsecr;
Some paths in TCP stack were not dealing properly with this,
this patch addresses four of them.
Fixes: ab408b6dc744 ("tcp: switch tcp and sch_fq to new earliest departure time model")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Qiumiao Zhang <zhangqiumiao1@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The cited commit exposed an old retransmits_timed_out() bug
which assumed it could call tcp_model_timeout() with
TCP_RTO_MIN as rto_base for all states.
But flows in SYN_SENT or SYN_RECV state uses a different
RTO base (1 sec instead of 200 ms, unless BPF choses
another value).
This caused a reduction of SYN retransmits from 6 to 4 with
the default /proc/sys/net/ipv4/tcp_syn_retries value.
Change-Id: I424b7a716ceee8d455701f6470b0688e1fa1dc5a
Fixes: a41e8a88b06e ("tcp: better handle TCP_USER_TIMEOUT in SYN_SENT state")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Marek Majkowski <marek@cloudflare.com>
Git-Commit: 3256a2d6ab1f71f9a1bd2d7f6f18eb8108c48d17
Git-repo: https://android.googlesource.com/kernel/common/
Signed-off-by: Kaustubh Pandey <kapandey@codeaurora.org>
Previously when the sender fails to retransmit a data packet on
timeout due to congestion in the local host (e.g. throttling in
qdisc), it'll retry within an RTO up to 500ms.
In low-RTT networks such as data-centers, RTO is often far
below the default minimum 200ms (and the cap 500ms). Then local
host congestion could trigger a retry storm pouring gas to the
fire. Worse yet, the retry counter (icsk_retransmits) is not
properly updated so the aggressive retry may exceed the system
limit (15 rounds) until the packet finally slips through.
On such rare events, it's wise to retry more conservatively (500ms)
and update the stats properly to reflect these incidents and follow
the system limit. Note that this is consistent with the behavior
when a keep-alive probe is dropped due to local congestion.
Change-Id: I39ca48f733faefcda4e9cc091fb3c860fc9e0708
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Neal Cardwell <ncardwell@google.com>
Reviewed-by: Soheil Hassas Yeganeh <soheil@google.com>
Git-Commit: 590d2026d62418bb27de9ca87526e9131c1f48af
Git-repo: https://android.googlesource.com/kernel/common/
Signed-off-by: Kaustubh Pandey <kapandey@codeaurora.org>
Previously TCP socket's retrans_stamp is not set if the
retransmission has failed to send. As a result if a socket is
experiencing local issues to retransmit packets, determining when
to abort a socket is complicated w/o knowning the starting time of
the recovery since retrans_stamp may remain zero.
This complication causes sub-optimal behavior that TCP may use the
latest, instead of the first, retransmission time to compute the
elapsed time of a stalling connection due to local issues. Then TCP
may disrecard TCP retries settings and keep retrying until it finally
succeed: not a good idea when the local host is already strained.
The simple fix is to always timestamp the start of a recovery.
It's worth noting that retrans_stamp is also used to compare echo
timestamp values to detect spurious recovery. This patch does
not break that because retrans_stamp is still later than when the
original packet was sent.
Change-Id: I8d13ee5aee5ba84d3c42fb34fbb46da942a4b7d7
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Neal Cardwell <ncardwell@google.com>
Reviewed-by: Soheil Hassas Yeganeh <soheil@google.com>
Git-Commit: 7ae189759cc48cf8b54beebff566e9fd2d4e7d7c
Git-repo: https://android.googlesource.com/kernel/common/
Signed-off-by: Kaustubh Pandey <kapandey@codeaurora.org>
When a qdisc setup including pacing FQ is dismantled and recreated,
some TCP packets are sent earlier than instructed by TCP stack.
TCP can be fooled when ACK comes back, because the following
operation can return a negative value.
tcp_time_stamp(tp) - tp->rx_opt.rcv_tsecr;
Some paths in TCP stack were not dealing properly with this,
this patch addresses four of them.
Change-Id: Ib7208509a066a9fd8c6b83846bc69fe33a6ae617
Fixes: ab408b6dc744 ("tcp: switch tcp and sch_fq to new earliest departure time model")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Git-Commit: 9efdda4e3abed13f0903b7b6e4d4c2102019440a
Git-repo: https://android.googlesource.com/kernel/common/
Signed-off-by: Kaustubh Pandey <kapandey@codeaurora.org>
It has been observed that default values for some of key tcp/ip
parameters are affecting the tput/performance of the system. Hence
extending configuration capabilities to TCP/Ip stack through
sysctl interface.
Change-Id: I4287e9103769535f43e0934bac08435a524ee6a4
CRs-Fixed: 507581
Signed-off-by: Ravi Joshi <ravij@codeaurora.org>
Signed-off-by: Ganesh Babu Kumaravel <kganesh@codeaurora.org>
Signed-off-by: Mohit Khanna <mkhannaqca@codeaurora.org>
Signed-off-by: Manjunathappa Prakash <prakashpm@codeaurora.org>
Signed-off-by: Sandeep Singh <sandsing@codeaurora.org>
[ Upstream commit e1561fe2dd69dc5dddd69bd73aa65355bdfb048b ]
Previously the SNMP TCPTIMEOUTS counter has inconsistent accounting:
1. It counts all SYN and SYN-ACK timeouts
2. It counts timeouts in other states except recurring timeouts and
timeouts after fast recovery or disorder state.
Such selective accounting makes analysis difficult and complicated. For
example the monitoring system needs to collect many other SNMP counters
to infer the total amount of timeout events. This patch makes TCPTIMEOUTS
counter simply counts all the retransmit timeout (SYN or data or FIN).
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 3976535af0cb9fe34a55f2ffb8d7e6b39a2f8188 ]
Previously there is an off-by-one bug on determining when to abort
a stalled window-probing socket. This patch fixes that so it is
consistent with tcp_write_timeout().
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
commit 88f8598d0a302a08380eadefd09b9f5cb1c4c428 upstream.
Previously TCP only warns if its RTO timer fires and the
retransmission queue is empty, but it'll cause null pointer
reference later on. It's better to avoid such catastrophic failure
and simply exit with a warning.
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Neal Cardwell <ncardwell@google.com>
Reviewed-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit a66b10c05ee2d744189e9a2130394b070883d289 ]
Yuchung Cheng and Marek Majkowski independently reported a weird
behavior of TCP_USER_TIMEOUT option when used at connect() time.
When the TCP_USER_TIMEOUT is reached, tcp_write_timeout()
believes the flow should live, and the following condition
in tcp_clamp_rto_to_user_timeout() programs one jiffie timers :
remaining = icsk->icsk_user_timeout - elapsed;
if (remaining <= 0)
return 1; /* user timeout has passed; fire ASAP */
This silly situation ends when the max syn rtx count is reached.
This patch makes sure we honor both TCP_SYNCNT and TCP_USER_TIMEOUT,
avoiding these spurious SYN packets.
Fixes: b701a99e43 ("tcp: Add tcp_clamp_rto_to_user_timeout() helper to improve accuracy")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Yuchung Cheng <ycheng@google.com>
Reported-by: Marek Majkowski <marek@cloudflare.com>
Cc: Jon Maxwell <jmaxwell37@gmail.com>
Link: https://marc.info/?l=linux-netdev&m=156940118307949&w=2
Acked-by: Jon Maxwell <jmaxwell37@gmail.com>
Tested-by: Marek Majkowski <marek@cloudflare.com>
Signed-off-by: Marek Majkowski <marek@cloudflare.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit c5715b8fabfca0ef85903f8bad2189940ed41cc8 ]
Previously upon SYN timeouts the sender recomputes the txhash to
try a different path. However this does not apply on the initial
timeout of SYN-data (active Fast Open). Therefore an active IPv6
Fast Open connection may incur one second RTO penalty to take on
a new path after the second SYN retransmission uses a new flow label.
This patch removes this undesirable behavior so Fast Open changes
the flow label just like the regular connections. This also helps
avoid falsely disabling Fast Open on the sender which triggers
after two consecutive SYN timeouts on Fast Open.
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Reviewed-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 86de5921a3d5dd246df661e09bdd0a6131b39ae3 ]
Jean-Louis reported a TCP regression and bisected to recent SACK
compression.
After a loss episode (receiver not able to keep up and dropping
packets because its backlog is full), linux TCP stack is sending
a single SACK (DUPACK).
Sender waits a full RTO timer before recovering losses.
While RFC 6675 says in section 5, "Algorithm Details",
(2) If DupAcks < DupThresh but IsLost (HighACK + 1) returns true --
indicating at least three segments have arrived above the current
cumulative acknowledgment point, which is taken to indicate loss
-- go to step (4).
...
(4) Invoke fast retransmit and enter loss recovery as follows:
there are old TCP stacks not implementing this strategy, and
still counting the dupacks before starting fast retransmit.
While these stacks probably perform poorly when receivers implement
LRO/GRO, we should be a little more gentle to them.
This patch makes sure we do not enable SACK compression unless
3 dupacks have been sent since last rcv_nxt update.
Ideally we should even rearm the timer to send one or two
more DUPACK if no more packets are coming, but that will
be work aiming for linux-4.21.
Many thanks to Jean-Louis for bisecting the issue, providing
packet captures and testing this patch.
Fixes: 5d9f4262b7 ("tcp: add SACK compression")
Reported-by: Jean-Louis Dupond <jean-louis@dupond.be>
Tested-by: Jean-Louis Dupond <jean-louis@dupond.be>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Fixes the following sparse warnings:
net/ipv4/tcp_timer.c:25:5: warning:
symbol 'tcp_retransmit_stamp' was not declared. Should it be static?
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Create the tcp_clamp_rto_to_user_timeout() helper routine. To calculate
the correct rto, so that the TCP_USER_TIMEOUT socket option is more
accurate. Taking suggestions and feedback into account from
Eric Dumazet, Neal Cardwell and David Laight. Due to the 1st commit we
can avoid the msecs_to_jiffies() and jiffies_to_msecs() dance.
Signed-off-by: Jon Maxwell <jmaxwell37@gmail.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Create a seperate helper routine as per Neal Cardwells suggestion. To
be used by the final commit in this series and retransmits_timed_out().
Signed-off-by: Jon Maxwell <jmaxwell37@gmail.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This is a preparatory commit. Part of this series that improves the
socket TCP_USER_TIMEOUT option accuracy. Implement Eric Dumazets idea
to convert icsk->icsk_user_timeout from jiffies to msecs. To eliminate
the msecs_to_jiffies() and jiffies_to_msecs() dance in future.
Signed-off-by: Jon Maxwell <jmaxwell37@gmail.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When TCP receives an out-of-order packet, it immediately sends
a SACK packet, generating network load but also forcing the
receiver to send 1-MSS pathological packets, increasing its
RTX queue length/depth, and thus processing time.
Wifi networks suffer from this aggressive behavior, but generally
speaking, all these SACK packets add fuel to the fire when networks
are under congestion.
This patch adds a high resolution timer and tp->compressed_ack counter.
Instead of sending a SACK, we program this timer with a small delay,
based on RTT and capped to 1 ms :
delay = min ( 5 % of RTT, 1 ms)
If subsequent SACKs need to be sent while the timer has not yet
expired, we simply increment tp->compressed_ack.
When timer expires, a SACK is sent with the latest information.
Whenever an ACK is sent (if data is sent, or if in-order
data is received) timer is canceled.
Note that tcp_sack_new_ofo_skb() is able to force a SACK to be sent
if the sack blocks need to be shuffled, even if the timer has not
expired.
A new SNMP counter is added in the following patch.
Two other patches add sysctls to allow changing the 1,000,000 and 44
values that this commit hard-coded.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Toke Høiland-Jørgensen <toke@toke.dk>
Signed-off-by: David S. Miller <davem@davemloft.net>
linux-4.16 got support for softirq based hrtimers.
TCP can switch its pacing hrtimer to this variant, since this
avoids going through a tasklet and some atomic operations.
pacing timer logic looks like other (jiffies based) tcp timers.
v2: use hrtimer_try_to_cancel() in tcp_clear_xmit_timers()
to correctly release reference on socket if needed.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When the connection is aborted, there is no point in
keeping the packets on the write queue until the connection
is closed.
Similar to a27fd7a8ed ('tcp: purge write queue upon RST'),
this is essential for a correct MSG_ZEROCOPY implementation,
because userspace cannot call close(fd) before receiving
zerocopy signals even when the connection is aborted.
Fixes: f214f915e7 ("tcp: enable MSG_ZEROCOPY")
Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Adds an optional call to sock_ops BPF program based on whether the
BPF_SOCK_OPS_RTO_CB_FLAG is set in bpf_sock_ops_flags.
The BPF program is passed 2 arguments: icsk_retransmits and whether the
RTO has expired.
Signed-off-by: Lawrence Brakmo <brakmo@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
When a tcp socket is closed, if it detects that its net namespace is
exiting, close immediately and do not wait for FIN sequence.
For normal sockets, a reference is taken to their net namespace, so it will
never exit while the socket is open. However, kernel sockets do not take a
reference to their net namespace, so it may begin exiting while the kernel
socket is still open. In this case if the kernel socket is a tcp socket,
it will stay open trying to complete its close sequence. The sock's dst(s)
hold a reference to their interface, which are all transferred to the
namespace's loopback interface when the real interfaces are taken down.
When the namespace tries to take down its loopback interface, it hangs
waiting for all references to the loopback interface to release, which
results in messages like:
unregister_netdevice: waiting for lo to become free. Usage count = 1
These messages continue until the socket finally times out and closes.
Since the net namespace cleanup holds the net_mutex while calling its
registered pernet callbacks, any new net namespace initialization is
blocked until the current net namespace finishes exiting.
After this change, the tcp socket notices the exiting net namespace, and
closes immediately, releasing its dst(s) and their reference to the
loopback interface, which lets the net namespace continue exiting.
Link: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1711407
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=97811
Signed-off-by: Dan Streetman <ddstreet@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Three sets of overlapping changes, two in the packet scheduler
and one in the meson-gxl PHY driver.
Signed-off-by: David S. Miller <davem@davemloft.net>
Only the retransmit timer currently refreshes tcp_mstamp
We should do the same for delayed acks and keepalives.
Even if RFC 7323 does not request it, this is consistent to what linux
did in the past, when TS values were based on jiffies.
Fixes: 385e20706f ("tcp: use tp->tcp_mstamp in output path")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Soheil Hassas Yeganeh <soheil@google.com>
Cc: Mike Maloney <maloney@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Mike Maloney <maloney@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Prior to this patch, active Fast Open is paused on a specific
destination IP address if the previous connections to the
IP address have experienced recurring timeouts . But recent
experiments by Microsoft (https://goo.gl/cykmn7) and Mozilla
browsers indicate the isssue is often caused by broken middle-boxes
sitting close to the client. Therefore it is much better user
experience if Fast Open is disabled out-right globally to avoid
experiencing further timeouts on connections toward other
destinations.
This patch changes the destination-IP disablement to global
disablement if a connection experiencing recurring timeouts
or aborts due to timeout. Repeated incidents would still
exponentially increase the pause time, starting from an hour.
This is extremely conservative but an unfortunate compromise to
minimize bad experience due to broken middle-boxes.
Reported-by: Dragana Damjanovic <ddamjanovic@mozilla.com>
Reported-by: Patrick McManus <mcmanus@ducksong.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Reviewed-by: Wei Wang <weiwan@google.com>
Reviewed-by: Neal Cardwell <ncardwell@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Reduce one indentation level to make code more readable.
tcp_sync_mss() can be factorized.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Note that sysctl_tcp_thin_dupack was not used, I deleted it.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Using a linear list to store all skbs in write queue has been okay
for quite a while : O(N) is not too bad when N < 500.
Things get messy when N is the order of 100,000 : Modern TCP stacks
want 10Gbit+ of throughput even with 200 ms RTT flows.
40 ns per cache line miss means a full scan can use 4 ms,
blowing away CPU caches.
SACK processing often can use various hints to avoid parsing
whole retransmit queue. But with high packet losses and/or high
reordering, hints no longer work.
Sender has to process thousands of unfriendly SACK, accumulating
a huge socket backlog, burning a cpu and massively dropping packets.
Using an rb-tree for retransmit queue has been avoided for years
because it added complexity and overhead, but now is the time
to be more resistant and say no to quadratic behavior.
1) RTX queue is no longer part of the write queue : already sent skbs
are stored in one rb-tree.
2) Since reaching the head of write queue no longer needs
sk->sk_send_head, we added an union of sk_send_head and tcp_rtx_queue
Tested:
On receiver :
netem on ingress : delay 150ms 200us loss 1
GRO disabled to force stress and SACK storms.
for f in `seq 1 10`
do
./netperf -H lpaa6 -l30 -- -K bbr -o THROUGHPUT|tail -1
done | awk '{print $0} {sum += $0} END {printf "%7u\n",sum}'
Before patch :
323.87
351.48
339.59
338.62
306.72
204.07
304.93
291.88
202.47
176.88
2840
After patch:
1700.83
2207.98
2070.17
1544.26
2114.76
2124.89
1693.14
1080.91
2216.82
1299.94
18053
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The UDP offload conflict is dealt with by simply taking what is
in net-next where we have removed all of the UFO handling code
entirely.
The TCP conflict was a case of local variables in a function
being removed from both net and net-next.
In netvsc we had an assignment right next to where a missing
set of u64 stats sync object inits were added.
Signed-off-by: David S. Miller <davem@davemloft.net>
prequeue is a tcp receive optimization that moves part of rx processing
from bh to process context.
This only works if the socket being processed belongs to a process that
is blocked in recv on that socket.
In practice, this doesn't happen anymore that often because nowadays
servers tend to use an event driven (epoll) model.
Even normal client applications (web browsers) commonly use many tcp
connections in parallel.
This has measureable impact only in netperf (which uses plain recv and
thus allows prequeue use) from host to locally running vm (~4%), however,
there were no changes when using netperf between two physical hosts with
ixgbe interfaces.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
After the mentioned commit, some of our packetdrill tests became flaky.
TCP_SYNCNT socket option can limit the number of SYN retransmits.
retransmits_timed_out() has to compare times computations based on
local_clock() while timers are based on jiffies. With NTP adjustments
and roundings we can observe 999 ms delay for 1000 ms timers.
We end up sending one extra SYN packet.
Gimmick added in commit 6fa12c8503 ("Revert Backoff [v3]: Calculate
TCP's connection close threshold as a time value") makes no
real sense for TCP_SYN_SENT sockets where no RTO backoff can happen at
all.
Lets use a simpler logic for TCP_SYN_SENT sockets and remove @syn_set
parameter from retransmits_timed_out()
Fixes: 9a568de481 ("tcp: switch TCP TS option (RFC 7323) to 1ms clock")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>