Commit Graph

239 Commits

Author SHA1 Message Date
Vlad Buslov
5df0c8e7e2 macvlan: Don't propagate promisc change to lower dev in passthru
[ Upstream commit 7e1caeace0418381f36b3aa8403dfd82fc57fc53 ]

Macvlan device in passthru mode sets its lower device promiscuous mode
according to its MACVLAN_FLAG_NOPROMISC flag instead of synchronizing it to
its own promiscuity setting. However, macvlan_change_rx_flags() function
doesn't check the mode before propagating such changes to the lower device
which can cause net_device->promiscuity counter overflow as illustrated by
reproduction example [0] and resulting dmesg log [1]. Fix the issue by
first verifying the mode in macvlan_change_rx_flags() function before
propagating promiscuous mode change to the lower device.

[0]:
ip link add macvlan1 link enp8s0f0 type macvlan mode passthru
ip link set macvlan1 promisc on
ip l set dev macvlan1 up
ip link set macvlan1 promisc off
ip l set dev macvlan1 down
ip l set dev macvlan1 up

[1]:
[ 5156.281724] macvlan1: entered promiscuous mode
[ 5156.285467] mlx5_core 0000:08:00.0 enp8s0f0: entered promiscuous mode
[ 5156.287639] macvlan1: left promiscuous mode
[ 5156.288339] mlx5_core 0000:08:00.0 enp8s0f0: left promiscuous mode
[ 5156.290907] mlx5_core 0000:08:00.0 enp8s0f0: entered promiscuous mode
[ 5156.317197] mlx5_core 0000:08:00.0 enp8s0f0: promiscuity touches roof, set promiscuity failed. promiscuity feature of device might be broken.

Fixes: efdbd2b30c ("macvlan: Propagate promiscuity setting to lower devices.")
Reviewed-by: Gal Pressman <gal@nvidia.com>
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Link: https://lore.kernel.org/r/20231114175915.1649154-1-vladbu@nvidia.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-11-28 16:46:33 +00:00
Eric Dumazet
650137a7c0 macvlan: enforce a consistent minimal mtu
commit b64085b00044bdf3cd1c9825e9ef5b2e0feae91a upstream.

macvlan should enforce a minimal mtu of 68, even at link creation.

This patch avoids the current behavior (which could lead to crashes
in ipv6 stack if the link is brought up)

$ ip link add macvlan1 link eno1 mtu 8 type macvlan  # This should fail !
$ ip link sh dev macvlan1
5: macvlan1@eno1: <BROADCAST,MULTICAST> mtu 8 qdisc noop
    state DOWN mode DEFAULT group default qlen 1000
    link/ether 02:47:6c:24:74:82 brd ff:ff:ff:ff:ff:ff
$ ip link set macvlan1 mtu 67
Error: mtu less than device minimum.
$ ip link set macvlan1 mtu 68
$ ip link set macvlan1 mtu 8
Error: mtu less than device minimum.

Fixes: 91572088e3 ("net: use core MTU range checking in core net infra")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-11-25 17:40:28 +01:00
Chuang Wang
a81b44d1df net: macvlan: fix memory leaks of macvlan_common_newlink
[ Upstream commit 23569b5652ee8e8e55a12f7835f59af6f3cefc30 ]

kmemleak reports memory leaks in macvlan_common_newlink, as follows:

 ip link add link eth0 name .. type macvlan mode source macaddr add
 <MAC-ADDR>

kmemleak reports:

unreferenced object 0xffff8880109bb140 (size 64):
  comm "ip", pid 284, jiffies 4294986150 (age 430.108s)
  hex dump (first 32 bytes):
    00 00 00 00 00 00 00 00 b8 aa 5a 12 80 88 ff ff  ..........Z.....
    80 1b fa 0d 80 88 ff ff 1e ff ac af c7 c1 6b 6b  ..............kk
  backtrace:
    [<ffffffff813e06a7>] kmem_cache_alloc_trace+0x1c7/0x300
    [<ffffffff81b66025>] macvlan_hash_add_source+0x45/0xc0
    [<ffffffff81b66a67>] macvlan_changelink_sources+0xd7/0x170
    [<ffffffff81b6775c>] macvlan_common_newlink+0x38c/0x5a0
    [<ffffffff81b6797e>] macvlan_newlink+0xe/0x20
    [<ffffffff81d97f8f>] __rtnl_newlink+0x7af/0xa50
    [<ffffffff81d98278>] rtnl_newlink+0x48/0x70
    ...

In the scenario where the macvlan mode is configured as 'source',
macvlan_changelink_sources() will be execured to reconfigure list of
remote source mac addresses, at the same time, if register_netdevice()
return an error, the resource generated by macvlan_changelink_sources()
is not cleaned up.

Using this patch, in the case of an error, it will execute
macvlan_flush_sources() to ensure that the resource is cleaned up.

Fixes: aa5fd0fb77 ("driver: macvlan: Destroy new macvlan port if macvlan_common_newlink failed.")
Signed-off-by: Chuang Wang <nashuiliang@gmail.com>
Link: https://lore.kernel.org/r/20221109090735.690500-1-nashuiliang@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-11-25 17:40:18 +01:00
Alvin Šipraga
b12151989d macvlan: validate setting of multiple remote source MAC addresses
[ Upstream commit 8b61fba503904acae24aeb2bd5569b4d6544d48f ]

Remote source MAC addresses can be set on a 'source mode' macvlan
interface via the IFLA_MACVLAN_MACADDR_DATA attribute. This commit
tightens the validation of these MAC addresses to match the validation
already performed when setting or adding a single MAC address via the
IFLA_MACVLAN_MACADDR attribute.

iproute2 uses IFLA_MACVLAN_MACADDR_DATA for its 'macvlan macaddr set'
command, and IFLA_MACVLAN_MACADDR for its 'macvlan macaddr add' command,
which demonstrates the inconsistent behaviour that this commit
addresses:

 # ip link add link eth0 name macvlan0 type macvlan mode source
 # ip link set link dev macvlan0 type macvlan macaddr add 01:00:00:00:00:00
 RTNETLINK answers: Cannot assign requested address
 # ip link set link dev macvlan0 type macvlan macaddr set 01:00:00:00:00:00
 # ip -d link show macvlan0
 5: macvlan0@eth0: <BROADCAST,MULTICAST,DYNAMIC,UP,LOWER_UP> mtu 1500 ...
     link/ether 2e:ac:fd:2d:69:f8 brd ff:ff:ff:ff:ff:ff promiscuity 0
     macvlan mode source remotes (1) 01:00:00:00:00:00 numtxqueues 1 ...

With this change, the 'set' command will (rightly) fail in the same way
as the 'add' command.

Signed-off-by: Alvin Šipraga <alsi@bang-olufsen.dk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-09-03 11:24:26 +02:00
Alexander Sverdlin
133b3d2401 macvlan: Skip loopback packets in RX handler
[ Upstream commit 81f3dc9349ce0bf7b8447f147f45e70f0a5b36a6 ]

Ignore loopback-originatig packets soon enough and don't try to process L2
header where it doesn't exist. The very similar br_handle_frame() in bridge
code performs exactly the same check.

This is an example of such ICMPv6 packet:

skb len=96 headroom=40 headlen=96 tailroom=56
mac=(40,0) net=(40,40) trans=80
shinfo(txflags=0 nr_frags=0 gso(size=0 type=0 segs=0))
csum(0xae2e9a2f ip_summed=1 complete_sw=0 valid=0 level=0)
hash(0xc97ebd88 sw=1 l4=1) proto=0x86dd pkttype=5 iif=24
dev name=etha01.212 feat=0x0x0000000040005000
skb headroom: 00000000: 00 7c 86 52 84 88 ff ff 00 00 00 00 00 00 08 00
skb headroom: 00000010: 45 00 00 9e 5d 5c 40 00 40 11 33 33 00 00 00 01
skb headroom: 00000020: 02 40 43 80 00 00 86 dd
skb linear:   00000000: 60 09 88 bd 00 38 3a ff fe 80 00 00 00 00 00 00
skb linear:   00000010: 00 40 43 ff fe 80 00 00 ff 02 00 00 00 00 00 00
skb linear:   00000020: 00 00 00 00 00 00 00 01 86 00 61 00 40 00 00 2d
skb linear:   00000030: 00 00 00 00 00 00 00 00 03 04 40 e0 00 00 01 2c
skb linear:   00000040: 00 00 00 78 00 00 00 00 fd 5f 42 68 23 87 a8 81
skb linear:   00000050: 00 00 00 00 00 00 00 00 01 01 02 40 43 80 00 00
skb tailroom: 00000000: ...
skb tailroom: 00000010: ...
skb tailroom: 00000020: ...
skb tailroom: 00000030: ...

Call Trace, how it happens exactly:
 ...
 macvlan_handle_frame+0x321/0x425 [macvlan]
 ? macvlan_forward_source+0x110/0x110 [macvlan]
 __netif_receive_skb_core+0x545/0xda0
 ? enqueue_task_fair+0xe5/0x8e0
 ? __netif_receive_skb_one_core+0x36/0x70
 __netif_receive_skb_one_core+0x36/0x70
 process_backlog+0x97/0x140
 net_rx_action+0x1eb/0x350
 ? __hrtimer_run_queues+0x136/0x2e0
 __do_softirq+0xe3/0x383
 do_softirq_own_stack+0x2a/0x40
 </IRQ>
 do_softirq.part.4+0x4e/0x50
 netif_rx_ni+0x60/0xd0
 dev_loopback_xmit+0x83/0xf0
 ip6_finish_output2+0x575/0x590 [ipv6]
 ? ip6_cork_release.isra.1+0x64/0x90 [ipv6]
 ? __ip6_make_skb+0x38d/0x680 [ipv6]
 ? ip6_output+0x6c/0x140 [ipv6]
 ip6_output+0x6c/0x140 [ipv6]
 ip6_send_skb+0x1e/0x60 [ipv6]
 rawv6_sendmsg+0xc4b/0xe10 [ipv6]
 ? proc_put_long+0xd0/0xd0
 ? rw_copy_check_uvector+0x4e/0x110
 ? sock_sendmsg+0x36/0x40
 sock_sendmsg+0x36/0x40
 ___sys_sendmsg+0x2b6/0x2d0
 ? proc_dointvec+0x23/0x30
 ? addrconf_sysctl_forward+0x8d/0x250 [ipv6]
 ? dev_forward_change+0x130/0x130 [ipv6]
 ? _raw_spin_unlock+0x12/0x30
 ? proc_sys_call_handler.isra.14+0x9f/0x110
 ? __call_rcu+0x213/0x510
 ? get_max_files+0x10/0x10
 ? trace_hardirqs_on+0x2c/0xe0
 ? __sys_sendmsg+0x63/0xa0
 __sys_sendmsg+0x63/0xa0
 do_syscall_64+0x6c/0x1e0
 entry_SYSCALL_64_after_hwframe+0x49/0xbe

Signed-off-by: Alexander Sverdlin <alexander.sverdlin@nokia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-06-22 09:05:19 +02:00
Taehee Yoo
9d7ea5675e macvlan: fix null dereference in macvlan_device_event()
[ Upstream commit 4dee15b4fd0d61ec6bbd179238191e959d34cf7a ]

In the macvlan_device_event(), the list_first_entry_or_null() is used.
This function could return null pointer if there is no node.
But, the macvlan module doesn't check the null pointer.
So, null-ptr-deref would occur.

      bond0
        |
   +----+-----+
   |          |
macvlan0   macvlan1
   |          |
 dummy0     dummy1

The problem scenario.
If dummy1 is removed,
1. ->dellink() of dummy1 is called.
2. NETDEV_UNREGISTER of dummy1 notification is sent to macvlan module.
3. ->dellink() of macvlan1 is called.
4. NETDEV_UNREGISTER of macvlan1 notification is sent to bond module.
5. __bond_release_one() is called and it internally calls
   dev_set_mac_address().
6. dev_set_mac_address() calls the ->ndo_set_mac_address() of macvlan1,
   which is macvlan_set_mac_address().
7. macvlan_set_mac_address() calls the dev_set_mac_address() with dummy1.
8. NETDEV_CHANGEADDR of dummy1 is sent to macvlan module.
9. In the macvlan_device_event(), it calls list_first_entry_or_null().
At this point, dummy1 and macvlan1 were removed.
So, list_first_entry_or_null() will return NULL.

Test commands:
    ip netns add nst
    ip netns exec nst ip link add bond0 type bond
    for i in {0..10}
    do
        ip netns exec nst ip link add dummy$i type dummy
	ip netns exec nst ip link add macvlan$i link dummy$i \
		type macvlan mode passthru
	ip netns exec nst ip link set macvlan$i master bond0
    done
    ip netns del nst

Splat looks like:
[   40.585687][  T146] general protection fault, probably for non-canonical address 0xdffffc0000000000: 0000 [#1] SMP DEI
[   40.587249][  T146] KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007]
[   40.588342][  T146] CPU: 1 PID: 146 Comm: kworker/u8:2 Not tainted 5.7.0-rc1+ #532
[   40.589299][  T146] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
[   40.590469][  T146] Workqueue: netns cleanup_net
[   40.591045][  T146] RIP: 0010:macvlan_device_event+0x4e2/0x900 [macvlan]
[   40.591905][  T146] Code: 00 00 00 00 00 fc ff df 80 3c 06 00 0f 85 45 02 00 00 48 89 da 48 b8 00 00 00 00 00 fc ff d2
[   40.594126][  T146] RSP: 0018:ffff88806116f4a0 EFLAGS: 00010246
[   40.594783][  T146] RAX: dffffc0000000000 RBX: 0000000000000000 RCX: 0000000000000000
[   40.595653][  T146] RDX: 0000000000000000 RSI: ffff88806547ddd8 RDI: ffff8880540f1360
[   40.596495][  T146] RBP: ffff88804011a808 R08: fffffbfff4fb8421 R09: fffffbfff4fb8421
[   40.597377][  T146] R10: ffffffffa7dc2107 R11: 0000000000000000 R12: 0000000000000008
[   40.598186][  T146] R13: ffff88804011a000 R14: ffff8880540f1000 R15: 1ffff1100c22de9a
[   40.599012][  T146] FS:  0000000000000000(0000) GS:ffff888067800000(0000) knlGS:0000000000000000
[   40.600004][  T146] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   40.600665][  T146] CR2: 00005572d3a807b8 CR3: 000000005fcf4003 CR4: 00000000000606e0
[   40.601485][  T146] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   40.602461][  T146] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[   40.603443][  T146] Call Trace:
[   40.603871][  T146]  ? nf_tables_dump_setelem+0xa0/0xa0 [nf_tables]
[   40.604587][  T146]  ? macvlan_uninit+0x100/0x100 [macvlan]
[   40.605212][  T146]  ? __module_text_address+0x13/0x140
[   40.605842][  T146]  notifier_call_chain+0x90/0x160
[   40.606477][  T146]  dev_set_mac_address+0x28e/0x3f0
[   40.607117][  T146]  ? netdev_notify_peers+0xc0/0xc0
[   40.607762][  T146]  ? __module_text_address+0x13/0x140
[   40.608440][  T146]  ? notifier_call_chain+0x90/0x160
[   40.609097][  T146]  ? dev_set_mac_address+0x1f0/0x3f0
[   40.609758][  T146]  dev_set_mac_address+0x1f0/0x3f0
[   40.610402][  T146]  ? __local_bh_enable_ip+0xe9/0x1b0
[   40.611071][  T146]  ? bond_hw_addr_flush+0x77/0x100 [bonding]
[   40.611823][  T146]  ? netdev_notify_peers+0xc0/0xc0
[   40.612461][  T146]  ? bond_hw_addr_flush+0x77/0x100 [bonding]
[   40.613213][  T146]  ? bond_hw_addr_flush+0x77/0x100 [bonding]
[   40.613963][  T146]  ? __local_bh_enable_ip+0xe9/0x1b0
[   40.614631][  T146]  ? bond_time_in_interval.isra.31+0x90/0x90 [bonding]
[   40.615484][  T146]  ? __bond_release_one+0x9f0/0x12c0 [bonding]
[   40.616230][  T146]  __bond_release_one+0x9f0/0x12c0 [bonding]
[   40.616949][  T146]  ? bond_enslave+0x47c0/0x47c0 [bonding]
[   40.617642][  T146]  ? lock_downgrade+0x730/0x730
[   40.618218][  T146]  ? check_flags.part.42+0x450/0x450
[   40.618850][  T146]  ? __mutex_unlock_slowpath+0xd0/0x670
[   40.619519][  T146]  ? trace_hardirqs_on+0x30/0x180
[   40.620117][  T146]  ? wait_for_completion+0x250/0x250
[   40.620754][  T146]  bond_netdev_event+0x822/0x970 [bonding]
[   40.621460][  T146]  ? __module_text_address+0x13/0x140
[   40.622097][  T146]  notifier_call_chain+0x90/0x160
[   40.622806][  T146]  rollback_registered_many+0x660/0xcf0
[   40.623522][  T146]  ? netif_set_real_num_tx_queues+0x780/0x780
[   40.624290][  T146]  ? notifier_call_chain+0x90/0x160
[   40.624957][  T146]  ? netdev_upper_dev_unlink+0x114/0x180
[   40.625686][  T146]  ? __netdev_adjacent_dev_unlink_neighbour+0x30/0x30
[   40.626421][  T146]  ? mutex_is_locked+0x13/0x50
[   40.627016][  T146]  ? unregister_netdevice_queue+0xf2/0x240
[   40.627663][  T146]  unregister_netdevice_many.part.134+0x13/0x1b0
[   40.628362][  T146]  default_device_exit_batch+0x2d9/0x390
[   40.628987][  T146]  ? unregister_netdevice_many+0x40/0x40
[   40.629615][  T146]  ? dev_change_net_namespace+0xcb0/0xcb0
[   40.630279][  T146]  ? prepare_to_wait_exclusive+0x2e0/0x2e0
[   40.630943][  T146]  ? ops_exit_list.isra.9+0x97/0x140
[   40.631554][  T146]  cleanup_net+0x441/0x890
[ ... ]

Fixes: e289fd2817 ("macvlan: fix the problem when mac address changes for passthru mode")
Reported-by: syzbot+5035b1f9dc7ea4558d5a@syzkaller.appspotmail.com
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-04-29 16:31:20 +02:00
Mahesh Bandewar
084dce9dc9 macvlan: add cond_resched() during multicast processing
[ Upstream commit ce9a4186f9ac475c415ffd20348176a4ea366670 ]

The Rx bound multicast packets are deferred to a workqueue and
macvlan can also suffer from the same attack that was discovered
by Syzbot for IPvlan. This solution is not as effective as in
IPvlan. IPvlan defers all (Tx and Rx) multicast packet processing
to a workqueue while macvlan does this way only for the Rx. This
fix should address the Rx codition to certain extent.

Tx is still suseptible. Tx multicast processing happens when
.ndo_start_xmit is called, hence we cannot add cond_resched().
However, it's not that severe since the user which is generating
 / flooding will be affected the most.

Fixes: 412ca1550c ("macvlan: Move broadcasts into a work queue")
Signed-off-by: Mahesh Bandewar <maheshb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-03-18 07:14:19 +01:00
Eric Dumazet
37cf279748 macvlan: use skb_reset_mac_header() in macvlan_queue_xmit()
[ Upstream commit 1712b2fff8c682d145c7889d2290696647d82dab ]

I missed the fact that macvlan_broadcast() can be used both
in RX and TX.

skb_eth_hdr() makes only sense in TX paths, so we can not
use it blindly in macvlan_broadcast()

Fixes: 96cc4b69581d ("macvlan: do not assume mac_header is set in macvlan_broadcast()")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Jurgen Van Ham <juvanham@gmail.com>
Tested-by: Matteo Croce <mcroce@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-01-23 08:21:34 +01:00
Eric Dumazet
5f3274c53a macvlan: do not assume mac_header is set in macvlan_broadcast()
[ Upstream commit 96cc4b69581db68efc9749ef32e9cf8e0160c509 ]

Use of eth_hdr() in tx path is error prone.

Many drivers call skb_reset_mac_header() before using it,
but others do not.

Commit 6d1ccff627 ("net: reset mac header in dev_start_xmit()")
attempted to fix this generically, but commit d346a3fae3
("packet: introduce PACKET_QDISC_BYPASS socket option") brought
back the macvlan bug.

Lets add a new helper, so that tx paths no longer have
to call skb_reset_mac_header() only to get a pointer
to skb->data.

Hopefully we will be able to revert 6d1ccff627
("net: reset mac header in dev_start_xmit()") and save few cycles
in transmit fast path.

BUG: KASAN: use-after-free in __get_unaligned_cpu32 include/linux/unaligned/packed_struct.h:19 [inline]
BUG: KASAN: use-after-free in mc_hash drivers/net/macvlan.c:251 [inline]
BUG: KASAN: use-after-free in macvlan_broadcast+0x547/0x620 drivers/net/macvlan.c:277
Read of size 4 at addr ffff8880a4932401 by task syz-executor947/9579

CPU: 0 PID: 9579 Comm: syz-executor947 Not tainted 5.5.0-rc4-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0x197/0x210 lib/dump_stack.c:118
 print_address_description.constprop.0.cold+0xd4/0x30b mm/kasan/report.c:374
 __kasan_report.cold+0x1b/0x41 mm/kasan/report.c:506
 kasan_report+0x12/0x20 mm/kasan/common.c:639
 __asan_report_load_n_noabort+0xf/0x20 mm/kasan/generic_report.c:145
 __get_unaligned_cpu32 include/linux/unaligned/packed_struct.h:19 [inline]
 mc_hash drivers/net/macvlan.c:251 [inline]
 macvlan_broadcast+0x547/0x620 drivers/net/macvlan.c:277
 macvlan_queue_xmit drivers/net/macvlan.c:520 [inline]
 macvlan_start_xmit+0x402/0x77f drivers/net/macvlan.c:559
 __netdev_start_xmit include/linux/netdevice.h:4447 [inline]
 netdev_start_xmit include/linux/netdevice.h:4461 [inline]
 dev_direct_xmit+0x419/0x630 net/core/dev.c:4079
 packet_direct_xmit+0x1a9/0x250 net/packet/af_packet.c:240
 packet_snd net/packet/af_packet.c:2966 [inline]
 packet_sendmsg+0x260d/0x6220 net/packet/af_packet.c:2991
 sock_sendmsg_nosec net/socket.c:639 [inline]
 sock_sendmsg+0xd7/0x130 net/socket.c:659
 __sys_sendto+0x262/0x380 net/socket.c:1985
 __do_sys_sendto net/socket.c:1997 [inline]
 __se_sys_sendto net/socket.c:1993 [inline]
 __x64_sys_sendto+0xe1/0x1a0 net/socket.c:1993
 do_syscall_64+0xfa/0x790 arch/x86/entry/common.c:294
 entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x442639
Code: 18 89 d0 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 5b 10 fc ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007ffc13549e08 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 0000000000442639
RDX: 000000000000000e RSI: 0000000020000080 RDI: 0000000000000003
RBP: 0000000000000004 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 0000000000403bb0 R14: 0000000000000000 R15: 0000000000000000

Allocated by task 9389:
 save_stack+0x23/0x90 mm/kasan/common.c:72
 set_track mm/kasan/common.c:80 [inline]
 __kasan_kmalloc mm/kasan/common.c:513 [inline]
 __kasan_kmalloc.constprop.0+0xcf/0xe0 mm/kasan/common.c:486
 kasan_kmalloc+0x9/0x10 mm/kasan/common.c:527
 __do_kmalloc mm/slab.c:3656 [inline]
 __kmalloc+0x163/0x770 mm/slab.c:3665
 kmalloc include/linux/slab.h:561 [inline]
 tomoyo_realpath_from_path+0xc5/0x660 security/tomoyo/realpath.c:252
 tomoyo_get_realpath security/tomoyo/file.c:151 [inline]
 tomoyo_path_perm+0x230/0x430 security/tomoyo/file.c:822
 tomoyo_inode_getattr+0x1d/0x30 security/tomoyo/tomoyo.c:129
 security_inode_getattr+0xf2/0x150 security/security.c:1222
 vfs_getattr+0x25/0x70 fs/stat.c:115
 vfs_statx_fd+0x71/0xc0 fs/stat.c:145
 vfs_fstat include/linux/fs.h:3265 [inline]
 __do_sys_newfstat+0x9b/0x120 fs/stat.c:378
 __se_sys_newfstat fs/stat.c:375 [inline]
 __x64_sys_newfstat+0x54/0x80 fs/stat.c:375
 do_syscall_64+0xfa/0x790 arch/x86/entry/common.c:294
 entry_SYSCALL_64_after_hwframe+0x49/0xbe

Freed by task 9389:
 save_stack+0x23/0x90 mm/kasan/common.c:72
 set_track mm/kasan/common.c:80 [inline]
 kasan_set_free_info mm/kasan/common.c:335 [inline]
 __kasan_slab_free+0x102/0x150 mm/kasan/common.c:474
 kasan_slab_free+0xe/0x10 mm/kasan/common.c:483
 __cache_free mm/slab.c:3426 [inline]
 kfree+0x10a/0x2c0 mm/slab.c:3757
 tomoyo_realpath_from_path+0x1a7/0x660 security/tomoyo/realpath.c:289
 tomoyo_get_realpath security/tomoyo/file.c:151 [inline]
 tomoyo_path_perm+0x230/0x430 security/tomoyo/file.c:822
 tomoyo_inode_getattr+0x1d/0x30 security/tomoyo/tomoyo.c:129
 security_inode_getattr+0xf2/0x150 security/security.c:1222
 vfs_getattr+0x25/0x70 fs/stat.c:115
 vfs_statx_fd+0x71/0xc0 fs/stat.c:145
 vfs_fstat include/linux/fs.h:3265 [inline]
 __do_sys_newfstat+0x9b/0x120 fs/stat.c:378
 __se_sys_newfstat fs/stat.c:375 [inline]
 __x64_sys_newfstat+0x54/0x80 fs/stat.c:375
 do_syscall_64+0xfa/0x790 arch/x86/entry/common.c:294
 entry_SYSCALL_64_after_hwframe+0x49/0xbe

The buggy address belongs to the object at ffff8880a4932000
 which belongs to the cache kmalloc-4k of size 4096
The buggy address is located 1025 bytes inside of
 4096-byte region [ffff8880a4932000, ffff8880a4933000)
The buggy address belongs to the page:
page:ffffea0002924c80 refcount:1 mapcount:0 mapping:ffff8880aa402000 index:0x0 compound_mapcount: 0
raw: 00fffe0000010200 ffffea0002846208 ffffea00028f3888 ffff8880aa402000
raw: 0000000000000000 ffff8880a4932000 0000000100000001 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
 ffff8880a4932300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
 ffff8880a4932380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>ffff8880a4932400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
                   ^
 ffff8880a4932480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
 ffff8880a4932500: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

Fixes: b863ceb7dd ("[NET]: Add macvlan driver")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-01-12 12:17:25 +01:00
Menglong Dong
8a0092e905 macvlan: schedule bc_work even if error
[ Upstream commit 1d7ea55668878bb350979c377fc72509dd6f5b21 ]

While enqueueing a broadcast skb to port->bc_queue, schedule_work()
is called to add port->bc_work, which processes the skbs in
bc_queue, to "events" work queue. If port->bc_queue is full, the
skb will be discarded and schedule_work(&port->bc_work) won't be
called. However, if port->bc_queue is full and port->bc_work is not
running or pending, port->bc_queue will keep full and schedule_work()
won't be called any more, and all broadcast skbs to macvlan will be
discarded. This case can happen:

macvlan_process_broadcast() is the pending function of port->bc_work,
it moves all the skbs in port->bc_queue to the queue "list", and
processes the skbs in "list". During this, new skbs will keep being
added to port->bc_queue in macvlan_broadcast_enqueue(), and
port->bc_queue may already full when macvlan_process_broadcast()
return. This may happen, especially when there are a lot of real-time
threads and the process is preempted.

Fix this by calling schedule_work(&port->bc_work) even if
port->bc_work is full in macvlan_broadcast_enqueue().

Fixes: 412ca1550c ("macvlan: Move broadcasts into a work queue")
Signed-off-by: Menglong Dong <dong.menglong@zte.com.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-12-05 09:21:29 +01:00
Matteo Croce
2ef6e0fe38 macvlan: return correct error value
[ Upstream commit 59f997b088d26a774958cb7b17b0763cd82de7ec ]

A MAC address must be unique among all the macvlan devices with the same
lower device. The only exception is the passthru [sic] mode,
which shares the lower device address.

When duplicate addresses are detected, EBUSY is returned when bringing
the interface up:

    # ip link add macvlan0 link eth0 type macvlan
    # read addr </sys/class/net/eth0/address
    # ip link set macvlan0 address $addr
    # ip link set macvlan0 up
    RTNETLINK answers: Device or resource busy

Use correct error code which is EADDRINUSE, and do the check also
earlier, on address change:

    # ip link set macvlan0 address $addr
    RTNETLINK answers: Address already in use

Signed-off-by: Matteo Croce <mcroce@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2018-12-21 14:15:20 +01:00
Travis Brown
80fd2d6ca5 macvlan: Change status when lower device goes down
Today macvlan ignores the notification when a lower device goes
administratively down, preventing the lack of connectivity from
bubbling up.

Processing NETDEV_DOWN results in a macvlan state of LOWERLAYERDOWN
with NO-CARRIER which should be easy to interpret in userspace.

2: lower: <BROADCAST,MULTICAST> mtu 1500 qdisc mq state DOWN mode DEFAULT group default qlen 1000
3: macvlan@lower: <NO-CARRIER,BROADCAST,MULTICAST,UP,M-DOWN> mtu 1500 qdisc noqueue state LOWERLAYERDOWN mode DEFAULT group default qlen 1000

Signed-off-by: Suresh Krishnan <skrishnan@arista.com>
Signed-off-by: Travis Brown <travisb@arista.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-11 23:07:22 -07:00
Alexander Duyck
eadec877ce net: Add support for subordinate traffic classes to netdev_pick_tx
This change makes it so that we can support the concept of subordinate
device traffic classes to the core networking code. In doing this we can
start pulling out the driver specific bits needed to support selecting a
queue based on an upper device.

The solution at is currently stands is only partially implemented. I have
the start of some XPS bits in here, but I would still need to allow for
configuration of the XPS maps on the queues reserved for the subordinate
devices. For now I am using the reference to the sb_dev XPS map as just a
way to skip the lookup of the lower device XPS map for now as that would
result in the wrong queue being picked.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-07-09 12:53:58 -07:00
Alexander Duyck
81d4e91cd5 macvlan: Use software path for offloaded local, broadcast, and multicast traffic
This change makes it so that we use a software path for packets that are
going to be locally switched between two macvlan interfaces on the same
device. In addition we resort to software replication of broadcast and
multicast packets instead of offloading that to hardware.

The general idea is that using the device for east/west traffic local to
the system is extremely inefficient. We can only support up to whatever the
PCIe limit is for any given device so this caps us at somewhere around 20G
for devices supported by ixgbe. This is compounded even further when you
take broadcast and multicast into account as a single 10G port can come to
a crawl as a packet is replicated up to 60+ times in some cases. In order
to get away from that I am implementing changes so that we handle
broadcast/multicast replication and east/west local traffic all in
software.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-04-25 08:26:19 -07:00
Alexander Duyck
7d775f6347 macvlan: Rename fwd_priv to accel_priv and add accessor function
This change renames the fwd_priv member to accel_priv as this more
accurately reflects the actual purpose of this value. In addition I am
adding an accessor which will allow us to further abstract this in the
future if needed.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-04-25 08:26:19 -07:00
Shannon Nelson
13fbcc8dc5 macvlan: filter out unsupported feature flags
Adding a macvlan device on top of a lowerdev that supports
the xfrm offloads fails with a new regression:
  # ip link add link ens1f0 mv0 type macvlan
  RTNETLINK answers: Operation not permitted

Tracing down the failure shows that the macvlan device inherits
the NETIF_F_HW_ESP and NETIF_F_HW_ESP_TX_CSUM feature flags
from the lowerdev, but with no dev->xfrmdev_ops API filled
in, it doesn't actually support xfrm.  When the request is
made to add the new macvlan device, the XFRM listener for
NETDEV_REGISTER calls xfrm_api_check() which fails the new
registration because dev->xfrmdev_ops is NULL.

The macvlan creation succeeds when we filter out the ESP
feature flags in macvlan_fix_features(), so let's filter them
out like we're already filtering out ~NETIF_F_NETNS_LOCAL.
When XFRM support is added in the future, we can add the flags
into MACVLAN_FEATURES.

This same problem could crop up in the future with any other
new feature flags, so let's filter out any flags that aren't
defined as supported in macvlan.

Fixes: d77e38e612 ("xfrm: Add an IPsec hardware offloading API")
Reported-by: Alexey Kodanev <alexey.kodanev@oracle.com>
Signed-off-by: Shannon Nelson <shannon.nelson@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-11 22:46:16 -04:00
Alexey Kodanev
4e14bf4236 macvlan: fix use-after-free in macvlan_common_newlink()
The following use-after-free was reported by KASan when running
LTP macvtap01 test on 4.16-rc2:

[10642.528443] BUG: KASAN: use-after-free in
               macvlan_common_newlink+0x12ef/0x14a0 [macvlan]
[10642.626607] Read of size 8 at addr ffff880ba49f2100 by task ip/18450
...
[10642.963873] Call Trace:
[10642.994352]  dump_stack+0x5c/0x7c
[10643.035325]  print_address_description+0x75/0x290
[10643.092938]  kasan_report+0x28d/0x390
[10643.137971]  ? macvlan_common_newlink+0x12ef/0x14a0 [macvlan]
[10643.207963]  macvlan_common_newlink+0x12ef/0x14a0 [macvlan]
[10643.275978]  macvtap_newlink+0x171/0x260 [macvtap]
[10643.334532]  rtnl_newlink+0xd4f/0x1300
...
[10646.256176] Allocated by task 18450:
[10646.299964]  kasan_kmalloc+0xa6/0xd0
[10646.343746]  kmem_cache_alloc_trace+0xf1/0x210
[10646.397826]  macvlan_common_newlink+0x6de/0x14a0 [macvlan]
[10646.464386]  macvtap_newlink+0x171/0x260 [macvtap]
[10646.522728]  rtnl_newlink+0xd4f/0x1300
...
[10647.022028] Freed by task 18450:
[10647.061549]  __kasan_slab_free+0x138/0x180
[10647.111468]  kfree+0x9e/0x1c0
[10647.147869]  macvlan_port_destroy+0x3db/0x650 [macvlan]
[10647.211411]  rollback_registered_many+0x5b9/0xb10
[10647.268715]  rollback_registered+0xd9/0x190
[10647.319675]  register_netdevice+0x8eb/0xc70
[10647.370635]  macvlan_common_newlink+0xe58/0x14a0 [macvlan]
[10647.437195]  macvtap_newlink+0x171/0x260 [macvtap]

Commit d02fd6e7d2 ("macvlan: Fix one possible double free") handles
the case when register_netdevice() invokes ndo_uninit() on error and
as a result free the port. But 'macvlan_port_get_rtnl(dev))' check
(returns dev->rx_handler_data), which was added by this commit in order
to prevent double free, is not quite correct:

* for macvlan it always returns NULL because 'lowerdev' is the one that
  was used to register rx handler (port) in macvlan_port_create() as
  well as to unregister it in macvlan_port_destroy().
* for macvtap it always returns a valid pointer because macvtap registers
  its own rx handler before macvlan_common_newlink().

Fixes: d02fd6e7d2 ("macvlan: Fix one possible double free")
Signed-off-by: Alexey Kodanev <alexey.kodanev@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-23 10:48:10 -05:00
Gao Feng
d02fd6e7d2 macvlan: Fix one possible double free
Because the macvlan_uninit would free the macvlan port, so there is one
double free case in macvlan_common_newlink. When the macvlan port is just
created, then register_netdevice or netdev_upper_dev_link failed and they
would invoke macvlan_uninit. Then it would reach the macvlan_port_destroy
which triggers the double free.

Signed-off-by: Gao Feng <gfree.wind@vip.163.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-02 13:30:14 -05:00
Alexander Duyck
56fd2b2ca4 macvlan/macvtap: Add support for L2 forwarding offloads with macvtap
This patch reverts earlier commit b13ba1b83f ("macvlan: forbid L2
fowarding offload for macvtap"). The reason for reverting this is because
the original patch no longer fixes what it previously did as the
underlying structure has changed for macvtap. Specifically macvtap
originally pulled packets directly off of the lowerdev. However in commit
6acf54f1cf ("macvtap: Add support of packet capture on macvtap device.")
that code was changed and instead macvtap would listen directly on the
macvtap device itself instead of the lower device. As such, the L2
forwarding offload should now be able to provide a performance advantage of
skipping the checks on the lower dev while not introducing any sort of
regression.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-19 12:43:59 +01:00
Alexander Duyck
c8c41ea1bd macvlan: Only update pkt_type if destination MAC address matches
This patch updates the pkt_type to PACKET_HOST only if the destination MAC
address matches on the on the source based macvlan. It didn't make sense to
be updating broadcast, multicast, and non-local destined frames with
PACKET_HOST.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-14 18:46:36 -07:00
Alexander Duyck
dd6b9c2c33 macvlan: Only deliver one copy of the frame to the macvlan interface
This patch intoduces a slight adjustment for macvlan to address the fact
that in source mode I was seeing two copies of any packet addressed to the
macvlan interface being delivered where there should have been only one.

The issue appears to be that one copy was delivered based on the source MAC
address and then the second copy was being delivered based on the
destination MAC address. To fix it I am just treating a unicast address
match as though it is not a match since source based macvlan isn't supposed
to be matching based on the destination MAC anyway.

Fixes: 79cf79abce ("macvlan: add source mode")
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-14 18:46:36 -07:00
David Ahern
42ab19ee90 net: Add extack to upper device linking
Add extack arg to netdev_upper_dev_link and netdev_master_upper_dev_link

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-04 21:39:33 -07:00
Zhang Shengju
07850a4f74 macvlan: code refine to check data before using
This patch checks data first at one place, return if it's null.

Signed-off-by: Zhang Shengju <zhangshengju@cmss.chinamobile.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-20 13:46:45 -07:00
Dimitris Michailidis
f21e507701 macvlan: add offload features for encapsulation
Currently macvlan devices do not set their hw_enc_features making
encapsulated Tx packets resort to SW fallbacks. Add encapsulation GSO
offloads to ->features as is done for the other GSOs and set
->hw_enc_features.

Signed-off-by: Dimitris Michailidis <dmichail@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-18 16:06:54 -07:00
David S. Miller
fb652fdfe8 macvlan/macvtap: Remove NETIF_F_UFO advertisement.
It is going away.

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-17 09:52:57 -07:00
David S. Miller
b079115937 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
A set of overlapping changes in macvlan and the rocker
driver, nothing serious.

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-30 12:43:08 -04:00
Matthias Schiffer
a8b8a889e3 net: add netlink_ext_ack argument to rtnl_link_ops.validate
Add support for extended error reporting.

Signed-off-by: Matthias Schiffer <mschiffer@universe-factory.net>
Acked-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-26 23:13:22 -04:00
Matthias Schiffer
ad744b223c net: add netlink_ext_ack argument to rtnl_link_ops.changelink
Add support for extended error reporting.

Signed-off-by: Matthias Schiffer <mschiffer@universe-factory.net>
Acked-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-26 23:13:22 -04:00
Matthias Schiffer
7a3f4a1851 net: add netlink_ext_ack argument to rtnl_link_ops.newlink
Add support for extended error reporting.

Signed-off-by: Matthias Schiffer <mschiffer@universe-factory.net>
Acked-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-26 23:13:21 -04:00
Vlad Yasevich
18c8c54de9 macvlan: Let passthru macvlan correctly restore lower mac address
Passthru macvlans directly change the mac address of the lower
level device.  That's OK, but after the macvlan is deleted,
the lower device is left with changed address and one needs to
reboot to bring back the origina HW addresses.

This scenario is actually quite common with passthru macvtap devices.

This patch attempts to solve this, by storing the mac address
of the lower device in macvlan_port structure and keeping track of
it through the changes.

After this patch, any changes to the lower device mac address
done trough the macvlan device, will be reverted back.  Any
changs done directly to the lower device mac address will be kept.

Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-22 11:17:42 -04:00
Vlad Yasevich
43c2d578a0 macvlan: convert port passthru to flags.
Convert the port passthru boolean into flags with accesor functions.

Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-22 11:17:42 -04:00
Vlad Yasevich
e696cda7bd macvlan: Fix passthru macvlan mac address inheritance
When a lower device of the passthru macvlan changes it's address,
passthru macvlan is supposed to change it's own address as well.
However, that doesn't happen correctly because the check in
macvlan_addr_busy() will catch the fact that the lower level
(port) mac address is the same as the address we are trying to
assign to the macvlan, and return an error.  As a reasult,
the address of the passthru macvlan device is never changed.

The same thing happens when the user attempts to change the
mac address of the passthru macvlan.

The simple solution appers to be to not check against
the lower device in case of passthru macvlan device, since
the 2 addresses are _supposed_ to be the same.

Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-22 11:17:41 -04:00
Vlad Yasevich
e26f43faa0 macvlan: Do not return error when setting the same mac address
The user currently gets an EBUSY error when attempting to set
the mac address on a macvlan device to the same value.

This should really be a no-op as nothing changes.  Catch
the condition and return early.

Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-22 11:17:41 -04:00
David S. Miller
0ddead90b2 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
The conflicts were two cases of overlapping changes in
batman-adv and the qed driver.

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-15 11:59:32 -04:00
Zhang Shengju
a88e2676a6 macvlan: propagate the mac address change status for lowerdev
The macvlan dev should propagate the return value of mac address change for
lower device in the passthru mode, instead of always return 0.

Signed-off-by: Zhang Shengju <zhangshengju@cmss.chinamobile.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-14 14:34:51 -04:00
David S. Miller
cf124db566 net: Fix inconsistent teardown and release of private netdev state.
Network devices can allocate reasources and private memory using
netdev_ops->ndo_init().  However, the release of these resources
can occur in one of two different places.

Either netdev_ops->ndo_uninit() or netdev->destructor().

The decision of which operation frees the resources depends upon
whether it is necessary for all netdev refs to be released before it
is safe to perform the freeing.

netdev_ops->ndo_uninit() presumably can occur right after the
NETDEV_UNREGISTER notifier completes and the unicast and multicast
address lists are flushed.

netdev->destructor(), on the other hand, does not run until the
netdev references all go away.

Further complicating the situation is that netdev->destructor()
almost universally does also a free_netdev().

This creates a problem for the logic in register_netdevice().
Because all callers of register_netdevice() manage the freeing
of the netdev, and invoke free_netdev(dev) if register_netdevice()
fails.

If netdev_ops->ndo_init() succeeds, but something else fails inside
of register_netdevice(), it does call ndo_ops->ndo_uninit().  But
it is not able to invoke netdev->destructor().

This is because netdev->destructor() will do a free_netdev() and
then the caller of register_netdevice() will do the same.

However, this means that the resources that would normally be released
by netdev->destructor() will not be.

Over the years drivers have added local hacks to deal with this, by
invoking their destructor parts by hand when register_netdevice()
fails.

Many drivers do not try to deal with this, and instead we have leaks.

Let's close this hole by formalizing the distinction between what
private things need to be freed up by netdev->destructor() and whether
the driver needs unregister_netdevice() to perform the free_netdev().

netdev->priv_destructor() performs all actions to free up the private
resources that used to be freed by netdev->destructor(), except for
free_netdev().

netdev->needs_free_netdev is a boolean that indicates whether
free_netdev() should be done at the end of unregister_netdevice().

Now, register_netdevice() can sanely release all resources after
ndo_ops->ndo_init() succeeds, by invoking both ndo_ops->ndo_uninit()
and netdev->priv_destructor().

And at the end of unregister_netdevice(), we invoke
netdev->priv_destructor() and optionally call free_netdev().

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-07 15:53:24 -04:00
Vlad Yasevich
70957eaecc macvlan: Fix performance issues with vlan tagged packets
Macvlan always turns on offload features that have sofware
fallback (NETIF_GSO_SOFTWARE).  This allows much higher guest-guest
communications over macvtap.

However, macvtap does not turn on these features for vlan tagged traffic.
As a result, depending on the HW that mactap is configured on, the
performance of guest-guest communication over a vlan is very
inconsistent.  If the HW supports TSO/UFO over vlans, then the
performance will be fine.  If not, the the performance will suffer
greatly since the VM may continue using TSO/UFO, and will force the host
segment the traffic and possibly overlow the macvtap queue.

This patch adds the always on offloads to vlan_features.  This
makes sure that any vlan tagged traffic between 2 guest will not
be segmented needlessly.

Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-15 14:18:11 -04:00
Herbert Xu
f6478218e6 macvlan: Fix device ref leak when purging bc_queue
When a parent macvlan device is destroyed we end up purging its
broadcast queue without dropping the device reference count on
the packet source device.  This causes the source device to linger.

This patch drops that reference count.

Fixes: 260916dfb4 ("macvlan: Fix potential use-after free for...")
Reported-by: Joe Ghalam <Joe.Ghalam@dell.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-25 10:42:10 -04:00
Sainath Grandhi
6fe3faf867 tap: Abstract type of virtual interface from tap implementation
macvlan object is re-structured to hold tap related elements in a separate
entity, tap_dev. Upon NETDEV_REGISTER device_event, tap_dev is registered with
idr and fetched again on tap_open. Few of the tap functions are modified to
accepted tap_dev as argument. tap_dev object includes callbacks to be used by
underlying virtual interface to take care of tx and rx accounting.

Signed-off-by: Sainath Grandhi <sainath.grandhi@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-11 20:59:41 -05:00
Mahesh Bandewar
322dc6e067 macvlan: use netdev_is_rx_handler_busy instead of checking specific type
netdev_is_rx_handler_busy() check is a superset of netif_is_ipvlan_port()
check and hence should be preferred.

Signed-off-by: Mahesh Bandewar <maheshb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-20 12:22:26 -05:00
stephen hemminger
bc1f44709c net: make ndo_get_stats64 a void function
The network device operation for reading statistics is only called
in one place, and it ignores the return value. Having a structure
return value is potentially confusing because some future driver could
incorrectly assume that the return value was used.

Fix all drivers with ndo_get_stats64 to have a void function.

Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-08 17:51:44 -05:00
Gao Feng
a1f5315ce4 driver: macvlan: Remove the rcu member of macvlan_port
When free macvlan_port in macvlan_port_destroy, it is safe to free
directly because netdev_rx_handler_unregister could enforce one
grace period.
So it is unnecessary to use kfree_rcu for macvlan_port.

Signed-off-by: Gao Feng <fgao@ikuai8.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-07 13:22:07 -05:00
David S. Miller
0b42f25d2f Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
udplite conflict is resolved by taking what 'net-next' did
which removed the backlog receive method assignment, since
it is no longer necessary.

Two entries were added to the non-priv ethtool operations
switch statement, one in 'net' and one in 'net-next, so
simple overlapping changes.

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-26 23:42:21 -05:00
Gao Feng
c3891fa254 driver: macvlan: Check if need rollback multicast setting in macvlan_open
When dev_set_promiscuity failed in macvlan_open, it always invokes
dev_set_allmulti without checking if necessary.
Now check the IFF_ALLMULTI flag firstly before rollback the multicast
setting in the error handler.

Signed-off-by: Gao Feng <fgao@ikuai8.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-23 20:18:36 -05:00
Gao Feng
fc51f2b7e5 driver: macvlan: Remove duplicated IFF_UP condition check in macvlan_forward_source
The function macvlan_forward_source_one has already checked the flag
IFF_UP, so needn't check it outside in macvlan_forward_source too.

Signed-off-by: Gao Feng <gfree.wind@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-21 11:58:49 -05:00
David S. Miller
bb598c1b8c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Several cases of bug fixes in 'net' overlapping other changes in
'net-next-.

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-15 10:54:36 -05:00
Gao Feng
d94d02547e driver: macvlan: Replace integer number with bool value
The return value of function macvlan_addr_busy is used as bool value,
so use bool value instead of integer number "1" and "0".

Signed-off-by: Gao Feng <gfree.wind@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-14 16:36:00 -05:00
Gao Feng
aa5fd0fb77 driver: macvlan: Destroy new macvlan port if macvlan_common_newlink failed.
When there is no existing macvlan port in lowdev, one new macvlan port
would be created. But it doesn't be destoried when something failed later.
It casues some memleak.

Now add one flag to indicate if new macvlan port is created.

Signed-off-by: Gao Feng <fgao@ikuai8.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-09 13:14:47 -05:00
Jarod Wilson
91572088e3 net: use core MTU range checking in core net infra
geneve:
- Merge __geneve_change_mtu back into geneve_change_mtu, set max_mtu
- This one isn't quite as straight-forward as others, could use some
  closer inspection and testing

macvlan:
- set min/max_mtu

tun:
- set min/max_mtu, remove tun_net_change_mtu

vxlan:
- Merge __vxlan_change_mtu back into vxlan_change_mtu
- Set max_mtu to IP_MAX_MTU and retain dynamic MTU range checks in
  change_mtu function
- This one is also not as straight-forward and could use closer inspection
  and testing from vxlan folks

bridge:
- set max_mtu of IP_MAX_MTU and retain dynamic MTU range checks in
  change_mtu function

openvswitch:
- set min/max_mtu, remove internal_dev_change_mtu
- note: max_mtu wasn't checked previously, it's been set to 65535, which
  is the largest possible size supported

sch_teql:
- set min/max_mtu (note: max_mtu previously unchecked, used max of 65535)

macsec:
- min_mtu = 0, max_mtu = 65535

macvlan:
- min_mtu = 0, max_mtu = 65535

ntb_netdev:
- min_mtu = 0, max_mtu = 65535

veth:
- min_mtu = 68, max_mtu = 65535

8021q:
- min_mtu = 0, max_mtu = 65535

CC: netdev@vger.kernel.org
CC: Nicolas Dichtel <nicolas.dichtel@6wind.com>
CC: Hannes Frederic Sowa <hannes@stressinduktion.org>
CC: Tom Herbert <tom@herbertland.com>
CC: Daniel Borkmann <daniel@iogearbox.net>
CC: Alexander Duyck <alexander.h.duyck@intel.com>
CC: Paolo Abeni <pabeni@redhat.com>
CC: Jiri Benc <jbenc@redhat.com>
CC: WANG Cong <xiyou.wangcong@gmail.com>
CC: Roopa Prabhu <roopa@cumulusnetworks.com>
CC: Pravin B Shelar <pshelar@ovn.org>
CC: Sabrina Dubroca <sd@queasysnail.net>
CC: Patrick McHardy <kaber@trash.net>
CC: Stephen Hemminger <stephen@networkplumber.org>
CC: Pravin Shelar <pshelar@nicira.com>
CC: Maxim Krasnyansky <maxk@qti.qualcomm.com>
Signed-off-by: Jarod Wilson <jarod@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-20 14:51:09 -04:00
Sabrina Dubroca
952fcfd08c net: remove type_check from dev_get_nest_level()
The idea for type_check in dev_get_nest_level() was to count the number
of nested devices of the same type (currently, only macvlan or vlan
devices).
This prevented the false positive lockdep warning on configurations such
as:

eth0 <--- macvlan0 <--- vlan0 <--- macvlan1

However, this doesn't prevent a warning on a configuration such as:

eth0 <--- macvlan0 <--- vlan0
eth1 <--- vlan1 <--- macvlan1

In this case, all the locks end up with a nesting subclass of 1, so
lockdep thinks that there is still a deadlock:

- in the first case we have (macvlan_netdev_addr_lock_key, 1) and then
  take (vlan_netdev_xmit_lock_key, 1)
- in the second case, we have (vlan_netdev_xmit_lock_key, 1) and then
  take (macvlan_netdev_addr_lock_key, 1)

By removing the linktype check in dev_get_nest_level() and always
incrementing the nesting depth, lockdep considers this configuration
valid.

Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-13 15:15:54 -07:00