You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Taking the following interface script as an example:
allow-usbeth0 usbeth0
iface usbeth0 inet manual
ipv6-addrgen off
mtu 1574
gro-offload off
allow-usbeth0 eth0.14
iface eth0.14 inet static
ipv6-addrgen on
mtu 1574
address 10.0.216.5/24
vlan-raw-device usbeth0
vlan-id 14
post-up /sbin/ip route add 0.0.0.0/0 via 10.0.216.1 dev $IFACE
pre-down /sbin/ip route del 0.0.0.0/0 via 10.0.216.1 dev $IFACE
where our systems are configured to disable IPv6 link local addressing by default (in systctl.conf):
# Enable IPv6 networking on all interfaces by default but disable LLA
# except on the loopback interface.
net.ipv6.conf.all.addr_gen_mode=1
net.ipv6.conf.default.addr_gen_mode=1
net.ipv6.conf.lo.addr_gen_mode=0
net.ipv6.conf.all.disable_ipv6=0
net.ipv6.conf.default.disable_ipv6=0
net.ipv6.conf.lo.disable_ipv6=0
we are occasionally seeing that the VLAN interface does not have IPv6 LLA enabled even though ipv6-addrgen on is set.
Note that to reproduce, the following commands are run repeatedly until the condition arises (since the condition does not always occur):
ifdown --allow usbeth0 ; ifup -v --allow usbeth0
After looking into this, the problem is caused by the Netlink RTM_NEWLINK event for the VLAN interface arriving after the cache has been interrogated by get_link_ipv6_addrgen_mode to determine the IPv6 mode, which defaults to 0 (since the cache is missing the information and a KeyError occurs):
def get_link_ipv6_addrgen_mode(self, ifname):
try:
with self._cache_lock:
return self._link_cache[ifname].attributes[Link.IFLA_AF_SPEC].value[socket.AF_INET6][Link.IFLA_INET6_ADDR_GEN_MODE]
except (KeyError, AttributeError):
# default to 0 (eui64)
return 0
except TypeError as e:
return self.__handle_type_error(inspect.currentframe().f_code.co_name, ifname, str(e), return_value=0)
This default is incorrect in our setup. This is just one instance wher the contents of the cache may not reflect reality.
The underlying cause is the implementation choice to not wait for the RTM_NEWLINK event message when creating an interface and instead only waiting for the acknowledge, thereby creating this ambiguity (and race) in the cached link state information; from the comment in tx_nlpacket_get_response_with_error_and_cache_on_ack (in nlcache.py):
# When creating a new link via netlink, we don't always wait for the kernel
# NEWLINK notification to be cached to continue. If our request is ACKed by
# the OS we assume that the link was successfully created. Since we aren't
# waiting for the kernel notification to continue we need to manually fill
# our cache with the packet we just TX'ed. Once the NEWLINK notification
# is received it will simply override the previous entry.
# We need to keep track of those manually cached packets. We set a private
# flag on the objects via the attribute priv_flags
Note that I have tried replacing all uses of the tx_nlpacket_get_response_with_error_and_cache_on_ack method with the (currently unused) tx_nlpacket_get_response_with_error_and_wait_for_cache method and this looks like to have solved the cache coherency issue for us but I am not sure if this is a robust solution (with or without extending the 1 second timeout).
Is there better way to robustly solve this issue ?
The text was updated successfully, but these errors were encountered:
ifupdown2 version: 3.3.0
Taking the following interface script as an example:
where our systems are configured to disable IPv6 link local addressing by default (in
systctl.conf
):we are occasionally seeing that the VLAN interface does not have IPv6 LLA enabled even though
ipv6-addrgen on
is set.Note that to reproduce, the following commands are run repeatedly until the condition arises (since the condition does not always occur):
ifdown --allow usbeth0 ; ifup -v --allow usbeth0
After looking into this, the problem is caused by the Netlink
RTM_NEWLINK
event for the VLAN interface arriving after the cache has been interrogated byget_link_ipv6_addrgen_mode
to determine the IPv6 mode, which defaults to0
(since the cache is missing the information and aKeyError
occurs):This default is incorrect in our setup. This is just one instance wher the contents of the cache may not reflect reality.
The underlying cause is the implementation choice to not wait for the
RTM_NEWLINK
event message when creating an interface and instead only waiting for the acknowledge, thereby creating this ambiguity (and race) in the cached link state information; from the comment intx_nlpacket_get_response_with_error_and_cache_on_ack
(innlcache.py
):Note that I have tried replacing all uses of the
tx_nlpacket_get_response_with_error_and_cache_on_ack
method with the (currently unused)tx_nlpacket_get_response_with_error_and_wait_for_cache
method and this looks like to have solved the cache coherency issue for us but I am not sure if this is a robust solution (with or without extending the 1 second timeout).Is there better way to robustly solve this issue ?
The text was updated successfully, but these errors were encountered: