-
Notifications
You must be signed in to change notification settings - Fork 39
L3 Tunneling
- Introduction
- Overlay Configuration
- Tunnel Configuration
- Decap-only Tunnels
- Configuration Changes
- Features and Limitations
- Further Resources
Since L3 tunneling is fundamentally a routing technology, the switch where tunnels should to be configured needs to have routing enabled. See Static Routing for more details.
In abstract, the reason to create an IP-in-IP tunnel is to connect two IP networks separated by another IP network. In the example here, the two domains to be connected are represented by two hosts with arbitrarily-chosen addresses 192.168.1.33 resp. 192.168.2.33. The two hosts are each connected to a tunnel endpoint, addressed 1.2.3.4/31, which wraps up the host traffic and delivers it through a tunnel to the other endpoint. The encapsulated traffic travels over a transport network, here addressed 192.168.99.0/24.
In tunneling parlance, the traffic flowing between the two separated IP domains is called overlay traffic, and correspondingly the network where it flows overlay network. The encapsulated traffic on the other hand is called underlay traffic, and the network where it flows underlay network.
+--------------+ +--------------+
| | | |
| host1 | | host2 |
| | | |
| 192.168.1.33 | | 192.168.2.33 |
| + | | + |
| | | | | |
+--------------+ +--------------+
| |
+--------------+ +--------------+
| | | | | |
| + | | + | Overlay
| 192.168.1.1 | | 192.168.2.1 | - - - - - -
| | | | Underlay
| switch1 | | switch2 |
| | | |
| 1.2.3.4 | | 1.2.3.5 |
| + | | + |
| | | | | |
| 192.168.99.1 | | 192.168.99.2 |
| + | | + |
| | | | | | | |
+--------------+ +--------------+
| |______________________| |
'--------------------------'
The switch, as a tunneling gateway, naturally handles both overlay and underlay traffic. Both can be in the same VRF (possibly the default one), or each can be in a different VRF. See below for details of each of these configurations.
Currently, mlxsw offloads GRE tunnels, but not all possible configurations are supported. Refer to Features and Limitations for the list of constraints that the tunnel needs to satisfy to be offloaded.
Besides setting up a tunnel device, one needs to also have a local route matching tunnel local address, which is offloaded to decapsulate packets; and possibly one or more routes that direct traffic to the tunnel, which are offloaded to encapsulate packets.
Kernel Version | |
---|---|
4.15 | Offload GRE tunnels. |
5.1 | Spectrum-2 support. |
5.16 | Offload GRE6 for Spectrum-2 and above. |
6.2 | Offload GRE6 for Spectrum-1. |
First, set up connection to local overlay network and route for tunneling of traffic destined for the remote overlay network (in this case, 192.168.2.0/24):
host1 $ ip link set dev eth0 up
host1 $ ip address add dev eth0 192.168.1.33/24
host1 $ ip route add 192.168.2.0/24 via 192.168.1.1
host2 $ ip link set dev eth0 up
host2 $ ip address add dev eth0 192.168.2.33/24
host2 $ ip route add 192.168.1.0/24 via 192.168.2.1
On the switch, set up the overlay interface accordingly:
sw1 $ ip link set dev sw1p49 up
sw1 $ ip address add dev sw1p49 192.168.1.1/24
sw2 $ ip link set dev sw1p49 up
sw2 $ ip address add dev sw1p49 192.168.2.1/24
You need a GRE module in order to set up GRE tunnels:
sw $ modprobe ip-gre
There are two main ways that GRE tunnel endpoint can be set up. If the tunnel is not bound to another device, the underlay is always in main VRF. If it is bound to a device, the underlay is where the device is.
The following sections elaborate how to setup first a simple case, where both overlay and underlay are in the main VRF, and then a general case, where they are possibly separate.
In this configuration, overlay and underlay traffic are both in the main VRF:
+------------------( switch )-------------------+
| |
| overlay GRE transport |
---|-+ 192.168.1.1 1.2.3.4 +-- 192.168.99.1 +=|===
| |
+-----------------------------------------------+
First, set up the tunnel itself:
sw1 $ ip tunnel add name g mode gre local 1.2.3.4 remote 1.2.3.5 tos inherit
sw1 $ ip link set dev g up
sw1 $ ip address add dev g 1.2.3.4/32
sw2 $ ip tunnel add name g mode gre local 1.2.3.5 remote 1.2.3.4 tos inherit
sw2 $ ip link set dev g up
sw2 $ ip address add dev g 1.2.3.5/32
Or, if you want to use GRE keys:
sw1 $ ip tunnel add name g mode gre local 1.2.3.4 remote 1.2.3.5 tos inherit \
key 123
Or:
sw1 $ ip tunnel add name g mode gre local 1.2.3.4 remote 1.2.3.5 tos inherit \
ikey 456 okey 789
Note that the tunnel remote address must be reachable from this node. For example:
sw1 $ ip link set dev sw1p51 up
sw1 $ ip address add dev sw1p51 192.168.99.1/24
sw1 $ ip route add 1.2.3.5/32 via 192.168.99.2
sw2 $ ip link set dev sw1p51 up
sw2 $ ip address add dev sw1p51 192.168.99.2/24
sw2 $ ip route add 1.2.3.4/32 via 192.168.99.1
At this point, it is possible to direct traffic at the tunnel:
sw1 $ ip route add 192.168.2.0/24 dev g
sw1 $ ip route add 2001:db8:2::/56 dev g
sw2 $ ip route add 192.168.1.0/24 dev g
sw2 $ ip route add 2001:db8:1::/56 dev g
To verify that the individual routes have been offloaded:
sw $ ip route show table local dev g
local 1.2.3.4 dev g proto kernel scope host src 1.2.3.4 offload
sw $ ip route show dev g
192.168.2.0/24 scope link offload
sw $ ip -6 route show dev g
2001:db8:2::/56 metric 1024 offload pref medium
A tunnel that is bound to another device has overlay in the VRF where the tunnel is, and underlay where the device that it is bound to is. Typically the underlay would be a different VRF than the one with the GRE netdevice itself, but it does not have to be.
Note: Bind devices are offloaded correctly only when their master is a VRF device. In that case, the bind device is only used to select the VRF to use for underlay traffic. When in main VRF, the bind device serves to actually select interface to egress encapsulated traffic through. That use is not recognized by mlxsw, a bind device is assumed to always just select an underlay VRF, even in cases when the bind device is in the main VRF. That is the reason we use a dummy device in this tutorial; it is the only device that makes sense as an anchor to select VRF.
This is what the set-up looks like:
+------------------( switch )-------------------+
| | <-- VRF ol
| overlay GRE |
---|-+ 192.168.1.1 ^ |
| | |
| - - - - - - - - - - -|- - - - - - - - - - - - |
| v | <-- VRF ul
| dummy transport |
| 1.2.3.4 +-- 192.168.99.1 +=|===
| |
+-----------------------------------------------+
First, create the VRFs themselves. For more details on that, see Virtual Routing and Forwarding (VRF):
sw $ ip link add name ol type vrf table 10
sw $ ip link set dev ol up
sw $ ip link add name ul type vrf table 20
sw $ ip link set dev ul up
Next create a dummy device to use to select the underlay VRF:
sw1 $ ip link add name d type dummy
sw1 $ ip link set dev d master ul
sw1 $ ip link set dev d up
sw1 $ ip address add dev d 1.2.3.4/32
sw2 $ ip link add name d type dummy
sw2 $ ip link set dev d master ul
sw2 $ ip link set dev d up
sw2 $ ip address add dev d 1.2.3.5/32
Now create a tunnel, binding it to the dummy:
sw1 $ ip tunnel add name g mode gre local 1.2.3.4 remote 1.2.3.5 dev d tos inherit
sw1 $ ip link set dev g master ol
sw1 $ ip link set dev g up
sw2 $ ip tunnel add name g mode gre local 1.2.3.5 remote 1.2.3.4 dev d tos inherit
sw2 $ ip link set dev g master ol
sw2 $ ip link set dev g up
You can of course set input and/or output GRE key like shown in the section on main VRF.
At this point, it is possible to direct traffic at the tunnel:
sw1 $ ip route add vrf ol 192.168.2.0/24 dev g
sw1 $ ip route add vrf ol 2001:db8:2::/56 dev g
sw2 $ ip route add vrf ol 192.168.1.0/24 dev g
sw2 $ ip route add vrf ol 2001:db8:1::/56 dev g
Also remember to put the ports which connect to the overlay and underlay networks to their right VRF. For example:
sw $ ip link set dev sw1p49 master ol
sw $ ip link set dev sw1p51 master ul
Tunnel decap is offloaded as soon as there is a local route matching the local address of a tunnel. However in slow path, if the decapsulated packets are to be forwarded to hosts, one of the following conditions needs to hold:
- There actually needs to be a corresponding route that would direct traffic from those hosts to the tunnel device (i.e. an encapsulating route)
- Reverse path filtering needs to be disabled:
sysctl -w net.ipv4.conf.all.rp_filter=0
- The decapsulated traffic needs to be IPv6
mlxsw ignores the rp_filter setting and offloads as if it were disabled. This might create a discrepancy between how slow path and fast path packets are processed.
Another possibility to create a decap-only tunnel is to actually introduce the encapsulating routes, but set the bind device down. In that scenario, Linux (and mlxsw) does not forward encapsulated traffic, but the existence of the route makes the reverse path filtering work.
Only tunnels satisfying the following conditions are offloaded:
- Only GRE tunnels
- Both local and remote addresses shall be given (NBMA tunnels and LWT are currently not supported)
- TTL and TOS shall both be
inherit
(note that in Linux the default TTL value for IPv6 tunnels is 64, unlike IPv4 tunnels where it isinherit
by default. TOSinherit
is not a default setting in Linux for either tunnel type) - No two tunnels that share underlay VRF shall share a local address (i.e. dispatch based on tunnel key is not supported)
- Sequence numbers and checksumming shall not be used
The tunnel may have i-key and/or o-key set, and if it has both, the two may differ.
- GRE tunnels with IPv6 underlay can be offloaded to Spectrum-1. Each
router interface (RIF) representing an
ip6gre
tunnel consumes two RIF entries.
- GRE tunnels with IPv6 underlay can be offloaded for Spectrum-2 and above.
The type should be
ip6gre
and both TTL and TOS should be set toinherit
. For example, to add a GRE tunnel with IPv6 underlay, run:
sw $ ip link add name g1 type ip6gre local 2001:db8:3::1 remote 2001:db8:3::2 tos inherit ttl inherit
- Underlay of an unbound GRE device is now correctly the main VRF. That means that it is not possible anymore to cause local address collision by moving the GRE netdevice to another VRF.
- If a GRE netdevice is moved to another VRF such that it causes local address collision, both tunnels are unoffloaded. The opposite logic which would notice that a netdevice became eligible for offloading due to configuration changes is currently not implemented. What falls to slow path, stays there.
- Underlay of an unbound GRE device the same VRF that the GRE is in. This is unlike Linux, where it would be the main VRF. This issue is fixed in Linux 5.5.
- Forming encapsulating routes to two tunnels that have the same local address and underlay VRF, leads to invocation of abort mechanism (see Static Routing)
- Nothings is offloaded until an encapsulating route is added (i.e. the decap-only flow is not supported)
- Changes to configuration done after the tunnel is offloaded are not reflected. This can be circumvented by removing and re-adding of all encapsulating routes at once (not one at a time).
- State of bound device (up/down) is not reflected
- Underlay of an unbound GRE device the same VRF that the GRE is in. This is unlike Linux, where it would be the main VRF. This issue is fixed in Linux 5.5.
- man ip-tunnel
- https://www.deepspace6.net/docs/iproute2tunnel-en.html
General information
System Maintenance
Network Interface Configuration
- Switch Port Configuration
- Netdevice Statistics
- Persistent Configuration
- Quality of Service
- Queues Management
- How To Configure Lossless RoCE
- Port Mirroring
- ACLs
- OVS
- Resource Management
- Precision Time Protocol (PTP)
Layer 2
Network Virtualization
Layer 3
- Static Routing
- Virtual Routing and Forwarding (VRF)
- Tunneling
- Multicast Routing
- Virtual Router Redundancy Protocol (VRRP)
Debugging