-
Notifications
You must be signed in to change notification settings - Fork 39
OVS
- Overview
- Installation
- Starting the daemon
- Enabling TC-flower
- OVS bridges
- Offloading flows
- What can be offloaded?
Open vSwitch is a fully-featured software switch implementation. The majority of the switching logic resides in userspace (ovs-vswitchd) and is ported to various environments. There is a small set of minimal requirements for porting OVS to a given platform, mainly the ability to receive ingress packets to userspace, transmit packets via userspace and query the interface status. However, this minimal requirements would require the switch to process all the dataplane in userspace which would have horrendous effect on the overall performance.
To mitigate this, OVS porting usually consists of a kernel-portion which allows early processing and switching in the kernel. This prevents most of the kernel <-> userspace packet traversal. The kernel portion (dpif) maintains a flow table which consists of exact-match flows and associated actions. On a mismatch, the dpif would pass the packet to userspace where the daemon would further process it; This may result in a new flow being inserted to the kernel dpif flow table.
Notice the OVS Linux kernel infrastructure is entirely flow-based - it does not utilize any of the regular L2/L3 constructs within the network stack. As a result by default using OVS on top of our switch would result in very poor performance - dpif might diminish ingress packets need to travel all the way to userspace, but they still would need to be trapped, processed by the OS then (likely) egressed by sending it back to the device.
Recently, OVS gained the ability to offload flows using tc-flower, that is, whenever a given match-action rule can be handled by tc-flower it would do that instead of using the dpif dataflow. If the relevant port is capable of offloading said TC-flower rule then it would. This allows us to leverage OVS on top of the switch - tc-offloaded flows are handled entirely by the device and ingressed packets matching such would not be trapped to the CPU.
On modern distributions, OVS comes as a package and can be installed using that distribution's package manager. E.g., for Fedora 26:
$ dnf install openvswitch
Last metadata expiration check: 1:58:59 ago on Thu 21 Dec 2017 11:32:39 AM IST.
Dependencies resolved.
===============================================================================================================================================================================================================
Package Arch Version Repository Size
===============================================================================================================================================================================================================
Installing:
openvswitch x86_64 2.7.3-2.fc26 updates 4.6 M
Transaction Summary
===============================================================================================================================================================================================================
Install 1 Package
However, do notice that packaged installation does not necessarily contain the necessary support for tc-flower offloading; That was only added in 2.8.0. It is possible to install latest OVS from sources. Please follow these guidelines for doing so.
In case package was installed, the daemon can be controlled by service:
$ systemctl start openvswitch
If installed from sources, some more wiggling would be required to start it. Some OVSDB should be configured for usage by OVS. Assuming this DB is local (which might be wrong for actual deployments, but is easy to experiment with) the following should be done once after installation:
$ ovsdb-tool create
This would create the local database. Later, to start the various daemons required, run:
$ ovsdb-server --remote=punix:/usr/local/var/run/openvswitch/db.sock \
--remote=db:Open_vSwitch,Open_vSwitch,manager_options \
--private-key=db:Open_vSwitch,SSL,private_key \
--certificate=db:Open_vSwitch,SSL,certificate \
--bootstrap-ca-cert=db:Open_vSwitch,SSL,ca_cert \
--pidfile --detach
$ ovs-vsctl --no-wait init
$ ovs-vswitchd --pidfile --detach
By default, OVS is not going to use TC flower for its dataflows. In order to enable that, run:
$ ovs-vsctl set Open_vSwitch . other_config:hw-offload=true
Notice this enables OVS to use TC-flower, but it does not mean that those have to be offloaded; When enabled OVS would allow non-HW-offloaded TC rules to exist. If user wants to allow only the HW-offloaded TC rules to use the TC infrastructure and the rest to use regular dpif dataflow, then the default policy should be changed by running:
$ ovs-vsctl set Open_vSwitch . other_config:tc-policy=skip_sw
Do notice the configuration is persistent; Once configured, the configuration would be retained after stopping & restarting the daemons.
The various ports we are interested in should be added to an OVS bridge. In this example, consider the following topology where the two peers are trying to pass basic ping traffic:
/-------------------\ /---------------\
| | | |
| Peer I | | Switch |
| | | |
| Interface A | | |
| 192.168.10.1/24 | -- | enp3s0np49 |
| e4:1d:2d:ca:c9:7a | | |
| | | |
\-------------------/ | |
| |
/-------------------\ | |
| | | |
| Peer II | | |
| | | |
| Interface B | | enp3s0np51 |
| 192.168.10.1/24 | -- | |
| e4:1d:2d:ca:c9:66 | | |
| | | |
\-------------------/ \---------------/
The two interfaces enp3s0np49
and enp3s0np51
are added to a new
OVS bridge called br-int
, then set to UP state by running:
$ ovs-vsctl add-br br-int
$ ovs-vsctl add-port br-int enp3s0np49
$ ovs-vsctl add-port br-int enp3s0np51
$ ip link set dev enp3s0np49 up
$ ip link set dev enp3s0np51 up
You have various ways of seeing the configured bridge topology, e.g.:
$ ovs-vsctl show
Bridge br-int
Port "enp3s0np49"
tag: 1
Interface "enp3s0np49"
Port br-int
Interface br-int
type: internal
Port "enp3s0np51"
tag: 1
Interface "enp3s0np51"
$ ovs-dpctl show
system@ovs-system:
lookups: hit:120 missed:122 lost:0
flows: 0
masks: hit:318 total:0 hit/pkt:1.31
port 0: ovs-system (internal)
port 1: br-int (internal)
port 2: enp3s0np49
port 4: enp3s0np51
While ping is running, user can see the dp flows by running:
$ ovs-dpctl dump-flows
in_port(4),eth(src=e4:1d:2d:ca:c9:66,dst=e4:1d:2d:ca:c9:7a),eth_type(0x0806), packets:2, bytes:110, used:8.240s, actions:2
in_port(4),eth(src=e4:1d:2d:ca:c9:66,dst=e4:1d:2d:ca:c9:7a),eth_type(0x0800), packets:325, bytes:33150, used:0.051s, actions:2
in_port(2),eth(src=e4:1d:2d:ca:c9:7a,dst=e4:1d:2d:ca:c9:66),eth_type(0x0806), packets:4, bytes:220, used:8.240s, actions:4
in_port(2),eth(src=e4:1d:2d:ca:c9:7a,dst=e4:1d:2d:ca:c9:66),eth_type(0x0800), packets:325, bytes:33150, used:0.050s, actions:4
We can see ARPs and ICMP packets traveling from both peers.
After enabling TC-flower offload, we could see the filter offloaded as an ingress redirection of packets between the two ports:
$ tc filter show dev enp3s0np51 ingress
filter protocol 802.1Q pref 1 flower chain 0
filter protocol arp pref 2 flower chain 0
filter protocol arp pref 2 flower chain 0 handle 0x1
dst_mac e4:1d:2d:ca:c9:7a
src_mac e4:1d:2d:ca:c9:66
eth_type arp
in_hw
action order 1: mirred (Egress Redirect to device enp3s0np49) stolen
index 4 ref 1 bind 1
filter protocol ip pref 3 flower chain 0
filter protocol ip pref 3 flower chain 0 handle 0x1
dst_mac e4:1d:2d:ca:c9:7a
src_mac e4:1d:2d:ca:c9:66
eth_type ipv4
in_hw
action order 1: mirred (Egress Redirect to device enp3s0np49) stolen
index 2 ref 1 bind 1
$ tc filter show dev enp3s0np49 ingress
filter protocol arp pref 2 flower chain 0
filter protocol arp pref 2 flower chain 0 handle 0x1
dst_mac e4:1d:2d:ca:c9:66
src_mac e4:1d:2d:ca:c9:7a
eth_type arp
in_hw
action order 1: mirred (Egress Redirect to device enp3s0np51) stolen
index 3 ref 1 bind 1
filter protocol ip pref 3 flower chain 0
filter protocol ip pref 3 flower chain 0 handle 0x1
dst_mac e4:1d:2d:ca:c9:66
src_mac e4:1d:2d:ca:c9:7a
eth_type ipv4
in_hw
action order 1: mirred (Egress Redirect to device enp3s0np51) stolen
index 1 ref 1 bind 1
Offloading explicit rules can be achieved by using ovs-ofctl. Continuing the previous example, assume the following is used:
$ ovs-ofctl add-flow br-int "ip,nw_dst=192.168.10.2 actions=drop"
The immediate effect would be that traffic would be stopped. Checking the current flows and offloaded tc actions we could see the drop action has been offloaded:
$ ovs-dpctl dump-flows
in_port(2),eth_type(0x0800),ipv4(dst=192.168.10.2), packets:38, bytes:3876, used:0.530s, actions:drop
$ tc filter show dev enp3s0np49 ingress
filter protocol arp pref 2 flower chain 0
filter protocol ip pref 3 flower chain 0
filter protocol ip pref 4 flower chain 0
filter protocol ip pref 4 flower chain 0 handle 0x1
eth_type ipv4
dst_ip 192.168.10.2
in_hw
action order 1: gact action drop
random type none pass val 0
index 1 ref 1 bind 1
Unrelated traffic could still pass, e.g., if Peer II starts sending ARPs we will see that the explicit IP is still going to be dropped while ARPs are passing, and the offloaded filters reflect that:
$ ovs-dpctl dump-flows
in_port(4),eth(src=e4:1d:2d:ca:c9:66,dst=e4:1d:2d:ca:c9:7a),eth_type(0x0806), packets:296, bytes:16280, used:0.271s, actions:2
in_port(2),eth(src=e4:1d:2d:ca:c9:7a,dst=e4:1d:2d:ca:c9:66),eth_type(0x0806), packets:576, bytes:31680, used:0.270s, actions:4
in_port(2),eth_type(0x0800),ipv4(dst=192.168.10.2), packets:596, bytes:60792, used:0.270s, actions:drop
$ tc filter show dev enp3s0np49 ingress
filter protocol arp pref 2 flower chain 0
filter protocol arp pref 2 flower chain 0 handle 0x1
dst_mac e4:1d:2d:ca:c9:66
src_mac e4:1d:2d:ca:c9:7a
eth_type arp
in_hw
action order 1: mirred (Egress Redirect to device enp3s0np51) stolen
index 1 ref 1 bind 1
filter protocol ip pref 3 flower chain 0
filter protocol ip pref 4 flower chain 0
filter protocol ip pref 4 flower chain 0 handle 0x1
eth_type ipv4
dst_ip 192.168.10.2
in_hw
action order 1: gact action drop
random type none pass val 0
index 1 ref 1 bind 1
For both the supported keys and action, we can only offload the intersection of what is supported by the driver and OVS. At the moment, the driver's supported keys can be part of the offloaded match-rule, while from the driver's supported actions only redirection and dropping of packets.
General information
System Maintenance
Network Interface Configuration
- Switch Port Configuration
- Netdevice Statistics
- Persistent Configuration
- Quality of Service
- Queues Management
- How To Configure Lossless RoCE
- Port Mirroring
- ACLs
- OVS
- Resource Management
- Precision Time Protocol (PTP)
Layer 2
Network Virtualization
Layer 3
- Static Routing
- Virtual Routing and Forwarding (VRF)
- Tunneling
- Multicast Routing
- Virtual Router Redundancy Protocol (VRRP)
Debugging