-
Notifications
You must be signed in to change notification settings - Fork 39
Queues Management
Traffic control in Linux is managed by the TC subsystem. Documentation can be found here, here and in the TC man page.
A qdisc command is of the following structure:
$ tc <flags> qdisc <command> dev <dev name> [root |parent <parent ID>] [handle <handle id>] <qdisc type> [<qdiscs params>]
Handle ID is built from two 16 bits numbers of the form MAJOR:MINOR
. Any qdisc
created by a user will have its minor set to 0 and can be referred to
using MAJOR:
. In case an handle ID was not specified, a new handle ID
will be allocated.
Parent ID is the identifier of the qdisc location. A qdisc can be set as a root
qdisc or as a child of another qdisc. In order to set a qdisc as the X
child
of a qdisc with the handle Y
, its parent ID should be set to
Y:X
.
The operation of linking two qdiscs is called grafting.
To see the configured qdisc on a port run the show command. Adding the flag
-s
will display the statistics as well:
$ tc -s qdisc show dev sw1p1
The offloaded
flag denotes whether the qdisc is offloaded or not.
The basic statistics are:
-
Sent
- Packets and bytes count. -
dropped
– The number of packets that were dropped by this queue. -
overlimits
– Qdisc specific. -
requeues
- Qdisc specific. -
backlog
– The queue's size in bytes and packets. Only the bytes count reflects the hardware status. Some qdiscs might include extra statistics. If the qdisc is offloaded, the statistics will reflect both the hardware and software statistics.
For full documentation see the TC man page.
The PRIO scheduler sends traffic to its child qdiscs, called bands,
according to a mapping from packet priority to band number. The priority
of routed packets is derived from the packet's DSCP value according to
the mapping specified in the PRIO man page.
PRIO enforces strict priority between its bands. Packets will be sent
from band X
only if there are no packets enqueued in bands 0..(X-1)
.
For a full description please refer to the Further Resources
Section.
PRIO parameters:
-
bands
(optional) - Number of children queues. Offloading is supported for up to 8 bands and setting it to any higher number will result in not offloading the qdisc. -
priomap
(optional) - Mapping of packet's priority to a band.
Offload is only supported for priorities 0-7. Higher priorities will be ignored.
Note: PRIO is only supported on top of physical ports and only as a root
qdisc.
Note: Configuring both DCB and PRIO on the same port is not supported.
Example:
$ tc qdisc replace dev sw1p1 root handle 3: prio bands 8 priomap 0 1 2 3 4 5 6 7
The show command will show the current configuration, including the full priomap.
$ tc -s qdisc show dev sw1p1
qdisc prio 3: root refcnt 2 offloaded bands 8 priomap 0 1 2 3 4 5 6 7 1 1 1 1 1 1 1 1
Sent 30510403042 bytes 20289261 pkt (dropped 5199870, overlimits 0 requeues 0)
backlog 222720b 0p requeues 0
Children of PRIO are not displayed unless they are configured with a
qdisc. Adding the word invisible
to the show command will show these
children, but not their offloaded counter queue.
The statistics represent the sum of the statistics of all the bands.
When using PRIO, lower priority bands can be starved as long as there
are packeted enqueued in higher priority bands. In this situation,
packets might be dropped due to timeouts. These drops are not counted as
PRIO drops.
In order to see the backlog of each of PRIO band one can use the following ethtool command:
$ ethtool -S sw1p1 | grep tc_transmit_queue_tc
tc_transmit_queue_tc_0: 0
tc_transmit_queue_tc_1: 0
tc_transmit_queue_tc_2: 0
tc_transmit_queue_tc_3: 0
tc_transmit_queue_tc_4: 0
tc_transmit_queue_tc_5: 0
tc_transmit_queue_tc_6: 0
tc_transmit_queue_tc_7: 0
Note: These traffic classes correspond to PRIO bands according to
the following formula: TC = 7 - PRIO band
.
RED is a queueing discipline designated for congestion control.
It drops packets based on statistical probabilities. The probability to
drop a packet is zero until the queue's average size reaches the minimum
limit. From there, the probability will rise linearly until it reaches
the maximum probability for dropping, when the queue's average size
reaches the maximum limit. When the queue's average size is above the
maximum, the probability to drop a packet is 1 (See figure below).
These drops are called early drops.
The RED qdisc has the option to mark packets with the ECN flag
instead of dropping them.
This mode only affects packets which indicate that their hosts honor ECN.
When set, the queue's size might reach its maximum size. In this case,
packets that cannot be enqueued will be tail dropped.
RED parameters:
-
limit
– Hard limit for the queue's size. Not offloaded. -
avpkt
– Average queue size calculation parameter. Not offloaded.1000
is recommended. -
min
(optional) – The minimum limit. -
max
(optional) – The maximum limit. -
probability
(optional) – The probability to drop a packet when the average queue size is at maximum limit. -
burst
(optional) – Allowed burst size. Not offloaded. -
bandwidth
(optional) – Used for average queue size calculation (and not to enforce queue's bandwidth). Not offloaded. -
ecn
(optional) – If set, indicates ECN mode is on. -
hard-drop
– If set, when the queue's average size is above the maximum limit, packets will be dropped even if they are ECN enabled and ECN mode is on. Not offloaded.
When offloading RED it is recommended to specify min
, max
and
probability
and not rely on the default values.
Note: RED is only supported on top of physical ports and only as a
root qdisc.
Note: Configuring both DCB and RED on the same port is not supported.
Example:
$ tc qdisc add dev sw1p1 root handle 4: red limit 1000000 avpkt 10000 probability 0.1 min 500000 max 1500000
The show command will show the current configuration including RED's statistics.
$ tc -s qdisc show dev sw1p1
qdisc red 4: root refcnt 2 limit 1000000b min 500000b max 15782272b offload
Sent 9962 bytes 29 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
marked 0 early 0 pdrop 0 other 0
Notes about RED statistics:
-
dropped
– A packet can be dropped either by early drop or tail drop. -
overlimits
– The number of packets that were early dropped or ECN marked. -
marked
- The number of packets that were ECN marked. -
early
- The number of packets that were early dropped. -
pdrop
– The number of packets that were tail dropped. -
other
– The number of packets that were dropped for other reasons. Not in use.
- man tc
- man tc-red
- man tc-prio
- Linux Advanced Routing & Traffic Control HOWTO
- Traffic Control HOWTO
General information
System Maintenance
Network Interface Configuration
- Switch Port Configuration
- Netdevice Statistics
- Persistent Configuration
- Quality of Service
- Queues Management
- How To Configure Lossless RoCE
- Port Mirroring
- ACLs
- OVS
- Resource Management
- Precision Time Protocol (PTP)
Layer 2
Network Virtualization
Layer 3
- Static Routing
- Virtual Routing and Forwarding (VRF)
- Tunneling
- Multicast Routing
- Virtual Router Redundancy Protocol (VRRP)
Debugging