-
Notifications
You must be signed in to change notification settings - Fork 39
Precision Time Protocol
- Introduction
- Configuring PTP
- PTP Management
- Synchronizing Real Time Clock
- Low-level Details
- Issues and Interactions
- Further Reading
Precision time protocol (commonly abbreviated as PTP) is a protocol designed to synchronize real-time clocks in the nodes of a distributed system that communicate using a network. On a local network, it can commonly achieve sub-microsecond accuracy. It is standardized as IEEE 1588-2002.
The nodes in the PTP network (which are called clocks) are organized into master-slave hierarchy. Slaves synchronize to their masters, which can in turn be slaves to other masters. The master-slave relationship is determined by running, on each clock, a so-called Best Master Clock (BMC) algorithm. By determining the relationships in the network dynamically, the PTP network can resolve cycles and prune the general interconnect graph down to a tree of masters and slaves.
The root of this tree is the ultimate master that all clocks (possibly indirectly) synchronize to, and is called grandmaster. Its time would in turn typically be synchronized by GPS.
Interior nodes in this network are called boundary clocks (BC). They have one slave port (leading to a master) and one or more master ports (leading to slave clocks).
On Linux, a prominent package that includes a PTP agent and other related tools
is linuxptp
. The following text assumes that this PTP package is
installed.
Kernel Version | |
---|---|
5.3 | Support PTP in master, slave or BC role on Spectrum-1. |
5.4 |
ethtool counters for garbage-collected packets and timestamps. |
6.0 | Spectrum-2 support. |
The linuxptp
daemon that implements the PTP protocol is called ptp4l
.
ptp4l
can use different transport protocols to communicate with other clocks:
Ethernet, UDP over IPv4 or UDP over IPv6. If it should be using UDP, the ports
that ptp4l
should run on need to be router ports. For example, in order to
configure for UDP over IPv4:
# ip address add dev swp1 192.0.2.1/28
# ip address add dev swp2 192.0.2.17/28
# ip address add dev swp3 192.0.2.33/28
In simple situations all that is necessary to enable PTP on a switch is to run
ptp4l
, passing one or more -i
options with interfaces to use:
# ptp4l -i swp1 -i swp2 -i swp3
By default, ptp4l
uses IPv4 UDP as a transport protocol. The command line
options -2
, -4
and -6
can be used to select, respectively, Ethernet, UDP
over IPv4 and UDP over IPv6 transport protocols.
A more detailed configuration can be done through a configuration file. One with a couple common options set might look like this:
[global]
time_stamping hardware
logSyncInterval 0
logMinDelayReqInterval 1
summary_interval 0
step_threshold 1.0
tx_timestamp_timeout 30
network_transport UDPv4
[swp1]
[swp2]
[swp3]
The global section holds options that are either global in nature and influence
behavior of the daemon itself, or common for all interfaces unless overridden
explicitly. The example also shows three sections with names matching the ports
configured above. All of them are empty, but each of them serves as a cue for
ptp4l
to run PTP on a given port.
Of particular interest is the option tx_timestamp_timeout
, which configures
how long (in milliseconds) ptp4l
will wait for timestamp of a sent packet.
ptp4l
defaults to 1ms, which is often enough unless the system is busy. On a
system with high CPU load or one that needs to handle many PTP packets, it may
be necessary to set the timeout higher. The value of 30ms shown above should be
enough for the PTP traffic volume that the Spectrum system is designed to
handle.
The option step_threshold
configures the amount of inaccuracy, in seconds,
that ptp4l
will correct in one step, instead of by tweaking the clock
frequency.
ptp4l
does not use a configuration file by default, but one can be passed in a
-f
command line option. When the configuration file includes the interfaces to
run on, they do not need to be listed on command line anymore. ptp4l
just
needs to be told the location of that file:
# ptp4l -f /etc/ptp4l.conf
On systems that use systemd
, the ptp4l
service file uses environment file
/etc/sysconfig/ptp4l
to configure the options that ptp4l
is started with.
The default options contain -i eth0
, which is typically wrong on a switch.
Because all of the configuration can be contained in /etc/ptp4l.conf
,
including the interfaces to run on, -f
is actually the only necessary option:
# cat /etc/sysconfig/ptp4l
OPTIONS="-f /etc/ptp4l.conf"
If the ptp4l
environment file is updated this way, and /etc/ptp4l.conf
contains a suitable configuration, ptp4l
can be started by systemd:
# systemctl start ptp4l
If ptp4l
should be started by default during the system boot, the service
needs to be enabled:
# systemctl enable ptp4l
Make sure that your kernel has PTP_1588_CLOCK
and NET_PTP_CLASSIFY
enabled
when you want to use the PTP support in mlxsw
.
IEEE 1588 standardizes a mechanism for introspection and configuration of
individual PTP clocks. This uses PTP management messages. The linuxptp
suite
contains a tool named pmc
, for PTP management client, which can be used to
send and receive PTP management messages.
As an example, a GET CURRENT_DATA_SET
command can be sent to retrieve the
current status of each clock:
# pmc -u 'GET CURRENT_DATA_SET'
sending: GET CURRENT_DATA_SET
7cfe90.fffe.f5a351-0 seq 0 RESPONSE MANAGEMENT CURRENT_DATA_SET
stepsRemoved 1
offsetFromMaster -5.0
meanPathDelay 738.0
The value stepsRemoved
shows how far from the grand master, in PTP hops, a
given clock is. offsetFromMaster
then shows the estimation of how far this
system's clock is from the grand master clock, in nanoseconds.
Similarly, one could use GET DEFAULT_DATA_SET
to find out how each clock is
configured:
# pmc -u 'GET DEFAULT_DATA_SET'
sending: GET DEFAULT_DATA_SET
7cfe90.fffe.f5a351-0 seq 0 RESPONSE MANAGEMENT DEFAULT_DATA_SET
twoStepFlag 1
slaveOnly 0
numberPorts 3
priority1 100
clockClass 248
clockAccuracy 0xfe
offsetScaledLogVariance 0xffff
priority2 128
clockIdentity 7cfe90.fffe.f5a351
domainNumber 0
Some values can be also changed by using the SET
command:
# pmc -u 'SET PRIORITY1 128'
sending: SET PRIORITY1
7cfe90.fffe.f5a351-0 seq 0 RESPONSE MANAGEMENT PRIORITY1
priority1 128
All the commands supported by pmc
are described in the manual page, or by
issuing pmc -u help
.
The message GET PORT_STATS_NP
can be used to obtain counters of number of
messages received or transmitted on each port:
# pmc -u -b0 'GET PORT_STATS_NP'
sending: GET PORT_STATS_NP
7cfe90.fffe.f5a351-1 seq 0 RESPONSE MANAGEMENT PORT_STATS_NP
portIdentity 7cfe90.fffe.f5a351-1
rx_Sync 80746
rx_Delay_Req 0
rx_Pdelay_Req 0
rx_Pdelay_Resp 0
rx_Follow_Up 80746
rx_Delay_Resp 80542
rx_Pdelay_Resp_Follow_Up 0
rx_Announce 80747
rx_Signaling 0
rx_Management 0
tx_Sync 1
tx_Delay_Req 80542
tx_Pdelay_Req 0
tx_Pdelay_Resp 0
tx_Follow_Up 1
tx_Delay_Resp 0
tx_Pdelay_Resp_Follow_Up 0
tx_Announce 2
tx_Signaling 0
tx_Management 0
[...]
This will show statistics for all the ports. To get just one of them, use
the TARGET
command:
# pmc -u -b0 'TARGET 7cfe90.fffe.f5a351-2' 'GET PORT_STATS_NP'
sending: GET PORT_STATS_NP
7cfe90.fffe.f5a351-2 seq 0 RESPONSE MANAGEMENT PORT_STATS_NP
portIdentity 7cfe90.fffe.f5a351-2
rx_Sync 0
rx_Delay_Req 88293
[...]
Note: PTP message statistics and support for PORT_STATS_NP
should be
available in the next version of linuxptp
after 2.0. Before they are, the
two patches can be applied by hand.
Another tool that the linuxptp
package contains is phc2sys
, for
synchronizing several system clocks. This is suitable for updating the Linux
real time clock (what date
shows) from a ptp4l
-synchronized hardware clock.
# phc2sys -a -r -m
-a
instructs phc2sys
to connect to a running ptp4l
daemon to determine
what hardware clocks are synchronized. -r
additionally selects Linux real-time
clock. Since the real-time clock is not considered as time source unless -r
is
given twice, this will cause synchronization of real-time clock to the hardware
clock.
Linux distributions typically contain a systemd
service to run in the above
mode. So you can start or enable phc2sys
similarly to ptp4l
:
# systemctl enable phc2sys
# systemctl start phc2sys
# systemctl enable --now phc2sys # both of the above at the same time
When running phc2sys
make sure that systemd-timesyncd
is disabled, otherwise
it will have its own ideas about how to synchronize the real-time clock.
When PTP is supported on a given Spectrum switch, each front panel port
announces the existence of a hardware clock. (All of them expose the same
clock.) ethtool
can be used to check this:
# ethtool -T swp1
[...]
PTP Hardware Clock: 2
[...]
We can verify that ptp2
is indeed the right clock:
# cat /sys/class/ptp/ptp2/clock_name
mlxsw_sp_clock
The device that represents the PHC associated with the above port would be
/dev/ptp2
. Either that file or directly the port name can be used to access
and tweak the hardware clock:
# phc_ctl swp1 get
phc_ctl[17852.520]: clock time is 1562947079.124677110 or Fri Jul 12 18:57:59 2019
# phc_ctl swp1 set 2000000000
phc_ctl[17990.104]: set clock time to 2000000000.000000000 or Wed May 18 06:33:20 2033
By adjusting clock frequency, one can have the clock tick slower or faster. In the following example, the clock frequency is increased by 10% (100M parts per billion), and indeed accumulates an extra second within 10 seconds of real time waiting:
# phc_ctl swp1 set 0 freq 100000000 wait 10 get
phc_ctl[18370.946]: set clock time to 0.000000000 or Thu Jan 1 02:00:00 1970
phc_ctl[18370.946]: adjusted clock frequency offset to 100000000.000000ppb
phc_ctl[18380.946]: process slept for 10.000000 seconds
phc_ctl[18380.946]: clock time is 11.001129199 or Thu Jan 1 02:00:11 1970
The user would not normally manipulate the clock by hand like this, instead leaving this to the PTP daemon. But it may be useful in order to verify low-level operation of the hardware clock.
PTP operation relies heavily on the ability of the hardware to timestamp the packets accurately as they enter and leave the switch. Seeing the operation of egress timestamping is tricky, because the timestamps are only available on a specially-opened socket. However it is easy enough to test the operation of ingress timestamping.
In order to inspect and configure timestamping of PTP packets, use the tool
hwstamp_ctl
from the linuxptp
suite:
# hwstamp_ctl -i swp6
current settings:
tx_type 0
rx_filter 0
Let us enable ingress timestamping of all packets (that's the 1
in the
following):
# hwstamp_ctl -i swp6 -r 1
current settings:
tx_type 0
rx_filter 0
new settings:
tx_type 0
rx_filter 1
Now to determine which packets are HW-timestamped, set the clock to the past and
run tcpdump
to capture:
# phc_ctl swp6 set 1000000000
phc_ctl[429855.465]: set clock time to 1000000000.000000000 or Sun Sep 9 04:46:40 2001
# tcpdump -j adapter_unsynced -tttt -i swp6
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on swp6, link-type EN10MB (Ethernet), capture size 262144 bytes
2019-08-05 16:33:08.221491 LLDP, length 59
2019-08-05 16:33:08.225781 LLDP, length 59
[...]
Now use mausezahn to send broken PTP Sync packets to that port, either from another machine, or from the same switch, if the two ports are connected by a loop-back cable:
# mausezahn swp6 -A 192.0.2.1 -B 224.0.1.129 -c 1 -a own -b bc -t udp \
sp=319,dp=319,p=00:02:$(yes 00 | head -n 32 | tr '\n' ':')
On the tcpdump
terminal you should see something like this:
2019-08-05 16:35:38.443538 LLDP, length 59
2019-08-05 16:35:38.447916 LLDP, length 59
2001-09-09 04:49:48.021207 IP 192.0.2.1.ptp-event > ptp-primary.mcast.net.ptp-event: UDP, length 34
The timestamps from 2019 are the software ones. The packets could be from another control protocol, or just data plane traffic that was trapped to the CPU. The timestamps from 2001 are from the HW clock.
On Spectrum-2 and newer ASICs, user cannot enable timestamping only in ingress or egress. The following configuration is not supported:
# hwstamp_ctl -i swp6 -r 12 -t 0
current settings:
tx_type 0
rx_filter 0
SIOCSHWTSTAMP failed: Invalid argument
The Spectrum-2 and newer ASICs act as a transparent clock between front panel ports and the reference plane, which lies at the host interface ("CPU port"). When timestamping is enabled on a port, this also enables adjustment of the correction field in timestamped packets. However, it only makes sense to enable the transparent clock globally. Therefore when timestamping is enabled on one of the ports, PTP packets from all ports will have their correction field updated. Correspondingly, PTP packets from all ports will also get the timestamp attached.
On Spectrum-2 and newer ASICs, mlxsw
permits timestamping of any PTP event
packets. Due to the global nature of the setting, when timestamping of any PTP
event is requested, all PTP event packets are timestamped. Correspondingly,
HWTSTAMP_FILTER_SOME
(the value 2) is returned through the API:
# hwstamp_ctl -i swp6 -r 4 -t 1
current settings:
tx_type 0
rx_filter 0
new settings:
tx_type 1
rx_filter 2
When operating a multicast router, if the PIM rendezvous point (RP) mask covers the multicast address used by PTP (224.0.1.129), the multicast router will eventually set up routes for this address. If the master messages have TTL>1, they will not be terminated on the boundary clock, but instead be propagated to the slaves. The slaves will thus see directly master messages, will evaluate it as the best master clock, and attempt to synchronize directly to it. Because all the traffic goes through slow path, this leads to unpredictable path delay and the slave will be unable to synchronize reasonably close to the master clock.
The issue can be corrected by either of the following ways:
-
Configure the PIM RP mask that does not cover the PTP multicast address.
-
Set up an
iptables
rule that prevents propagation of PTP packets:# iptables -I FORWARD -p udp -m udp --dport 319 -j DROP # iptables -I FORWARD -p udp -m udp --dport 320 -j DROP
-
Another possibility is to have the master send packets such that when they arrive to the switch, they have TTL of 1. Such packets are not forwarded. In
ptp4l
, TTL of sent packets is configured by theudp_ttl
option.
To deliver egress timestamps to ptp4l
, Linux loops back the original packet to
the error queue of the listening socket, with the timestamps attached. If that
socket is already overwhelmed with ingress traffic, there may not be space to
enqueue the timestamped egress traffic. While ptp4l
will not notice a missed
ingress packet, it needs to wait for the timestamp of the packets that it sent
(see above for tx_timestamp_timeout
), and will therefore notice and complain
that the timestamp was not delivered.
Unfortunately, ptp4l
does not allow setting of the socket receive buffer size,
so if this happens, one needs to increase the default value before starting
ptp4l
, and then revert in back again. For example:
# sysctl -w net.core.rmem_max=4194304 # 4MB
# sysctl -w net.core.rmem_default=4194304
When timestamping is enabled on a port with speed of 25Gbps or lower, mlxsw
activates a PTP shaper. That decreases the bandwidth by about 4-6% in order to
reduce the jitter and improve predictability of timestamping. The shaper is
turned off again when PTP timestamping is disabled on a given port, or when the
speed increases above 25Gbps.
On Spectrum-1, timestamps are delivered to the driver separately from the packets, and the driver has to match the two before passing the timestamped packet on to the kernel. Due to this separation it is possible that one of the pieces never arrives.
The driver runs a regular garbage collection process that cleans up
unmatched packets and timestamps after they are about a second old. In
order to provide visibility into the GC events, the mlxsw
driver
publishes, on Spectrum-1 only, the following four artificial counters:
# ethtool -S swp1 | grep ptp_
ptp_rx_gcd_packets: 0
ptp_rx_gcd_timestamps: 0
ptp_tx_gcd_packets: 0
ptp_tx_gcd_timestamps: 0
Packets are garbage-collected if the driver got the packet, but did not get the corresponding timestamp. On contrary, timestamps are collected when the packet was dropped.
One reason for these losses is host interface pressure. Currently the PTP policer is hardcoded at 24Kpps, which needs to include all incoming PTP event packets, and then the timestamp events themselves, for both ingress and egress. Although one timestamp event can carry up to four timestamps, the vast majority of them will arrive one by one. So 12Kpps of PTP event ingress will all but saturate the host interface policer.
On top of that, even below the policer limit, some timestamps will get lost simply because there happens to be a burst of PTP traffic on a single port, and the HW timestamp queue for that port has overflown.
The garbage-collected packets are simply forwarded without a timestamp,
they are not thrown away unless the original port went away, for example
due to a split. Some PTP stacks might still be able to use these packets in
some way, though linuxptp
will simply complain:
ptp4l[6000.268]: rms 19 max 46 freq -1484 +/- 26 delay 24 +/- 2
ptp4l[6004.269]: rms 13 max 25 freq -1476 +/- 19 delay 25 +/- 1
ptp4l[6008.271]: rms 12 max 28 freq -1461 +/- 18 delay 26 +/- 1
ptp4l[6011.029]: port 3: received DELAY_REQ without timestamp
- man ptp4l
- man phc2sys
- man pmc
- Linux kernel timestamping documentation
- Fedora PTP documentation
- Slides for a PTP tutorial
General information
System Maintenance
Network Interface Configuration
- Switch Port Configuration
- Netdevice Statistics
- Persistent Configuration
- Quality of Service
- Queues Management
- How To Configure Lossless RoCE
- Port Mirroring
- ACLs
- OVS
- Resource Management
- Precision Time Protocol (PTP)
Layer 2
Network Virtualization
Layer 3
- Static Routing
- Virtual Routing and Forwarding (VRF)
- Tunneling
- Multicast Routing
- Virtual Router Redundancy Protocol (VRRP)
Debugging