What's new

Chrony - spikes in offset

  • SNBForums Code of Conduct

    SNBForums is a community for everyone, no matter what their level of experience.

    Please be tolerant and patient of others, especially newcomers. We are all here to share and learn!

    The rules are simple: Be patient, be nice, be helpful or be gone!

nephilim

Regular Contributor
Hi,

I hope it's OK to post this here, as I believe it is not an ntpMerlin specific issue (please correct me if I am wrong)

I have an AiMesh setup with an AX-86U Pro as main router running chrony. One of the network clients is a Raspberry Pi with attached Adafruit GPS/PPS module. This client runs a chrony instance, which is using the GPS as refclock and a handful of public NTP servers with the "noselect" option for monitoring purposes. The chrony instance on the AX-86U uses the Raspberry as single source (and the same "noselect"ed public servers for monitoring) and acts as time server for my local network.

The issue is that even though the Raspi's time offset is usually less than 1μs, the offset on the AX-86U occasionally jumps to 300-500ms, requiring large corrections of the clock. The Raspi is connected directly to one of the router's ethernet ports. As far as I can tell no other NTP service is running. The ntp entware package is not installed. In the GUI the "Enable local ntp server" box is not checked and there is no entry in the NTP server fields. There is an ntp-symlink pointing to busybox but I have no idea if and where this is configured.

Here is a screenshot of the Raspi's source time offset and some diagnostic info

1687702607740.png


Code:
# chronyc -m sources sourcestats tracking
MS Name/IP address         Stratum Poll Reach LastRx Last sample
===============================================================================
#? GPS                           0   4   377    12    +43ms[  +43ms] +/-  100ms
#* PPS                           0   4   377     9  +1197ns[+1282ns] +/-  634ns
^? zeit.fu-berlin.de             1  10   377   287  -2570us[-2570us] +/-   56ms
^? ptbtime1.ptb.de               1  10   377   34m  -2986us[-2982us] +/- 9605us
^? ptbtime2.ptb.de               1  10   377   972  -3281us[-3281us] +/-   10ms
^? ptbtime3.ptb.de               1  10   377   979  -3861us[-3861us] +/- 9732us
^? ns.tu-berlin.de               2  10   377   760  -2796us[-2797us] +/-   12ms
Name/IP Address            NP  NR  Span  Frequency  Freq Skew  Offset  Std Dev
==============================================================================
GPS                        11   4   160    +70.967    503.715    +59ms    18ms
PPS                        63  39   994     +0.000      0.001     +0ns   794ns
zeit.fu-berlin.de          16   9  258m     +0.042      0.091  -1868us   423us
ptbtime1.ptb.de            10   5  155m     +0.067      0.256  -1547us   451us
ptbtime2.ptb.de            23  14  396m     +0.118      0.086  -1452us   794us
ptbtime3.ptb.de             6   4   86m     -0.046      0.731  -2396us   280us
ns.tu-berlin.de            11   8  189m     -0.042      0.226  -2118us   517us
Reference ID    : 50505300 (PPS)
Stratum         : 1
Ref time (UTC)  : Sun Jun 25 14:17:04 2023
System time     : 0.000000005 seconds fast of NTP time
Last offset     : +0.000000086 seconds
RMS offset      : 0.000000074 seconds
Frequency       : 0.684 ppm fast
Residual freq   : +0.000 ppm
Skew            : 0.001 ppm
Root delay      : 0.000000001 seconds
Root dispersion : 0.000010801 seconds
Update interval : 16.0 seconds
Leap status     : Normal

And this is the AX-86U's interpretation of that input

7 days with severe spikes

1687702817765.png


Last 24h without an extreme spike but still quite large values.

1687702878136.png


Diagnostic info
Code:
# chronyc -m sources  sourcestats tracking
MS Name/IP address         Stratum Poll Reach LastRx Last sample
===============================================================================
^* 192.168.1.105                 1  -1   377     7    +70us[  +76us] +/-  123us
^? zeit.fu-berlin.de             1   6   377    35  -4203us[ -786ms] +/-   58ms
^? ptbtime1.ptb.de               1   6   377    44  -2085us[ -783ms] +/- 8489us
^? ptbtime2.ptb.de               1   6   377    39  -3051us[ -784ms] +/- 9530us
^? ptbtime3.ptb.de               1   6   377    37  -3013us[ -784ms] +/- 9361us
^? ns.tu-berlin.de               2   6   377    51   +778ms[-3362us] +/-   14ms
Name/IP Address            NP  NR  Span  Frequency  Freq Skew  Offset  Std Dev
==============================================================================
192.168.1.105              16   8    19     +0.056      1.637   +291ns  6080ns
zeit.fu-berlin.de          55  24   58m  -2575.711    109.491   +313ms   231ms
ptbtime1.ptb.de            63  28   66m  -2646.124     88.932   +154ms   231ms
ptbtime2.ptb.de            58  26   61m  -2624.447    102.515   +250ms   239ms
ptbtime3.ptb.de            64  28   67m  -2596.221     86.995   +297ms   235ms
ns.tu-berlin.de            50  21   52m  -2570.033    124.756   +319ms   233ms
Reference ID    : C0A80169 (192.168.1.105)
Stratum         : 2
Ref time (UTC)  : Sun Jun 25 14:22:33 2023
System time     : 0.000000000 seconds slow of NTP time
Last offset     : +0.000006530 seconds
RMS offset      : 0.052360762 seconds
Frequency       : 15.524 ppm slow
Residual freq   : +0.056 ppm
Skew            : 1.888 ppm
Root delay      : 0.000136754 seconds
Root dispersion : 0.000075826 seconds
Update interval : 3.8 seconds
Leap status     : Normal

What's even more confusing to me is that I am already averaging 15 readings from the source, hence I expect the Raspi's time signal to be pretty smooth. This is the chrony config
Code:
server 192.168.1.105 minpoll -7 maxpoll 0 filter 15

server zeit.fu-berlin.de noselect
server ptbtime1.ptb.de noselect
server ptbtime2.ptb.de noselect
server ptbtime3.ptb.de noselect
server times.tubit.tu-berlin.de noselect

minsources 1
maxdrift 100

driftfile /opt/var/lib/chrony/drift

dumponexit
dumpdir /opt/var/lib/chrony
pidfile /opt/var/run/chrony/chronyd.pid
ntsdumpdir /opt/var/lib/chrony

makestep 1.0 3
rtcsync

allow 192.168.0.0/16
deny 192.168.1.104
deny 192.168.1.105

broadcast 60 192.168.1.255

logchange 0.5

sched_priority 1
lock_all

bindaddress 192.168.1.1

Is the behavior of the router's clock as expected? If not how can it be improved?
 
most of those figures are in nano seconds (1/1000000sec) or fractions of a mili (1/1000) second - both of which are massively tighter than the tollerances our networks and the internet need/use.
If it's not causing an issue don't fixt it!
 
The router doesn't have a clock at all.

I don't know if what you're testing/using, is proper or not.
 
most of those figures are in nano seconds (1/1000000sec) or fractions of a mili (1/1000) second - both of which are massively tighter than the tollerances our networks and the internet need/use.
If it's not causing an issue don't fixt it!
I would agree but I still feel uncomfortable with occasional corrections of more than half a second (as reported in the syslog).
 
The router doesn't have a clock at all.

I don't know if what you're testing/using, is proper or not.
OK, then I would like to understand what other services other than chrony (deliberately set up) and ntpd (not installed) could affect the system time.
 
You also have to think about how the generated delay is measured. If another interrupt hits at the same time as the measurement is being taken what takes precedence? If we have a problem and we can see it in the graph that's one thing, but fixing an issue because we see it in a graph, but there are no issues, then that's something entirely different!
 
I have an AiMesh setup with an AX-86U Pro as main router running chrony. One of the network clients is a Raspberry Pi with attached Adafruit GPS/PPS module. This client runs a chrony instance, which is using the GPS as refclock and a handful of public NTP servers with the "noselect" option for monitoring purposes. The chrony instance on the AX-86U uses the Raspberry as single source (and the same "noselect"ed public servers for monitoring) and acts as time server for my local network.

Use the Pi as your time reference for the LAN - it's a stratum 1 with the GPS/PPS source...

Couple of tips...

You don't need GPS as a time source, PPS is preferred there, and GPS doesn't add the leap seconds, so it's off from UTC and the rest of the NTP universe unless you nudge it over with an offset.

For NTP sources, consider google or cloudflare public NTP - don't use them together, and if you're using the ntp pools, don't use google's public NTP...

chrony is great - I use it on some of my hosts, but for my ntp server, I do prefer the ntpsec packages... and there, they have ntpviz, which is pretty nice, as it makes all sorts of pretty chart on a daily/weekly basis...

As a side note (this isn't my time source, but from a client...)

Code:
sfx@blue~$ ntpq -p -c rv
     remote           refid      st t when poll reach   delay   offset   jitter
===============================================================================
*time1.google.co .GOOG.           1 u  336 1024  377  37.7387   3.1374   0.4645
+time2.google.co .GOOG.           1 u  312 1024  377  79.3514  -5.9875   0.7168
+time3.google.co .GOOG.           1 u  649 1024  377  38.8309   3.5538   0.2820
+time4.google.co .GOOG.           1 u  981 1024  377  82.1881  -5.0123   0.5388
associd=0 status=0018 leap_none, sync_unspec, 1 event, no_sys_peer,
leap=00, stratum=2, precision=-22, rootdelay=37.739, rootdisp=33.369,
refid=216.239.35.0, reftime=e8436551.d0176f38 2023-06-26T01:16:01.812Z,
tc=10, peer=17767, offset=0.865579, frequency=16.603806, sys_jitter=8.647657,
clk_jitter=0.251437, clock=e84366a2.8d8b207d 2023-06-26T01:21:38.552Z,
processor="x86_64", system="Linux/5.19.0-45-generic",
version="ntpd ntpsec-1.2.1", clk_wander=0.073263, tai=37,
leapsec="2017-01-01T00:00Z", expire="2023-12-28T00:00Z", mintc=0


peer-offsets.png
 
Last edited:
Use the Pi as your time reference for the LAN - it's a stratum 1 with the GPS/PPS source...
I had thought about this and will do so, if I am unable to tame the AX86U.

You don't need GPS as a time source, PPS is preferred there, and GPS doesn't add the leap seconds, so it's off from UTC and the rest of the NTP universe unless you nudge it over with an offset.
That's the way I have set up chrony on the Pi (see first block of chronyc output in my first posting).

I still struggle to accept that whilst the Pi's source clock is stable and has tiny variations in offset, the AX86U is jumping up and down with chrony reporting
Code:
Jun 27 13:08:50 chronyd[4289]: System clock wrong by 0.596941 seconds
several times a day.

That's why I was asking if there could be something else manipulating the system time. I would like to ask again: There are symlinks "ntp" and "ntpd" in /usr/sbin pointing to busybox. How can I check if these are called? Where is their configuration?
 
Just wanted to add a bit more perspective...

The Pi is being disciplined by the GPS module, so it's clock should be pretty stable overall...

It's normal to see some variance, key thing is that chrony and ntp/ntpsec should correct themselves over time...

local-offset.png
local-jitter.png
 

Latest threads

Support SNBForums w/ Amazon

If you'd like to support SNBForums, just use this link and buy anything on Amazon. Thanks!

Sign Up For SNBForums Daily Digest

Get an update of what's new every day delivered to your mailbox. Sign up here!
Top