I truly, very much appreciate the interest, and weirdly I did get some different behaviour from these devices after cold booting them, which I suspect means a warm reboot doesn't actually reset the baseband radio entirely, so there ya go, there's some truth a complete cold boot.
So I tried some more simple experiments. It turns out what the RT-AC86U churns on isn't just arp, its anything broadcast. It also isn't dropping these packets, they sit somewhere after the AC86U's kernel for a while... and then eventually end up in the air.
I have a daemon that produces broadcast traffic, a lot of it: oscd. I use it for other nerdy stuff, but for this case, I quickly cannibalized it to drop broadcast UDP packets with the contents of the current time and a destination port of 6667, and then ran tcpdump on a lot of things around the house at the same time. There's probably a more sockety and pure python way to do this, but I'm far from a native python speaker, so forgive the horribleness.
I also installed tcpdump on the RT-AC86U via entware (after going through so many USB sticks that were either A) dead or B) tiny or C) filled with gum and pocket lint). Anyway here's what I found:
With the the below broadcast generator on a wireless client on the network (verified to be connected to the main RT-AC86U):
Python:
import socket
from time import sleep
from pythonosc import udp_client
from datetime import datetime
client = udp_client.SimpleUDPClient('192.168.1.255',6667)
client._sock.setsockopt(socket.SOL_SOCKET, socket.SO_BROADCAST, 1)
while True:
client.send_message('time', datetime.now().strftime("%H:%M:%S"))
sleep(0.2)
5 packets a second should do it!
Running tcpdump with lines like: tcpdump -i eth6 -A src 192.168.1.115 and dst port 6667
I get dump output with arrival timestamps and easily readable contents that have departure timestamps!
Great, I just re-invented most of the functionality of ping just with UDP and broadcast Woohoo! there's probably a much better way to do this!
A client connected by cat-5 to the other RT-AC86U AiMesh node received the broadcast packets with normal network latency.
A client connected by cat-5 to the primary RT-AC86U received the broadcast packets with normal network latency.
The primary RT-AC86U itself saw the broadcast packets arrive and re-broadcasted them out on all interfaces at normal latency.
Sample
good tcpdump output:
Code:
17:52:11.639453 IP sidecar.clockmaker.home.49136 > 192.168.1.255.ircd: UDP, length 24
E..4U<@.@.`....z......... ..time....,s..17:52:11....
17:52:11.689891 IP sidecar.clockmaker.home.49136 > 192.168.1.255.ircd: UDP, length 24
E..4UH@.@.`....z......... ..time....,s..17:52:11....
17:52:11.740420 IP sidecar.clockmaker.home.49136 > 192.168.1.255.ircd: UDP, length 24
E..4UM@.@.`....z......... ..time....,s..17:52:11....
6667 is evidently ircd, huh, who knew?
Sample
terrible tcpdump output:
Code:
19:06:26.413801 wlp0s20f3 B IP burrito.clockmaker.home.58417 > 192.168.1.255.6667: UDP, length 24
E..4!L@.@......s.....1... ..time....,s..19:05:48....
19:06:26.413801 wlp0s20f3 B IP burrito.clockmaker.home.58417 > 192.168.1.255.6667: UDP, length 24
E..4!Q@.@......s.....1... ..time....,s..19:05:49....
19:06:26.721130 wlp0s20f3 B IP burrito.clockmaker.home.58417 > 192.168.1.255.6667: UDP, length 24
E..4!T@.@......s.....1... ..time....,s..19:05:49....
19:06:26.721131 wlp0s20f3 B IP burrito.clockmaker.home.58417 > 192.168.1.255.6667: UDP, length 24
E..4!W@.@......s.....1... ..time....,s..19:05:49....
19:06:26.721131 wlp0s20f3 B IP burrito.clockmaker.home.58417 > 192.168.1.255.6667: UDP, length 24
E..4!Y@.@......s.....1... ..time....,s..19:05:49....
19:06:26.721131 wlp0s20f3 B IP burrito.clockmaker.home.58417 > 192.168.1.255.6667: UDP, length 24
E..4!]@.@......s.....1... ..time....,s..19:05:49....
(I mean yeah, 37s is terrible, this is a LAN! 37s in WAN routing terms is... like 120 times around the planet!)
A client connected by wireless... got every packet, eventually, but with varying rates of delay. I watched the dump for quite some time, left it running while watching TV, doing the laundry, making a sandwich, etc. The latency to any client connected to the RT-AC86U's
wireless directly was all over the place. For hours at a time it was at line rate, then the delay slowly drifted up to four minutes! FOUR MINUTES! That means the packets were sitting in a buffer forever! That's just nuts! Meanwhile anything unicast stayed at line rate. Typical delay hung out at a mean of two minutes.
I had initially thought, maybe there's a rate limit in here somewhere without a drop... but after eliminating ALL other broadcast traffic on the network aside from my generator and listener, I got the same wacky variable behaviour. There's no triggering traffic threshold I can determine.
Thinking maybe this was related to the backhaul methodology, I connected the two RT-AC86Us directly with cat-5 and reconfigured them for 1Ge backhaul and was still able to replicate the problem immediately.
I've been up and down the iptables and ebtables on this thing and can't find any part of it that would account for post-routing delay that long. Its definitely nothing before that as... the tcpdump shows the packets exiting at line rate. Unless there's some part of iptables I'm missing, and please, please someone tell me I'm missing something basic, then the broadcast packets are getting stuck in the baseband radio buffer for... a while, and that's a big binary blob driver that we can't poke, right?
I took an old Netgear R6400, made it a dumb AP running a different ESSID and hung it off one of the RT-AC86Us so I could put most of the gadgets around the house on the old Netgear and they could... find each other without minutes of latency in the discovery. This doesn't help the people with laptops who are connected to the RT-AC86U trying to find the house file server.