What's new
  • SNBForums Code of Conduct

    SNBForums is a community for everyone, no matter what their level of experience.

    Please be tolerant and patient of others, especially newcomers. We are all here to share and learn!

    The rules are simple: Be patient, be nice, be helpful or be gone!

Router trouble RT-AX88U

ciprian.trofin

Occasional Visitor
My router is Asus RT-AX88U, firmware version 3004.388.6.

The 8 port switch is used as follows:
- port 1: desktop
- port 2: printer
- port 3: NAS
- port 4: powerline adapter
- port 5: Raspberry Pi running PiHole

DHCP is enabled on router. PiHole works only as DNS (DHCP disabled).

The router has WiFi hotspots on both 2.4 and 5GHz bands

The powerline network works as follows:
- one end is static, connected by Ethernet cable to router switch port 4
- the other end is WiFi enabled (2.4GHz only), serving a hotspot with the same SSID/credentials as the router. DHCP is disabled on this device. Devices connected to powerline hotspot get the IP from router.
The ideea for powerline is to extend 2.4GHz coverage.

FWIW, powerline ends connect each other @200Mbps.

The issue is the following: after working for a while (could be 3-4 days, could be a few hours after a reboot), the devices in range of powerline hotspot fail to get an IP address. The system log shows lots of consecutive DHCPDISCOVER / DHCPOFFER messages, not followed by DHCPREQUEST.

Code:
Mar  4 19:40:21 dnsmasq-dhcp[1129]: DHCPDISCOVER(br0) 30:09:c0:xx:xx:xx
Mar  4 19:40:21 dnsmasq-dhcp[1129]: DHCPOFFER(br0) 192.168.76.10 30:09:c0:xx:xx:xx
Mar  4 19:40:26 dnsmasq-dhcp[1129]: DHCPDISCOVER(br0) 30:09:c0:xx:xx:xx
Mar  4 19:40:26 dnsmasq-dhcp[1129]: DHCPOFFER(br0) 192.168.76.10 30:09:c0:xx:xx:xx
Mar  4 19:40:31 dnsmasq-dhcp[1129]: DHCPDISCOVER(br0) 30:09:c0:xx:xx:xx
Mar  4 19:40:31 dnsmasq-dhcp[1129]: DHCPOFFER(br0) 192.168.76.10 30:09:c0:xx:xx:xx
Mar  4 19:40:35 dnsmasq-dhcp[1129]: DHCPDISCOVER(br0) 30:09:c0:xx:xx:xx
Mar  4 19:40:35 dnsmasq-dhcp[1129]: DHCPOFFER(br0) 192.168.76.10 30:09:c0:xx:xx:xx
Mar  4 19:40:44 dnsmasq-dhcp[1129]: DHCPDISCOVER(br0) 30:09:c0:xx:xx:xx
Mar  4 19:40:44 dnsmasq-dhcp[1129]: DHCPOFFER(br0) 192.168.76.10 30:09:c0:xx:xx:xx
Mar  4 19:40:59 dnsmasq-dhcp[1129]: DHCPDISCOVER(br0) 30:09:c0:xx:xx:xx
Mar  4 19:40:59 dnsmasq-dhcp[1129]: DHCPOFFER(br0) 192.168.76.10 30:09:c0:xx:xx:xx
Mar  4 19:41:04 dnsmasq-dhcp[1129]: DHCPDISCOVER(br0) 30:09:c0:xx:xx:xx
Mar  4 19:41:04 dnsmasq-dhcp[1129]: DHCPOFFER(br0) 192.168.76.10 30:09:c0:xx:xx:xx
Mar  4 19:41:07 dnsmasq-dhcp[1129]: DHCPDISCOVER(br0) 30:09:c0:xx:xx:xx
Mar  4 19:41:07 dnsmasq-dhcp[1129]: DHCPOFFER(br0) 192.168.76.10 30:09:c0:xx:xx:xx
Mar  4 19:41:09 dnsmasq-dhcp[1129]: DHCPDISCOVER(br0) 30:09:c0:xx:xx:xx
Mar  4 19:41:09 dnsmasq-dhcp[1129]: DHCPOFFER(br0) 192.168.76.10 30:09:c0:xx:xx:xx
Mar  4 19:41:13 dnsmasq-dhcp[1129]: DHCPDISCOVER(br0) 30:09:c0:xx:xx:xx
Mar  4 19:41:13 dnsmasq-dhcp[1129]: DHCPOFFER(br0) 192.168.76.10 30:09:c0:xx:xx:xx
Mar  4 19:41:22 dnsmasq-dhcp[1129]: DHCPDISCOVER(br0) 30:09:c0:xx:xx:xx
Mar  4 19:41:22 dnsmasq-dhcp[1129]: DHCPOFFER(br0) 192.168.76.10 30:09:c0:xx:xx:xx
Mar  4 19:41:39 dnsmasq-dhcp[1129]: DHCPDISCOVER(br0) 30:09:c0:xx:xx:xx
Mar  4 19:41:39 dnsmasq-dhcp[1129]: DHCPOFFER(br0) 192.168.76.10 30:09:c0:xx:xx:xx

Moving in router's hotspot range result in success:
Code:
Mar  4 19:42:11 dnsmasq-dhcp[1129]: DHCPDISCOVER(br0) 30:09:c0:xx:xx:xx
Mar  4 19:42:11 dnsmasq-dhcp[1129]: DHCPOFFER(br0) 192.168.76.10 30:09:c0:xx:xx:xx
Mar  4 19:42:11 dnsmasq-dhcp[1129]: DHCPREQUEST(br0) 192.168.76.10 30:09:c0:xx:xx:xx
Mar  4 19:42:11 dnsmasq-dhcp[1129]: DHCPACK(br0) 192.168.76.10 30:09:c0:xx:xx:xx

Resetting the powerline devices (power cycling) does not solve the problem. Rebooting the router does, but the problem shows again later.

I had the same problem with previous router, an N66U (using latest Merlin FW). I upgraded mostly because of additional LAN ports. I am aware of port 5-8 troubles, but I had none.

Any advice ?
 
The issues with Ports 5-8 are random and obscure. This may be another variation.

To test properly, replace the PLAs with a wire (to test with). I'm going to guess the issue is repeated.
 
That's assuming its not a combined PLA with WiFi, so may not be possible to temporarily use an ethernet cable - but is the first step I'd try.

It does sound like an old problem I had way back around 4 years ago on 384.something - same thing in the logs the clients didn't do the request after an offer. Upgrading version nor full factory resets solved it, anywhere from 2-3 day to 2 weeks and it would come back.

I think restarting the dnsmasq service every week in a cron job kept it at bay. I tried every new release (with factory resets) but it persisted until about 386.x then magically disappeared. Not much help, I know, sorry!
 
Power gremlins? When was the power supply last changed? Is it the OE or an aftermarket? This is pretty non-specific, works sometimes, sometimes not. I've had these types of odd failures that were fixed with a new PS, even when the current one still 'worked'.
 
Thank you for your kind advice.
I will investigate as much as I can, according your suggestions, however I lack some hardware... :(


To me, it looks like the client does not receive the DHCPOFFER message from router. Is there any way to investigate if the DHCPOFFER is sent to the correct interface?

To explain: the router is placed in a room, the WPA is in another (WPA - Wireless Power Adapter, check below)
To allow roaming, I setup the WPA hotspot the same as the router hotspot (if course, different channels). When I move between rooms with my phone, my phone connects to the strongest (signal) hotspot, sends a DHCPDISCOVER broadcast, the router responds with an DHCPOFFER... and that's all. If I'm not mistaken, the DHCPOFFER is unicast, it is sent only to the requesting device but this means different interfaces, according the hotspot:
1. if the hotspot is the router - the interface is wireless
2. if the hotspot is the WPA - the interface is a wired port on router


The powerline adapter consists of 2 units:
- TL-PA4010 (TpLink Power Adapter, basically a Layer 2 unit, connected by Cat5 to router and plugged in a power outlet)
- TL-WPA4220 (TpLink Wireless Power Adapter, it is the other end of TL-PA4010, it has assigned an IP address)
The powerline connection between adapters is stable (I cron ping the WPA unit every 10 minutes and there are no interruptions).
 
For a while, I had no issues. After previous post, I installed scMerlin scripts, to allow me to restart dnsmasq service.

Today it showed again: when in range of Wireless Power Adapter, my device (phone) does not get an IP address; when in range of router, it does.

I tried (without restarting the router):
- restart dnsmasq service on router (via scMerlin scripts) - no improvement
- restart wifi service on router (via scMerlin scripts) - no improvement
- power cycle the Wireless Power Adapter - no improvement
- power cycle the Power Adapter (the one connected to router) - no improvement
- change the router port where the Power Adapter is connected - IT WORKED!

I find interesting that power cycling the Power Adapter does not solve my issue, but changing the switch (router) port did.

I also looks like my hunch about the router sending the DHCPOFFER to the wrong interface is plausible.

I there any way to reset (in software) the association between MAC addreses of devices and router interfaces ?
 
So far, the only software solution I found working is rebooting the router in command line or cru.

Is it possible to reset (in software) only one port of the switch ?
 
The router is working stable - 3 weeks so far. One caveat - the router is power cycled each week using a "digital timer" power outlet. The outlet cuts the power monday morning and restores it after 5 minutes. So far, it works better than a software restart (cron reboot).
 
Update: I gave up restarting the router every week (about 3 weeks ago). I had an incident where the issue showed up a few days after restart.

However, currently, the uptime is > 10 days, and there are 2 settings I changed.

1. I disabled beamforming (2.4Ghz and 5Ghz)

2. I (properly) enabled STP. The powerline adapter is a Level 2 device and I thought enabling Spanning Tree Protocol is the way to go. The STP was activated (LAN - Switch Control) but it looks the setting is not doing anything (I posted about it here). Basically, I added
Code:
brctl stp br0 on
in the init-start script.
 
Update: I had to reboot the router (18 days uptime).

The LAN was OK, but the WAN (PPPoE) was disconnected. The log shows lots of entries like this:
Code:
Timeout waiting for PADO packets
about 1 minute apart.

Nothing worked (restart interface via scMerlin, manually disconnect / connect from WAN) but reboot. I checked the forums, and there are related posts. No solution.

I foresee ChkWAN in my future..
 
Update: I had to reboot the router (28 days uptime).

The WAN stopped working and router became nonresponsive (no answer to ping, no GUI). However, it looks like DHCP service was OK.
 
Did you disable Universal Beamforming for both 2.4G and 5G?
I had success disabling the above, previously it was having issues every 72hrs.
The router has been up and working with no issues for over 35 days now, *knock on wood*.
 
Universal Beamforming is disabled. Both bands.
Current uptime: 2 days. I don't remember uptime longer than 30 days.

Last time I manually rebooted the router was about a month ago, but it might be longer. ChkWAN takes care of reboots. I am quite happy, actually.
 
Well, after a while, ChkWAN became inadequate (no shade to author, I suspect the router or my local network have some other issues). The router kept freezing.
Towards the end of January I went "factory defaults" way. Also, I moved the DNS and DHCP server to a Raspberry Pi 3B+ box (it also runs Tailscale). It did not improve the situation: the router kept on freezing every 3-4 days.

Today, after a 6 day uptime, it froze again. The log shows:
Code:
Feb  9 11:44:30 kernel: FPM Pool 0: invalid token 0x0014c000 freed
Feb  9 11:44:30 kernel: FPM Pool 0: ISR timer is enabled. There could be multiple occurrences of the reported issue
Feb  9 11:44:30 kernel: dhd_pktid_to_native: invalid addr detected: pktptr ffffffc02f115a00 next_p           (null) prev_p           (null) pkttype 2
[ about 30 entries, for 1-2 seconds ]
Feb  9 11:44:31 kernel: dhd_pktid_to_native: invalid addr detected: pktptr ffffffc02aa26000 next_p 0000001a00000111 prev_p dc3f7c3e06000001 pkttype 3
Feb  9 11:44:31 kernel: dhd_pktid_to_native: invalid addr detected: pktptr ffffffc02f115a00 next_p           (null) prev_p           (null) pkttype 2
Feb  9 11:44:35 kernel: FPM Pool 0: invalid token 0x00115000 freed
Feb  9 11:44:35 kernel: FPM Pool 0: ISR timer is enabled. There could be multiple occurrences of the reported issue
Feb  9 11:44:36 kernel: FPM Pool 1: invalid token 0x2003e000 freed
Feb  9 11:44:36 kernel: FPM Pool 1: ISR timer is enabled. There could be multiple occurrences of the reported issue
Feb  9 11:47:30 kernel: FPM Pool 1: invalid token 0x2000e000 freed
Feb  9 11:47:30 kernel: FPM Pool 1: ISR timer is enabled. There could be multiple occurrences of the reported issue
Feb  9 11:51:06 pppd[12315]: No response to 10 echo-requests
Feb  9 11:51:06 pppd[12315]: Serial link appears to be disconnected.
Feb  9 11:51:06 pppd[12315]: Connect time 5177.0 minutes.
Feb  9 11:51:06 pppd[12315]: Sent xxxx bytes, received xxxx bytes.
Feb  9 11:51:07 WAN(0)_Connection: Fail to connect with some issues.
Feb  9 11:51:09 pppd[12315]: Connection terminated.
Feb  9 11:51:09 pppd[12315]: Sent PADT
Feb  9 11:51:54 pppd[12315]: Timeout waiting for PADO packets
Feb  9 11:53:09 pppd[12315]: Timeout waiting for PADO packets
Feb  9 11:53:47 kernel: FPM Pool 1: invalid token 0x20045000 freed
Feb  9 11:53:47 kernel: FPM Pool 1: ISR timer is enabled. There could be multiple occurrences of the reported issue
Feb  9 11:54:22 pppd[12315]: Timeout waiting for PADS packets
Feb  9 11:54:57 pppd[12315]: Timeout waiting for PADO packets
Feb  9 11:56:12 pppd[12315]: Timeout waiting for PADO packets
Feb  9 11:56:15 kernel: FPM Pool 1: invalid token 0x2004e000 freed
Feb  9 11:56:15 kernel: FPM Pool 1: ISR timer is enabled. There could be multiple occurrences of the reported issue
Feb  9 11:56:50 pppd[12315]: Timeout waiting for PADS packets
Feb  9 11:57:25 pppd[12315]: Timeout waiting for PADO packets
The router is frozen (no ping / SSH / WebGUI answer), the internet is down, but the devices on network are reacheable (pings, webservices).

I reviewed older logs, and FPM Pool messages are followed by a freeze.
 
This almost year long process of troubleshooting intermittent issues has to tell you one thing - something is not working properly there and you need to make some changes. A router model with known switch issues, custom firmware on it, custom scripts on top, PLA adapters, external DNS/DHCP... try simplifying things and find the device causing the issue, replace it with working well device.
 
I am very aware I might have a dud, but I am trying to narrow the problem. I am not prepared to throw the router away, and it's out of warranty :)

I moved the DHCP from router because, during previous crashes, the log showed inability to offer DHCP service.
Currently, there are no custom scripts running.
PLA adapter might be the culprit, but I have a "hint" that it's not about the PLA (check post #6).

I had FPM Pool messages before, but never th dhd_pktid_to_native kind.

Currently, I'm experimenting with WiFi settings - I turn off both radios during 5-5:15AM (via WebGUI). There is a reason, and if it works, I'll tell you why I did it :)

The thermals are OK (router is v1.1). Another way to go is to replace the power source.
 
Well, that experiment also failed.

Next step: downgrade firmware to Merlin 384.19 -> reset to factory defaults.
Interesting: STP is enabled on 384.19. LAN - Switch Control works (it does not in 388)
 
Maybe it is time to abandon the Merlin firmware and use Asus. Downgrading to the old Merlin exposes you to security issues.
 
So far... mixed results. The router didn't crash, just some huge ping times after about 3 day uptime. I had to reboot the router.
My network is shown below.
The huge ping times (about 2s) happened when my phone (on 2.4G) was in yellow area, and the target the router or some device in the same yellow area. Moving my phone in blue area made the ping times "normal" (tens ms) - same target, in yellow area. I'm not an expert on networking, but I thought that WiFi devices in yellow area communicate without packets going to the router and back.

my_network.png


I find a way to replicate some symptoms I found during crashes: stopping the ARP on router with
Code:
ip link set arp off dev br0
makes the router inaccesible, but communication between devices connected to router work just fine.
 
If not already tested, try changing the wifi enabled pla endpoint ssid and creds to something completely different from what the main router broadcast.
 

Similar threads

Support SNBForums w/ Amazon

If you'd like to support SNBForums, just use this link and buy anything on Amazon. Thanks!

Sign Up For SNBForums Daily Digest

Get an update of what's new every day delivered to your mailbox. Sign up here!
Back
Top