What's new
  • SNBForums Code of Conduct

    SNBForums is a community for everyone, no matter what their level of experience.

    Please be tolerant and patient of others, especially newcomers. We are all here to share and learn!

    The rules are simple: Be patient, be nice, be helpful or be gone!

Asus AC3200 w/ Merlin 384.13 High CPU / Memory leak?

f00dl3

Occasional Visitor
Not sure if this is a known issue but for a long time now (probably a year) the Merlin firmwares have had issues on the AC3200 where it drops the DHCP about the 29th or 30th day of the month, every month. The only way to bring the network back up is to reboot the router - it's not even inaccessible via SSH from the Internet, when the desktop is because the desktop is the only device that doesn't loose DHCP for some reason (though websites are inaccessible, and all wifi devices on the Asus WiFi networks are inaccessible.)

I have kind of implemented a work-around as I have a Google Fiber box and I have purchased a smartplug which I have connected to the power source on the Fiber Box WiFi, so when the router does this I can SSH into a Raspberry Pi on the Fiber Box WiFi and switch on/off the smartplug for the Asus router... but this is kind of a non-ideal solution as the security on the WiFi of the Fiber Box kind of seems a bit lax. (Though I purchased smartplugs for the desktop and one of my wifi cameras too).

It's rather interesting because I have this setup:

Fiber Box

-> 1 Raspberry Pi (Wifi)
-> 3 Wifi Smartplugs (1 controls router, 1 controls desktop, 1 controls outdoor IP camera)
-> Router AC3200 Asus (Wired)
\
-> Desktop (Wired)
-> 3 IP cameras (Wifi)
-> 1 Raspberry Pi (Wifi)
-> Canon Pixma MP620b (WiFi)
-> Smart TV (WiFi)


Still, would be nice if the Asus AC3200 didn't drop DHCP once a month...

I'm not sure if it has something to do with the fact I have static IPs assigned to all my devices on the AC3200 or not - maybe this is part of the cause? I have to have it set this way for Java programs I have on my desktop that gathers stuff over the network either via SNMP, SSH, or RTSP streams (cameras, system metrics, etc.)
 
Last edited:
Sorry, I don't understand what you mean by "drops the DHCP". DHCP isn't dropped by anything.:confused: Are you talking about DHCP clients or a server? If it's a server, which server - the one on the Asus or the one on the Fibre box?
 
So here's the exact symptoms:

Always the 29/30th day of the month. Note my average monthly data use is 2.8-3.2 TB so this may be right as it hits 3 TB / 3000 GB. Not sure if that's a correlation.

- Google Fiber box WiFi network is NOT impacted, there is no loss of Internet connectivity between anything on the Fiber box WiFi / wired networks. I can still SSH into things on the Fiber Box WiFi from my phone just fine.

- Established connections appear to be not impacted. Desktop remains accessible via SSH. Phone -> Desktop SSH tunnel remains accessible until connection is broken on phone end (4G to 3G switch, signal loss, etc,) then is inaccessible.
- Asus AC3200 Router becomes inaccessible over SSH from the Internet. SSH connection attempts just hang - I do not get a network unreachable via SSH message / connection timeout.
- All devices loose WiFi connectivity.
- If I have a Raspberry Pi on the Asus set up to accept external (Internet) SSH connections, I am unable to establish a connection after the event happens until router is rebooted.
- Desktop looses Internet connection.
- Desktop unable to communicate with any device on the network, wired or wifi.

Again, only way out of this situation is to reboot the Asus AC3200 router, then all connections restore.

I think it's a DHCP issue because if I recall DHCP leases renew once a month or every 30 days or so. I could be wrong.
 
Last edited:
TBH it doesn't sound like a DHCP issue, but you can check the time remaining for the Asus' DHCP lease by logging into the router and clicking on the WAN "globe". Then look at the WAN status information. - This is assuming that your Asus' WAN interface is configured as a DHCP client.

Look in the Asus' syslog for error messages at the time the problem occurs. If the problem is caused by a failed lease renewal you'll probably see the dreaded "ISP's DHCP did not function properly" message. In your case, as you control the upstream DHCP server (the Fibre Box) you can simply reconfigure the Asus' WAN interface to be static and avoid the DHCP problem altogether.
 
Last edited:
Ok so here's the syslog. What's interesting is it does show that error - but at the same time, it appears the router has some weird reboot that happens right before it shows that error where the time and date are off by alot. Unless that's when I forced the reboot.
 

Attachments

There's nothing particularly unusual in that log file. It just shows the normal boot messages. It shows "5th May" because that is the default date and time the router starts with until it can establish a connection with the internet, at which point it corrects the date and time.

So at boot you do normally see that DHCP error message once but it can be ignored in this case as it's just temporary until the WAN interface comes up a few seconds later. At which time you see the "WAN_Connection: WAN was restored" message.

So I suppose the question is.... why was the router rebooted at 05:29?
 
There's nothing particularly unusual in that log file. It just shows the normal boot messages. It shows "5th May" because that is the default date and time the router starts with until it can establish a connection with the internet, at which point it corrects the date and time.

So at boot you do normally see that DHCP error message once but it can be ignored in this case as it's just temporary until the WAN interface comes up a few seconds later. At which time you see the "WAN_Connection: WAN was restored." message.

So I suppose the question is.... why was the router rebooted at 05:29?

Because that's when I woke up, noticed the event had happened at around 12:30 AM, and used the SmartPlug to reboot the router via SSH to Raspberry Pi on the Fiber Box wifi network.
 
Well there's nothing in the log file after 00:28:38 until your reboot. I'm guessing there normally would be.

So it appears that the router has just "frozen". The fact that some devices are still accessible from the internet can possibly be explained by the fact that they will have an existing conntrack entry and by using hardware acceleration they are bypassing the router's CPU. ...Just a theory.

Without any error messages to go on trying to diagnose the problem is going to be very difficult.

P.S. Even though there's no indication this is DHCP-related you might as well configure the Asus' WAN interface as static. I see it currently has an address of 192.168.100.200. Is this a reservation you have setup on the fibre box. If so just leave it in place and use 192.168.100.200 as the static address on the Asus.

P.P.S. What IP address and netmask are you using for the Asus' LAN?
 
Well there's nothing in the log file after 00:28:38 until your reboot. I'm guessing there normally would be.

So it appears that the router has just "frozen". The fact that some devices are still accessible from the internet can possibly be explained by the fact that they will have an existing conntrack entry and by using hardware acceleration they are bypassing the router's CPU. ...Just a theory.

Without any error messages to go on trying to diagnose the problem is going to be very difficult.

P.S. Even though there's no indication this is DHCP-related you might as well configure the Asus' WAN interface as static. I see it currently has an address of 192.168.100.200. Is this a reservation you have setup on the fibre box. If so just leave it in place and use 192.168.100.200 as the static address on the Asus.

P.P.S. What IP address and netmask are you using for the Asus' LAN?

So it's interesting you mention that. According to the NetSNMP I have that I run - there are some issues that could cause a lockup that occur.

First, the memory use gradually increases - almost like there is a memory leak. When you first reboot the router, it uses about 125/256 MB of RAM. But right before the crash, it was using about 240 MB of RAM.

Second, the CPU use goes wonky after some time. With each CPU sometimes reporting over SNMP that it's over 100% in use. This is a dual core router, and I am pulling off each CPU, so why it would report via SNMP that the CPU is over 100% used is interesting.

1) Yes the IP is static 192.168.100.200
2) 192.168.1.x is the IP range for Asus LAN, while Fiber box has 192.168.100.x. Netmask for Asus LAN is 255.255.255.0

astump@astump-Desktop:~/src/codex/Java$ ifconfig
eno1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 192.168.1.2 netmask 255.255.255.0 broadcast 192.168.1.255
inet6 fe80::e478:cde3:f29e:717a prefixlen 64 scopeid 0x20<link>
ether 08:62:66:c8:c8:7c txqueuelen 1000 (Ethernet)
RX packets 325624572 bytes 402721087336 (402.7 GB)
RX errors 0 dropped 21 overruns 0 frame 0
TX packets 182624476 bytes 40694120963 (40.6 GB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
device interrupt 20 memory 0xdf200000-df220000
 

Attachments

  • mRouterCPU.png
    mRouterCPU.png
    92.1 KB · Views: 380
  • mRouterMemory.png
    mRouterMemory.png
    49.8 KB · Views: 383
One mitigator is to configure the RT-AC3200 to reboot on schedule, say every week or so. Administration - System - Basic Config. I have set mine on regular reboot because it has let me down on occasions, one particular one during a trip, locking me out of my whole house automation/surveillance.

On a recent trip, I set up a second router connected to cell network and controlling one of those smart plugs you also use, to be able to turn off/on the router remotely, in case even the reboot scheduler failed. And then guess what failed? The second router and its cell connection.
 
I think it has to do with memory increase/leak, there has been many postings some months ago but almost settled down with 384.12.
 
So what can I do to help? It's doing it again right now. 125% CPU use on Core 1, 105% on core 2. Using 235 / 256 MB of RAM. It's on the latest firmware. What logs / what can I provide? I'm very intimate with Linux - just ask what you need and I can do it.

Here is top output via SSH:

MEM 234604K used, 20832K free, 1992K shrd, 1184K buff, 4072K cached CPU: 1.2% usr 39.9% sys 0.0% nic 23.1% idle 33.6% io 0.0% irq 1.9% sirq
Load average: 4.17 3.55 3.38 5/112 30095
PID PPID USER STAT VSZ %VSZ CPU %CPU COMMAND
256 1 f00dl3 S 6416 2.5 0 0.0 httpds -s -i br0 -p 8443
347 1 f00dl3 S 6068 2.3 1 0.0 /usr/sbin/smbd -D -s /etc/smb.c 345 1 f00dl3 D 5904 2.3 1 0.9 nmbd -D -s /etc/smb.conf
1705 1 f00dl3 S 5900 2.3 0 0.4 networkmap
257 1 f00dl3 S 5764 2.2 1 0.0 httpd -i br0
346 345 f00dl3 S 5592 2.1 0 0.0 nmbd -D -s /etc/smb.conf 229 1 f00dl3 S 5204 2.0 0 1.5 nt_center
238 229 f00dl3 S 5204 2.0 0 0.0 nt_center
239 238 f00dl3 S 5204 2.0 1 0.0 nt_center
280 1 f00dl3 S 5080 1.9 1 0.4 mastiff
408 406 f00dl3 S 5080 1.9 1 0.3 mastiff
406 280 f00dl3 S 5080 1.9 0 0.0 mastiff
407 406 f00dl3 S 5080 1.9 1 0.0 mastiff
227 226 f00dl3 S 5060 1.9 0 1.5 nt_monitor
240 226 f00dl3 S 5060 1.9 0 1.4 nt_monitor
203 1 f00dl3 S 5060 1.9 1 0.2 nt_monitor
226 203 f00dl3 S 5060 1.9 0 0.0 nt_monitor
330 1 f00dl3 D 4976 1.9 1 1.8 snmpd -c /tmp/snmpd.conf
210 205 f00dl3 S 4936 1.9 1 0.0 /sbin/netool
205 1 f00dl3 S 4936 1.9 0 0.0 /sbin/netool
211 210 f00dl3 S 4936 1.9 0 0.0 /sbin/netool
261 1 f00dl3 S 4932 1.9 1 1.9 watchdog
1 0 f00dl3 S 4924 1.9 1 0.0 /sbin/preinit
196 1 f00dl3 D 4920 1.9 1 1.0 /sbin/wanduck
331 1 f00dl3 S 4916 1.9 0 0.4 pctime
1708 1 f00dl3 S 4916 1.9 1 0.3 usbled
678 1 f00dl3 S 4916 1.9 1 0.0 bwdpi_wred_alive
329 1 f00dl3 S 4916 1.9 1 0.0 hour_monitor
572 1 f00dl3 S 4916 1.9 1 0.0 disk_monitor 222 1 f00dl3 S 4916 1.9 1 0.0 wpsaide 328 1 f00dl3 S 4916 1.9 1 0.0 bwdpi_check
109 1 f00dl3 S 4912 1.9 0 0.0 console
248 1 nobody D 3236 1.2 1 2.5 dnsmasq --log-async
260 1 f00dl3 S 3164 1.2 1 0.0 sysstate
265 1 f00dl3 S 2880 1.1 1 0.0 rstats
 
Last edited:
Here is the top snapshot right as it lost communication with the router/when the SSH session crashed / router froze:

Mem: 234892K used, 20544K free, 1992K shrd, 908K buff, 4300K cached CPU: 0.9% usr 58.9% sys 0.0% nic 12.8% idle 24.7% io 0.0% irq 2.5% sirq
Load average: 6.31 4.97 4.44 3/114 30213
 
And here is a clean view right after resetting it via smartplug.

Things of note - 125 MBs free RAM right after reboot (was only 20.5 MB free when it hung)
Load index 0.16 (was 6.31 when it froze)

Mem: 130020K used, 125416K free, 1700K shrd, 1820K buff, 18796K cached CPU: 0.7% usr 0.9% sys 0.0% nic 94.8% idle 0.0% io 0.0% irq 3.3% sirq
Load average: 0.16 0.20 0.14 1/110 1072
PID PPID USER STAT VSZ %VSZ CPU %CPU COMMAND 224 1 f00dl3 S 5200 2.0 1 1.0 nt_center 203 1 f00dl3 S 2060 0.8 1 0.4 protect_srv 222 221 f00dl3 S 5048 1.9 0 0.1 nt_monitor 1071 1066 f00dl3 R 1424 0.5 0 0.1 top 3 2 f00dl3 SW 0 0.0 0 0.1 [ksoftirqd/0] 346 1 f00dl3 S 6068 2.3 1 0.0 /usr/sbin/smbd -D -s /etc/smb.c 344 1 f00dl3 S 5904 2.3 1 0.0 nmbd -D -s /etc/smb.conf
276 1 f00dl3 S 5900 2.3 0 0.0 networkmap --bootwait 256 1 f00dl3 S 5764 2.2 0 0.0 httpd -i br0
255 1 f00dl3 S 5764 2.2 1 0.0 httpds -s -i br0 -p 8443
345 344 f00dl3 S 5592 2.1 0 0.0 nmbd -D -s /etc/smb.conf
232 231 f00dl3 S 5200 2.0 0 0.0 nt_center 231 224 f00dl3 S 5200 2.0 1 0.0 nt_center
279 1 f00dl3 S 5080 1.9 0 0.0 mastiff
407 405 f00dl3 S 5080 1.9 1 0.0 mastiff
405 279 f00dl3 S 5080 1.9 0 0.0 mastiff 406 405 f00dl3 S 5080 1.9 0 0.0 mastiff
239 221 f00dl3 S 5048 1.9 1 0.0 nt_monitor
202 1 f00dl3 S 5048 1.9 1 0.0 nt_monitor 221 202 f00dl3 S 5048 1.9 1 0.0 nt_monitor 327 1 f00dl3 S 4972 1.9 0 0.0 snmpd -c /tmp/snmpd.conf 204 1 f00dl3 S 4936 1.9 1 0.0 /sbin/netool 211 204 f00dl3 S 4936 1.9 0 0.0 /sbin/netool 212 211 f00dl3 S 4936 1.9 1 0.0 /sbin/netool 1 0 f00dl3 S 4924 1.9 1 0.0 /sbin/preinit 195 1 f00dl3 S 4920 1.9 0 0.0 /sbin/wanduck
260 1 f00dl3 S 4916 1.9 1 0.0 watchdog
678 1 f00dl3 S 4916 1.9 1 0.0 bwdpi_wred_alive
559 1 f00dl3 S 4916 1.9 1 0.0 disk_monitor
328 1 f00dl3 S 4916 1.9 0 0.0 bwdpi_check
225 1 f00dl3 S 4916 1.9 1 0.0 wpsaide
504 1 f00dl3 S 4916 1.9 1 0.0 usbled
329 1 f00dl3 S 4916 1.9 0 0.0 hour_monitor
330 1 f00dl3 S 4916 1.9 0 0.0 pctime
109 1 f00dl3 S 4912 1.9 0 0.0 console
 
Haven't seen a problem like that on any firmware version since I bought my AC3200 in 2015.

NO-LEAK.png
 
So I really do think it's directly tied with mass data use. I have jobs that go out and download raw weather data every 30 minutes from the National Weather Service NOMADs servers. 4 times per day these jobs get large chunks of data - 22-26 GB. Over an average day I'm downloading about 100 GB of data, again, about 2850-3100 GB/month of data use. I'm literally watching that System Info page right now shown in the image above and have seen the Free memory drop from 128 to 121.58 MBs. Looking at my SNMP database data graphs, I see that reflected as well. It appears it just gradually eats up memory - with bursts of eating the RAM up more rapidly that coincide with the CPU spikes which are directly related to when I'm pulling 60+ MB/sec / 360 Mbps+ on Google Fiber for that 25 GBs 4x per day....

It appears something is causing the router to leak memory based on data consumption. Logging maybe done in RAM?
 

Attachments

  • mRouterMemory.png
    mRouterMemory.png
    64.9 KB · Views: 337
  • mRouterCPU.png
    mRouterCPU.png
    118.4 KB · Views: 367
I'm really shocked - or maybe not - that there is no action with this what so ever. It's almost like the developer of the Merlin firmware is ignoring this is a problem, or he is too scared to fork his firmware off of the official Asus firmware in fear that it may actually fix stuff Asus can't. Would have thought with a custom firmware that meant that the developer would actually put forth effort into making sure it's better than the OEM stuff, but maybe not? Does the developer of the Merlin firmware even watch this forum? Can anyone reach out to him? Can he contact me so we can troubleshoot this issue in the firmware together - I know a lot about Linux I can provide some insights. I'm going to pull it off github now @ https://github.com/RMerl/asuswrt-merlin.ng - to see if I can find what's going on myself maybe.

Anyway, happening again. I can't even get into the router stats page it's like just sitting and spinning...

top from the SSH session:



MEM: 234720K used, 20716K free, 2196K shrd, 1572K buff, 5392K cached
CPU: 1.1% usr 59.4% sys 0.0% nic 14.2% idle 24.0% io 0.0% irq 0.9% sirq
Load average: 11.83 7.93 5.81 4/112 1414
PID PPID USER STAT VSZ %VSZ CPU %CPU COMMAND
27 2 f00dl3 SW 0 0.0 0 28.1 [mtdblock3]
260 1 f00dl3 D 4936 1.9 1 3.3 watchdog
247 1 nobody D 3236 1.2 1 2.2 dnsmasq --log-async
327 1 f00dl3 D 4976 1.9 1 2.0 snmpd -c /tmp/snmpd.conf *Filtered
228 1 f00dl3 S 5208 2.0 1 1.8 nt_center
226 225 f00dl3 R 5060 1.9 1 1.6 nt_monitor
255 1 f00dl3 D 6584 2.5 1 1.5 httpds -s -i br0 -p 8443
504 1 f00dl3 S 4916 1.9 1 1.5 usbled
344 1 f00dl3 R 5904 2.3 0 1.4 nmbd -D -s /etc/smb.conf
1413 1401 f00dl3 R 1424 0.5 1 1.2 top
264 1 f00dl3 S 2880 1.1 1 1.0 rstats
203 1 f00dl3 S 2060 0.8 1 1.0 protect_srv - What is this xxx 1 x S xxx 1.9 1 0.9 /sbin/wanduck
1398 217 f00dl3 S 1220 0.4 1 0.8 dropbear -p xxxxx -s -j -k
259 1 f00dl3 S 3164 1.2 1 0.7 sysstate
595 1 f00dl3 S 2040 0.8 0 0.7 dcd *Filtered - unsure what this is?
219 1 f00dl3 S 1328 0.5 1 0.7 /bin/eapd
600 596 f00dl3 S 2040 0.8 1 0.5 dcd *Filtered - unsure what this is?
238 1 f00dl3 S 2228 0.8 1 0.4 /usr/sbin/wlceventd
384 382 f00dl3 S 5080 1.9 0 0.3 mastiff
330 1 f00dl3 D 4916 1.9 1 0.1 pctime
382 277 f00dl3 S 5080 1.9 0 0.1 mastiff
507 505 f00dl3 S 2488 0.9 0 0.1 u2ec *Filtered - unsure what this is?
274 1 f00dl3 S 5904 2.3 0 0.0 networkmap --bootwait
383 382 f00dl3 S 5080 1.9 0 0.0 mastiff *Filtered - unsure what this is?
 

Attachments

  • Screenshot from 2019-10-17 16-21-45.png
    Screenshot from 2019-10-17 16-21-45.png
    256.4 KB · Views: 268
Last edited:
I too think there is a memory leak in the Asus code (this isn't Merlin related - I've tried stock as well). I have about 30 smartdevices, and I couldn't stay connected with an ac88. Memory starts out at a reasonable level and maxes out in less than a day and a freeze occurs. I upgraded to an ax88 and it does the same thing, just takes longer than a day because it has more memory to burn. I reboot every night. I've brought this up in the past, but it's like everything in IT - it's the user's fault - so I just accept it at this point. I do hope you can find something. I have had many Asus products over the years, and I've always been impressed with the hardware and totally flabbergasted by the poor quality code (we become their alpha/beta testers).
 
I upgraded to an ax88 and it does the same thing, just takes longer than a day because it has more memory to burn. I reboot every night.
I have a AX88U and can say with confidence that Linux is still in control of the memory. Everything is fine, the memory usage you see on the network map page is not a representation of used memory. It is instead a record of the memory that has been used to date, and either set aside as cache or flushed. Go to www.linuxatemyram.com and read up on how your router manages its memory. Your so called leak that travels from one router model to the next, is not a memory leak, but rather a configuration or a device issue. Common culprits are importing saved configs of some type or just plain misconfiguration, and not resetting to defaults and slowly manually adding item by item to find the break point. This is a one page thread, not many people with the above mentioned problems, like I said this is distinct to you and your configuration or practices.
 
I'm really shocked - or maybe not - that there is no action with this what so ever. It's almost like the developer of the Merlin firmware is ignoring this is a problem, or he is too scared to fork his firmware off of the official Asus firmware in fear that it may actually fix stuff Asus can't. Would have thought with a custom firmware that meant that the developer would actually put forth effort into making sure it's better than the OEM stuff, but maybe not?
A lot of the ASUS firmware is open sourced, but the juicy parts are still closed source (wireless and Trend Micro components for example). If you find a bug in the source, let us know!
 

Latest threads

Support SNBForums w/ Amazon

If you'd like to support SNBForums, just use this link and buy anything on Amazon. Thanks!

Sign Up For SNBForums Daily Digest

Get an update of what's new every day delivered to your mailbox. Sign up here!
Back
Top