Stuck commands

Ranger802004 · Feb 28, 2023

Martinski said:
I didn't run or use your script for "production." I simply tried to run it on my router to see how it works as a "proof of concept" (which I'm very familiar with as a professional s/w dev. myself). The point was that the current version of the script goes into an infinite loop because some NVRAM vars are set to empty strings or not set at all, which are not uncommon scenarios on ASUS routers. A "proof of concept" demonstration should take care of common scenarios like empty values; it doesn't have to be completely foolproof, but it shouldn't go into an infinite loop either.

Look, I get it. Nobody likes criticism, and some people are more averse to it than others even when it's constructive, as my feedback was meant to be. In one way or another, we're here to learn and if you are, I can offer some advice. If not, I can certainly move on - I got no skin in this game.

I actually don’t mind criticism at all and this may be how you write but you do seem to come off as providing a little more than criticism and you are welcome to edit the script for checking values you prefer to test with that aren’t null. Again it’s just a conceptual script.

j911 · Sep 14, 2023

I seem to be experiencing the symptoms mentioned here. The AC86U in question is abroad and use a vpn connection to login to it. Wifi unstable, client list not updating, connections dropping etc etc. Could these all be symptoms of issues mentioned here? Any help is appreciated.

ColinTaylor · Sep 14, 2023

j911 said:
I seem to be experiencing the symptoms mentioned here. The AC86U in question is abroad and use a vpn connection to login to it. Wifi unstable, client list not updating, connections dropping etc etc. Could these all be symptoms of issues mentioned here? Any help is appreciated.

Probably not. See post #1. SSH into the router and run top or ps to see if there are any processes that look like they shouldn't be there.

bibikalka · Jan 26, 2024

Ran the script for a day, below is the grep from trace file for the script. Lots of 'nvram get' commands get stuck!

Code:

admin@RT-AC86U-9988:/tmp/mnt/ac86u/entware/var/log/Trace# grep nvram *.txt
StuckProcCmds_00001_06276.TRC.txt:2024-01-25 12:05:21  1119  1111 admin    S     3104  0.7   0  0.0 nvram get productid [KILLED]
StuckProcCmds_00001_06276.TRC.txt:2024-01-25 12:05:04  1142     1 admin    S     3104  0.7   0  0.0 nvram get odmpid [KILLED]
StuckProcCmds_00001_06276.TRC.txt:2024-01-25 12:04:50  1142     1 admin    S     3104  0.7   0  0.0 nvram get odmpid
StuckProcCmds_00001_06276.TRC.txt:2024-01-25 12:04:50  1119  1111 admin    S     3104  0.7   0  0.0 nvram get productid
StuckProcCmds_00002_04205.TRC.txt:2024-01-25 19:51:18  1741  1728 admin    S     3104  0.7   1  0.0 nvram get ntp_ready [KILLED]
StuckProcCmds_00002_04205.TRC.txt:2024-01-25 19:51:04  1741  1728 admin    S     3104  0.7   1  0.0 nvram get ntp_ready
StuckProcCmds_00003_05066.TRC.txt:2024-01-25 21:12:19  1142  1136 admin    S     3104  0.7   0  0.0 nvram get http_username [KILLED]
StuckProcCmds_00003_05066.TRC.txt:2024-01-25 21:12:04  1142  1136 admin    S     3104  0.7   0  0.0 nvram get http_username
StuckProcCmds_00004_06143.TRC.txt:2024-01-25 23:18:18  1138  1107 admin    S     3104  0.7   0  0.0 nvram get productid [KILLED]
StuckProcCmds_00004_06143.TRC.txt:2024-01-25 23:18:04  1138  1107 admin    S     3104  0.7   0  0.0 nvram get productid
StuckProcCmds_00005_03983.TRC.txt:2024-01-26 00:42:19  2364  2363 admin    S     3104  0.7   1  0.0 nvram get http_username [KILLED]
StuckProcCmds_00005_03983.TRC.txt:2024-01-26 00:42:04  2364  2363 admin    S     3104  0.7   1  0.0 nvram get http_username
StuckProcCmds_00006_05344.TRC.txt:2024-01-26 03:07:06 99925 99924 admin    S N   2972  0.7   0  0.0 nvram get custom_clientlist [KILLED]
StuckProcCmds_00006_05344.TRC.txt:2024-01-26 03:06:51 99925 99924 admin    S N   2972  0.7   0  0.0 nvram get custom_clientlist
StuckProcCmds_00006_05344.TRC.txt:2024-01-26 03:06:51 87093 86301 admin    S     2972  0.7   1  0.0 nvram get vpn_server_custom
StuckProcCmds_00006_05344.TRC.txt:2024-01-26 03:06:34 98559 98558 admin    S N   2972  0.7   0  0.0 nvram get custom_clientlist [KILLED]
StuckProcCmds_00006_05344.TRC.txt:2024-01-26 03:06:19 98559 98558 admin    S N   2972  0.7   0  0.0 nvram get custom_clientlist
StuckProcCmds_00006_05344.TRC.txt:2024-01-26 03:06:19 87093 86301 admin    S     2972  0.7   1  0.0 nvram get vpn_server_custom
StuckProcCmds_00006_05344.TRC.txt:2024-01-26 03:06:01 97088 97087 admin    S N   2972  0.7   1  0.0 nvram get custom_clientlist [KILLED]
StuckProcCmds_00006_05344.TRC.txt:2024-01-26 03:05:46 97088 97087 admin    S N   2972  0.7   1  0.0 nvram get custom_clientlist
StuckProcCmds_00006_05344.TRC.txt:2024-01-26 03:05:46 87093 86301 admin    S     2972  0.7   1  0.0 nvram get vpn_server_custom
StuckProcCmds_00006_05344.TRC.txt:2024-01-26 03:05:29 95141 95140 admin    S N   2972  0.7   0  0.0 nvram get custom_clientlist [KILLED]
StuckProcCmds_00006_05344.TRC.txt:2024-01-26 03:05:14 95141 95140 admin    S N   2972  0.7   0  0.0 nvram get custom_clientlist
StuckProcCmds_00006_05344.TRC.txt:2024-01-26 03:05:14 87093 86301 admin    S     2972  0.7   1  0.0 nvram get vpn_server_custom
StuckProcCmds_00006_05344.TRC.txt:2024-01-26 03:04:56 93762 93761 admin    S N   2972  0.7   0  0.0 nvram get custom_clientlist [KILLED]
StuckProcCmds_00006_05344.TRC.txt:2024-01-26 03:04:41 93762 93761 admin    S N   2972  0.7   0  0.0 nvram get custom_clientlist
StuckProcCmds_00006_05344.TRC.txt:2024-01-26 03:04:41 87093 86301 admin    S     2972  0.7   1  0.0 nvram get vpn_server_custom
StuckProcCmds_00006_05344.TRC.txt:2024-01-26 03:04:24 92286 92285 admin    S N   2972  0.7   0  0.0 nvram get custom_clientlist [KILLED]
StuckProcCmds_00006_05344.TRC.txt:2024-01-26 03:04:09 92286 92285 admin    S N   2972  0.7   0  0.0 nvram get custom_clientlist
StuckProcCmds_00006_05344.TRC.txt:2024-01-26 03:04:09 87093 86301 admin    S     2972  0.7   1  0.0 nvram get vpn_server_custom
StuckProcCmds_00006_05344.TRC.txt:2024-01-26 03:03:51 90917 90916 admin    S N   2972  0.7   0  0.0 nvram get custom_clientlist [KILLED]
StuckProcCmds_00006_05344.TRC.txt:2024-01-26 03:03:36 90917 90916 admin    S N   2972  0.7   0  0.0 nvram get custom_clientlist
StuckProcCmds_00006_05344.TRC.txt:2024-01-26 03:03:36 87093 86301 admin    S     2972  0.7   1  0.0 nvram get vpn_server_custom
StuckProcCmds_00006_05344.TRC.txt:2024-01-26 03:03:19 87729 87728 admin    S N   2972  0.7   1  0.0 nvram get custom_clientlist [KILLED]
StuckProcCmds_00006_05344.TRC.txt:2024-01-26 03:03:04 87729 87728 admin    S N   2972  0.7   1  0.0 nvram get custom_clientlist
StuckProcCmds_00006_05344.TRC.txt:2024-01-26 03:03:04 87093 86301 admin    S     2972  0.7   1  0.0 nvram get vpn_server_custom

MarkusI · Mar 28, 2025

ColinTaylor said:
Looking at the source code, AFAICT the only routers that have their own prebuilt module are RT-AX88U, XT12, GT-AX11000, RT-AX56U, RT-AX58U, RT-AX68U, RT-AX86U, GT-AX6000, GT-AXE11000 and RT-AC68U_V4. If I'm reading it right all the other routers share a different common module. Given how popular the RT-AC68U is I wonder why we're not getting reports of stuck processes on that model.

Yes I believe that's the case.

Hi,
since this post is from 2022, I wonder if the issue is really one from the past / old routers...
Or in other words - if I get myself a new BE-version, can I assume that the device won't suffer from the same issues?

MarkusI · Mar 28, 2025

ColinTaylor said:
@dave14305 If you have the time would you mind running the following script on your RT-AC86U. It should print out a list of active netlink socket numbers that don't have a matching pid. My router is very minimal and doesn't run things like AiProtection so I'm curious to see if you have a lot more mismatched netlink sockets than I do (6 x 2 = 12).

Code:

#!/bin/sh cat /proc/net/netlink | sort -nk3 | \ awk ' BEGIN { print "\nPrint netlink sockets for which there is no process with the same number\n" getline pid_max < "/proc/sys/kernel/pid_max" } { if ( $2 == "31" ) { if ( $3 < pid_max ) system("kill -0 " $3 " 2>/dev/null || echo \"Process " $3 " not found\"") else { orig_pid = $3 - pid_max - 2 system("kill -0 " orig_pid " 2>/dev/null || echo \"Process associated with " $3 " not found (" orig_pid ")\"") } } } END { print "\nA number in brackets is a *guess* at an associated process\n" } '

EDIT: Removed some unnecessary experimental code from script (just in case it confuses people).
No comments on the quality of my coding please.

P.S. Not that this achieves anything other that satisfying my curiosity.

I wonder if it would be somehow possible to write a script or base it on the quoted one which identifies orphans and cleans them up so that nvram requests won't get stuck.

I read a lot about the issue but am struggling to really understand how things hang together and how to work around them, if possible at all.

I'm helping 100%-no-nerd-friends on their farm with an AiMesh setup and unfortunately, we have some AC86U routers there.
Those regularly run into that nvram condition.
My friends have 4 AC86U and it would be quite an investment to completely replace them...

So I'm trying to find a workaround.

I did the dumbass approach to check every 5 minutes via 'nvram get cfg_device_list' whether it hangs and if so, reboot the router... not considering that executing the nvram call every 5 minutes could actually worsen the issue

...

So I'm looking for a different approach to either regularly clean up so that it does not lock up in the first place or identify lockups and reboot without increasing their probability.

I'm a bit stuck here and any help is very much appreciated...
Also, I have to admit that I don't understand the details of what the quoted script does and why and if I could somehow use it...

Thx
Markus

PS: echo 4194304 > /proc/sys/kernel/pid_max is already in place.

ColinTaylor · Mar 28, 2025

MarkusI said:
I'm helping 100%-no-nerd-friends on their farm with an AiMesh setup and unfortunately, we have some AX86U routers there.
Those regularly run into that nvram condition.

The RT-AX86U doesn't suffer from this problem.

MarkusI · Mar 29, 2025

ColinTaylor said:
The RT-AX86U doesn't suffer from this problem.

Sorry, a typo... it's 4 AC86U...

Is it only the AC86U which suffers from the issue?
I ask because you wrote in of your threads that also others (incl. AX86U) share the same prebuilt modules ('Looking at the source code, AFAICT the only routers that have their own prebuilt module are RT-AX88U, XT12, GT-AX11000, RT-AX56U, RT-AX58U, RT-AX68U, RT-AX86U, GT-AX6000, GT-AXE11000 and RT-AC68U_V4. ')...

alan6854321 · Mar 29, 2025

MarkusI said:
Is it only the AC86U which suffers from the issue?

I see these hangs on a RT-AX88U Pro.

MarkusI · Mar 29, 2025

alan6854321 said:
I see these hangs on a RT-AX88U Pro.

with the result that the router somewhen stops working (at least partly) / needs a reboot, like AC86u?

alan6854321 · Mar 29, 2025

MarkusI said:
with the result that the router somewhen stops working (at least partly) / needs a reboot, like AC86u?

No, pretty much carries on regardless.

The hangs I see are in my backup script that runs overnight, the
"nvram save /tmp/mnt/NAS/public/BACKUPS/xxxxx" command hangs.

It just sits there forever if I don't do anything.
The backup still runs fine the next day.

The only time I've had issues with the router failing is when dnsmasq was restarted by the watchdog. It hung on each attempt and as a result no new clients could connect to the router. That needed a reboot to sort it.

Viktor Jaep · Mar 29, 2025

Have you guys tried running this little doozie to help clear up stuck processes courtesy of @Martinski?

- Kill Stuck Proc Cmds - Source - Dev: @Martinski

MarkusI · Mar 29, 2025

Viktor Jaep said:
Have you guys tried running this little doozie to help clear up stuck processes courtesy of @Martinski?

- Kill Stuck Proc Cmds - Source - Dev: @Martinski

Oh thanks!!! I missed that one! Will check it out asap!!!

MarkusI · Mar 29, 2025

@Viktor Jaep ...
The script does not seem to work (for me)...

I did the following:
1. I ran 'nvram get cfg_device_list' successfully
2. I provoked a hang (seems to be reproducible when restarting AiMesh nodes while frantically reloading the AiMesh page in the main node admin)
3. Confirmed that 'nvram get cfg_device_list' now hangs when executed from shell

When this happens, after logging in to the admin ui, the ui loads forever until getting a timeout.

The script from this post reports

Code:

Print netlink sockets for which there is no process with the same number

Process 1323 not found
Process 1343 not found
Process 1346 not found
Process 1349 not found
Process 1520 not found
Process 1816 not found
Process 32771 not found
Process associated with 4195629 not found (1323)
Process associated with 4195649 not found (1343)
Process associated with 4195652 not found (1346)
Process associated with 4195655 not found (1349)
Process associated with 4195826 not found (1520)
Process associated with 4196122 not found (1816)

A number in brackets is a *guess* at an associated process

while the Kill Stuck Proc Cmds script tells me:

Code:

FOUND: [0]

What am I missing?

Thanks

Viktor Jaep · Mar 29, 2025

MarkusI said:
@Viktor Jaep ...
The script does not seem to work (for me)...

I did the following:
1. I ran 'nvram get cfg_device_list' successfully
2. I provoked a hang (seems to be reproducible when restarting AiMesh nodes while frantically reloading the AiMesh page in the main node admin)
3. Confirmed that 'nvram get cfg_device_list' now hangs when executed from shell

When this happens, after logging in to the admin ui, the ui loads forever until getting a timeout.

The script from this post reports

Code:

Print netlink sockets for which there is no process with the same number Process 1323 not found Process 1343 not found Process 1346 not found Process 1349 not found Process 1520 not found Process 1816 not found Process 32771 not found Process associated with 4195629 not found (1323) Process associated with 4195649 not found (1343) Process associated with 4195652 not found (1346) Process associated with 4195655 not found (1349) Process associated with 4195826 not found (1520) Process associated with 4196122 not found (1816) A number in brackets is a *guess* at an associated process

while the Kill Stuck Proc Cmds script tells me:

Code:

FOUND: [0]

What am I missing?

Thanks

You run it 2x in a row. It's meant to run every 5 mins from cron. See if that does the trick?

MarkusI · Mar 29, 2025

Viktor Jaep said:
You run it 2x in a row. It's meant to run every 5 mins from cron. See if that does the trick?

Jepp, you're right.
When I see my command hang and run the script twice, the hanging nvram command gets killed.
I installed the script now as cron as recommended and will observe over the next days it that makes a change.
Thanks for that

The only thing which still bothers me now is that the admin UI keeps hanging in these cases, no matter what.
Also when I do a 'service restart_httpd', it does not make a difference.
The only way to help seems to be a reboot...
I have to admit, though, that I tested that before installing the cron.

In other words: when my nvram command hangs, the admin UI gets timeouts, too... the script does kill my nvram command but does not fix the ui.
Also, the script reports zero hanging commands.

Can you make any sense of this?
Thanks, Markus

Viktor Jaep · Mar 29, 2025

MarkusI said:
Jepp, you're right.
When I see my command hang and run the script twice, the hanging nvram command gets killed.
I installed the script now as cron as recommended and will observe over the next days it that makes a change.
Thanks for that

The only thing which still bothers me now is that the admin UI keeps hanging in these cases, no matter what.
Also when I do a 'service restart_httpd', it does not make a difference.
The only way to help seems to be a reboot...
I have to admit, though, that I tested that before installing the cron.

In other words: when my nvram command hangs, the admin UI gets timeouts, too... the script does kill my nvram command but does not fix the ui.
Also, the script reports zero hanging commands.

Can you make any sense of this?
Thanks, Markus

Not quite sure why that would be happening... but perhaps increasing that pid_max value might help it occur less frequently? See instructions below:

Asus AC86U - nvram show stops working after few hours causing router not accessible via WebUI

Hi, I came across a strange problem with my AC86U running latest merlin firmware (but it was happening on older as well). Basically router becomes unaccessible via WebUI after few hours after reboot. Just login screen shows up and after putting in login details it doesn't continue any further...

www.snbforums.com

MarkusI · Mar 30, 2025

Thanks!
That is already in place... I will observe what the cron now does for reliability... maybe everything is fine now..

MarkusI · Apr 11, 2025

So, after about two weeks, I can safely say, that the cron script does not prevent the main router from hanging.
In most cases, the UI gets timeouts, in some, it looses connections to all AIMesh nodes. SSHing always works.
After a reboot, all nodes again reconnect and everything is back to normal.
We decided to now replace it with an AX88U Pro and hope that this solves the issue.

Viktor Jaep · Apr 11, 2025

MarkusI said:
So, after about two weeks, I can safely say, that the cron script does not prevent the main router from hanging.
In most cases, the UI gets timeouts, in some, it looses connections to all AIMesh nodes. SSHing always works.
After a reboot, all nodes again reconnect and everything is back to normal.
We decided to now replace it with an AX88U Pro and hope that this solves the issue.

Yeah, it's no good to try to keep a router functional with a workaround. Good luck with your new router!

Thread starter	Title	Forum	Replies	Date
	Asus RT-AC66U Merlin Stuck applying settings when manually assigning IP	Asuswrt-Merlin	19	Jul 28, 2024
	Asus RT-AX88U Stuck on Merlin Firmware 388.1	Asuswrt-Merlin	9	May 13, 2024

Stuck commands

Very Senior Member

Occasional Visitor

Part of the Furniture

Senior Member

Occasional Visitor

Occasional Visitor

Part of the Furniture

Occasional Visitor

Senior Member

Occasional Visitor

Senior Member

Part of the Furniture

Occasional Visitor

Occasional Visitor

Part of the Furniture

Occasional Visitor

Part of the Furniture

Occasional Visitor

Occasional Visitor

Part of the Furniture

Similar threads

Similar threads

Support SNBForums w/ Amazon

Sign Up For SNBForums Daily Digest