What's new
  • SNBForums Code of Conduct

    SNBForums is a community for everyone, no matter what their level of experience.

    Please be tolerant and patient of others, especially newcomers. We are all here to share and learn!

    The rules are simple: Be patient, be nice, be helpful or be gone!

My experience with the RT-AC86U

I don't expect to be able to get a replacement.

In any case, I think it's reasonable for Asus to investigate this. It's in their own interest.
Maybe it's a batch of faulty nvram chips. They could turn to their supplier for this and take measures not to allow it again.
Or it could be bad CPU design. What if there's a flaw in the nvram read instructions?
Or could be a flaw in their implementation - power supply, EM interference, etc.
Given what they've done with the thermal design* of this router, this may seem minor. Yet it's not so negligible, because if the root cause is not understood, problem may be replicated again.

*This is a big red flag. A company that releases such products worries me (unless it's intentional to shorten the products' life and make the users buy more frequently, in which case it's not incompetence but policy).

@SomeWhereOverTheRainBow: no, I did not do a factory reset.
That would be one of the troubleshooting steps asus might require you to perform.
 
I got the hint. ;)
Factory reset, reconfigured everything from scratch. Problem with Samba gone.

For the record, if anyone gets in the same situation - these zombie processes hogging the CPU are not spawned by the router. The Windows host somehow initiates this mess. Problem is, even after the Windows host is shut down, the CPU consuming processes on the router stay (but no new ones get created). Why and how does it happen - no idea. I might eventually drop Samba altogether - it's just very handy on a network with different hosts (Windows, Linux, Android).

The nvram get issue is still here to haunt me. :)
 
I got the hint. ;)
Factory reset, reconfigured everything from scratch. Problem with Samba gone.

For the record, if anyone gets in the same situation - these zombie processes hogging the CPU are not spawned by the router. The Windows host somehow initiates this mess. Problem is, even after the Windows host is shut down, the CPU consuming processes on the router stay (but no new ones get created). Why and how does it happen - no idea. I might eventually drop Samba altogether - it's just very handy on a network with different hosts (Windows, Linux, Android).

The nvram get issue is still here to haunt me. :)
Yea it is pretty bad when you try to RMA a bricked router that cannot access recovery mode and the technician insist on reading the same script telling you to access the factory reset page.
 
Okay when you are locked out what happens when you send the kill signal for the process to "resume"?
Here's what happens

Ran the script, it locked up at 645:
1655227411937.png

According to htop, it choked on "nvram get vpn_client2_state"
1655227462481.png

I sigkill it, and the script keeps on trucking...

1655227637975.png

Note how it said "killed" and the vpn state stat is missing

1655227684992.png

Then it eventually died again at 4147
 
Here's what happens

Ran the script, it locked up at 645:
View attachment 41866
According to htop, it choked on "nvram get vpn_client2_state"
View attachment 41867
I sigkill it, and the script keeps on trucking...

View attachment 41868
Note how it said "killed" and the vpn state stat is missing

View attachment 41869
Then it eventually died again at 4147
So we need to trap the suspend signal @Viktor Jaep "stp" I believe. It appears the router is suspending the process at a certain point so the loop doesn't run infinitely killing resources. This behavior can be stopped if we trap the suspend signal with the rest of our trap
 
So we need to trap the suspend signal @Viktor Jaep "stp" I believe. It appears the router is suspending the process at a certain point so the loop doesn't run infinitely killing resources. This behavior can be stopped if we trap the suspend signal with the rest of our trap
Like this?

Code:
#!/bin/sh
trap '' HUP INT QUIT ABRT TERM TSTP
(i="0"
while true; do
  i="$(( i + 1 ))"
  for nv in 1 2 3 4 5; do
  unset "state${nv}";
  eval "state${nv}"="$(/bin/nvram get vpn_client${nv}_state)";
  done
  clear
  echo "$state1" "$state2" "$state3" "$state4" "$state5"
  echo "$i"
done) > /tmp/mynvramerror.log 2>&1 &


exit 0
 
Like this?

Code:
#!/bin/sh
trap '' HUP INT QUIT ABRT TERM STP
(i="0"
while true; do
  i="$(( i + 1 ))"
  for nv in 1 2 3 4 5; do
  unset "state${nv}";
  eval "state${nv}"="$(/bin/nvram get vpn_client${nv}_state)";
  done
  clear
  echo "$state1" "$state2" "$state3" "$state4" "$state5"
  echo "$i"
done) > /tmp/mynvramerror.log 2>&1 &


exit 0
I believe it is "TSTP"
 
I don't expect to be able to get a replacement.

In any case, I think it's reasonable for Asus to investigate this. It's in their own interest.
Maybe it's a batch of faulty nvram chips. They could turn to their supplier for this and take measures not to allow it again.
Or it could be bad CPU design. What if there's a flaw in the nvram read instructions?
Or could be a flaw in their implementation - power supply, EM interference, etc.
Given what they've done with the thermal design* of this router, this may seem minor. Yet it's not so negligible, because if the root cause is not understood, problem may be replicated again.

*This is a big red flag. A company that releases such products worries me (unless it's intentional to shorten the products' life and make the users buy more frequently, in which case it's not incompetence but policy).

@SomeWhereOverTheRainBow: no, I did not do a factory reset.
As @dave14305 alluded in his post on the thread here, the same issue also happens fairly regularly with the /usr/sbin/wl command, so I don't think the root cause is a hardware flaw or a failing NVRAM chip. At this point, I agree with @RMerlin that it looks more like a deadlock condition with two (or more) competing threads not releasing their corresponding lock/mutex/semaphore appropriately and at the right time, so they end up waiting on each other forever.

I got very curious last Sunday about this problem, so I ended up writing a script that looks for both "nvram" & "wl" commands that appear "stuck" and then captures the tree path to their root parent process. I was trying to see if the same parent processes show up when the hangs occur. In my case, the same pair show up more frequently: YazFi & conn_diag. I think that's probably because they both frequently make calls to the "wl" command, and Yazfi to 'nvram get' as well. BTW, I found out when going thru the logs generated by the script that the "cru l" command can get "stuck" as well because it makes a call to "nvram get http_username" which gives the filename containing the list of cron jobs (e.g. /var/spool/cron/crontabs/{http_username}).

Today, I added code to the script to kill the "stuck" processes when found on the 2nd round of the search when the script is set up to run as a cron job. I have it set up to run every 5 minutes since I don't get a lot of occurrences (average about 3 a day). But I'd imagine that for those folks who are running many 3rd-party add-ons which call nvram and/or wl commands frequently, they may see the bug much more often. This script is not a solution at all, but at least it will eliminate all those "stuck" processes that stay around until the next reboot.

Here is the script if you want to try it. It was initially meant to be a diagnostic tool, so it's still a bit "raw" and it has not been polished with a round of refactoring.

Type ./CheckStuckProcCmds.sh -help to get usage description.
 
Welp... even with TSTP, I'm seeing similar behavior after sigkilling the stuck nvram get call:
View attachment 41870
View attachment 41871
View attachment 41872
What about
Code:
#!/bin/sh 
#trap '' HUP INT QUIT ABRT TERM TSTP 
(i="0" 
while true; do 
  i="$(( i + 1 ))";
  for nv in 1 2 3 4 5; do 
    unset "state${nv}"; 
    eval "state${nv}"="$(/bin/nvram get vpn_client${nv}_state)" & kill $!;
    wait
  done 
  clear 
  echo "$state1" "$state2" "$state3" "$state4" "$state5" 
  echo "$i" 
done) > /tmp/mynvramerror.log 2>&1 & 
exit 0
 
Last edited:
As @dave14305 alluded in his post on the thread here, the same issue also happens fairly regularly with the /usr/sbin/wl command, so I don't think the root cause is a hardware flaw or a failing NVRAM chip. At this point, I agree with @RMerlin that it looks more like a deadlock condition with two (or more) competing threads not releasing their corresponding lock/mutex/semaphore appropriately and at the right time, so they end up waiting on each other forever.

I got very curious last Sunday about this problem, so I ended up writing a script that looks for both "nvram" & "wl" commands that appear "stuck" and then captures the tree path to their root parent process. I was trying to see if the same parent processes show up when the hangs occur. In my case, the same pair show up more frequently: YazFi & conn_diag. I think that's probably because they both frequently make calls to the "wl" command, and Yazfi to 'nvram get' as well. BTW, I found out when going thru the logs generated by the script that the "cru l" command can get "stuck" as well because it makes a call to "nvram get http_username" which gives the filename containing the list of cron jobs (e.g. /var/spool/cron/crontabs/{http_username}).

Today, I added code to the script to kill the "stuck" processes when found on the 2nd round of the search when the script is set up to run as a cron job. I have it set up to run every 5 minutes since I don't get a lot of occurrences (average about 3 a day). But I'd imagine that for those folks who are running many 3rd-party add-ons which call nvram and/or wl commands frequently, they may see the bug much more often. This script is not a solution at all, but at least it will eliminate all those "stuck" processes that stay around until the next reboot.

Here is the script if you want to try it. It was initially meant to be a diagnostic tool, so it's still a bit "raw" and it has not been polished with a round of refactoring.

Type ./CheckStuckProcCmds.sh -help to get usage description.

Well, it looks like the forum didn't like the script file as an attachment, even with a *.TXT file extension. I can put it in Pastebin if you're interested in trying the script.
 
Well, it looks like the forum didn't like the script file as an attachment, even with a *.TXT file extension. I can put it in Pastebin if you're interested in trying the script.
Please do! Thanks for your work on this! :)
 
As @dave14305 alluded in his post on the thread here, the same issue also happens fairly regularly with the /usr/sbin/wl command, so I don't think the root cause is a hardware flaw or a failing NVRAM chip. At this point, I agree with @RMerlin that it looks more like a deadlock condition with two (or more) competing threads not releasing their corresponding lock/mutex/semaphore appropriately and at the right time, so they end up waiting on each other forever.

I got very curious last Sunday about this problem, so I ended up writing a script that looks for both "nvram" & "wl" commands that appear "stuck" and then captures the tree path to their root parent process. I was trying to see if the same parent processes show up when the hangs occur. In my case, the same pair show up more frequently: YazFi & conn_diag. I think that's probably because they both frequently make calls to the "wl" command, and Yazfi to 'nvram get' as well. BTW, I found out when going thru the logs generated by the script that the "cru l" command can get "stuck" as well because it makes a call to "nvram get http_username" which gives the filename containing the list of cron jobs (e.g. /var/spool/cron/crontabs/{http_username}).

Today, I added code to the script to kill the "stuck" processes when found on the 2nd round of the search when the script is set up to run as a cron job. I have it set up to run every 5 minutes since I don't get a lot of occurrences (average about 3 a day). But I'd imagine that for those folks who are running many 3rd-party add-ons which call nvram and/or wl commands frequently, they may see the bug much more often. This script is not a solution at all, but at least it will eliminate all those "stuck" processes that stay around until the next reboot.

Here is the script if you want to try it. It was initially meant to be a diagnostic tool, so it's still a bit "raw" and it has not been polished with a round of refactoring.

Type ./CheckStuckProcCmds.sh -help to get usage description.
When the file in /jffs/.sys/diag_db/ is not updating, is it a sign of something stucked? There are two files in this directory and get rotated everyday around 8am. Mine usually stop updating a few days after a reboot.
 
What about
Code:
#!/bin/sh
#trap '' HUP INT QUIT ABRT TERM TSTP
(i="0"
while true; do
  i="$(( i + 1 ))";
  for nv in 1 2 3 4 5; do
    unset "state${nv}";
    eval "state${nv}"="$(/bin/nvram get vpn_client${nv}_state)" & kill $!;
    wait
  done
  clear
  echo "$state1" "$state2" "$state3" "$state4" "$state5"
  echo "$i"
done) > /tmp/mynvramerror.log 2>&1 &
exit 0
Didn't like this...

nvrampoc.sh: line 12: syntax error: unexpected ")" (expecting "done")
 
@Martinski: This is all good info but I have a question.
If the stuck nvram get commands have nothing to do with the hardware, how come this only happens on the AC86U?

I let the same loop run on an AC68U last night and it got beyond 870,000 iterations before I finally decided to stop it.
 
Yea I had to revise the post make sure you have the revised version
Wow, whatever you did there caused my /sbin/init to go off the chart and spike utilization to the max. Log was filled with these... something aint right:

1655231204522.png
 
When the file in /jffs/.sys/diag_db/ is not updating, is it a sign of something stucked? There are two files in this directory and get rotated everyday around 8am. Mine usually stop updating a few days after a reboot.
This is normal. Mine usually stop updating 24-48 hours after a reboot.
 
Wow, whatever you did there caused my /sbin/init to go off the chart and spike utilization to the max. Log was filled with these... something aint right:

View attachment 41873
Dial it back a minute and try

Code:
#!/bin/sh 
#trap '' HUP INT QUIT ABRT TERM TSTP 
(i="0" 
while true; do 
  i="$(( i + 1 ))";
  for nv in 1 2 3 4 5; do 
    unset "state${nv}"; 
    eval "state${nv}"="$(/bin/nvram get vpn_client${nv}_state)";
    wait
  done 
  clear 
  echo "$state1" "$state2" "$state3" "$state4" "$state5" 
  echo "$i" 
done) > /tmp/mynvramerror.log 2>&1 & 
exit 0
 

Similar threads

Latest threads

Support SNBForums w/ Amazon

If you'd like to support SNBForums, just use this link and buy anything on Amazon. Thanks!

Sign Up For SNBForums Daily Digest

Get an update of what's new every day delivered to your mailbox. Sign up here!
Back
Top