What's new
  • SNBForums Code of Conduct

    SNBForums is a community for everyone, no matter what their level of experience.

    Please be tolerant and patient of others, especially newcomers. We are all here to share and learn!

    The rules are simple: Be patient, be nice, be helpful or be gone!

My experience with the RT-AC86U

the "nvram get" quirk
Oh, sh**! So there's more than the poor cooling.

Can we make a test script that runs some nvram get command with timeout in a loop and logs the timeout when it happens? To see how frequent it is.
My router is 20 days old. I could run this and see if it's a matter of wear and tear of the nvram or the VRMs or comes defective out of the box. It would be very weird if it's a software bug because, like you say in your thread, it's a very basic operation. Memory starts having glitches like this when overheated or underpowered or worn.
'nvram get' lookup causes script hang | SmallNetBuilder Forums (snbforums.com)
 
I dismantled both of mine just a few weeks ago to replace all the thermal pads. I bought an assortment of 1.0, 1.5, 2.0 and 2.5mm thermal pads. It was kinda a hack job, but the results on one of them (the first one I did, the remote node that sits in a hot shed) was very good.

I repeated the process on my main router and the CPU temps went up a couple degrees, though the radio temps are way down. I guess I messed something up.

If (when) I re-do again, I'd like to switch to a copper shim but it would seem precise thickness is critical with the heatsink being screwed down to prevent crushing the delicate, unprotected silicon. I've not found a measurement I'm comfortable with. I know the silicon gel 2.5mm pad on the CPU compresses quite a bit, so would that mean a 2.0 or 1.5mm shim is the right thickness?

Ive had these for years, i've had good luck and been happy with them, just trying to keep them alive as long as possible.
 
I could try to get the precise size of the gaps to the modders' benefit but ~0.1 mm variance won't be critical. Just don't tighten the screws that hold the heatsink all the way down. Of course, to make our lives harder, these screws are very short.

Btw, that's the reason PC CPU heatsinks have been designed with spring couplings for decades. You have screws or a lever but the right pressure on the chip is ensured by the spring.
 
Oh, sh**! So there's more than the poor cooling.

Can we make a test script that runs some nvram get command with timeout in a loop and logs the timeout when it happens? To see how frequent it is.
My router is 20 days old. I could run this and see if it's a matter of wear and tear of the nvram or the VRMs or comes defective out of the box. It would be very weird if it's a software bug because, like you say in your thread, it's a very basic operation. Memory starts having glitches like this when overheated or underpowered or worn.
'nvram get' lookup causes script hang | SmallNetBuilder Forums (snbforums.com)

I just made a proof of concept that shows how a "nvram get" command will hang my script on my AC86U... I didn't get beyond 2245 on my counter (which is literally a minute or so after running this):

1655136747533.png

1655136880836.png


And as predicted... the stuck nvram get is visible in htop... when I kill it, the script continues to run. And now that I'm looking at this, I'm wondering if all these other "nvram get" commands sitting here are also items that have gotten stuck, and never cleared out. <sigh>

Code:
#!/bin/sh

i=0

while true; do

    i=$(( $i + 1 ))

    state1=$(nvram get vpn_client1_state)
    state2=$(nvram get vpn_client2_state)
    state3=$(nvram get vpn_client3_state)
    state4=$(nvram get vpn_client4_state)
    state5=$(nvram get vpn_client5_state)

    clear
    echo $state1 $state2 $state3 $state4 $state5
    echo $i

done

exit 0

@dave14305 @SomeWhereOverTheRainBow - thought you'd be interested as well.
 
I just made a proof of concept that shows how a "nvram get" command will hang my script on my AC86U... I didn't get beyond 2245 on my counter (which is literally a minute or so after running this):

View attachment 41826
View attachment 41827

And as predicted... the stuck nvram get is visible in htop... when I kill it, the script continues to run. And now that I'm looking at this, I'm wondering if all these other "nvram get" commands sitting here are also items that have gotten stuck, and never cleared out. <sigh>

Code:
#!/bin/sh

i=0

while true; do

    i=$(( $i + 1 ))

    state1=$(nvram get vpn_client1_state)
    state2=$(nvram get vpn_client2_state)
    state3=$(nvram get vpn_client3_state)
    state4=$(nvram get vpn_client4_state)
    state5=$(nvram get vpn_client5_state)

    clear
    echo $state1 $state2 $state3 $state4 $state5
    echo $i

done

exit 0

@dave14305 @SomeWhereOverTheRainBow - thought you'd be interested as well.
What happens if you double quote all those?
See if it happens with

Code:
#!/bin/sh 
i="0"
while true; do 
  i="$(( i + 1 ))"
  state1="$(nvram get vpn_client1_state)" 
  state2="$(nvram get vpn_client2_state)"
  state3="$(nvram get vpn_client3_state)"
  state4="$(nvram get vpn_client4_state)"
  state5="$(nvram get vpn_client5_state)"
  clear
  echo "$state1" "$state2" "$state3" "$state4" "$state5"
  echo "$i"
done 

exit 0
 
Last edited:
@Viktor Jaep
Can I put a few ms of sleep in this loop before I run it to avoid flooding the CPU?
If possible and if reasonable. Or at least put i < 20000 limit.

Also, do I need to install htop to or top will do?
 
What happens if you double quote all those?
It works as-is up to a certain point... why would double quoting help it get beyond that?
@Viktor Jaep
Can I put a few ms of sleep in this loop before I run it to avoid flooding the CPU?
If possible and if reasonable. Or at least put i < 20000 limit.

Also, do I need to install htop to or top will do?
Listen... this is a stress test. I'm not going to play nice and give it timeouts and limits. This needs to hit hard and fast to demonstrate failure.

Sleep only works in 1 sec increments unfortunately.

Top should work, but htop gives you more functionality to filter, kill etc.
 
What happens if you double quote all those?
See if it happens with

Code:
#!/bin/sh
i="0"
while true; do
  i="$(( i + 1 ))"
  state1="$(nvram get vpn_client1_state)"
  state2="$(nvram get vpn_client2_state)"
  state3="$(nvram get vpn_client3_state)"
  state4="$(nvram get vpn_client4_state)"
  state5="$(nvram get vpn_client5_state)"
  clear
  echo "$state1" "$state2" "$state3" "$state4" "$state5"
  echo "$i"
done

exit 0
I'll give it a whirl... Thanks for the suggestion!
 
Yes, sleep didn't take 0.1 or 0.01, hence my question if possible to use milliseconds.

Anyway, I put limit of 20,000 iterations - no issue.
50,000 - no issue.
100,000 - no issue.
I don't really wish to go higher than that.

Here's the code if you want to check it:
Bash:
#!/bin/sh
i=0
while [ $i -le 100000 ]; do
  i=$(( i + 1 ))
  state1="$(nvram get vpn_client1_state)"
  state2="$(nvram get vpn_client2_state)"
  state3="$(nvram get vpn_client3_state)"
  state4="$(nvram get vpn_client4_state)"
  state5="$(nvram get vpn_client5_state)"
  clear
  echo "$state1" "$state2" "$state3" "$state4" "$state5"
  echo "$i"
#sleep 0.01
done
exit 0
It prints
Code:
0 0 0 0 0
100001

Yet, in htop I can see a command
nvram get odmpid
sitting there idle and it doesn't go away.
 
You can use usleep if you want to use microseconds.

I don't have this router, but when I looked into this a bit before my gut feeling was that this is an IO interrupt issue. So my initial suspicions would lie with any add-on scripts that perform high intensity IO. Of course that doesn't excuse the behaviour but might help to understand and therefore mitigate the cause.
 
All right.
I'll let the loop run for a few hours with usleep 1000 and see if it fails.

P.S. No need to wait for hours. Failed at 4897.
Next run failed at 3959.
Third run failed at 3053.
4th: 2798.
5th:
Code:
0 0 0 0 0
3115
Nicely stuck, visible in htop. :)

Removed usleep, rerun it: stopped at 5.
Did it overheat or what now? Can't get past 4000 any more.
Something's clearly wrong.
 

Attachments

  • AC86UStucknvram.png
    AC86UStucknvram.png
    53.3 KB · Views: 69
Yes, sleep didn't take 0.1 or 0.01, hence my question if possible to use milliseconds.

Anyway, I put limit of 20,000 iterations - no issue.
50,000 - no issue.
100,000 - no issue.
I don't really wish to go higher than that.

Here's the code if you want to check it:
Bash:
#!/bin/sh
i=0
while [ $i -le 100000 ]; do
  i=$(( i + 1 ))
  state1="$(nvram get vpn_client1_state)"
  state2="$(nvram get vpn_client2_state)"
  state3="$(nvram get vpn_client3_state)"
  state4="$(nvram get vpn_client4_state)"
  state5="$(nvram get vpn_client5_state)"
  clear
  echo "$state1" "$state2" "$state3" "$state4" "$state5"
  echo "$i"
#sleep 0.01
done
exit 0
It prints
Code:
0 0 0 0 0
100001

Yet, in htop I can see a command
nvram get odmpid
sitting there idle and it doesn't go away.
Has to be one of the addons that parse that variable. Nvram would hang on that because of the code logic some of the devs use.

Code:
[ -z "$(nvram get odmpid)" ] && ROUTER_MODEL="$(nvram get productid)" || ROUTER_MODEL="$(nvram get odmpid)"
 
You can use usleep if you want to use microseconds.
OMG... why have I never heard of usleep! THANK YOU!

All right.
I'll let the loop run for a few hours with usleep 1000 and see if it fails.

P.S. No need to wait for hours. Failed at 4897.
Next run failed at 3959.
Third run failed at 3053.
4th: 2798.
5th:
Code:
0 0 0 0 0
3115
Nicely stuck, visible in htop. :)

Removed usleep, rerun it: stopped at 5.
Did it overheat or what now? Can't get past 4000 any more.
Something's clearly wrong.
I was able to get it run past 10,000 on one of my attempts... but it varies. What normally takes hours or days to rear its ugly head when doing these calls in my script that happen on a small scale very infrequently, I'm glad we can show what happens in just mere minutes now.

What happens if you double quote all those?
See if it happens with
So basically the same behavior. Just ran it, and it got to 1761.
 
What happens if you run it like this?


Code:
#!/bin/sh 
(i="0"
while true; do 
  i="$(( i + 1 ))"
  state1="$(nvram get vpn_client1_state)" 
  state2="$(nvram get vpn_client2_state)"
  state3="$(nvram get vpn_client3_state)"
  state4="$(nvram get vpn_client4_state)"
  state5="$(nvram get vpn_client5_state)"
  clear
  echo "$state1" "$state2" "$state3" "$state4" "$state5"
  echo "$i"
done)&

exit 0
 
What happens if you run it like this?


Code:
#!/bin/sh
(i="0"
while true; do
  i="$(( i + 1 ))"
  state1="$(nvram get vpn_client1_state)"
  state2="$(nvram get vpn_client2_state)"
  state3="$(nvram get vpn_client3_state)"
  state4="$(nvram get vpn_client4_state)"
  state5="$(nvram get vpn_client5_state)"
  clear
  echo "$state1" "$state2" "$state3" "$state4" "$state5"
  echo "$i"
done)&

exit 0
Nice try... but it got stuck:

1655147201397.png


:)
 
In the meantime, I logged into an AC68U router that I have access to (nothing attached to it, nothing custom there, besides running the same Merlin 386.5_2 firmware) and submitted the same loop.
It's already past 120 K iterations and counting.

Shall I unplug all USB media from the AC86U and see it that makes any difference?
 
@Viktor Jaep try this one but check the error log when it gets stuck.
Code:
#!/bin/sh 
(i="0"
while true; do 
  i="$(( i + 1 ))"
  state1="$(nvram get vpn_client1_state)" 
  state2="$(nvram get vpn_client2_state)"
  state3="$(nvram get vpn_client3_state)"
  state4="$(nvram get vpn_client4_state)"
  state5="$(nvram get vpn_client5_state)"
  clear
  echo "$state1" "$state2" "$state3" "$state4" "$state5"
  echo "$i"
done)&>/tmp/mynvramerror.log &

exit 0
 
Personally I moved all non dynamic nvram values to be written to my script config files (ie model/fw version)... anything that is dynamic you may want to look at creating watchdogs for those calls if you're concerned about hung processes on AC86U models and would remove an Entware dependency for timeout

Code:
( nvram get vpn_client2_state ) & command_pid=$!
( sleep 2 && kill -HUP "$command_pid" 2> /dev/null && printf "NOTICE - Killed hung nvram get process\n" ) & watcher_pid=$!
wait "$command_pid" && kill -HUP "$watcher_pid" 2> /dev/null
 
Personally I moved all non dynamic nvram values to be written to my script config files (ie model/fw version)... anything that is dynamic you may want to look at creating watchdogs for those calls if you're concerned about hung processes on AC86U models and would remove an Entware dependency for timeout

Code:
( nvram get vpn_client2_state ) & command_pid=$!
( sleep 2 && kill -HUP "$command_pid" 2> /dev/null && printf "NOTICE - Killed hung nvram get process\n" ) & watcher_pid=$!
wait "$command_pid" && kill -HUP "$watcher_pid" 2> /dev/null
That is actually not a bad option.
 
@Viktor Jaep try this one but check the error log when it gets stuck.
Code:
#!/bin/sh
(i="0"
while true; do
  i="$(( i + 1 ))"
  state1="$(nvram get vpn_client1_state)"
  state2="$(nvram get vpn_client2_state)"
  state3="$(nvram get vpn_client3_state)"
  state4="$(nvram get vpn_client4_state)"
  state5="$(nvram get vpn_client5_state)"
  clear
  echo "$state1" "$state2" "$state3" "$state4" "$state5"
  echo "$i"
done)&>/tmp/mynvramerror.log &

exit 0
Here's the log from where it hung at 8877... nothing to go on. Again, really good thinking!

1655148010604.png
 

Similar threads

Latest threads

Support SNBForums w/ Amazon

If you'd like to support SNBForums, just use this link and buy anything on Amazon. Thanks!

Sign Up For SNBForums Daily Digest

Get an update of what's new every day delivered to your mailbox. Sign up here!
Back
Top