What's new
  • SNBForums Code of Conduct

    SNBForums is a community for everyone, no matter what their level of experience.

    Please be tolerant and patient of others, especially newcomers. We are all here to share and learn!

    The rules are simple: Be patient, be nice, be helpful or be gone!

My experience with the RT-AC86U

Is the AC86U that does not hit the issue referring to my test 2 days ago? If yes, then now I keep hitting the issue without fail. :eek:
I just went back and checked. Yes, I meant your report. :)

A note that should be taken with considerable doubt:
The first time I ran the test script, I was able to get to 20,000 and then to 50,000 iterations without hanging up. I thought my device didn't have the problem and then it started happening steadily.
The reason for the doubt is that I copied and pasted the script from a Windows machine that added an extra \r (CR) character at the bottom. So I guess I was running a corrupted script, as visible on one of the early logs.
In summary, this problem developed by accumulating a certain number of operations (unlikely) or it was there from the beginning but my initial testing was flawed (more likely). One more fact in support of the problem being there from the beginning - I already had a stuck nvram get command visible in htop while I was thinking I was ok.
 
Last edited:
1. Alternative solution without timeout
@SomeWhereOverTheRainBow I'll test your code.
Would you please explain to me what this part actually does?
Bash:
run_cmd () {
    to=$1; shift
    $@ & local child=$! start=0
     while kill -0 $child 2>/dev/null; do
        read -t 1
        start=$((start+1))
        if [ $start -ge $to ]; then
            kill -s 9 $child 2>/dev/null
            break
        fi
    done
}

2. The current wrapper script with timeout
This is still work in progress. It lacks protection from running when timeout is not available. It's installed as an addon on external USB device. Quite unsafe given it would be managing such a core operation as nvram get / set. Would it be possible to install timeout on the jffs partition to make this a bit safer?
The $PATH has to be checked. I have a line exporting /opt/bin:/opt/sbin but that needs refinement because I ended up having these paths added more than once.
I also wish to have a counter how many times the override has been invoked, so I can send to the system log an error message with a number on it (for statistics). Thus we can get a more accurate measure of how often the nvram command is used and how often the error condition happens.

3. AF_NETLINK suspicion
As for the netlink library, presumably setting nl_pid wrongly: this doesn't match very well with empirical observation.
Why is it that >2,700 iterations in a row of the test loop (with 5 nvram get calls inside) work fine? Mind you, these are actually >13,500 successful nvram operations before 1 fails. Unless it's a buffer that runs out or something of this nature, I would expect the nvram reads to fail much more regularly. This could as well be a faulty nvram controller or a bug in the CPU.
We also have a report of an AC86U that doesn't exhibit the faulty behavior. We haven't independently verified it but it deserves attention. - Update, this report has now been recalled. At this point we have no reason to doubt that all AC86U routers have this fault.

4. The "ditch the AC86U and buy AX86U" solution
Well, I don't like it. You know the saying: trick me once - shame on you, trick me twice - shame on me. I don't feel confident to go an shovel more money into a company that has failed me before I see genuine effort to fix the problem.
As I've mentioned, I had a couple of cheap TP-Link routers (sub 50 EUR) that worked flawlessly with custom firmware and a load of addons for around 5 years. I don't like the idea of having to ditch a 120 EUR device + shipping costs less than a month of unboxing it. Asking for a replacement of the same model? Difficult and likely useless. They are out of stock at the place I ordered it, that's quite a bit of shipping costs on me and I have no guarantee that the potential replacement unit won't have the same defect. Quite the opposite - it will probably have it. On top of that, I'll have to deal with the heat dissipation problem all over again.
What about all the other users experiencing the faulty behavior? I don't want to hurt the Asus business but let's look for a better solution first before throwing away devices.

5. Reporting the problem
I believe this problem should be adequately reported to Asus. I have to give them the benefit of the doubt that they were not aware of it and would take efforts to address it. I'm somewhat discouraged by the thermal design - I mean, how could they not have seen that? They clearly went for the thick thermal pads between the ICs and the heatsink - they were completely aware of the big gaps. That's indication of low standards on designing a premium device.
I would still try. I'd report both the poor heat dissipation and the nvram bug. If I contact regular customer service, I'll be most probably dismissed. So how shall I contact them? Could I possibly write a notification letter and submit it via @RMerlin ?

Edited the whole post, reason: refinement.
To respond to your ask, that function should simulate a timeout, like timeout while the process either hangs in the back ground or finishes on time. It will kill it if it hangs
 
I am also wondering how his script works. This is all I see when I run it:
Code:
admin@RT-AC86U-DBA8:/jffs/scripts/custom# sh -x zz.sh
+ cp /bin/nvram /tmp/_nvram
+ cat
+ chmod +x /tmp/nvram
+ mount -o bind /tmp/nvram /bin/nvram

Is the AC86U that does not hit the issue referring to my test 2 days ago? If yes, then now I keep hitting the issue without fail. :eek:

Update:
Received feedback from my local Asus support. They will forward this to R&D.
View attachment 41902
That is because you won't see it in action until you use the actual nvram get variables. Not the script that writes the child script.
 
Now, before I do something remarkably stupid - files written to /tmp go to RAM, right?
I didn't manage to set global variables to count the nvram override usage (and local vars are reset each time), so I created 2 files in /tmp. This should be in RAM but the delay introduced by reading / writing to these files got me suspicious.
I've already made wrong assumptions based on Linux but the router is something different.
 
Now, before I do something remarkably stupid - files written to /tmp go to RAM, right?
Correct.

Side note: Strictly speaking /tmp is stored in tempfs which is virtual memory (RAM+swap). So theoretically if your router was actively using all of its RAM you could get paging occurring. But in reality this is very unlikely unless you've completely overloaded your router with memory hungry applications.
 
Correct.

Side note: Strictly speaking /tmp is stored in tempfs which is virtual memory (RAM+swap). So theoretically if your router was actively using all of its RAM you could get paging occurring. But in reality this is very unlikely unless you've completely overloaded your router with memory hungry applications.
I think this is one of the most commonly over looked aspects of the router. While the router may be able to run certain processes, we commonly over look the fact that the router has a finite amount of memory for operating router task. When we start throwing other processes in the mix, this finite gap of memory only gets smaller. Given enough extra processes, this gap will dwindle to the point the router has to cease in performing even its own task.
 
Thanks, ColinTaylor.

Now back to testing.
@SomeWhereOverTheRainBow, the last script I took from your comments on the previous page takes 10 sec for 6 iterations of the test loop. To get to around 2700 where it usually hangs, it will take approx. 75 min. In the meantime it consumes 100% of both cores and makes the router so unresponsive that even the PuTTY console lags badly.
This one:
Bash:
#!/bin/sh

# copy original nvram executable to /tmp
cp /bin/nvram /tmp/_nvram

# create nvram wrapper that calls original nvram executable in /tmp
cat << 'EOF' > /tmp/nvram
#!/bin/sh
#set -x # comment/uncomment to disable/enable debug mode

run_cmd () {
    to=$1; shift
    $@ & local child=$! start=0
     while kill -0 $child 2>/dev/null; do
        read -t 1
        start=$((start+1))
        if [ $start -ge $to ]; then
            kill -s 9 $child 2>/dev/null
            break
        fi
    done
}

# make the new function accessible

# clear rc variable, may be unnecessary
rc=""

run_cmd 1 /tmp/_nvram $@
rc=$?

#echo "$rc" >> /tmp/nvram.log

if [ "$rc" != "0" ]; then
   #echo "$rc" >> /tmp/nvram.log
   sleep 1
   logger -t "nvram-override" "Couldn't execute nvram $@, exit status: $rc (124=timeout)"
   run_cmd 1 /tmp/_nvram $@
   rc=$?
   logger -t "nvram-override" "Retried executing nvram $@, exit status: $rc"
fi

exit $rc
EOF
chmod +x /tmp/nvram

# replace nvram in /usr/sbin w/ nvram wrapper in /tmp
mount -o bind /tmp/nvram /bin/nvram
Can it be optimized?
The current script with timeout, incl. keeping counters of the iterations takes 3 min to get to 3000.

Here is my latest version with even bigger performance penalty because of the ugly way to keep count of usage and errors. I also suspect there's a bug in the usage count because it auto-resets at some point above 32 K.
Edit: Yes, this thing definitely can't count beyond 32K, it starts all over. How can I use big integers instead?
Bash:
#!/bin/sh

#if [ ! -f "/opt/bin/timeout" ]; then
#   echo "timeout function not found"
#   echo "install by running"
#   echo "opkg install coreutils-timeout"
#   exit 1
#fi

# copy original nvram executable to /tmp
cp /bin/nvram /tmp/_nvram

# create nvram wrapper that calls original nvram executable in /tmp
cat << 'EOF' > /tmp/nvram
#!/bin/sh
#set -x # comment/uncomment to disable/enable debug mode

# make the new function accessible
#export PATH=/opt/bin:/opt/sbin:$PATH

# clear rc variable
rc=""

# keep count of total session usage
if [ ! -f "/tmp/nvramuse" ]; then
   echo 0 > /tmp/nvramuse
fi
usecount=$(cat /tmp/nvramuse)
usecount=$(( usecount + 1 ))
echo $usecount > /tmp/nvramuse

# execute the nvram command
timeout 1 /tmp/_nvram $@
rc=$?

#echo "$rc" >> /tmp/nvram.log

if [ "$rc" != "0" ]; then
   # give the device time to recover from the read failure
   usleep 1000

   #$(echo "$rc") >> /tmp/nvram.log
   # keep count of session errors
   errcount=0
   if [ ! -f "/tmp/nvramerr" ]; then
      echo 0 > /tmp/nvramerr
   else
      errcount=$(cat /tmp/nvramerr)
   fi
   errcount=$(( errcount + 1 ))
   echo $errcount > /tmp/nvramerr

   # report the error
   logger -t "nvram-override" "Error detected at use count: $usecount, error count: $errcount"
   logger -t "nvram-override" "Couldn't execute nvram $@, exit status: $rc (124=timeout)"

   # try nvram again
   timeout 1 /tmp/_nvram $@
   rc=$?
   logger -t "nvram-override" "Retried executing nvram $@, exit status: $rc"
fi

exit $rc
EOF
chmod +x /tmp/nvram

# replace nvram in /usr/sbin w/ nvram wrapper in /tmp
mount -o bind /tmp/nvram /bin/nvram

The system output it produces looks like this:
Jun 16 16:48:27 nvram-override: Error detected at use count: 24278, error count: 5
Jun 16 16:48:27 nvram-override: Couldn't execute nvram get vpn_client2_state, exit status: 124 (124=timeout)
Jun 16 16:48:27 nvram-override: Retried executing nvram get vpn_client2_state, exit status: 0
Jun 16 16:49:52 nvram-override: Error detected at use count: 31436, error count: 6
Jun 16 16:49:52 nvram-override: Couldn't execute nvram get vpn_client5_state, exit status: 124 (124=timeout)
Jun 16 16:49:52 nvram-override: Retried executing nvram get vpn_client5_state, exit status: 0
Jun 16 16:49:54 nvram-override: Error detected at use count: 31543, error count: 7
Jun 16 16:49:54 nvram-override: Couldn't execute nvram get vpn_client2_state, exit status: 124 (124=timeout)
Jun 16 16:49:54 nvram-override: Retried executing nvram get vpn_client2_state, exit status: 0
 
Last edited:
Thanks, ColinTaylor.

Now back to testing.
@SomeWhereOverTheRainBow, the last script I took from your comments on the previous page takes 10 sec for 6 iterations of the test loop. To get to around 2700 where it usually hangs, it will take approx. 75 min. In the meantime it consumes 100% of both cores and makes the router so unresponsive that even the PuTTY console lags badly.
This one:
Bash:
#!/bin/sh

# copy original nvram executable to /tmp
cp /bin/nvram /tmp/_nvram

# create nvram wrapper that calls original nvram executable in /tmp
cat << 'EOF' > /tmp/nvram
#!/bin/sh
#set -x # comment/uncomment to disable/enable debug mode

run_cmd () {
    to=$1; shift
    $@ & local child=$! start=0
     while kill -0 $child 2>/dev/null; do
        read -t 1
        start=$((start+1))
        if [ $start -ge $to ]; then
            kill -s 9 $child 2>/dev/null
            break
        fi
    done
}

# make the new function accessible

# clear rc variable, may be unnecessary
rc=""

run_cmd 1 /tmp/_nvram $@
rc=$?

#echo "$rc" >> /tmp/nvram.log

if [ "$rc" != "0" ]; then
   #echo "$rc" >> /tmp/nvram.log
   sleep 1
   logger -t "nvram-override" "Couldn't execute nvram $@, exit status: $rc (124=timeout)"
   run_cmd 1 /tmp/_nvram $@
   rc=$?
   logger -t "nvram-override" "Retried executing nvram $@, exit status: $rc"
fi

exit $rc
EOF
chmod +x /tmp/nvram

# replace nvram in /usr/sbin w/ nvram wrapper in /tmp
mount -o bind /tmp/nvram /bin/nvram
Can it be optimized?
The current script with timeout, incl. keeping counters of the iterations takes 3 min to get to 3000.

Here is my latest version with even bigger performance penalty because of the ugly way to keep count of usage and errors. I also suspect there's a bug in the usage count because it autoresets at some point above 32 K.
Bash:
#!/bin/sh

#if [ ! -f "/opt/bin/timeout" ]; then
#   echo "timeout function not found"
#   echo "install by running"
#   echo "opkg install coreutils-timeout"
#   exit 1
#fi

# copy original nvram executable to /tmp
cp /bin/nvram /tmp/_nvram

# create nvram wrapper that calls original nvram executable in /tmp
cat << 'EOF' > /tmp/nvram
#!/bin/sh
#set -x # comment/uncomment to disable/enable debug mode

# make the new function accessible
#export PATH=/opt/bin:/opt/sbin:$PATH

# clear rc variable
rc=""

# keep count of total session usage
if [ ! -f "/tmp/nvramuse" ]; then
   echo 0 > /tmp/nvramuse
fi
usecount=$(cat /tmp/nvramuse)
usecount=$(( usecount + 1 ))
echo $usecount > /tmp/nvramuse

# execute the nvram command
timeout 1 /tmp/_nvram $@
rc=$?

#echo "$rc" >> /tmp/nvram.log

if [ "$rc" != "0" ]; then
   # give the device time to recover from the read failure
   usleep 1000

   #$(echo "$rc") >> /tmp/nvram.log
   # keep count of session errors
   if [ ! -f "/tmp/nvramerr" ]; then
      echo 0 > /tmp/nvramerr
      errcount=0
   else
      errcount=$(cat /tmp/nvramerr) 
   fi
   errcount=$(( errcount + 1 ))
   echo $errcount > /tmp/nvramerr

   # report the error
   logger -t "nvram-override" "Error detected at use count: $usecount, error count: $errcount"
   logger -t "nvram-override" "Couldn't execute nvram $@, exit status: $rc (124=timeout)"

   # try nvram again
   timeout 1 /tmp/_nvram $@
   rc=$?
   logger -t "nvram-override" "Retried executing nvram $@, exit status: $rc"
fi

exit $rc
EOF
chmod +x /tmp/nvram

# replace nvram in /usr/sbin w/ nvram wrapper in /tmp
mount -o bind /tmp/nvram /bin/nvram

The system output it produces looks like this:
Yes it can be made quicker just take out the read -t 1 (which takes one second to run) and replace it with a command that doesn't take very long to run like maybe a silent grep -q or something. I would do the same with your usleep which is an entware package I believe? Or does that come built into the router?
 
Yes it can be made quicker just take out the read -t 1 (which takes one second to run) and replace it with a command that doesn't take very long to run like maybe a silent grep -q or something. I would do the same with your usleep which is an entware package I believe? Or does that come built into the router?
Ok.
Replaced read -t 1 with usleep 1000. Completed in 5 min.
usleep was there, I haven't installed it separately.

The loop did not hang but left empty values. Log attached. I'm trying to debug why there was no retry on the failed commands.

Issue found: whenever the nvram command is killed, exit code is still "0", so the IF condition is never true. Can we do something about it?
 

Attachments

Last edited:
Ok.
Replaced read -t 1 with usleep 1000. Completed in 5 min.
usleep was there, I haven't installed it separately.

The loop did not hang but left empty values. Log attached. I'm trying to debug why there was no retry on the failed commands.
Does usleep work when entware is not mounted
 
@Oracle this one should allow for a proper retry.

Code:
#!/bin/sh

# copy original nvram executable to /tmp
cp /bin/nvram /tmp/_nvram

# create nvram wrapper that calls original nvram executable in /tmp
cat << 'EOF' > /tmp/nvram
#!/bin/sh
#set -x # comment/uncomment to disable/enable debug mode

run_cmd () {
    local to
    local start
    local child
    to=$1; shift
    $@ & local child=$! start=0
     while kill -0 $child 2>/dev/null; do
        usleep 1000
        start=$((start+1))
        if [ $start -gt $to ]; then
            kill -s 9 $child 2>/dev/null
            return 1
        fi
    done
}

# make the new function accessible

run_cmd 1 /tmp/_nvram $@
rc=$?

#echo "$rc" >> /tmp/nvram.log

if [ "$rc" != "0" ]; then
   #echo "$rc" >> /tmp/nvram.log
   unset rc
   usleep 1000
   logger -t "nvram-override" "Couldn't execute nvram $@, exit status: $rc (124=timeout)"
   run_cmd 1 /tmp/_nvram $@
   rc=$?
   logger -t "nvram-override" "Retried executing nvram $@, exit status: $rc"
fi

exit $rc
EOF
chmod +x /tmp/nvram

# replace nvram in /usr/sbin w/ nvram wrapper in /tmp
mount -o bind /tmp/nvram /bin/nvram
 
Last edited:
No.
This line messes up the output:
[ $start -ge $to ] && return $child || return 0

What if we just put something to the effect of rc=1 between kill -s 9 $child 2>/dev/null and break? Although that doesn't look right even to my rookie head. return 1, maybe?
 
No.
This line messes up the output:
[ $start -ge $to ] && return $child || return 0

What if we just put something to the effect of rc=1 between kill -s 9 $child 2>/dev/null and break? Although that doesn't look right even to my rookie head. return 1, maybe?
It should work because rc=? Is only catching the return status, and by your retry segment , the return value should then be caught as the PID of the failed process would not be equal to zero. It should return 0 for success if we did not fail
 
Last edited:
No.
This line messes up the output:
[ $start -ge $to ] && return $child || return 0

What if we just put something to the effect of rc=1 between kill -s 9 $child 2>/dev/null and break? Although that doesn't look right even to my rookie head. return 1, maybe?
I revised it to return 1 instead of return $child. See if that works, you have to have a segment that will return success when it is successful, otherwise you will always retry after success.
 
Last edited:
All right. I read your last few comments again. :)
rc is indeed set to something different from 0 but the output in the test script is messed up. Let me try to explain better.

1. With the initial version, output was ok but there wasn't a second attempt to re-run nvram get and pick the correct value. It just came out with empty values. There was nothing in the system log.
0 0 0 0 0 1813
0 0 0 0 0 1814
0 0 0 1815 Empty value found
0 0 0 0 0 1816
0 0 0 0 0 1817

2. After the first mod, there was a lot in the system log but the output of the test loop changed to something unrecognizable.
Expected this:
0 0 0 0 0 1
0 0 0 0 0 2
0 0 0 0 0 3
0 0 0 0 0 4
Got this:
0
0 0
0 0
0 0
0 0
0 1
0
0 0
0 0
0 0
0 0
0 2
0
0 0
0 0
0 0
0 0
0 3
0
0 0
0 0
0 0
0 0
0 4

The system log got this:
Jun 16 18:39:03 nvram-override: Couldn't execute nvram get vpn_client1_state, exit status: 193 (124=timeout)
Jun 16 18:39:03 nvram-override: Retried executing nvram get vpn_client1_state, exit status: 209
Jun 16 18:39:03 nvram-override: Couldn't execute nvram get vpn_client2_state, exit status: 218 (124=timeout)
Jun 16 18:39:03 nvram-override: Retried executing nvram get vpn_client2_state, exit status: 228
Jun 16 18:39:04 nvram-override: Couldn't execute nvram get vpn_client3_state, exit status: 237 (124=timeout)
Jun 16 18:39:04 nvram-override: Retried executing nvram get vpn_client3_state, exit status: 250
Jun 16 18:39:04 nvram-override: Couldn't execute nvram get vpn_client4_state, exit status: 3 (124=timeout)
Jun 16 18:39:04 nvram-override: Retried executing nvram get vpn_client4_state, exit status: 13
Jun 16 18:39:04 nvram-override: Couldn't execute nvram get vpn_client5_state, exit status: 21 (124=timeout)
Jun 16 18:39:04 nvram-override: Retried executing nvram get vpn_client5_state, exit status: 32

3. After replacing $child with return 1, the output was still messed up and the system log got this:
Jun 16 19:05:21 nvram-override: Couldn't execute nvram get vpn_client2_state, exit status: 1 (124=timeout)
Jun 16 19:05:21 nvram-override: Retried executing nvram get vpn_client2_state, exit status: 1
Jun 16 19:05:21 nvram-override: Couldn't execute nvram get vpn_client3_state, exit status: 1 (124=timeout)
Jun 16 19:05:21 nvram-override: Retried executing nvram get vpn_client3_state, exit status: 1
Jun 16 19:05:21 nvram-override: Couldn't execute nvram get vpn_client4_state, exit status: 1 (124=timeout)
Jun 16 19:05:21 nvram-override: Retried executing nvram get vpn_client4_state, exit status: 1
Jun 16 19:05:21 nvram-override: Couldn't execute nvram get vpn_client5_state, exit status: 1 (124=timeout)
Jun 16 19:05:21 nvram-override: Retried executing nvram get vpn_client5_state, exit status: 1
 
All right. I read your last few comments again. :)
rc is indeed set to something different from 0 but the output in the test script is messed up. Let me try to explain better.

1. With the initial version, output was ok but there wasn't a second attempt to re-run nvram get and pick the correct value. It just came out with empty values. There was nothing in the system log.


2. After the first mod, there was a lot in the system log but the output of the test loop changed to something unrecognizable.
Expected this:

Got this:


The system log got this:


3. After replacing $child with return 1, the output was still messed up and the system log got this:
Try the new revised. I localized the variables of runcmd since we are using it more than once, the variables were probably not properly clearing out on return.
 
No. :)
The test output is still messed. I figured it out, it receives 0 0 instead of 0 because nvram get runs twice.
The system log changed a bit - there's an occasional 0 in there somehow, the rest is rc=1.
Jun 16 19:31:57 nvram-override: Retried executing nvram get vpn_client5_state, exit status: 1
Jun 16 19:31:57 nvram-override: Couldn't execute nvram get vpn_client1_state, exit status: (124=timeout)
Jun 16 19:31:57 nvram-override: Retried executing nvram get vpn_client1_state, exit status: 1
Jun 16 19:31:57 nvram-override: Couldn't execute nvram get vpn_client2_state, exit status: (124=timeout)
Jun 16 19:31:57 nvram-override: Retried executing nvram get vpn_client2_state, exit status: 0
Jun 16 19:31:57 nvram-override: Couldn't execute nvram get vpn_client3_state, exit status: (124=timeout)
Jun 16 19:31:57 nvram-override: Retried executing nvram get vpn_client3_state, exit status: 1
Jun 16 19:31:57 nvram-override: Couldn't execute nvram get vpn_client4_state, exit status: (124=timeout)
Jun 16 19:31:57 nvram-override: Retried executing nvram get vpn_client4_state, exit status: 1
Jun 16 19:31:57 nvram-override: Couldn't execute nvram get vpn_client5_state, exit status: (124=timeout)
Jun 16 19:31:57 nvram-override: Retried executing nvram get vpn_client5_state, exit status: 1
 

Similar threads

Latest threads

Support SNBForums w/ Amazon

If you'd like to support SNBForums, just use this link and buy anything on Amazon. Thanks!

Sign Up For SNBForums Daily Digest

Get an update of what's new every day delivered to your mailbox. Sign up here!

Staff online

Back
Top