My experience with the RT-AC86U

SomeWhereOverTheRainBow · Jun 16, 2022

Oracle said:
This version produced 6 retries within 3:35 min. Visibly faster (2x) and no empty values.

but notice, none of the retries ever went past the first retry, which is good. I would test the intervals at higher lengths such as 50 75 and 100. you should see alot less retries. The cool thing is, all the touch command is doing is date and time stamping the file

. This doesn't actually halt the already sensitive router system like usleep does.

Oracle · Jun 16, 2022

Yes, this is also much lighter on the CPU compared to using the timeout addon. Utilization goes to 60-70% but not 100% during the stress test. This version can be implemented directly as a band aid, without dependencies on Entware. We can get some more feedback and then I can publish it on the first page of this thread.

Actually, the timeout version takes 30% more CPU (i.e., almost all of it) and completes 30% faster. So I guess these are more or less the limitations of the platform.

SomeWhereOverTheRainBow · Jun 16, 2022

Oracle said:
Yes, this is also much lighter on the CPU compared to using the timeout addon. Utilization goes to 60-70% but not 100% during the stress test. This version can be implemented directly as a band aid, without dependencies on Entware. We can get some more feedback and then I can publish it on the first page of this thread.

Actually, the timeout version takes 30% more CPU (i.e., almost all of it) and completes 30% faster. So I guess these are more or less the limitations of the platform.

I really do hope it winds up helping someone on here although my intentions were just to bloviate (gotta love @dave14305 vernacular).

Oracle · Jun 17, 2022

Now we have a few lines of code that are not in use any more. Can we clean up?
I'm also planning to reintroduce optional counters, so I can collect some usage statistics before I pin this workaround to the first page.

Which reminds me that I didn't have a good solution for counting use iterations. I resorted to keeping a nvramuse file in memory because I couldn't think of a better way to set a system-wide variable.

Oh, one more important thing. We had reports of the same bug happening to wl commands. Shall I clone this code to make a similar wl override?

Oracle · Jun 17, 2022

Update:
I have a hypothesis, based on observation.

This problem appears statistically cyclical. I see a somewhat stable number of successful reads between fails (peak probability for hang up to occur) and dispersion (some random fails). My statistic is distorted because of the stress-testing but there's clearly a pattern with some deviation.
It could be the number of runs or it could be a time interval driving the pattern.

So it looks like the nvram get defect is induced not by another nvram command locking the access (we already tried serializing it to no avail) but by a separate event. As if something else is using the nvram and blocking it at regular intervals or interfering with the read command itself.

Could it be wl and nvram commands conflicting with each other or could it be a third event?

Oracle · Jun 17, 2022

I wonder if it's really necessary to run this many nvram commands:

Jun 17 16:45:00 nvram-override: Executed nvram get odmpid, use count: 140556, exit status: 0
Jun 17 16:45:00 nvram-override: Executed nvram get productid, use count: 140557, exit status: 0
Jun 17 16:45:00 nvram-override: Executed nvram get ntp_ready, use count: 140558, exit status: 0
Jun 17 16:45:00 nvram-override: Executed nvram get http_username, use count: 140559, exit status: 0
Jun 17 16:45:00 nvram-override: Executed nvram get http_username, use count: 140560, exit status: 0
Jun 17 16:45:00 nvram-override: Executed nvram get http_username, use count: 140561, exit status: 0
Jun 17 16:45:00 nvram-override: Executed nvram get http_username, use count: 140562, exit status: 0
Jun 17 16:45:05 nvram-override: Executed nvram get http_username, use count: 140573, exit status: 0
Jun 17 16:45:05 nvram-override: Executed nvram get http_username, use count: 140574, exit status: 0
Jun 17 16:45:05 nvram-override: Executed nvram get http_username, use count: 140575, exit status: 0
Jun 17 16:45:05 nvram-override: Executed nvram get http_username, use count: 140576, exit status: 0
Jun 17 16:50:00 nvram-override: Executed nvram get odmpid, use count: 140577, exit status: 0
Jun 17 16:50:01 nvram-override: Executed nvram get productid, use count: 140578, exit status: 0
Jun 17 16:50:01 nvram-override: Executed nvram get ntp_ready, use count: 140579, exit status: 0
Jun 17 16:50:01 nvram-override: Executed nvram get http_username, use count: 140580, exit status: 0
Jun 17 16:50:01 nvram-override: Executed nvram get http_username, use count: 140581, exit status: 0
Jun 17 16:50:01 nvram-override: Executed nvram get http_username, use count: 140582, exit status: 0
Jun 17 16:50:01 nvram-override: Executed nvram get http_username, use count: 140583, exit status: 0
Jun 17 16:50:05 nvram-override: Executed nvram get http_username, use count: 140584, exit status: 0
Jun 17 16:50:05 nvram-override: Executed nvram get http_username, use count: 140585, exit status: 0
Jun 17 16:50:05 nvram-override: Executed nvram get http_username, use count: 140586, exit status: 0
Jun 17 16:50:05 nvram-override: Executed nvram get http_username, use count: 140587, exit status: 0

I understand it's a relatively cheap operation and developers wouldn't worry too much about it but why so many in the same second, esp. this one:
nvram get http_username
I guess this is the dn-vnstat, running every 5 min?

Oracle · Jun 17, 2022

Anyway, here's the code for the wrappers I'm currently using. They do serialization, count usage and send errors to the system log. Pay attention, these are the old versions that use timeout.
The newer version from @SomeWhereOverTheRainBow is available and tested, it just needs cleanup and doesn't have usage statistics yet.

nvram wrapper

Bash:

#!/bin/sh
# ATTENTION: Install the timeout addon before running this!!!
# opkg install coreutils-timeout

# copy original nvram executable to /tmp
cp /bin/nvram /tmp/_nvram

# create nvram wrapper that calls original nvram executable in /tmp
cat << 'EOF' > /tmp/nvram
#!/bin/sh
#set -x # comment/uncomment to disable/enable debug mode

# required for serialization when reentry is possible
LOCK="/tmp/$(basename $0).lock"
acquire_lock() { until mkdir $LOCK &>/dev/null; do sleep 2; done; }
release_lock() { rmdir $LOCK &>/dev/null; }
#usleep 500
# one instance at a time
acquire_lock

# catch premature exit and cleanup
trap 'release_lock; exit 1' SIGHUP SIGINT SIGTERM

# make the new function accessible
#export PATH=/opt/bin:/opt/sbin:$PATH

# clear rc variable
rc=""

# keep count of total session usage
if [ ! -f "/tmp/nvramuse" ]; then
   echo 0 > /tmp/nvramuse
fi
usecount=$(cat /tmp/nvramuse)
usecount=$(( usecount + 1 ))
echo $usecount > /tmp/nvramuse

# execute the nvram command
timeout 1 /tmp/_nvram $@
rc=$?

# debug messages, disable for normal use
#echo "$rc" >> /tmp/nvram.log
logger -t "nvram-override" "Executed nvram $@, use count: $usecount, exit status: $rc"

MAXCOUNT="3"
i="1"
while [ "$rc" != "0" ] && [ "$i" -le "$MAXCOUNT" ]; do
   # give the device time to recover from the read failure
   usleep 1000

   #$(echo "$rc") >> /tmp/nvram.log
   # keep count of session errors
   errcount=0
   if [ ! -f "/tmp/nvramerr" ]; then
      echo 0 > /tmp/nvramerr
   else
      errcount=$(cat /tmp/nvramerr)
   fi
   errcount=$(( errcount + 1 ))
   echo $errcount > /tmp/nvramerr

   # report the error
   logger -t "nvram-override" "Error detected at use count: $usecount, error count: $errcount"
   logger -t "nvram-override" "Couldn't execute nvram $@, exit status: $rc (124=timeout)"

   # try nvram again
   timeout 1 /tmp/_nvram $@
   rc=$?
   logger -t "nvram-override" "Retried executing nvram $@, attempt ${i}/${MAXCOUNT}, exit status: $rc"
   i="$((i+1))"
done

# any concurrent instance(s) may now run
release_lock

exit $rc
EOF
chmod +x /tmp/nvram

# replace nvram in /usr/sbin w/ nvram wrapper in /tmp
mount -o bind /tmp/nvram /bin/nvram

wl wrapper

Bash:

#!/bin/sh
# ATTENTION: Install the timeout addon before running this!!!
# opkg install coreutils-timeout

# copy original wl executable to /tmp
cp /usr/sbin/wl /tmp/_wl

# create wl wrapper that calls original wl executable in /tmp
cat << 'EOF' > /tmp/wl
#!/bin/sh
#set -x # comment/uncomment to disable/enable debug mode

# required for serialization when reentry is possible
LOCK="/tmp/$(basename $0).lock"
acquire_lock() { until mkdir $LOCK &>/dev/null; do sleep 2; done; }
release_lock() { rmdir $LOCK &>/dev/null; }
#usleep 500
# one instance at a time
acquire_lock

# catch premature exit and cleanup
trap 'release_lock; exit 1' SIGHUP SIGINT SIGTERM

# make the new function accessible
#export PATH=/opt/bin:/opt/sbin:$PATH

# clear rc variable
rc=""

# keep count of total session usage
if [ ! -f "/tmp/wluse" ]; then
   echo 0 > /tmp/wluse
fi
usecount=$(cat /tmp/wluse)
usecount=$(( usecount + 1 ))
echo $usecount > /tmp/wluse

# execute the wl command
timeout 1 /tmp/_wl $@
rc=$?

# debug messages, disable for normal use
#echo "$rc" >> /tmp/wl.log
logger -t "wl-override" "Executed wl $@, use count: $usecount, exit status: $rc"

MAXCOUNT="3"
i="1"
while [ "$rc" != "0" ] && [ "$i" -le "$MAXCOUNT" ]; do
   # give the device time to recover from the read failure
   usleep 1000

   #$(echo "$rc") >> /tmp/wl.log
   # keep count of session errors
   errcount=0
   if [ ! -f "/tmp/wlerr" ]; then
      echo 0 > /tmp/wlerr
   else
      errcount=$(cat /tmp/wlerr)
   fi
   errcount=$(( errcount + 1 ))
   echo $errcount > /tmp/wlerr

   # report the error
   logger -t "wl-override" "Error detected at use count: $usecount, error count: $errcount"
   logger -t "wl-override" "Couldn't execute wl $@, exit status: $rc (124=timeout)"

   # try wl again
   timeout 1 /tmp/_wl $@
   rc=$?
   logger -t "wl-override" "Retried executing wl $@, attempt ${i}/${MAXCOUNT}, exit status: $rc"
   i="$((i+1))"
done

# any concurrent instance(s) may now run
release_lock

exit $rc
EOF
chmod +x /tmp/wl

# replace wl in /usr/sbin w/ wl wrapper in /tmp
mount -o bind /tmp/wl /usr/sbin/wl

Usage
The quoted code can be saved into shell scripts, for example nvram.sh and wl.sh.
Enable the wrappers by executing the scripts.
Disable by running:
umount /bin/nvram - to deactivate nvram wrapper
umount /usr/sbin/wl - to deactivate wl wrapper
There is a dependency on an Entware addon - timeout. For a safer alternative version, follow the thread.

SomeWhereOverTheRainBow · Jun 17, 2022

Oracle said:

Anyway, here's the code for the wrappers I'm currently using. They do serialization, count usage and send errors to the system log. Pay attention, these are the old versions that use timeout.
The newer version from @SomeWhereOverTheRainBow is available and tested, it just needs cleanup and doesn't have usage statistics yet.

nvram wrapper

Bash:

#!/bin/sh
# ATTENTION: Install the timeout addon before running this!!!
# opkg install coreutils-timeout

# copy original nvram executable to /tmp
cp /bin/nvram /tmp/_nvram

# create nvram wrapper that calls original nvram executable in /tmp
cat << 'EOF' > /tmp/nvram
#!/bin/sh
#set -x # comment/uncomment to disable/enable debug mode

# required for serialization when reentry is possible
LOCK="/tmp/$(basename $0).lock"
acquire_lock() { until mkdir $LOCK &>/dev/null; do touch /tmp/nvram; done; }
release_lock() { rmdir $LOCK &>/dev/null; }
#usleep 500
# one instance at a time
acquire_lock

# catch premature exit and cleanup
trap 'release_lock; exit 1' SIGHUP SIGINT SIGTERM

# make the new function accessible
#export PATH=/opt/bin:/opt/sbin:$PATH

# clear rc variable
rc=""

# keep count of total session usage
if [ ! -f "/tmp/nvramuse" ]; then
   echo 0 > /tmp/nvramuse
fi
usecount=$(cat /tmp/nvramuse)
usecount=$(( usecount + 1 ))
echo $usecount > /tmp/nvramuse

# execute the nvram command
timeout 1 /tmp/_nvram $@
rc=$?

# debug messages, disable for normal use
#echo "$rc" >> /tmp/nvram.log
logger -t "nvram-override" "Executed nvram $@, use count: $usecount, exit status: $rc"

MAXCOUNT="3"
i="1"
while [ "$rc" != "0" ] && [ "$i" -le "$MAXCOUNT" ]; do
   # give the device time to recover from the read failure
   usleep 1000

   #$(echo "$rc") >> /tmp/nvram.log
   # keep count of session errors
   errcount=0
   if [ ! -f "/tmp/nvramerr" ]; then
      echo 0 > /tmp/nvramerr
   else
      errcount=$(cat /tmp/nvramerr)
   fi
   errcount=$(( errcount + 1 ))
   echo $errcount > /tmp/nvramerr

   # report the error
   logger -t "nvram-override" "Error detected at use count: $usecount, error count: $errcount"
   logger -t "nvram-override" "Couldn't execute nvram $@, exit status: $rc (124=timeout)"

   # try nvram again
   timeout 1 /tmp/_nvram $@
   rc=$?
   logger -t "nvram-override" "Retried executing nvram $@, attempt ${i}/${MAXCOUNT}, exit status: $rc"
   i="$((i+1))"
done

# any concurrent instance(s) may now run
release_lock

exit $rc
EOF
chmod +x /tmp/nvram

# replace nvram in /usr/sbin w/ nvram wrapper in /tmp
mount -o bind /tmp/nvram /bin/nvram

wl wrapper

Bash:

#!/bin/sh
# ATTENTION: Install the timeout addon before running this!!!
# opkg install coreutils-timeout

# copy original wl executable to /tmp
cp /usr/sbin/wl /tmp/_wl

# create wl wrapper that calls original wl executable in /tmp
cat << 'EOF' > /tmp/wl
#!/bin/sh
#set -x # comment/uncomment to disable/enable debug mode

# required for serialization when reentry is possible
LOCK="/tmp/$(basename $0).lock"
acquire_lock() { until mkdir $LOCK &>/dev/null; do sleep 2; done; }
release_lock() { rmdir $LOCK &>/dev/null; }
#usleep 500
# one instance at a time
acquire_lock

# catch premature exit and cleanup
trap 'release_lock; exit 1' SIGHUP SIGINT SIGTERM

# make the new function accessible
#export PATH=/opt/bin:/opt/sbin:$PATH

# clear rc variable
rc=""

# keep count of total session usage
if [ ! -f "/tmp/wluse" ]; then
   echo 0 > /tmp/wluse
fi
usecount=$(cat /tmp/wluse)
usecount=$(( usecount + 1 ))
echo $usecount > /tmp/wluse

# execute the wl command
timeout 1 /tmp/_wl $@
rc=$?

# debug messages, disable for normal use
#echo "$rc" >> /tmp/wl.log
logger -t "wl-override" "Executed wl $@, use count: $usecount, exit status: $rc"

MAXCOUNT="3"
i="1"
while [ "$rc" != "0" ] && [ "$i" -le "$MAXCOUNT" ]; do
   # give the device time to recover from the read failure
   usleep 1000

   #$(echo "$rc") >> /tmp/wl.log
   # keep count of session errors
   errcount=0
   if [ ! -f "/tmp/wlerr" ]; then
      echo 0 > /tmp/wlerr
   else
      errcount=$(cat /tmp/wlerr)
   fi
   errcount=$(( errcount + 1 ))
   echo $errcount > /tmp/wlerr

   # report the error
   logger -t "wl-override" "Error detected at use count: $usecount, error count: $errcount"
   logger -t "wl-override" "Couldn't execute wl $@, exit status: $rc (124=timeout)"

   # try wl again
   timeout 1 /tmp/_wl $@
   rc=$?
   logger -t "wl-override" "Retried executing wl $@, attempt ${i}/${MAXCOUNT}, exit status: $rc"
   i="$((i+1))"
done

# any concurrent instance(s) may now run
release_lock

exit $rc
EOF
chmod +x /tmp/wl

# replace wl in /usr/sbin w/ wl wrapper in /tmp
mount -o bind /tmp/wl /usr/sbin/wl

Usage
The quoted code can be saved into shell scripts, for example nvram.sh and wl.sh.
Enable the wrappers by executing the scripts.
Disable by running:
umount /bin/nvram - to deactivate nvram wrapper
umount /usr/sbin/wl - to deactivate wl wrapper
There is a dependency on an Entware addon - timeout. For a safer alternative version, follow the thread.

nvram wrapper

Bash:

#!/bin/sh

# copy original nvram executable to /tmp
cp /bin/nvram /tmp/_nvram

# create nvram wrapper that calls original nvram executable in /tmp
cat << 'EOF' > /tmp/nvram
#!/bin/sh
#set -x # comment/uncomment to disable/enable debug mode
# required for serialization when reentry is possible
LOCK="/tmp/$(basename "$0").lock"
acquire_lock() { until mkdir "$LOCK" &>/dev/null; do touch /tmp/nvram; done; }
release_lock() { rmdir "$LOCK" &>/dev/null; }

# one instance at a time
acquire_lock

# catch premature exit and cleanup
trap 'release_lock; exit 1' SIGHUP SIGINT SIGTERM

# make the new function accessible
#export PATH=/opt/bin:/opt/sbin:$PATH

# clear rc variable
rc=""

# keep count of total session usage
if [ ! -f "/tmp/nvramuse" ]; then
   echo 0 > /tmp/nvramuse
fi
usecount=$(cat /tmp/nvramuse)
usecount=$((usecount + 1 ))
echo $usecount > /tmp/nvramuse

#INTERVAL="100"
MAXCOUNT="3"
run_cmd () {
    local to
    local start
    local child
    # here as the interval number increases, the longer we wait.
    to="$1"
    to="$((to*INTERVAL))"; shift
    $@ & local child="$!" start=0
    touch /tmp/nvram
    while { [ "$(kill -0 $child >/dev/null 2>&1; printf "%s" "$?")" = "0" ] && [ "$start" -le "$to" ]; }; do
        # to account for killing too soon, as the number of tries required increases our count requirement increases before we attempt to kill the process.
        touch /tmp/nvram
        start="$((start+1))"
        if [ $start -gt $to ]; then
            kill -s 9 $child 2>/dev/null
            wait $child
            return 1
        fi
    done
    return 0
}

for INTERVAL in 25 50 75 100; do
# make the new function accessible, on the first run we want to exit right away if successful.
i="1"
if { run_cmd "$i" /tmp/_nvram "$@"; }; then
  rc="0"
else
  rc="1"
fi
logger -t "nvram-override" "Executed nvram $@ (at $INTERVAL), use count: $usecount, exit status: $rc"
# here we add an interval check and allow up to 3 retries.
while [ "$i" -le "$MAXCOUNT" ] && [ "$rc" != "0" ]; do
  touch /tmp/nvram
  if { run_cmd "$i" /tmp/_nvram "$@"; }; then
    rc="0"
  else
    rc="1"
    errcount="$rc"
    if [ ! -f "/tmp/nvramerr" ]; then
       echo 0 > /tmp/nvramerr
    else
       errcount=$(cat /tmp/nvramerr)
    fi
    errcount=$((errcount + 1 ))
    echo $errcount > /tmp/nvramerr
    logger -t "nvram-override" "Error detected at use count: $usecount (at $INTERVAL), error count: $errcount"
    logger -t "nvram-override" "Couldn't execute nvram $@ (at $INTERVAL) , exit status: $rc (124=timeout)"
  fi
  logger -t "nvram-override" "Retried executing nvram $@ (at $INTERVAL), attempt ${i}/${MAXCOUNT}, exit status: $rc"
  i="$((i+1))"
done
[ "$rc" = "0" ] && break || logger -t "nvram-override" "NVRAM remained locked too long; continuing anyway."
done
# any concurrent instance(s) may now run
release_lock
exit $rc
EOF
chmod +x /tmp/nvram

# replace nvram in /usr/sbin w/ nvram wrapper in /tmp
mount -o bind /tmp/nvram /bin/nvram

Adapted for your pleasure....

SomeWhereOverTheRainBow · Jun 17, 2022

wl wrapper

Bash:

#!/bin/sh

# copy original wl executable to /tmp
cp /usr/sbin/wl /tmp/_wl

# create wl wrapper that calls original wl executable in /tmp
cat << 'EOF' > /tmp/wl
#!/bin/sh
#set -x # comment/uncomment to disable/enable debug mode
# required for serialization when reentry is possible
LOCK="/tmp/$(basename "$0").lock"
acquire_lock() { until mkdir "$LOCK" &>/dev/null; do touch /tmp/wl; done; }
release_lock() { rmdir "$LOCK" &>/dev/null; }

# one instance at a time
acquire_lock

# catch premature exit and cleanup
trap 'release_lock; exit 1' SIGHUP SIGINT SIGTERM

# make the new function accessible
#export PATH=/opt/bin:/opt/sbin:$PATH

# clear rc variable
rc=""

# keep count of total session usage
if [ ! -f "/tmp/wluse" ]; then
   echo 0 > /tmp/wluse
fi
usecount=$(cat /tmp/wluse)
usecount=$((usecount + 1 ))
echo $usecount > /tmp/wluse

#INTERVAL="100"
MAXCOUNT="3"
run_cmd () {
    local to
    local start
    local child
    # here as the interval number increases, the longer we wait..
    to="$1"
    to="$((to*INTERVAL))"; shift
    $@ & local child="$!" start=0
    touch /tmp/wl
    while { [ "$(kill -0 $child >/dev/null 2>&1; printf "%s" "$?")" = "0" ] && [ "$start" -le "$to" ]; }; do
        # to account for killing too soon, as the number of tries required increases our count requirement increases before we attempt to kill the process.
        touch /tmp/wl
        start="$((start+1))"
        if [ $start -gt $to ]; then
            kill -s 9 $child 2>/dev/null
            wait $child
            return 1
        fi
    done
    return 0
}

for INTERVAL in 25 50 75 100; do
# make the new function accessible, on the first run we want to exit right away if successful.
i="1"
if { run_cmd "$i" /tmp/_wl "$@"; }; then
  rc="0"
else
  rc="1"
fi
logger -t "wl-override" "Executed wl $@ (at $INTERVAL), use count: $usecount, exit status: $rc"
# here we add an interval check and allow up to 3 retries.
while [ "$i" -le "$MAXCOUNT" ] && [ "$rc" != "0" ]; do
  touch /tmp/wl
  if { run_cmd "$i" /tmp/_wl "$@"; }; then
    rc="0"
  else
    rc="1"
    errcount="$rc"
    if [ ! -f "/tmp/wlerr" ]; then
       echo 0 > /tmp/wlerr
    else
       errcount=$(cat /tmp/wlerr)
    fi
    errcount=$((errcount + 1 ))
    echo $errcount > /tmp/wlerr
    logger -t "wl-override" "Error detected at use count: $usecount (at $INTERVAL), error count: $errcount"
    logger -t "wl-override" "Couldn't execute wl $@ (at $INTERVAL), exit status: $rc (124=timeout)"
  fi
  logger -t "wl-override" "Retried executing wl $@ (at $INTERVAL), attempt ${i}/${MAXCOUNT}, exit status: $rc"
  i="$((i+1))"
done
[ "$rc" = "0" ] && break || logger -t "wl-override" "wl remained locked too long; continuing anyway."
done
# any concurrent instance(s) may now run
release_lock

exit $rc
EOF
chmod +x /tmp/wl

# replace wl in /usr/sbin w/ wl wrapper in /tmp
mount -o bind /tmp/wl /usr/sbin/wl

Oracle · Jun 17, 2022

Do we need this line any more (and the entire loop)?
for INTERVAL in 25 50 75 100; do...

As time goes on and I see how regular this bug is, I'm leaning more and more towards the suspicion these 2 functions are ok but there is something else interfering (blocking them).
The reason we see many hung up wl and nvram commands is because they are very, very frequently used. I counted >250 wl commands per hour (4 every minute and then some).

SomeWhereOverTheRainBow · Jun 17, 2022

Oracle said:
Do we need this line any more (and the entire loop)?
for INTERVAL in 25 50 75 100; do...

As time goes on and I see how regular this bug is, I'm leaning more and more towards the suspicion these 2 functions are ok but there is something else interfering (blocking them).
The reason we see many hung up wl and nvram commands is because they are very, very frequently used. I counted >250 wl commands per hour (4 every minute and then some).

The interval is actually there if we need a longer time before we kill the process.

A for loop is perfect for this because it keeps you from having to optimize the interval.

Give it a try with your nvram and see what the logs are like now that all the statistics have been added.

In most instances you will see the interval is always complete at 25, but there may be sometimes where operations are so rough on the router, that you wish it had been there.

Using the very wise vernacular of @dave14305 , I am done bloviating. You are welcome to adjust it however you like. I was just adapting it to include your statistic modifications.

Oracle · Jun 17, 2022

In fact out of ~50 failed read attempts, 2 had to go via 2 retries to reach rc=0. This is extreme but not totally useless. With 1 failure per approx. 4000 calls, 2 out of 50 failed means roughly 2 double retries out of 200 K actual nvram calls.

When Viktor pointed me to this bug I thought that was totally insignificant (given how rare it seemed). Then I saw 2 stuck scripts just after a day of router uptime. Unless we've got something else wrong, this will improve the stability of the router until Asus fix the root cause (if they ever do it).

I am grateful to all the gurus who helped here!

SomeWhereOverTheRainBow · Jun 17, 2022

Oracle said:
In fact out of ~50 failed read attempts, 2 had to go via 2 retries to reach rc=0. This is extreme but not totally useless. With 1 failure per approx. 4000 calls, 2 out of 50 failed means roughly 2 double retries out of 200 K actual nvram calls.

When Viktor pointed me to this bug I thought that was totally insignificant (given how rare it seemed). Then I saw 2 stuck scripts just after a day of router uptime. Unless we've got something else wrong, this will improve the stability of the router until Asus fix the root cause (if they ever do it).

I am grateful to all the gurus who helped here!

Hey oracle, you are welcome. If you decide to fire up my current one using your plugged in statics feel free to share what they look like. I remember what yours looked like with the timeout. it was quite interesting. I want to see how many retries and what not.

SomeWhereOverTheRainBow · Jun 17, 2022

@Oracle , I just had to fix a brace on my test, but it should be good now. Revised it.

Oracle · Jun 17, 2022

It's fine.
3000 test loop iterations took 5:08 min.
Caught 1 fish, 1 retry.
No missing output values, nothing stuck.

For anyone who wants to use these overrides beyond initial testing, please comment out this line:
logger -t "nvram-override" "Executed nvram $@, use count: $usecount, exit status: $rc"

I used if for measurement and debugging - it prints every single nvram call and can flood your system log.
Same for the wl version.

SomeWhereOverTheRainBow · Jun 17, 2022

@Oracle

I added (at $INTERVAL) into the loggers so you can investigate if it is worth having the one loop.

Here you can investigate at lower intervals like :5 :10 :20

Bash:

#!/bin/sh

# copy original nvram executable to /tmp
cp /bin/nvram /tmp/_nvram

# create nvram wrapper that calls original nvram executable in /tmp
cat << 'EOF' > /tmp/nvram
#!/bin/sh
#set -x # comment/uncomment to disable/enable debug mode
# required for serialization when reentry is possible
LOCK="/tmp/$(basename "$0").lock"
acquire_lock() { until mkdir "$LOCK" &>/dev/null; do touch /tmp/nvram; done; }
release_lock() { rmdir "$LOCK" &>/dev/null; }

# one instance at a time
acquire_lock

# catch premature exit and cleanup
trap 'release_lock; exit 1' SIGHUP SIGINT SIGTERM

# make the new function accessible
#export PATH=/opt/bin:/opt/sbin:$PATH

# clear rc variable
rc=""

# keep count of total session usage
if [ ! -f "/tmp/nvramuse" ]; then
   echo 0 > /tmp/nvramuse
fi
usecount=$(cat /tmp/nvramuse)
usecount=$((usecount + 1 ))
echo $usecount > /tmp/nvramuse

#INTERVAL="100"
MAXCOUNT="3"
run_cmd () {
    local to
    local start
    local child
    # here as the interval number increases, the longer we wait.
    to="$1"
    to="$((to*INTERVAL))"; shift
    $@ & local child="$!" start=0
    touch /tmp/nvram
    while { [ "$(kill -0 $child >/dev/null 2>&1; printf "%s" "$?")" = "0" ] && [ "$start" -le "$to" ]; }; do
        # to account for killing too soon, as the number of tries required increases our count requirement increases before we attempt to kill the process.
        touch /tmp/nvram
        start="$((start+1))"
        if [ $start -gt $to ]; then
            kill -s 9 $child 2>/dev/null
            wait $child
            return 1
        fi
    done
    return 0
}

for INTERVAL in 5 10 20; do
# make the new function accessible, on the first run we want to exit right away if successful.
i="1"
if { run_cmd "$i" /tmp/_nvram "$@"; }; then
  rc="0"
else
  rc="1"
fi
logger -t "nvram-override" "Executed nvram $@ (at $INTERVAL), use count: $usecount, exit status: $rc"
# here we add an interval check and allow up to 3 retries.
while [ "$i" -le "$MAXCOUNT" ] && [ "$rc" != "0" ]; do
  touch /tmp/nvram
  if { run_cmd "$i" /tmp/_nvram "$@"; }; then
    rc="0"
  else
    rc="1"
    errcount="$rc"
    if [ ! -f "/tmp/nvramerr" ]; then
       echo 0 > /tmp/nvramerr
    else
       errcount=$(cat /tmp/nvramerr)
    fi
    errcount=$((errcount + 1 ))
    echo $errcount > /tmp/nvramerr
    logger -t "nvram-override" "Error detected at use count: $usecount (at $INTERVAL), error count: $errcount"
    logger -t "nvram-override" "Couldn't execute nvram $@ (at $INTERVAL) , exit status: $rc (124=timeout)"
  fi
  logger -t "nvram-override" "Retried executing nvram $@ (at $INTERVAL), attempt ${i}/${MAXCOUNT}, exit status: $rc"
  i="$((i+1))"
done
[ "$rc" = "0" ] && break || logger -t "nvram-override" "NVRAM remained locked too long; continuing anyway."
done
# any concurrent instance(s) may now run
release_lock
exit $rc
EOF
chmod +x /tmp/nvram

# replace nvram in /usr/sbin w/ nvram wrapper in /tmp
mount -o bind /tmp/nvram /bin/nvram

i drastically dropped the interval to see how it impacts performance.

Oracle · Jun 17, 2022

@SomeWhereOverTheRainBow I guess I'll answer tomorrow about the intervals.

Meantime the guesthouse is open for feedback from other users.

Does this work for you, did you try using the wrappers, did you face any difficulty?

SomeWhereOverTheRainBow · Jun 17, 2022

@Oracle I also just replaced the "sleep 2" inside the lock commands with touch commands.

SomeWhereOverTheRainBow · Jun 17, 2022

Oracle said:
@SomeWhereOverTheRainBow I guess I'll answer tomorrow about the intervals.

Meantime the guesthouse is open for feedback from other users.
Does this work for you, did you try using the wrappers, did you face any difficulty?

@Oracle I cannot "truely" test it to the extent you can. The only difficulty I faced was I had to "force failures" to make the script work since I do not posess a RT-AC86U with this issue.

I imagine they will have a fix down the road, but a true fix might come from those doing diagnostics and submitting their findings. In this regard, @dave14305 and @ColinTaylor are off to a great start. I just hope someone shares their diagnostic works with Asus down the road.

Oracle · Jun 18, 2022

Here's some log output from running the test loop on tighter timings.
It reported 5 retries, while the average is 2-3.

Jun 19 00:37:56 nvram-override: Executed nvram get vpn_client1_state (at 5), use count: 51, exit status: 0
Jun 19 00:37:56 nvram-override: Executed nvram get vpn_client2_state (at 5), use count: 52, exit status: 0
Jun 19 00:37:56 nvram-override: Executed nvram get vpn_client3_state (at 5), use count: 53, exit status: 0
Jun 19 00:37:57 nvram-override: Executed nvram get vpn_client4_state (at 5), use count: 54, exit status: 0
Jun 19 00:37:57 nvram-override: Executed nvram get vpn_client5_state (at 5), use count: 55, exit status: 0
Jun 19 00:41:13 nvram-override: Retried executing nvram get vpn_client3_state (at 5), attempt 1/3, exit status: 0
Jun 19 00:42:25 nvram-override: Retried executing nvram get vpn_client5_state (at 5), attempt 1/3, exit status: 0
Jun 19 00:42:59 nvram-override: Retried executing nvram get vpn_client5_state (at 5), attempt 1/3, exit status: 0
Jun 19 00:43:34 nvram-override: Retried executing nvram get vpn_client4_state (at 5), attempt 1/3, exit status: 0
Jun 19 00:43:35 nvram-override: Retried executing nvram get vpn_client5_state (at 5), attempt 1/3, exit status: 0

Running time appears unaffected: 5:07 min per 3000 loop iterations.

I switched to these versions because the other one using timeout falls apart if I unplug the USB drive.
Experienced it during the shutdown process. Unmounted the USB device and before executing sync;halt a few seconds later I was flooded with errors.

Thread starter	Title	Forum	Replies	Date
	Sharing my DualWan with port forwarding and DDNS experience	Asuswrt-Merlin	0	Mar 31, 2024
	ASUS Merlin on RT-AC86U OpenVPN Server not blocking IP when a client connects	Asuswrt-Merlin	2	Jan 8, 2025
M	AC86U - SSH Disabled few seconds after boot - 386.14.	Asuswrt-Merlin	2	Nov 16, 2024
B	Unable to establish VPN connection to my PiVPN (ovpn) from my Asus RT-AC86U running Asuswrt-Merlin 386.14	Asuswrt-Merlin	1	Oct 27, 2024
C	Need info on bringing AC86U back to life as AIMesh unit	Asuswrt-Merlin	3	Sep 28, 2024
D	VPN director issue on RT-AC86U (386.14 Merlin FW)	Asuswrt-Merlin	9	Sep 24, 2024
	Happy and sad for RT-AC86U	Asuswrt-Merlin	10	Sep 13, 2024
H	Extremely low speed on ac86u with openvpn client	Asuswrt-Merlin	7	Aug 25, 2024
P	EoL RT-AC86U replacement	Asuswrt-Merlin	7	Aug 16, 2024
H	GT-AC29000 and AC86U mesh together	Asuswrt-Merlin	10	Jul 28, 2024

My experience with the RT-AC86U

Part of the Furniture

Regular Contributor

Part of the Furniture

Regular Contributor

Regular Contributor

Regular Contributor

Regular Contributor

Part of the Furniture

Part of the Furniture

Regular Contributor

Part of the Furniture

Regular Contributor

Part of the Furniture

Part of the Furniture

Regular Contributor

Part of the Furniture

Regular Contributor

Part of the Furniture

Part of the Furniture

Regular Contributor

Similar threads

Similar threads

Support SNBForums w/ Amazon

Sign Up For SNBForums Daily Digest