What's new
  • SNBForums Code of Conduct

    SNBForums is a community for everyone, no matter what their level of experience.

    Please be tolerant and patient of others, especially newcomers. We are all here to share and learn!

    The rules are simple: Be patient, be nice, be helpful or be gone!

My experience with the RT-AC86U

@dave14305 Can you try strace'ing a variable that's been moved jffs? Notice how it doesn't use netlink but normal file IO, with locking. Just curious at what point it fails, or maybe it doesn't.
Code:
strace -r nvram get dhcp_staticlist
Haven’t seen it fail yet. So netlink is probably the common thread between nvram and wl hangs.
Code:
root# strace -ry nvram get dhcp_staticlist
     0.000000 execve("/bin/nvram", ["nvram", "get", "dhcp_staticlist"], 0x7fdcb5d0b8 /* 18 vars */ <unfinished ...>
     0.000442 [ Process PID=5055 runs in 32 bit mode. ]
strace: WARNING: Proper structure decoding for this personality is not supported, please consider building strace with mpers support enabled.
     0.000028 <... execve resumed>)     = 0
     0.000069 brk(NULL)                 = 0x77e000
     0.000054 uname({sysname="Linux", nodename="router", ...}) = 0
     0.000094 mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xf70ff000
     0.000055 access("/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or directory)
     0.000125 open("/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3</tmp/etc/ld.so.cache>
     0.000218 fstat64(3</tmp/etc/ld.so.cache>, 0xfff5b970) = 0
     0.000060 mmap2(NULL, 10528, PROT_READ, MAP_PRIVATE, 3</tmp/etc/ld.so.cache>, 0) = 0xf70fc000
     0.000055 close(3</tmp/etc/ld.so.cache>) = 0
     0.000062 open("/lib/libc.so.6", O_RDONLY|O_CLOEXEC) = 3</lib/libc.so.6>
     0.000058 read(3</lib/libc.so.6>, "\177ELF\1\1\1\3\0\0\0\0\0\0\0\0\3\0(\0\1\0\0\0xm\1\0004\0\0\0"..., 512) = 512
     0.000061 fstat64(3</lib/libc.so.6>, 0xfff5b9b0) = 0
     0.000050 mmap2(NULL, 1299824, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3</lib/libc.so.6>, 0) = 0xf6f93000
     0.000056 mprotect(0xf70bb000, 65536, PROT_NONE) = 0
     0.000049 mmap2(0xf70cb000, 12288, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3</lib/libc.so.6>, 0x128000) = 0xf70cb000
     0.000070 mmap2(0xf70ce000, 9584, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0xf70ce000
     0.000057 close(3</lib/libc.so.6>)  = 0
     0.000065 open("/lib/libgcc_s.so.1", O_RDONLY|O_CLOEXEC) = 3</lib/libgcc_s.so.1>
     0.000058 read(3</lib/libgcc_s.so.1>, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0(\0\1\0\0\0\254\317\0\0004\0\0\0"..., 512) = 512
     0.000059 fstat64(3</lib/libgcc_s.so.1>, 0xfff5b998) = 0
     0.000049 mmap2(NULL, 182372, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3</lib/libgcc_s.so.1>, 0) = 0xf6f66000
     0.000054 mprotect(0xf6f83000, 61440, PROT_NONE) = 0
     0.000044 mmap2(0xf6f92000, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3</lib/libgcc_s.so.1>, 0x1c000) = 0xf6f92000
     0.000068 close(3</lib/libgcc_s.so.1>) = 0
     0.000058 open("/lib/libdl.so.2", O_RDONLY|O_CLOEXEC) = 3</lib/libdl.so.2>
     0.000057 read(3</lib/libdl.so.2>, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0(\0\1\0\0\0(\t\0\0004\0\0\0"..., 512) = 512
     0.000288 fstat64(3</lib/libdl.so.2>, 0xfff5b980) = 0
     0.000054 mmap2(NULL, 73912, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3</lib/libdl.so.2>, 0) = 0xf6f53000
     0.000054 mprotect(0xf6f55000, 61440, PROT_NONE) = 0
     0.000044 mmap2(0xf6f64000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3</lib/libdl.so.2>, 0x1000) = 0xf6f64000
     0.000078 close(3</lib/libdl.so.2>) = 0
     0.000052 open("/lib/libnvram.so", O_RDONLY|O_CLOEXEC) = 3</lib/libnvram.so>
     0.000059 read(3</lib/libnvram.so>, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0(\0\1\0\0\08\t\0\0004\0\0\0"..., 512) = 512
     0.000058 fstat64(3</lib/libnvram.so>, 0xfff5b968) = 0
     0.000048 mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xf70fb000
     0.000045 mmap2(NULL, 69528, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3</lib/libnvram.so>, 0) = 0xf6f42000
     0.000055 mprotect(0xf6f43000, 61440, PROT_NONE) = 0
     0.000044 mmap2(0xf6f52000, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3</lib/libnvram.so>, 0) = 0xf6f52000
     0.000065 close(3</lib/libnvram.so>) = 0
     0.000052 open("/usr/lib/libshared.so", O_RDONLY|O_CLOEXEC) = 3</usr/lib/libshared.so>
     0.000061 read(3</usr/lib/libshared.so>, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0(\0\1\0\0\0\214X\1\0004\0\0\0"..., 512) = 512
     0.000059 fstat64(3</usr/lib/libshared.so>, 0xfff5b950) = 0
     0.000049 mmap2(NULL, 646528, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3</usr/lib/libshared.so>, 0) = 0xf6ea4000
     0.000054 mprotect(0xf6f11000, 65536, PROT_NONE) = 0
     0.000045 mmap2(0xf6f21000, 45056, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3</usr/lib/libshared.so>, 0x6d000) = 0xf6f21000
     0.000068 mmap2(0xf6f2c000, 89472, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0xf6f2c000
     0.000058 close(3</usr/lib/libshared.so>) = 0
     0.000058 open("/lib/libwlcsm.so", O_RDONLY|O_CLOEXEC) = 3</lib/libwlcsm.so>
     0.000059 read(3</lib/libwlcsm.so>, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0(\0\1\0\0\0\200\34\0\0004\0\0\0"..., 512) = 512
     0.000059 fstat64(3</lib/libwlcsm.so>, 0xfff5b938) = 0
     0.000049 mmap2(NULL, 90092, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3</lib/libwlcsm.so>, 0) = 0xf6e8e000
     0.000053 mprotect(0xf6e93000, 61440, PROT_NONE) = 0
     0.000043 mmap2(0xf6ea2000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3</lib/libwlcsm.so>, 0x4000) = 0xf6ea2000
     0.000076 close(3</lib/libwlcsm.so>) = 0
     0.000062 open("/lib/libpthread.so.0", O_RDONLY|O_CLOEXEC) = 3</lib/libpthread.so.0>
     0.000059 read(3</lib/libpthread.so.0>, "\177ELF\1\1\1\3\0\0\0\0\0\0\0\0\3\0(\0\1\0\0\0\\I\0\0004\0\0\0"..., 512) = 512
     0.000059 fstat64(3</lib/libpthread.so.0>, 0xfff5b770) = 0
     0.000049 mmap2(NULL, 164436, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3</lib/libpthread.so.0>, 0) = 0xf6e65000
     0.000055 mprotect(0xf6e7a000, 65536, PROT_NONE) = 0
     0.000052 mmap2(0xf6e8a000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3</lib/libpthread.so.0>, 0x15000) = 0xf6e8a000
     0.000072 mmap2(0xf6e8c000, 4692, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0xf6e8c000
     0.000058 close(3</lib/libpthread.so.0>) = 0
     0.000083 mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xf70fa000
     0.000053 set_tls(0xf70fa780)       = 0
     0.000196 mprotect(0xf70cb000, 8192, PROT_READ) = 0
     0.000086 mprotect(0xf6e8a000, 4096, PROT_READ) = 0
     0.000364 mprotect(0xf6f64000, 4096, PROT_READ) = 0
     0.000085 mprotect(0xf7100000, 4096, PROT_READ) = 0
     0.000049 munmap(0xf70fc000, 10528) = 0
     0.000049 set_tid_address(0xf70fa328) = 5055
     0.000034 set_robust_list(0xf70fa330, 12) = 0
     0.000050 rt_sigaction(SIGRTMIN, {sa_handler=0xf6e69278, sa_mask=[], sa_flags=SA_RESTORER|SA_SIGINFO, sa_restorer=0xf6fbfb10}, NULL, 8) = 0
     0.000062 rt_sigaction(SIGRT_1, {sa_handler=0xf6e6935c, sa_mask=[], sa_flags=SA_RESTORER|SA_RESTART|SA_SIGINFO, sa_restorer=0xf6fbfb10}, NULL, 8) = 0
     0.000053 rt_sigprocmask(SIG_UNBLOCK, [RTMIN RT_1], NULL, 8) = 0
     0.000054 ugetrlimit(RLIMIT_STACK, {rlim_cur=2048*1024, rlim_max=RLIM_INFINITY}) = 0
     0.000202 mmap2(NULL, 135168, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xf6e44000
     0.000079 stat64("/jffs", 0xfff5bcc0) = 0
     0.000049 stat64("/jffs/nvram_war", 0xfff5bcc0) = 0
     0.000051 open("/var/nvram.lock", O_WRONLY|O_CREAT, 0644) = 3</var/nvram.lock>
     0.000070 flock(3</var/nvram.lock>, LOCK_EX) = 0
     0.000095 stat64("/jffs/nvram", 0xfff5acb0) = 0
     0.000045 stat64("/jffs/nvram/dhcp_staticlist", 0xfff5acb0) = 0
     0.000052 open("/jffs/nvram/dhcp_staticlist", O_RDONLY) = 4</jffs/nvram/dhcp_staticlist>
     0.000072 read(4</jffs/nvram/dhcp_staticlist>, "<00:9C>192.168.1.5>>", 16383) = 32
     0.000059 close(4</jffs/nvram/dhcp_staticlist>) = 0
     0.000049 close(3</var/nvram.lock>) = 0
     0.000059 fstat64(1</dev/pts/0>, 0xfff5bcf0) = 0
     0.000051 mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xf70fe000
     0.000047 write(1</dev/pts/0>, "<00:9C>192.168.1.5>>"..., 33<00:9C>192.168.1.5>>
) = 33
     0.000061 munmap(0xf6e44000, 135168) = 0
     0.000079 exit_group(0)             = ?
     0.000227 +++ exited with 0 +++
 
Why can't I collect any diagnostic messages from the nvram wrapper?
logger doesn't work whatever I try.
What does set -x do?

The test loop script is fine, it captures any blank output and writes a message to the log. I'm attaching an example. Search for the word "value" and you'll see the problematic lines.

Update:
Time to answer myself.
In the console, whenever the nvram get command gets stuck and terminated by the timeout, the output is empty.
In the wrapper, rc=$? becomes "124". This why my IF statement never worked - $rc isn't empty. Time to revise the wrapper code again. I'm afraid for the time being I don't know how to make this right.
Basically I mixed up the exit status and the output. :)
 

Attachments

Last edited:
Please don't laugh at me for silly mistakes - this is way outside my comfort zone.
Here's an alternative version of the wrapper script that uses timeout. There is a slight performance penalty but the benefit is no more stuck nvram get commands at least as far as I can see in htop and the failed command output is available after an automatic retry.

Bash:
#!/bin/sh
# before running this, make sure to install timeout
# opkg install coreutils-timeout

# copy original nvram executable to /tmp
cp /bin/nvram /tmp/_nvram

# create nvram wrapper that calls original nvram executable in /tmp
cat << 'EOF' > /tmp/nvram
#!/bin/sh
#set -x # comment/uncomment to disable/enable debug mode

# make the new function accessible
#export PATH=/opt/bin:/opt/sbin:$PATH

# clear rc variable, may be unnecessary
rc=""

timeout 1 /tmp/_nvram $@
rc=$?

#echo "$rc" >> /tmp/nvram.log

if [ "$rc" != "0" ]; then
   #echo "$rc" >> /tmp/nvram.log
   usleep 1000
   logger -t "nvram-override" "Couldn't execute nvram $@, exit status: $rc (124=timeout)"
   timeout 1 /tmp/_nvram $@
   rc=$?
   logger -t "nvram-override" "Retried executing nvram $@, exit status: $rc"
fi

exit $rc
EOF
chmod +x /tmp/nvram

# replace nvram in /usr/sbin w/ nvram wrapper in /tmp
mount -o bind /tmp/nvram /bin/nvram

I had to rename the attachment to txt for the forum to accept it.
 

Attachments

Please don't laugh at me for silly mistakes - this is way outside my comfort zone.
Here's an alternative version of the wrapper script that uses timeout. There is a slight performance penalty but the benefit is no more stuck nvram get commands at least as far as I can see in htop and the failed command output is available after an automatic retry.

Bash:
#!/bin/sh
# before running this, make sure to install timeout
# opkg install coreutils-timeout

# copy original nvram executable to /tmp
cp /bin/nvram /tmp/_nvram

# create nvram wrapper that calls original nvram executable in /tmp
cat << 'EOF' > /tmp/nvram
#!/bin/sh
#set -x # comment/uncomment to disable/enable debug mode

# make the new function accessible
#export PATH=/opt/bin:/opt/sbin:$PATH

# clear rc variable, may be unnecessary
rc=""

timeout 1 /tmp/_nvram $@
rc=$?

#echo "$rc" >> /tmp/nvram.log

if [ "$rc" != "0" ]; then
   #echo "$rc" >> /tmp/nvram.log
   usleep 1000
   logger -t "nvram-override" "Couldn't execute nvram $@, exit status: $rc (124=timeout)"
   timeout 1 /tmp/_nvram $@
   rc=$?
   logger -t "nvram-override" "Retried executing nvram $@, exit status: $rc"
fi

exit $rc
EOF
chmod +x /tmp/nvram

# replace nvram in /usr/sbin w/ nvram wrapper in /tmp
mount -o bind /tmp/nvram /bin/nvram

I had to rename the attachment to txt for the forum to accept it.
Also, We must consider that entware is required to run this. It would be nice if the timeout could be applied without entware packages, but I guess that is wishful thinking.
 
Please don't laugh at me for silly mistakes - this is way outside my comfort zone.
Here's an alternative version of the wrapper script that uses timeout. There is a slight performance penalty but the benefit is no more stuck nvram get commands at least as far as I can see in htop and the failed command output is available after an automatic retry.

Bash:
#!/bin/sh
# before running this, make sure to install timeout
# opkg install coreutils-timeout

# copy original nvram executable to /tmp
cp /bin/nvram /tmp/_nvram

# create nvram wrapper that calls original nvram executable in /tmp
cat << 'EOF' > /tmp/nvram
#!/bin/sh
#set -x # comment/uncomment to disable/enable debug mode

# make the new function accessible
#export PATH=/opt/bin:/opt/sbin:$PATH

# clear rc variable, may be unnecessary
rc=""

timeout 1 /tmp/_nvram $@
rc=$?

#echo "$rc" >> /tmp/nvram.log

if [ "$rc" != "0" ]; then
   #echo "$rc" >> /tmp/nvram.log
   usleep 1000
   logger -t "nvram-override" "Couldn't execute nvram $@, exit status: $rc (124=timeout)"
   timeout 1 /tmp/_nvram $@
   rc=$?
   logger -t "nvram-override" "Retried executing nvram $@, exit status: $rc"
fi

exit $rc
EOF
chmod +x /tmp/nvram

# replace nvram in /usr/sbin w/ nvram wrapper in /tmp
mount -o bind /tmp/nvram /bin/nvram

I had to rename the attachment to txt for the forum to accept it.
Oracle try this

Code:
#!/bin/sh

# copy original nvram executable to /tmp
cp /bin/nvram /tmp/_nvram

# create nvram wrapper that calls original nvram executable in /tmp
cat << 'EOF' > /tmp/nvram
#!/bin/sh
#set -x # comment/uncomment to disable/enable debug mode

run_cmd () {
    cmd="$3"; nv="$2"; timeout="$1";
    [ "$(ps -T | grep -q "$2 $3" | wc -l )" -le "1" ] && timeout="$1" || timeout="10"
    (
        eval "$nv" "$cmd" &
        child=$!
        trap -- "" SIGTERM
        (
                sleep $timeout
                kill $child 2> /dev/null
        ) &
        wait $child
    )
}

# make the new function accessible

# clear rc variable, may be unnecessary
rc=""

run_cmd 1 /tmp/_nvram $@
rc=$?

#echo "$rc" >> /tmp/nvram.log

if [ "$rc" != "0" ]; then
   #echo "$rc" >> /tmp/nvram.log
   usleep 1000
   logger -t "nvram-override" "Couldn't execute nvram $@, exit status: $rc (124=timeout)"
   run_cmd 1 /tmp/_nvram $@
   rc=$?
   logger -t "nvram-override" "Retried executing nvram $@, exit status: $rc"
fi

exit $rc
EOF
chmod +x /tmp/nvram

# replace nvram in /usr/sbin w/ nvram wrapper in /tmp
mount -o bind /tmp/nvram /bin/nvram

Scratch that, it leaves you not knowing the command return.
 
Last edited:
@chongnt

here is one that might work.

Code:
#!/bin/sh

# copy original nvram executable to /tmp
cp /bin/nvram /tmp/_nvram

# create nvram wrapper that calls original nvram executable in /tmp
cat << 'EOF' > /tmp/nvram
#!/bin/sh
#set -x # comment/uncomment to disable/enable debug mode

run_cmd () {
    to=$1; shift
    $@ & local child=$! start=0
     while kill -0 $child 2>/dev/null; do
        read -t 1
        start=$((start+1))
        if [ $start -ge $to ]; then
            kill -s 9 $child 2>/dev/null
            break
        fi
    done
}

# make the new function accessible

# clear rc variable, may be unnecessary
rc=""

run_cmd 1 /tmp/_nvram $@
rc=$?

#echo "$rc" >> /tmp/nvram.log

if [ "$rc" != "0" ]; then
   #echo "$rc" >> /tmp/nvram.log
   sleep 1
   logger -t "nvram-override" "Couldn't execute nvram $@, exit status: $rc (124=timeout)"
   run_cmd 1 /tmp/_nvram $@
   rc=$?
   logger -t "nvram-override" "Retried executing nvram $@, exit status: $rc"
fi

exit $rc
EOF
chmod +x /tmp/nvram

# replace nvram in /usr/sbin w/ nvram wrapper in /tmp
mount -o bind /tmp/nvram /bin/nvram
 
Somewhere in this thread, amid all the bloviating, is the core issue of nvram hanging on a kernel netlink call, and the suggestion by @csj97229 that setting the nl_pid is problematic (at least in other applications that have experienced a similar issue).
 
Somewhere in this thread, amid all the bloviating, is the core issue of nvram hanging on a kernel netlink call, and the suggestion by @csj97229 that setting the nl_pid is problematic (at least in other applications that have experienced a similar issue).
Yea I am somewhat still baffled by the fact that this type of behavior is happening on some of the RT-AC86U. The wrapper concept is a bandaid idea, if it could actually work. I definitely would consider replacing the router at this point until the issue is resolved.
 
Last edited:
Somewhere in this thread, amid all the bloviating, is the core issue of nvram hanging on a kernel netlink call, and the suggestion by @csj97229 that setting the nl_pid is problematic (at least in other applications that have experienced a similar issue).

I could understand your concerns if there was something else the end-user could do about it, or expect a fix from the developer. But presumably that's NOT the case. So as @SomeWhereOverTheRainBow suggests, that leaves few other options than to buy a new router. Under such circumstances, I can see the continued interest in a workaround (as bad as it may be).

As I stated previously, I've never been able to reproduce this problem on my AC68U, so it's looking more and more like this is a hardware-specific issue.

Frankly, I've seen enough bad things w/ this AC86U over the years that I'd never buy one. I know many users have no issues, but compared to something like the AX86U, which seems far more reliable (at least anecdotally), I would just avoid the AC86U completely. But I understand that's NOT going to resolve the issue for those currently committed to it.
 
Last I recall, @Viktor Jaep had already tried using timeout (at my suggestion). At least while experimenting. But as a long term solution, while it will allow the script to continue, the call to nvram still fails. Or do we know the call only hangs *after* the is value is successfully gotten or set?
That's correct... The timeout command has been the only saving grace in this whole thing that allows my script to run indefinitely.

So what's your opinion on moving forward with all this? Worth submitting a bug report to Asus over this? Anyone have any good avenues to get this to the right people?
 
That's correct... The timeout command has been the only saving grace in this whole thing that allows my script to run indefinitely.

So what's your opinion on moving forward with all this? Worth submitting a bug report to Asus over this? Anyone have any good avenues to get this to the right people?


I frankly don't know where we stand at this moment w/ the changes to my script made by @Oracle to retry using timeout until it succeeded. I saw some concerns over using Entware to gain access to the timeout executable. I'd be willing to consider the same, but again, if @Oracle is making progress, then I don't want to reinvent the wheel.
 

I frankly don't know where we stand at this moment w/ the changes to my script made by @Oracle to retry using timeout until it succeeded. I saw some concerns over using Entware to gain access to the timeout executable. I'd be willing to consider the same, but again, if @Oracle is making progress, then I don't want to reinvent the wheel.
It would be nice to have a solution that doesn't require entware simply because addons do rely on checking nvram sometimes well before entware is active.

I attempted to do such justice here:


with a runcmd that offers similar results as timeout. It seems to adequately serve up the nvram get when I try it out, but I am not sure how well it works as it has not been tested in their extreme conditions where nvram get breaks.
 
Last edited:
That's correct... The timeout command has been the only saving grace in this whole thing that allows my script to run indefinitely.

So what's your opinion on moving forward with all this? Worth submitting a bug report to Asus over this? Anyone have any good avenues to get this to the right people?
I just give your script a run again today and unfortunately I am consistently hitting the issue now. Sometimes it even stuck below 200 counts. One of the difference is two days ago when I run this script successfully, some processes already stuck, for example /usr/sbin/wl, conn_diag. I have drop a message to my local Asus support. I will wait for a day or two when other process stuck then run this again to see if it makes any difference.

It would be nice to have a solution that doesn't require entware simply because addons do rely on checking nvram sometimes well before entware is active.

I attempted to do such justice here:


with a runcmd that offers similar results as timeout. It seems to adequately serve up the nvram get when I try it out, but I am not sure how well it works as it has not been tested in their extreme conditions where nvram get breaks.
I am now running your script. It takes a sec for each nvram get command to run.

Code:
+ true
+ i=176
+ nvram get vpn_client1_state
+ state1=2
+ nvram get vpn_client2_state
+ state2=0
+ nvram get vpn_client3_state
+ state3=2
+ nvram get vpn_client4_state
+ state4=0
+ nvram get vpn_client5_state
+ state5=0
+ clear
+ echo 2 0 2 0 0
2 0 2 0 0
+ echo 176
176
... snipped...
 
1. Alternative solution without timeout
@SomeWhereOverTheRainBow I'll test your code.
Would you please explain to me what this part actually does?
Bash:
run_cmd () {
    to=$1; shift
    $@ & local child=$! start=0
     while kill -0 $child 2>/dev/null; do
        read -t 1
        start=$((start+1))
        if [ $start -ge $to ]; then
            kill -s 9 $child 2>/dev/null
            break
        fi
    done
}

2. The current wrapper script with timeout
This is still work in progress. It lacks protection from running when timeout is not available. It's installed as an addon on external USB device. Quite unsafe given it would be managing such a core operation as nvram get / set. Would it be possible to install timeout on the jffs partition to make this a bit safer?
The $PATH has to be checked. I have a line exporting /opt/bin:/opt/sbin but that needs refinement because I ended up having these paths added more than once.
I also wish to have a counter how many times the override has been invoked, so I can send to the system log an error message with a number on it (for statistics). Thus we can get a more accurate measure of how often the nvram command is used and how often the error condition happens.

3. AF_NETLINK suspicion
As for the netlink library, presumably setting nl_pid wrongly: this doesn't match very well with empirical observation.
Why is it that >2,700 iterations in a row of the test loop (with 5 nvram get calls inside) work fine? Mind you, these are actually >13,500 successful nvram operations before 1 fails. Unless it's a buffer that runs out or something of this nature, I would expect the nvram reads to fail much more regularly. This could as well be a faulty nvram controller or a bug in the CPU.
We also have a report of an AC86U that doesn't exhibit the faulty behavior. We haven't independently verified it but it deserves attention. - Update, this report has now been recalled. At this point we have no reason to doubt that all AC86U routers have this fault.

4. The "ditch the AC86U and buy AX86U" solution
Well, I don't like it. You know the saying: trick me once - shame on you, trick me twice - shame on me. I don't feel confident to go an shovel more money into a company that has failed me before I see genuine effort to fix the problem.
As I've mentioned, I had a couple of cheap TP-Link routers (sub 50 EUR) that worked flawlessly with custom firmware and a load of addons for around 5 years. I don't like the idea of having to ditch a 120 EUR device + shipping costs less than a month of unboxing it. Asking for a replacement of the same model? Difficult and likely useless. They are out of stock at the place I ordered it, that's quite a bit of shipping costs on me and I have no guarantee that the potential replacement unit won't have the same defect. Quite the opposite - it will probably have it. On top of that, I'll have to deal with the heat dissipation problem all over again.
What about all the other users experiencing the faulty behavior? I don't want to hurt the Asus business but let's look for a better solution first before throwing away devices.

5. Reporting the problem
I believe this problem should be adequately reported to Asus. I have to give them the benefit of the doubt that they were not aware of it and would take efforts to address it. I'm somewhat discouraged by the thermal design - I mean, how could they not have seen that? They clearly went for the thick thermal pads between the ICs and the heatsink - they were completely aware of the big gaps. That's indication of low standards on designing a premium device.
I would still try. I'd report both the poor heat dissipation and the nvram bug. If I contact regular customer service, I'll be most probably dismissed. So how shall I contact them? Could I possibly write a notification letter and submit it via @RMerlin ?

Edited the whole post, reason: refinement.
 
Last edited:
1. Alternative solution without timeout.
@SomeWhereOverTheRainBow I'll test your code.
Would you please explain to me what this part actually does?
Bash:
run_cmd () {
    to=$1; shift
    $@ & local child=$! start=0
     while kill -0 $child 2>/dev/null; do
        read -t 1
        start=$((start+1))
        if [ $start -ge $to ]; then
            kill -s 9 $child 2>/dev/null
            break
        fi
    done
}

2. AF_NETLING suspicion
As for the netlink library, presumably setting nl_pid wrongly: this doesn't match very well with empirical observation.
Why is it that >2,700 iterations in a row of the test loop (with 5 nvram get calls inside) work fine? Mind you, these are actually >13,500 successful nvram operations before 1 fails. Unless it's a buffer that runs out or something of this nature, I would expect the nvram reads to fail much more regularly. This could as well be a faulty nvram controller or a bug in the CPU.
We also have a report of an AC86U that doesn't exhibit the faulty behavior. We haven't independently verified it but it deserves attention.

3. The "ditch the AC86U and buy AX86U" solution.
Well, I don't like it. You know the saying: trick me once - shame on you, trick me twice - shame on me. I don't feel confident to go an shovel more money into a company that has failed me before I see genuine effort to fix the problem.
As I've mentioned, I had a couple of cheap TP-Link routers (sub 50 EUR) that worked flawlessly with custom firmware and a load of addons for over 5 years. I don't like the idea of having to replace a 120 EUR device + shipping costs less than a month of unboxing it. Asking for a replacement of the same model? Difficult. They are out of stock at the place I ordered it, that's quite a bit of shipping costs on me and I have no guarantee that the potential replacement unit won't have the same defect. On top of that, I'll have to deal with the heat dissipation problem all over again.

4. Reporting the problem.
I believe this problem should be adequately reported to Asus. I have to give them the benefit of the doubt that they were not aware of it and would take efforts to address it. I'm somewhat discouraged by the thermal design - I mean, how could they not have seen that? They clearly went for the thick thermal pads between the ICs and the heatsink - they were completely aware of the big gap. That's indication of low standards on designing a premium device.
If I contact regular customer service, I'll be most probably dismissed. So how to contact them?
I am also wondering how his script works. This is all I see when I run it:
Code:
admin@RT-AC86U-DBA8:/jffs/scripts/custom# sh -x zz.sh
+ cp /bin/nvram /tmp/_nvram
+ cat
+ chmod +x /tmp/nvram
+ mount -o bind /tmp/nvram /bin/nvram

Is the AC86U that does not hit the issue referring to my test 2 days ago? If yes, then now I keep hitting the issue without fail. :eek:

Update:
Received feedback from my local Asus support. They will forward this to R&D.
1655371428730.png
 
Last edited:

Similar threads

Support SNBForums w/ Amazon

If you'd like to support SNBForums, just use this link and buy anything on Amazon. Thanks!

Sign Up For SNBForums Daily Digest

Get an update of what's new every day delivered to your mailbox. Sign up here!
Back
Top