...
Initially, this error showed up right in the middle of doing a routine VPN reset (done via script):
...
If I'm reading this right... it may have been hung on this?
Code:
FOUND_12372: [0][ 5930 5920 ViktorJp S 4008 0.3 0 0.0 nvram get vpn_client4_state]
After the tool ran, the netlink socket error was "killed", and the script resumed!
Code:
_wlcsm_create_nl_socket:268: pid:5940 binding netlink socket error!!!
Killed
Based on the output you posted, you had
4 distinct "
nvram" cmd processes stuck at that point. Here is the breakdown:
1) "
nvram get vpn_client4_state" call during execution of "vpnmon-r3.sh -reset" script.
Code:
5930 5920 ViktorJp S 4008 0.3 0 0.0 nvram get vpn_client4_state
5920 13444 ViktorJp S 3508 0.3 1 0.0 sh /jffs/scripts/vpnmon-r3.sh -reset
13444 8962 ViktorJp S 3508 0.3 0 0.0 sh /jffs/scripts/vpnmon-r3.sh -reset
8962 7593 ViktorJp S 3328 0.3 0 0.0 -sh
7593 2915 ViktorJp S 3704 0.3 2 0.0 dropbear -p 192.168.50.1:22 -j -k
2915 1 ViktorJp S 3576 0.3 3 0.0 dropbear -p 192.168.50.1:22 -j -k
2) "
nvram get http_username" call while trying to add a cron job (Diversion_UpdateBL) related to Diversion.
Code:
5925 5924 ViktorJp S 4008 0.3 2 0.0 nvram get http_username
5924 1 ViktorJp S 3328 0.3 0 0.0 {cru} /bin/sh /usr/sbin/cru a Diversion_UpdateBL 00 2 * * Fri /bin/sh /opt/share/diversion/file/update-bl.div reset
3) "
nvram get http_username" call while trying to delete a cron job (Diversion_LocalBackup) related to Diversion.
Code:
2691 2679 ViktorJp S 4008 0.3 0 0.0 nvram get http_username
2679 1 ViktorJp S 3328 0.3 2 0.0 {cru} /bin/sh /usr/sbin/cru d Diversion_LocalBackup
4) "
nvram get http_enable" call during execution of "unbound_manager.sh vpn=1" script.
Code:
2689 2582 ViktorJp S 4008 0.3 1 0.0 nvram get http_enable
2582 2348 ViktorJp S 3872 0.3 1 0.0 {unbound_manager} /bin/sh /jffs/addons/unbound/unbound_manager.sh vpn=1
2348 1 ViktorJp S 3328 0.3 1 0.0 {unbound_DNS_via} /bin/sh /jffs/addons/unbound/unbound_DNS_via_OVPN.sh 1 start
These stuck commands seem to indicate that the problem found in some older AC-class routers (e.g. RT-AC86U) is not yet completely fixed in some newer AX-class routers. It may take much longer to show up, perhaps based on the frequency of
nvram calls being made in a very short time, or it could be something else entirely, but the problem is still there, and your current setup and number of add-on scripts running concurrently probably increases the chances of that happening.
There should be log & trace files created by the "
CheckStuckProcCmds.sh" script that have time stamps and may show further details.
The log file (
CheckStuckProcCmds.LOG) is located in the "
/opt/var/log" directory if you have Entware installed; otherwise, check in the "
/tmp/var/tmp" directory.
The trace file may be located in the "
/opt/var/log/Trace" or the "
/tmp/var/tmp" directory. The trace file name has the following pattern:
StuckProcCmds_{INDEX}_{PID}.TRC.txt
e.g. "
StuckProcCmds_00001_18729.TRC.txt"
I'd recommend running the "
CheckStuckProcCmds.sh" script manually a couple of more times to make sure that all 4 stuck commands have been killed and are no longer there.
HTH.