• SNBForums Code of Conduct

    SNBForums is a community for everyone, no matter what their level of experience.

    Please be tolerant and patient of others, especially newcomers. We are all here to share and learn!

    The rules are simple: Be patient, be nice, be helpful or be gone!

Provider disconnects / VPN does not reconnect

It is not specific to client1.
Hmmmmm, so post #13
Code:
0: from all lookup local
9990: from all fwmark 0x8000/0x8000 lookup main
9991: from all fwmark 0x7000/0x7000 lookup ovpnc4
9992: from all fwmark 0x3000/0x3000 lookup ovpnc5
9993: from all fwmark 0x1000/0x1000 lookup ovpnc1
9994: from all fwmark 0x2000/0x2000 lookup ovpnc2
9995: from all fwmark 0x4000/0x4000 lookup ovpnc3
10201: from 192.168.1.1 lookup main
10301: from 192.168.1.22 lookup ovpnc2
32766: from all lookup main
32767: from all lookup default
shows rule 10301 for VPN Client 2 is still there compared to the missing VPN Client 1 rules 10001-10108 in post #15
Code:
0: from all lookup local
9990: from all fwmark 0x8000/0x8000 lookup main
9991: from all fwmark 0x7000/0x7000 lookup ovpnc4
9992: from all fwmark 0x3000/0x3000 lookup ovpnc5
9993: from all fwmark 0x1000/0x1000 lookup ovpnc1
9994: from all fwmark 0x2000/0x2000 lookup ovpnc2
9995: from all fwmark 0x4000/0x4000 lookup ovpnc3
10001: from 192.168.1.1 lookup main
10002: from 192.168.1.192/28 to 192.168.0.254 lookup main
10003: from 192.168.1.192/28 to 185.246.211.0/24 lookup main
10004: from 192.168.1.192/28 to 193.200.164.0/24 lookup main
10005: from 192.168.1.192/28 to 212.8.242.0/23 lookup main
10006: from 192.168.1.192/28 to 185.59.222.0/24 lookup main
10007: from 192.168.1.192/28 to 84.17.60.0/23 lookup main
10008: from 192.168.1.192/28 to 185.132.176.0/22 lookup main
10009: from 192.168.1.192/28 to 51.77.0.0/16 lookup main
10010: from 192.168.1.192/28 to 145.239.0.0/16 lookup main
10011: from 192.168.1.192/28 to 54.38.0.0/16 lookup main
10012: from 192.168.1.192/28 to 185.172.88.0/22 lookup main
10101: from all to 74.125.0.0/16 lookup ovpnc1
10102: from all to 64.233.160.0/19 lookup ovpnc1
10103: from all to 66.102.0.0/20 lookup ovpnc1
10104: from all to 66.249.64.0/19 lookup ovpnc1
10105: from all to 72.14.192.0/18 lookup ovpnc1
10106: from all to 209.85.128.0/17 lookup ovpnc1
10107: from all to 216.239.32.0/19 lookup ovpnc1
10108: from 192.168.1.192/28 lookup ovpnc1
10201: from 192.168.1.1 lookup main
10301: from 192.168.1.22 lookup ovpnc2
32766: from all lookup main
32767: from all lookup default
 
Last edited:
you are right, but in that example it was client 1 which fall down, client 2 was still on and therefore no change on policy routing rules related to client 2.
 
you are right, but in that example it was client 1 which fall down, client 2 was still on and therefore no change on policy routing rules related to client 2.
OK, thanks for the clarification.

So this rules out a global wipe of the RPDB rules, but affects any VPN Client that experiences the 'SIGUSR1[soft,ping-restart]' event.

In that case I suggest you change the Logging debug level to 4

upload_2019-12-4_1-11-27.png
 
I have just played a bit with what you recommended and found an expected VPN_Failover behavior.

My set up is
- vpnclient 1 running and a 10min cron VPN_Failover task (new PID every 10 min) monitors it ([1 ignore=2,3,4,5 once multiconfig force curlrate=500000]) - no other VPN_Failover task is launched for client 1 (nothing in vpnclient1-up)
- vpnclient 2 running and a VPN_Failover task (PID 16074) is launched through the vpnclient2-up script ([2 multiconfig interval=600 delay=240 ignore=1,3,4,5 pingonly=1.1.1.1]) - no other VPN_Failover task is launched for client 2 (nothing in crontab).

At 1h50m43s I launched a manual VPN_Failover task (PID 1170) to kill client 1 and see what's happening ([1 ignore=2,3,4,5 once multiconfig force curlrate=50M]).

I have noticed then 2 unexpected behaviors :
- VPN client 2 got killed by the same VPN_Failover manual task (PID 1170) despite ignore=2,3,4,5 option is set.
- PID1170 VPN_Failover task entered in permanent loop (even after a positive reconnect, VPN_Failovers re-checked the connection after 30seconds, despite the "once" option, leading to client1 disconnects again (due to 50M unachievable threshold).
Then I manually killed PID 1170 to close the loop. Meanwhile I have checked ip rule and it was ok after client1 reconnect.
 

Attachments

I have just played a bit with what you recommended and found an expected VPN_Failover behavior.

My set up is
- vpnclient 1 running and a 10min cron VPN_Failover task (new PID every 10 min) monitors it ([1 ignore=2,3,4,5 once multiconfig force curlrate=500000]) - no other VPN_Failover task is launched for client 1 (nothing in vpnclient1-up)
- vpnclient 2 running and a VPN_Failover task (PID 16074) is launched through the vpnclient2-up script ([2 multiconfig interval=600 delay=240 ignore=1,3,4,5 pingonly=1.1.1.1]) - no other VPN_Failover task is launched for client 2 (nothing in crontab).

At 1h50m43s I launched a manual VPN_Failover task (PID 1170) to kill client 1 and see what's happening ([1 ignore=2,3,4,5 once multiconfig force curlrate=50M]).

I have noticed then 2 unexpected behaviors :
- VPN client 2 got killed by the same VPN_Failover manual task (PID 1170) despite ignore=2,3,4,5 option is set.
- PID1170 VPN_Failover task entered in permanent loop (even after a positive reconnect, VPN_Failovers re-checked the connection after 30seconds, despite the "once" option, leading to client1 disconnects again (due to 50M unachievable threshold).
Then I manually killed PID 1170 to close the loop. Meanwhile I have checked ip rule and it was ok after client1 reconnect.
Many thanks for the feedback.:)

Can't believe that the 'multiple clients will be killed' bug existed :oops::oops:, but perhaps most users only have a single VPN Client ACTIVE at any given moment.

By design, the script (even with the 'once' directive) will tenaciously keep restarting the VPN Client until it is UP and satisfies the minimum throughput criteria as set by the 'curlrate=' directive before honouring the 'once' directive if specified.

So I have pushed v1.21 VPN_Failover.sh and added the 'nocurlrestart' directive, so if 'once' is also specified, the monitoring will terminate if the switched/restarted VPN Client is UP without reapplying the absurd 'curlrate=' threshold.

NOTE: Use the 'noswitch' directive if you need to test various 'curlrate=' values to find an appropriate value.

Video v1.21 VPN_Failover test

P.S. Any luck in identifying when/how the RPDB rules mysteriously go AWOL?
 
Last edited:
I just have disconnections. but no issues with rules. Good.
The bad thing is I have now 5 VPN_Failover daemons running ! Looks like they are not killed by the command below (from the vpnclient route pre down script):
#!/bin/sh
VPN_ID=${dev:4:1}
VPNFAILOVER="/tmp/vpnclient"$VPN_ID"-VPNFailover"
# Also rely on the VPN_Failover.sh to test for the existence of the VPNFailover semaphore BEFORE it attempts a restart!
if [ -z "$(grep "NOKILL" $VPNFAILOVER)" ];then
PID=$(cat $VPNFAILOVER)
[ "$PID" != "NOKILL" ] && kill $PID
rm $VPNFAILOVER
logger -st "($(basename $0))" $$ "VPN Failover Monitor self-destruct requested....." $VPNFAILOVER "RC="$? # RC=1 means file was already deleted
fi​
 
I just have disconnections. but no issues with rules. Good.
Weird that it suddenly fixed itself? :confused: - therefore no need for VPN Failover monitoring ;)and its rogue PIDs! :oops:
The bad thing is I have now 12 VPN_Failover daemons running ! Looks like they are not killed by the vpnclient route pre down script):
Hmm strange indeed. :confused:o_O

However, as can be seen in the video, there are three instances of the 'VPN_Failover.sh' script running, one for each of the three ACTIVE VPN Client connections, and each was created by its parent 'vpnclientX-up' script.

When the 'VPN_Failover.sh' script initialises, it checks to see if there is an instance of the script already running for the nominated VPN Client.

As can be seen in the video (around the 00:50 second mark), when I attempted to manually request another monitoring instance for VPN Client 5 it was rejected.

You can then see I had to explicitly use the 'reset 5' command to delete the current VPN Client 5 monitoring instance, to allow me to create the manual monitoring instance with the impossible/unrealistic 'curlrate='.

So in theory the issue isn't the fact that 'vpnclientX-route-pre--down' failed to delete the monitoring instance, but the root cause is probably answered by "why did VPN_Failover.sh fail to regulate itself?"

Can't think why there would be any other external process that is responsible for firing off 5-12 requests for the script, unless there is an issue with the code that is supposed to limit the number of instances.

However, if you have the time/motivation you can manually prove/test to see if you can create multiple instances.

First modify 'vpnclient1-up' to allow debugging from the commandline

i.e. add the debugging line
Code:
VPN_ID=${dev:4:1}
[ -z "$VPN_ID" ] && { SCR=$(basename $0); VPN_ID=${SCR:9:1}; } # Allow manual debugging from commandline
then test i.e. make sure all of the rogue PIDs are 'killed'
Code:
./VPN_Failover.sh   reset   1

./vpnclient1-up

./vpnclient1-up
to see if multiple VPN Failover instances are created for VPN Client 1.
 
Last edited:
no problem to create 2 instances of VPN_Failover manually.
View attachment 20155
So the command window that you used to test simply allowed the two instances to be created?:confused:

Sorry, no idea then :oops: - it could be the firmware version/Router Model you have or even the combination of args passed to invoke the script.
 
will try with a dirty hack

sh /jffs/scripts/VPN_Failover.sh reset "$VPN_ID" && sleep 60 && sh /jffs/scripts/VPN_Failover.sh "$VPN_ID" "multiconfig" "interval=120" "delay=900" "ignore=2,3,4,5" "pingonly=1.1.1.1" &​
 
will try with a dirty hack
Code:
sh /jffs/scripts/VPN_Failover.sh reset "$VPN_ID" && sleep 60 && sh /jffs/scripts/VPN_Failover.sh "$VPN_ID" "multiconfig" "interval=120" "delay=900" "ignore=2,3,4,5" "pingonly=1.1.1.1" &
It isn't a hack if it works! :p

As I can't replicate your issue, I really appreciate your time to assist in investigating the issue or at least prove a "work around".
 
let me know if you want me to try anything else to understand what's going on with my setup (I have even simplified, now using only 1 client)
thanks
 
let me know if you want me to try anything else to understand what's going on with my setup (I have even simplified, now using only 1 client)
If your proposed "hack" works then let me know; but I'm sure you have other things to worry about, but I'll see if I can improve the detection code for duplicate PIDs/processes.

However, it would be useful if you could kindly state which Router/firmware are you running?
 
RT-AX88U Fw 384.13
OK thanks.

If you are motivated and could spare the time to test VPN_Failover v1.22 you will find the link in your PM.
 

Latest threads

Support SNBForums w/ Amazon

If you'd like to support SNBForums, just use this link and buy anything on Amazon. Thanks!

Sign Up For SNBForums Daily Digest

Get an update of what's new every day delivered to your mailbox. Sign up here!
Top