Provider disconnects / VPN does not reconnect

Martineau · Dec 3, 2019

Olivier L said:
It is not specific to client1.

Hmmmmm, so post #13

Code:

0: from all lookup local
9990: from all fwmark 0x8000/0x8000 lookup main
9991: from all fwmark 0x7000/0x7000 lookup ovpnc4
9992: from all fwmark 0x3000/0x3000 lookup ovpnc5
9993: from all fwmark 0x1000/0x1000 lookup ovpnc1
9994: from all fwmark 0x2000/0x2000 lookup ovpnc2
9995: from all fwmark 0x4000/0x4000 lookup ovpnc3
10201: from 192.168.1.1 lookup main
10301: from 192.168.1.22 lookup ovpnc2
32766: from all lookup main
32767: from all lookup default

shows rule 10301 for VPN Client 2 is still there compared to the missing VPN Client 1 rules 10001-10108 in post #15

Code:

0: from all lookup local
9990: from all fwmark 0x8000/0x8000 lookup main
9991: from all fwmark 0x7000/0x7000 lookup ovpnc4
9992: from all fwmark 0x3000/0x3000 lookup ovpnc5
9993: from all fwmark 0x1000/0x1000 lookup ovpnc1
9994: from all fwmark 0x2000/0x2000 lookup ovpnc2
9995: from all fwmark 0x4000/0x4000 lookup ovpnc3
10001: from 192.168.1.1 lookup main
10002: from 192.168.1.192/28 to 192.168.0.254 lookup main
10003: from 192.168.1.192/28 to 185.246.211.0/24 lookup main
10004: from 192.168.1.192/28 to 193.200.164.0/24 lookup main
10005: from 192.168.1.192/28 to 212.8.242.0/23 lookup main
10006: from 192.168.1.192/28 to 185.59.222.0/24 lookup main
10007: from 192.168.1.192/28 to 84.17.60.0/23 lookup main
10008: from 192.168.1.192/28 to 185.132.176.0/22 lookup main
10009: from 192.168.1.192/28 to 51.77.0.0/16 lookup main
10010: from 192.168.1.192/28 to 145.239.0.0/16 lookup main
10011: from 192.168.1.192/28 to 54.38.0.0/16 lookup main
10012: from 192.168.1.192/28 to 185.172.88.0/22 lookup main
10101: from all to 74.125.0.0/16 lookup ovpnc1
10102: from all to 64.233.160.0/19 lookup ovpnc1
10103: from all to 66.102.0.0/20 lookup ovpnc1
10104: from all to 66.249.64.0/19 lookup ovpnc1
10105: from all to 72.14.192.0/18 lookup ovpnc1
10106: from all to 209.85.128.0/17 lookup ovpnc1
10107: from all to 216.239.32.0/19 lookup ovpnc1
10108: from 192.168.1.192/28 lookup ovpnc1
10201: from 192.168.1.1 lookup main
10301: from 192.168.1.22 lookup ovpnc2
32766: from all lookup main
32767: from all lookup default

Olivier L · Dec 3, 2019

you are right, but in that example it was client 1 which fall down, client 2 was still on and therefore no change on policy routing rules related to client 2.

Martineau · Dec 3, 2019

Olivier L said:
you are right, but in that example it was client 1 which fall down, client 2 was still on and therefore no change on policy routing rules related to client 2.

OK, thanks for the clarification.

So this rules out a global wipe of the RPDB rules, but affects any VPN Client that experiences the 'SIGUSR1[soft,ping-restart]' event.

In that case I suggest you change the Logging debug level to 4

Olivier L · Dec 3, 2019

I have just played a bit with what you recommended and found an expected VPN_Failover behavior.

My set up is
- vpnclient 1 running and a 10min cron VPN_Failover task (new PID every 10 min) monitors it ([1 ignore=2,3,4,5 once multiconfig force curlrate=500000]) - no other VPN_Failover task is launched for client 1 (nothing in vpnclient1-up)
- vpnclient 2 running and a VPN_Failover task (PID 16074) is launched through the vpnclient2-up script ([2 multiconfig interval=600 delay=240 ignore=1,3,4,5 pingonly=1.1.1.1]) - no other VPN_Failover task is launched for client 2 (nothing in crontab).

At 1h50m43s I launched a manual VPN_Failover task (PID 1170) to kill client 1 and see what's happening ([1 ignore=2,3,4,5 once multiconfig force curlrate=50M]).

I have noticed then 2 unexpected behaviors :
- VPN client 2 got killed by the same VPN_Failover manual task (PID 1170) despite ignore=2,3,4,5 option is set.
- PID1170 VPN_Failover task entered in permanent loop (even after a positive reconnect, VPN_Failovers re-checked the connection after 30seconds, despite the "once" option, leading to client1 disconnects again (due to 50M unachievable threshold).
Then I manually killed PID 1170 to close the loop. Meanwhile I have checked ip rule and it was ok after client1 reconnect.

Martineau · Dec 5, 2019

Olivier L said:
I have just played a bit with what you recommended and found an expected VPN_Failover behavior.

My set up is
- vpnclient 1 running and a 10min cron VPN_Failover task (new PID every 10 min) monitors it ([1 ignore=2,3,4,5 once multiconfig force curlrate=500000]) - no other VPN_Failover task is launched for client 1 (nothing in vpnclient1-up)
- vpnclient 2 running and a VPN_Failover task (PID 16074) is launched through the vpnclient2-up script ([2 multiconfig interval=600 delay=240 ignore=1,3,4,5 pingonly=1.1.1.1]) - no other VPN_Failover task is launched for client 2 (nothing in crontab).

At 1h50m43s I launched a manual VPN_Failover task (PID 1170) to kill client 1 and see what's happening ([1 ignore=2,3,4,5 once multiconfig force curlrate=50M]).

I have noticed then 2 unexpected behaviors :
- VPN client 2 got killed by the same VPN_Failover manual task (PID 1170) despite ignore=2,3,4,5 option is set.
- PID1170 VPN_Failover task entered in permanent loop (even after a positive reconnect, VPN_Failovers re-checked the connection after 30seconds, despite the "once" option, leading to client1 disconnects again (due to 50M unachievable threshold).
Then I manually killed PID 1170 to close the loop. Meanwhile I have checked ip rule and it was ok after client1 reconnect.

Many thanks for the feedback.

Can't believe that the 'multiple clients will be killed' bug existed

, but perhaps most users only have a single VPN Client ACTIVE at any given moment.

By design, the script (even with the 'once' directive) will tenaciously keep restarting the VPN Client until it is UP and satisfies the minimum throughput criteria as set by the 'curlrate=' directive before honouring the 'once' directive if specified.

So I have pushed v1.21 VPN_Failover.sh and added the 'nocurlrestart' directive, so if 'once' is also specified, the monitoring will terminate if the switched/restarted VPN Client is UP without reapplying the absurd 'curlrate=' threshold.

NOTE: Use the 'noswitch' directive if you need to test various 'curlrate=' values to find an appropriate value.

Video v1.21 VPN_Failover test

P.S. Any luck in identifying when/how the RPDB rules mysteriously go AWOL?

Olivier L · Dec 5, 2019

thanks, will try.
so far, no new disconnect...

Olivier L · Dec 5, 2019

I just have disconnections. but no issues with rules. Good.
The bad thing is I have now 5 VPN_Failover daemons running ! Looks like they are not killed by the command below (from the vpnclient route pre down script):

#!/bin/sh
VPN_ID=${dev:4:1}
VPNFAILOVER="/tmp/vpnclient"$VPN_ID"-VPNFailover"
# Also rely on the VPN_Failover.sh to test for the existence of the VPNFailover semaphore BEFORE it attempts a restart!
if [ -z "$(grep "NOKILL" $VPNFAILOVER)" ];then
PID=$(cat $VPNFAILOVER)
[ "$PID" != "NOKILL" ] && kill $PID
rm $VPNFAILOVER
logger -st "($(basename $0))" $$ "VPN Failover Monitor self-destruct requested....." $VPNFAILOVER "RC="$? # RC=1 means file was already deleted
fi

Olivier L · Dec 5, 2019

it is now 12 VPN_Failover scripts running.......

Martineau · Dec 6, 2019

Olivier L said:
I just have disconnections. but no issues with rules. Good.

Weird that it suddenly fixed itself?

- therefore no need for VPN Failover monitoring

and its rogue PIDs!

Olivier L said:
The bad thing is I have now 12 VPN_Failover daemons running ! Looks like they are not killed by the vpnclient route pre down script):

Hmm strange indeed.

However, as can be seen in the video, there are three instances of the 'VPN_Failover.sh' script running, one for each of the three ACTIVE VPN Client connections, and each was created by its parent 'vpnclientX-up' script.

When the 'VPN_Failover.sh' script initialises, it checks to see if there is an instance of the script already running for the nominated VPN Client.

As can be seen in the video (around the 00:50 second mark), when I attempted to manually request another monitoring instance for VPN Client 5 it was rejected.

You can then see I had to explicitly use the 'reset 5' command to delete the current VPN Client 5 monitoring instance, to allow me to create the manual monitoring instance with the impossible/unrealistic 'curlrate='.

So in theory the issue isn't the fact that 'vpnclientX-route-pre--down' failed to delete the monitoring instance, but the root cause is probably answered by "why did VPN_Failover.sh fail to regulate itself?"

Can't think why there would be any other external process that is responsible for firing off 5-12 requests for the script, unless there is an issue with the code that is supposed to limit the number of instances.

However, if you have the time/motivation you can manually prove/test to see if you can create multiple instances.

First modify 'vpnclient1-up' to allow debugging from the commandline

i.e. add the debugging line

Code:

VPN_ID=${dev:4:1}
[ -z "$VPN_ID" ] && { SCR=$(basename $0); VPN_ID=${SCR:9:1}; } # Allow manual debugging from commandline

then test i.e. make sure all of the rogue PIDs are 'killed'

Code:

./VPN_Failover.sh   reset   1

./vpnclient1-up

./vpnclient1-up

to see if multiple VPN Failover instances are created for VPN Client 1.

Olivier L · Dec 6, 2019

which vidéo are you referiing to ?

Martineau · Dec 6, 2019

Olivier L said:
which vidéo are you referiing to ?

That would be the link here post #25

Olivier L · Dec 6, 2019

no problem to create 2 instances of VPN_Failover manually.

Martineau · Dec 6, 2019

Olivier L said:
no problem to create 2 instances of VPN_Failover manually.
View attachment 20155

So the command window that you used to test simply allowed the two instances to be created?

Sorry, no idea then

- it could be the firmware version/Router Model you have or even the combination of args passed to invoke the script.

Olivier L · Dec 6, 2019

yes I wrote ./vpnclient1-up two times.

Olivier L · Dec 6, 2019

will try with a dirty hack

sh /jffs/scripts/VPN_Failover.sh reset "$VPN_ID" && sleep 60 && sh /jffs/scripts/VPN_Failover.sh "$VPN_ID" "multiconfig" "interval=120" "delay=900" "ignore=2,3,4,5" "pingonly=1.1.1.1" &

Martineau · Dec 6, 2019

Olivier L said:

will try with a dirty hack

Code:

sh /jffs/scripts/VPN_Failover.sh reset "$VPN_ID" && sleep 60 && sh /jffs/scripts/VPN_Failover.sh "$VPN_ID" "multiconfig" "interval=120" "delay=900" "ignore=2,3,4,5" "pingonly=1.1.1.1" &

It isn't a hack if it works!

As I can't replicate your issue, I really appreciate your time to assist in investigating the issue or at least prove a "work around".

Olivier L · Dec 6, 2019

let me know if you want me to try anything else to understand what's going on with my setup (I have even simplified, now using only 1 client)
thanks

Martineau · Dec 6, 2019

Olivier L said:
let me know if you want me to try anything else to understand what's going on with my setup (I have even simplified, now using only 1 client)

If your proposed "hack" works then let me know; but I'm sure you have other things to worry about, but I'll see if I can improve the detection code for duplicate PIDs/processes.

However, it would be useful if you could kindly state which Router/firmware are you running?

Olivier L · Dec 6, 2019

RT-AX88U Fw 384.13

Martineau · Dec 6, 2019

Olivier L said:
RT-AX88U Fw 384.13

OK thanks.

If you are motivated and could spare the time to test VPN_Failover v1.22 you will find the link in your PM.

Thread starter	Title	Forum	Replies	Date
S	No reconnect after forced reconnect by provider	Asuswrt-Merlin	2	Jan 16, 2025
L	is enabling firewall necessary (same for upnp) when on a provider with cgnat?	Asuswrt-Merlin	5	Jul 3, 2024
B	Solved What VPN Director rule should i use to be able to connect outside my home to my VPN provider?	Asuswrt-Merlin	2	Jun 14, 2024
	Is there any VPN provider with easy setup for port forwarding on Merlin?	Asuswrt-Merlin	5	Jun 8, 2024
L	Custom DDNS vs Drop Down List for Same Provider	Asuswrt-Merlin	3	Mar 19, 2024
K	Using Asus XT8 with DS-Lite internet provider	Asuswrt-Merlin	0	Mar 12, 2024
	VPN does not auto connect after primary WAN disconnects and Secondary WAN becomes active. When primary WAN reconnects again then VPN auto reconnects	Asuswrt-Merlin	6	Nov 25, 2024
P	Asus aimesh node disconnects frequently, corresponding log entries recorded. Need help understanding the log.	Asuswrt-Merlin	5	Aug 22, 2024
W	Wired AIMesh causing devices to lose connection, 100% cpu usage when PC disconnects in 388.7+	Asuswrt-Merlin	2	Jul 27, 2024
	The Asus + Merlin + NordVPN router never disconnects from the internet when NordVPN stops working for a moment.	Asuswrt-Merlin	4	Mar 10, 2024

Provider disconnects / VPN does not reconnect

Part of the Furniture

Regular Contributor

Part of the Furniture

Regular Contributor

Attachments

Part of the Furniture

Regular Contributor

Regular Contributor

Regular Contributor

Part of the Furniture

Regular Contributor

Part of the Furniture

Regular Contributor

Part of the Furniture

Regular Contributor

Regular Contributor

Part of the Furniture

Regular Contributor

Part of the Furniture

Regular Contributor

Part of the Furniture

Similar threads

Similar threads

Support SNBForums w/ Amazon

Sign Up For SNBForums Daily Digest