What's new

WANFailover Dual WAN Failover Script

  • SNBForums Code of Conduct

    SNBForums is a community for everyone, no matter what their level of experience.

    Please be tolerant and patient of others, especially newcomers. We are all here to share and learn!

    The rules are simple: Be patient, be nice, be helpful or be gone!

Thank you guys for the data, I will have a beta to test later this evening
Here the link for the beta test version, rename this to wan-failover.sh replacing your current production version of the script.
v1.4.4-Beta
- Testing issue where WAN interface(s) were stuck in Cold Standby without an issued IP Address from a DHCP Server.
- Added beta version check for Update Mode.

 
i can try this early morning.
No rush, let me know how the testing goes, this seems to be a condition that happens for double NAT'ed connections as the Secondary WAN.
 
i manually find wan1 from all output
i was hit by the quote bug also , apparently @Ranger802004 phone must change them.. no more copy and paste.. yea typing works fine. I didnt understand it. grep -e really isnt different then grep string
 
i was hit by the quote bug also , apparently @Ranger802004 phone must change them.. no more copy and paste.. yea typing works fine. I didnt understand it. grep -e really isnt different then grep string
I guess I have to stop typing commands on my phone lol
 
Here the link for the beta test version, rename this to wan-failover.sh replacing your current production version of the script.
v1.4.4-Beta
- Testing issue where WAN interface(s) were stuck in Cold Standby without an issued IP Address from a DHCP Server.
- Added beta version check for Update Mode.


All ok on two routers
Code:
Jun  1 08:32:00 wan-failover.sh: WAN Status - wan0 enabled
Jun  1 08:32:00 wan-failover.sh: WAN Status - Creating route 77.88.8.8 via 46.32.76.254 dev eth0
Jun  1 08:32:00 wan-failover.sh: WAN Status - Created route 77.88.8.8 via 46.32.76.254 dev eth0
Jun  1 08:32:04 wan-failover.sh: WAN Status - wan0 has 0% packet loss
Jun  1 08:32:04 wan-failover.sh: WAN Status - wan1 enabled
Jun  1 08:32:05 wan-failover.sh: WAN Status - Restarting WAN1: eth2
Jun  1 08:32:09 wan-failover.sh: WAN Status - Restarted WAN1: eth2
Jun  1 08:33:00 wan-failover.sh: WAN Status - wan0 enabled
Jun  1 08:33:00 wan-failover.sh: WAN Status - Route already exists for 77.88.8.8 via 46.32.76.254 dev eth0
Jun  1 08:33:04 wan-failover.sh: WAN Status - wan0 has 0% packet loss
Jun  1 08:33:04 wan-failover.sh: WAN Status - wan1 enabled
Jun  1 08:33:05 wan-failover.sh: WAN Status - Creating route 77.88.8.1 via 10.100.0.1 dev eth2
Jun  1 08:33:05 wan-failover.sh: WAN Status - Created route 77.88.8.1 via 10.100.0.1 dev eth2
Jun  1 08:33:09 wan-failover.sh: WAN Status - wan1 has 0% packet loss
Jun  1 08:33:09 wan-failover.sh: WAN0 Active - Verifying WAN0
Jun  1 08:33:09 wan-failover.sh: WAN0 Failover Monitor - Monitoring WAN0 via 77.88.8.8 for Failure


But it seems like problems started on that router where the MAIN WAN goes through double NAT
 
Last edited:
All ok on two routers
Code:
Jun  1 08:32:00 wan-failover.sh: WAN Status - wan0 enabled
Jun  1 08:32:00 wan-failover.sh: WAN Status - Creating route 77.88.8.8 via 46.32.76.254 dev eth0
Jun  1 08:32:00 wan-failover.sh: WAN Status - Created route 77.88.8.8 via 46.32.76.254 dev eth0
Jun  1 08:32:04 wan-failover.sh: WAN Status - wan0 has 0% packet loss
Jun  1 08:32:04 wan-failover.sh: WAN Status - wan1 enabled
Jun  1 08:32:05 wan-failover.sh: WAN Status - Restarting WAN1: eth2
Jun  1 08:32:09 wan-failover.sh: WAN Status - Restarted WAN1: eth2
Jun  1 08:33:00 wan-failover.sh: WAN Status - wan0 enabled
Jun  1 08:33:00 wan-failover.sh: WAN Status - Route already exists for 77.88.8.8 via 46.32.76.254 dev eth0
Jun  1 08:33:04 wan-failover.sh: WAN Status - wan0 has 0% packet loss
Jun  1 08:33:04 wan-failover.sh: WAN Status - wan1 enabled
Jun  1 08:33:05 wan-failover.sh: WAN Status - Creating route 77.88.8.1 via 10.100.0.1 dev eth2
Jun  1 08:33:05 wan-failover.sh: WAN Status - Created route 77.88.8.1 via 10.100.0.1 dev eth2
Jun  1 08:33:09 wan-failover.sh: WAN Status - wan1 has 0% packet loss
Jun  1 08:33:09 wan-failover.sh: WAN0 Active - Verifying WAN0
Jun  1 08:33:09 wan-failover.sh: WAN0 Failover Monitor - Monitoring WAN0 via 77.88.8.8 for Failure


But it seems like problems started on that router where the MAIN WAN goes through double NAT

So are you good to go? Logs suggest it is working
 
I made a mistake, after rebooting everything is OK on the third router. Later I will check how the switching works. Thanks.
No worries. Let me know if everything is working well with the beta and I'll put it in production after some good testing.
 
No worries. Let me know if everything is working well with the beta and I'll put it in production after some good testing.
It does come up hot, i have to play with the switching some more, it seems to still when goign to secondary its connected and active , but then 20-30 seconds later switches to disconnected (when it showed disconnected i could ping -I eth5 8.8.8.8 fine). Then took about 3 minutes or so to switch back to primary.
 
It does come up hot, i have to play with the switching some more, it seems to still when goign to secondary its connected and active , but then 20-30 seconds later switches to disconnected (when it showed disconnected i could ping -I eth5 8.8.8.8 fine). Then took about 3 minutes or so to switch back to primary.

Send logs please
 
Send logs please
Code:
admin@RT-AX86U-D7D0:/jffs/scripts# ./wan-failover.sh monitor
wan-failover.sh - Monitor Mode
Jun  1 09:33:42 wan-failover.sh: WAN0 Failover Monitor - Failure Detected - WAN0 Packet Loss: 100%
Jun  1 09:33:42 wan-failover.sh: WAN Switch - Switching wan1 to Primary WAN
Jun  1 09:33:42 wan-failover.sh: WAN Switch - WAN IP Address: 192.168.10.189
Jun  1 09:33:42 wan-failover.sh: WAN Switch - WAN Gateway: 192.168.10.1
Jun  1 09:33:42 wan-failover.sh: WAN Switch - WAN Interface: eth5
Jun  1 09:33:42 wan-failover.sh: WAN Switch - Setting Manual DNS Settings
Jun  1 09:33:42 wan-failover.sh: WAN Switch - DNS1 Server: 1.1.1.1
Jun  1 09:33:42 wan-failover.sh: WAN Switch - /tmp/resolv.conf already updated for wan1 DNS1 Server
Jun  1 09:33:42 wan-failover.sh: WAN Switch - DNS2 Server: 8.8.8.8
Jun  1 09:33:42 wan-failover.sh: WAN Switch - /tmp/resolv.conf already updated for wan1 DNS2 Server
Jun  1 09:33:42 wan-failover.sh: WAN Switch - Deleting default route via 100.64.0.1 dev eth0
Jun  1 09:33:42 wan-failover.sh: WAN Switch - Adding default route via 192.168.10.1 dev eth5
Jun  1 09:33:42 wan-failover.sh: WAN Switch - QoS is Enabled
Jun  1 09:33:42 wan-failover.sh: WAN Switch - Applying Manual QoS Bandwidth Settings
Jun  1 09:33:42 wan-failover.sh: WAN Switch - QoS Settings: Download Bandwidth: 0Mbps Upload Bandwidth: 0Mbps
Jun  1 09:33:43 wan-failover.sh: WAN Switch - Switched wan1 to Primary WAN
Jun  1 09:33:43 wan-failover.sh: Service Restart - Restarting qos service
Jun  1 09:33:43 wan-failover.sh: Service Restart - Restarted qos service
Jun  1 09:33:43 wan-failover.sh: Service Restart - Restarting leds service
Jun  1 09:33:51 wan-failover.sh: Service Restart - Restarted leds service
Jun  1 09:33:51 wan-failover.sh: Service Restart - Restarting dnsmasq service
Jun  1 09:33:52 wan-failover.sh: Service Restart - Restarted dnsmasq service
Jun  1 09:33:52 wan-failover.sh: Service Restart - Restarting firewall service
Jun  1 09:33:53 wan-failover.sh: Service Restart - Restarted firewall service

########################### Secondary comes up connected, then within 20seconds disconnects here, primary in hot-standby..
But you can ping via its interface below.

ASUSWRT-Merlin RT-AX86U 386.5_2 Fri Mar 25 14:23:26 UTC 2022
admin@RT-AX86U-D7D0:/tmp/home/root# ping -I eth5 8.8.4.4
PING 8.8.4.4 (8.8.4.4): 56 data bytes
64 bytes from 8.8.4.4: seq=0 ttl=116 time=23.982 ms
64 bytes from 8.8.4.4: seq=1 ttl=116 time=23.616 ms
64 bytes from 8.8.4.4: seq=2 ttl=116 time=23.871 ms
^C
--- 8.8.4.4 ping statistics ---
3 packets transmitted, 3 packets received, 0% packet loss
round-trip min/avg/max = 23.616/23.823/23.982 ms

######################### nvram show|grep wan1  while in disconnected mode
admin@RT-AX86U-D7D0:/tmp/home/root# nvram show|grep wan1
link_wan1=1
switch_wan1prio=0
switch_wan1tagid=
wan1_6rd_ip4size=
wan1_6rd_prefix=
wan1_6rd_prefixlen=
wan1_6rd_router=
wan1_auth_x=
wan1_auxstate_t=2
wan1_clientid=
wan1_clientid_type=0
wan1_desc=
wan1_dhcp_qry=1
wan1_dhcpenable_x=1
wan1_dns=192.168.10.1
wan1_dns1_x=1.1.1.1
wan1_dns2_x=8.8.8.8
wan1_dnsenable_x=1
wan1_enable=1
wan1_expires=86553
wan1_gateway=192.168.10.1
wan1_gateway_x=0.0.0.0
wan1_gw_ifname=eth5
wan1_gw_mac=04:D4:C4:B6:14:40
wan1_heartbeat_x=
wan1_hostname=
wan1_hwaddr=24:4B:FE:2F:D7:D1
wan1_hwaddr_x=
wan1_hwname=
wan1_ifname=eth5
wan1_ipaddr=192.168.10.189
wan1_ipaddr_x=0.0.0.0
wan1_is_usb_modem_ready=0
wan1_lease=86400
wan1_mroute=
wan1_mtu=1500
wan1_nat_x=1
wan1_netmask=255.255.255.0
wan1_netmask_x=0.0.0.0
wan1_phytype=
wan1_ppp_echo=1
wan1_ppp_echo_failure=10
wan1_ppp_echo_interval=6
wan1_pppoe_ac=
wan1_pppoe_auth=
wan1_pppoe_hostuniq=
wan1_pppoe_idletime=0
wan1_pppoe_ifname=
wan1_pppoe_mru=1492
wan1_pppoe_mtu=1492
wan1_pppoe_options_x=
wan1_pppoe_passwd=
wan1_pppoe_relay=0
wan1_pppoe_service=
wan1_pppoe_username=
wan1_pptp_options_x=
wan1_primary=1
wan1_proto=dhcp
wan1_proto_t=dhcp
wan1_realip_ip=
wan1_realip_state=0
wan1_route=
wan1_s46_ealen_x=0
wan1_s46_offset_x=6
wan1_s46_peer_x=
wan1_s46_prefix4_x=
wan1_s46_prefix4len_x=0
wan1_s46_prefix6_x=
wan1_s46_prefix6len_x=0
wan1_s46_psid_x=0
wan1_s46_psidlen_x=0
wan1_sbstate_t=0
wan1_state_t=2
wan1_unit=1
wan1_upnp_enable=1
wan1_vendorid=
wan1_vpndhcp=1
wan1_wins=
wan1_xgateway=192.168.10.1
wan1_xipaddr=0.0.0.0
wan1_xnetmask=0.0.0.0
size: 75477 bytes (55595 left)

################## route while disconnected
(It was almost like the route command was hung, took about 30 seconds to print output)
admin@RT-AX86U-D7D0:/tmp/home/root# route
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
default         192.168.10.1    0.0.0.0         UG    0      0        0 eth5
1.1.1.1         100.64.0.1      255.255.255.255 UGH   1      0        0 eth0
8.8.4.4         192.168.10.1    255.255.255.255 UGH   0      0        0 eth5
8.8.8.8         100.64.0.1      255.255.255.255 UGH   1      0        0 eth0
34.120.255.244  *               255.255.255.255 UH    0      0        0 eth0
100.64.0.0      *               255.192.0.0     U     0      0        0 eth0
100.64.0.1      *               255.255.255.255 UH    0      0        0 eth0
127.0.0.0       *               255.0.0.0       U     0      0        0 lo
192.168.10.0    *               255.255.255.0   U     0      0        0 eth5
192.168.50.0    *               255.255.255.0   U     0      0        0 br0
192.168.100.1   *               255.255.255.255 UH    0      0        0 eth0
239.0.0.0       *               255.0.0.0       U     0      0        0 br0
 
Code:
admin@RT-AX86U-D7D0:/jffs/scripts# ./wan-failover.sh monitor
wan-failover.sh - Monitor Mode
Jun  1 09:33:42 wan-failover.sh: WAN0 Failover Monitor - Failure Detected - WAN0 Packet Loss: 100%
Jun  1 09:33:42 wan-failover.sh: WAN Switch - Switching wan1 to Primary WAN
Jun  1 09:33:42 wan-failover.sh: WAN Switch - WAN IP Address: 192.168.10.189
Jun  1 09:33:42 wan-failover.sh: WAN Switch - WAN Gateway: 192.168.10.1
Jun  1 09:33:42 wan-failover.sh: WAN Switch - WAN Interface: eth5
Jun  1 09:33:42 wan-failover.sh: WAN Switch - Setting Manual DNS Settings
Jun  1 09:33:42 wan-failover.sh: WAN Switch - DNS1 Server: 1.1.1.1
Jun  1 09:33:42 wan-failover.sh: WAN Switch - /tmp/resolv.conf already updated for wan1 DNS1 Server
Jun  1 09:33:42 wan-failover.sh: WAN Switch - DNS2 Server: 8.8.8.8
Jun  1 09:33:42 wan-failover.sh: WAN Switch - /tmp/resolv.conf already updated for wan1 DNS2 Server
Jun  1 09:33:42 wan-failover.sh: WAN Switch - Deleting default route via 100.64.0.1 dev eth0
Jun  1 09:33:42 wan-failover.sh: WAN Switch - Adding default route via 192.168.10.1 dev eth5
Jun  1 09:33:42 wan-failover.sh: WAN Switch - QoS is Enabled
Jun  1 09:33:42 wan-failover.sh: WAN Switch - Applying Manual QoS Bandwidth Settings
Jun  1 09:33:42 wan-failover.sh: WAN Switch - QoS Settings: Download Bandwidth: 0Mbps Upload Bandwidth: 0Mbps
Jun  1 09:33:43 wan-failover.sh: WAN Switch - Switched wan1 to Primary WAN
Jun  1 09:33:43 wan-failover.sh: Service Restart - Restarting qos service
Jun  1 09:33:43 wan-failover.sh: Service Restart - Restarted qos service
Jun  1 09:33:43 wan-failover.sh: Service Restart - Restarting leds service
Jun  1 09:33:51 wan-failover.sh: Service Restart - Restarted leds service
Jun  1 09:33:51 wan-failover.sh: Service Restart - Restarting dnsmasq service
Jun  1 09:33:52 wan-failover.sh: Service Restart - Restarted dnsmasq service
Jun  1 09:33:52 wan-failover.sh: Service Restart - Restarting firewall service
Jun  1 09:33:53 wan-failover.sh: Service Restart - Restarted firewall service

########################### Secondary comes up connected, then within 20seconds disconnects here, primary in hot-standby..
But you can ping via its interface below.

ASUSWRT-Merlin RT-AX86U 386.5_2 Fri Mar 25 14:23:26 UTC 2022
admin@RT-AX86U-D7D0:/tmp/home/root# ping -I eth5 8.8.4.4
PING 8.8.4.4 (8.8.4.4): 56 data bytes
64 bytes from 8.8.4.4: seq=0 ttl=116 time=23.982 ms
64 bytes from 8.8.4.4: seq=1 ttl=116 time=23.616 ms
64 bytes from 8.8.4.4: seq=2 ttl=116 time=23.871 ms
^C
--- 8.8.4.4 ping statistics ---
3 packets transmitted, 3 packets received, 0% packet loss
round-trip min/avg/max = 23.616/23.823/23.982 ms

######################### nvram show|grep wan1  while in disconnected mode
admin@RT-AX86U-D7D0:/tmp/home/root# nvram show|grep wan1
link_wan1=1
switch_wan1prio=0
switch_wan1tagid=
wan1_6rd_ip4size=
wan1_6rd_prefix=
wan1_6rd_prefixlen=
wan1_6rd_router=
wan1_auth_x=
wan1_auxstate_t=2
wan1_clientid=
wan1_clientid_type=0
wan1_desc=
wan1_dhcp_qry=1
wan1_dhcpenable_x=1
wan1_dns=192.168.10.1
wan1_dns1_x=1.1.1.1
wan1_dns2_x=8.8.8.8
wan1_dnsenable_x=1
wan1_enable=1
wan1_expires=86553
wan1_gateway=192.168.10.1
wan1_gateway_x=0.0.0.0
wan1_gw_ifname=eth5
wan1_gw_mac=04:D4:C4:B6:14:40
wan1_heartbeat_x=
wan1_hostname=
wan1_hwaddr=24:4B:FE:2F:D7:D1
wan1_hwaddr_x=
wan1_hwname=
wan1_ifname=eth5
wan1_ipaddr=192.168.10.189
wan1_ipaddr_x=0.0.0.0
wan1_is_usb_modem_ready=0
wan1_lease=86400
wan1_mroute=
wan1_mtu=1500
wan1_nat_x=1
wan1_netmask=255.255.255.0
wan1_netmask_x=0.0.0.0
wan1_phytype=
wan1_ppp_echo=1
wan1_ppp_echo_failure=10
wan1_ppp_echo_interval=6
wan1_pppoe_ac=
wan1_pppoe_auth=
wan1_pppoe_hostuniq=
wan1_pppoe_idletime=0
wan1_pppoe_ifname=
wan1_pppoe_mru=1492
wan1_pppoe_mtu=1492
wan1_pppoe_options_x=
wan1_pppoe_passwd=
wan1_pppoe_relay=0
wan1_pppoe_service=
wan1_pppoe_username=
wan1_pptp_options_x=
wan1_primary=1
wan1_proto=dhcp
wan1_proto_t=dhcp
wan1_realip_ip=
wan1_realip_state=0
wan1_route=
wan1_s46_ealen_x=0
wan1_s46_offset_x=6
wan1_s46_peer_x=
wan1_s46_prefix4_x=
wan1_s46_prefix4len_x=0
wan1_s46_prefix6_x=
wan1_s46_prefix6len_x=0
wan1_s46_psid_x=0
wan1_s46_psidlen_x=0
wan1_sbstate_t=0
wan1_state_t=2
wan1_unit=1
wan1_upnp_enable=1
wan1_vendorid=
wan1_vpndhcp=1
wan1_wins=
wan1_xgateway=192.168.10.1
wan1_xipaddr=0.0.0.0
wan1_xnetmask=0.0.0.0
size: 75477 bytes (55595 left)

################## route while disconnected
(It was almost like the route command was hung, took about 30 seconds to print output)
admin@RT-AX86U-D7D0:/tmp/home/root# route
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
default         192.168.10.1    0.0.0.0         UG    0      0        0 eth5
1.1.1.1         100.64.0.1      255.255.255.255 UGH   1      0        0 eth0
8.8.4.4         192.168.10.1    255.255.255.255 UGH   0      0        0 eth5
8.8.8.8         100.64.0.1      255.255.255.255 UGH   1      0        0 eth0
34.120.255.244  *               255.255.255.255 UH    0      0        0 eth0
100.64.0.0      *               255.192.0.0     U     0      0        0 eth0
100.64.0.1      *               255.255.255.255 UH    0      0        0 eth0
127.0.0.0       *               255.0.0.0       U     0      0        0 lo
192.168.10.0    *               255.255.255.0   U     0      0        0 eth5
192.168.50.0    *               255.255.255.0   U     0      0        0 br0
192.168.100.1   *               255.255.255.255 UH    0      0        0 eth0
239.0.0.0       *               255.0.0.0       U     0      0        0 br0

Do you have logs for Failback monitoring? I’m a little confused by it stating WAN1 goes disconnected?
 
Isnt the top of that the failback monitoring? The gui shows secondary disconnected, and the internet isnt usable, besides doing a direct ping -I eth5 (secondary)
 
Isnt the top of that the failback monitoring? The gui shows secondary disconnected, and the internet isnt usable, besides doing a direct ping -I eth5 (secondary)

It’s Failover, Failback is when it is monitoring for restore back to WAN0.
 
It’s Failover, Failback is when it is monitoring for restore back to WAN0.
ah, it just basically hangs at the last thing in the log shown above... primary stays in hotstand by when it comes back up
 
ah, it just basically hangs at the last thing in the log shown above... primary stays in hotstand by when it comes back up
Ping 8.8.4.4 for eth5, that is the target you set up for WAN1 right? Not 8.8.8.8
 
Ping 8.8.4.4 for eth5, that is the target you set up for WAN1 right? Not 8.8.8.8
i think its on 8.8.8.8 now, whatever it was on i could ping, but not anything else
 
i think its on 8.8.8.8 now, whatever it was on i could ping, but not anything else
It’s getting stuck after service restarts where it does wan event and send email. Do you have Alerts configured and did you delete your wan-event script?
 
It’s getting stuck after service restarts where it does wan event and send email. Do you have Alerts configured and did you delete your wan-event script?
This is wan-event. I did setup alerts. Ill try disabling those.
#!/bin/bash

sh /jffs/scripts/wan-failover.sh cron # Wan-Failover

EDIT: I was just messing with this again, did a stupid user error made the wantargets the same, thinking its just pinging to verify connection. But it makes the static route.. duh. Took me a few to figure out what the hell i did. But after some thought I think i need to figure out a different way of testing.
When starlink boots up, and gives the router a ip, then after it connects to a satellite dhcp sends out another ip... How ive been testing is rebooting the starlink router, that might be screwing this up. I mentioned last saturday it had switched many times fine during some storms.. That was just lack
of signal/internet, not rebooting the router.

jsut did ip link set dev eth0 down... wait until it switches.. ip link set dev eth0 up.. Switches back.....

Additional edit: so after doing this many times, it worked flawless. So that screwup in my testing was probably all of that issue. But the script does fix the hot-standby.
So on my end, Just doing a warm reboot of starlink takes about 1-2mins, but a cold reboot like a power outage can take up to 15mins.. So I need to make wan-failover.sh wait awhile on router boot to start.
 
Last edited:

Support SNBForums w/ Amazon

If you'd like to support SNBForums, just use this link and buy anything on Amazon. Thanks!

Sign Up For SNBForums Daily Digest

Get an update of what's new every day delivered to your mailbox. Sign up here!
Top