What's new

DUAL WAN: watchdog failback not working as it should.

  • SNBForums Code of Conduct

    SNBForums is a community for everyone, no matter what their level of experience.

    Please be tolerant and patient of others, especially newcomers. We are all here to share and learn!

    The rules are simple: Be patient, be nice, be helpful or be gone!

stylish_me

Regular Contributor
Hi all.
I've got dual wan configuration: isp cable as primary and usb modem as secondary.
Failover works ok: as soon as primary wan is down for 60 seconds - it turns to secondary (usb modem).
But then strange things happen. Watchdog is enabled and it's trying to ping 8.8.8.8 within 1 minute. As soon as ping is stable within 1 minute - is should turn back to primary wan. Right now ISP line is down (they have some problems, 8.8.8.8 is not pingable via primary wan at all). But watchdog still switches back to primary after 1 minute. If i set 5 minutes watchdog ping - it switches to primary within 5 minutes. But primary wan is still down!
So it means that watchdog is switching back to primary wan within failback execution time even if primary wan is down and ping is not possible. How can it be? Anyone having same problems? Right now i have internet (via usb modem) for 1 minute with 1 minute break and then 1 minute internet again and so on.
 
I have also seen this - I think the issue is with half-bridge modems, when they don't have a true wan IP they give the router a local IP via dhcp with a short lease time, could be 30 seconds. I have a feeling the dhcp renew on the failed connection confuses the router - does this trigger the wanup script?

Another gripe I have is that you cannot easily access the modem gui on the failed connection. I do not know how they ping out of it - the route table looks wrong ("route -n" from command line). You should be able to define a host route to the modem on the correct interface, then add an iptables masquerade command in same manner as folk do with pppoe modems. To ping ISP gateways or dns servers I also expect to see route entries, are they there in other modes?

NB you should moan to Asus, Asuswrt-merlin uses their code, it is appropriate to discuss here because we could have workarounds using the great flexible script functionality merlin provides.

One other thing I have noticed is that if you have scripts that use nvram vars wan_gateway or dns etc that work fine with single WAN you have to rewrite to refer to wan0_ or wan1_, I have noticed that original wan_ vars are not kept up-to date with the current active wan.

I think the responsible code is opensource - take a look at wanduck.c
I wonder why some code to ping ISP specific dns servers is commented out here?
https://github.com/RMerl/asuswrt-merlin/blob/master/release/src/router/rc/wanduck.c#L421
 
Last edited:
@mstombs - I have to agree with you. It appears to me that it does not ping the DNS IP address before failback. It only appears to look for a viable ethernet connection to the ISP modem, which means that as long as the modem/VDSL gateway is connected via ethernet to the ASUS router that it will try to "fail-back" after the specified time, ignoring the ping to IP address criteria. - Here I am talking about the actual behaviour of the router not the code you lised

NOTE: I have not had any time to pore over the code you reference, but the #if 0 -- #else pre-compiler directive does not mean that there is not other code that does execute for do_ping_detect(int wan_unit). I know that it is sometimes used to turn off and on various blocks of test code.
 
Last edited:
The code is complicated because of all the different connection possibilities, usb, wifi, pppoe etc. I am not too familiar with the asuswrt code, so do not know for sure that is the current code, nor which compile options are set - but I read the commented out bit a hint that someone considered a ping test to isp specific address, BUT opted for an alternative where the it just checks the interface is up - which might well work for pppoe I guess.
I wonder if it is significant that the Asus Android app 'network diagnostics' also doesn't work on my main connection - it fails detecting the modem, so maybe Asus engineers don't have normally test on my canle modem type! Of course would need to flash back to stock Asus code to test and report fault...
 
I hope you are not right, because that would assume that the connection is PPPoE. It clearly does not handle fail-back correctly in the general case. I can get fail-over to work, but fail-back is well . . . a FAIL!!

The correct behavior, if the ping watchdog is set up, is to test the primary periodically to see if it can ping the designated site or IP address. It should not attempt to switch back even if there is a good ethernet connection to the primary modem unless it can ping the provided site/address.

I have turned off fail-back for now and periodically simply check to see if a fail-over occurred. If I can correctly see a DHCP address I try a reboot. Unfortunately, I have not found a way to switch back through the interface.

I also does not appear to properly respect the aggressive vs normal DHCP setting during boot, only after, so that my secondary, which is Charter, always comes up to an X and not a dashed line showing that it is in standby. I have to disable WAN (secondary), then apply, then enable WAN, and apply to get it to connect and show that the secondary is on standby.
 
I think there is some issue with the watchdog working properly. I am running RT-AC68U FW 380.61. My primary WAN is 4G connection via bridged Dovado, AC68U gets DHCP WAN IP lease time from my ISP of 300 seconds. The failover WAN2 connection is Huawei LTE router.

I am able drop primary WAN with simple speedtest when watchdog is enabled to ping 8.8.8.8 every 5 seconds 12 times before it switches to WAN2. However it seems that as soon as the first ping is failed, Asus already drops the primary WAN and switch to WAN2 just to come back to primary WAN few seconds later. So it looks like watchdog rules are ignored for failover execution time. It's hard to believe that running speedtest would make a ping fail 12 times continuously, especially that speed tests are shorter than 60 seconds.

As soon as I disable watchdog (ping), I am unable to force drop primary WAN by running speed tests.
 
yesterday I found another BUG with Asus DUAL WAN implementation

my primary WAN connection is COAX-cable connection from Vodafone (Vodafone cable modem configured in BRIDGE MODE so it feeds Public IP to ASUS Primary WAN interface - Automatic IP)

secondary WAN connection is Android phone in USB tethering mode

DUAL WAN in Fail-over setup (with fail-back enabled)

due to ISP issues cable modem was not giving any IP to ASUS primary WAN (0.0.0.0)

Asus AC68U with latest MerlinFW (380.60_2) went on jumping between Primary WAN and Secondary WAN
because Primary WAN was "connected" even though IP on Primary WAN interface is 0.0.0.0

fail-over was transferring connection from Primary WAN to Secondary WAN
fail-back was transferring connection from Secondary WAN back to Primary WAN (connected - 0.0.0.0) and
it went into loop, so connection was breaking every 30-60 seconds :(

I did a few restarts, but it didn't help

this is very easy to replicate: connect switch or computer (no dhcp) to Primary WAN (Automatic IP) and let ASUS DualWAN "do the magic" :(

Screenshots from ASUS AC68U

dual wan fail-over went to secondary WAN, should have stayed there until Primary WAN comes properly back
y4mMGe06L9Hi9RpgUC4XF2YcXOAjXL6noXQQSfDjCoJ6-Zvd7qtAmBQi0KG1yTJBht0Hbn1xgAwNu1PoxDPpD_BAGQkY6hyCwWBmuAc4UaewXfxxLmql449dfKmq0FltFf3QWGMKlsRCF6_syyuC6Z08kGjFp3tIurw-VdIp0rWZs0uSv5v0sjHcGYUboPRMoXfPJrbr48fNWmKw3-_Ic9_LQ

on the computer, connection to Internet is interrupted every 30-60 seconds
y4mpX2oyGDJNzNfB3r8SCQK-B4b7SU7THDip5DpujFjB3Dm7_9K_fhHVOIkyTSY_usJXYZ0Dms3dHUJjsLWJa1FkVwjreK9I2C_LzO5LjQo0G7TAVJlIxUhZL32s-hJIDITU6jbi4qhDUUaIxIHgIkMjlKNEcLOtLtCeOzjX9d_jxngP2B_5RaniAEMW_ikDtRW-O9iDLNE3dVKGLLbwUPMRA

daul-wan config
y4mSD-b0LqL289ZSGnkLVBfYxF6iuEYzqPyPrYUXVa836m2a17fQW-XkHU5s378sg6heSNeczGAAtPBXjYm5jL7vQJNORL-aH794hiql4tr3tLkPldwOQ06WmdzTlZzg9DuCcv0tP0NS--naccMoXEpURYKaGU1wN5od9Jo0S0lwnNEsuMtc6X0yVzMTJO5xpMnpOePRmBWAIysKdAQ9MSu9g


going back to primary WAN even though IP is 0.0.0.0


meanwhile competition has this working: https://www.synology.com/en-global/knowledgebase/SRM/help/SRM/RouterApp/internet_smartwan#t1a

fail-over option checking multiple destinations (not only destination one like on ASUS watchdog)
y4mJ8wwQ-ZrzgBcrXdJnhr17S_9D5QziEv7_1GNsr4OwoQO0bblOnsv_z5AaslRmO99bD-kbfpeYi11faXdAKlPuTqjM5ASEJosSniLMb7R9jOuqccxO9hQPwVVbkUa02z9i_wFSGqBW0MxDHnfHENip4vJP9L6q36AgQAn1tZbzW3MABY2zO1Vy0-7oL2lmU35dJDcDQxDW2fZ9FQvWP3-zw


policy route (for dual-wan load-balancing mode) - Asus doesn't have this
y4m0hkzhr2tHn2x-MinyQp4mA-IOnb53t32rxx01SLPNLB92NDLOiFt8FJD97JA1QYkbaETNise1KRjvABEdGNTKR5HDaN1vUsmtQPgWbPaLSkT35ED_WkuVBGN6Dsx64x3Thk6waD9y66Gjxxo-CKgm2Qwh8Ykx0q-P8UBtJLJy98lkQqJSY3lGZXhj6BDffTJVJvSP3ZJnKSzzywSpZGbbA
 
Last edited:
I can confirm that even with the latest Merlin firmware, the RT-AC68 does not check the ping before failback. It will use the ping check for failover, but only looks for physical link status before failback. I dealt with this by making a very sensitive failover, 2 missed pings over 12 seconds, but a long failback setting that comes out to like 10 minutes. So the upshot is that when it fails over, it waits 10 minutes, tries the primary again, and if it doesn't get a good ping in 12 seconds, it will failover to the backup WAN again. So when the primary WAN is down, I get one 12-15 second connection interruption every 10 minutes until the primary is back. That seems to work well enough for me.
 
yesterday I found another BUG with Asus DUAL WAN implementation

my primary WAN connection is COAX-cable connection from Vodafone (Vodafone cable modem configured in BRIDGE MODE so it feeds Public IP to ASUS Primary WAN interface - Automatic IP)

secondary WAN connection is Android phone in USB tethering mode

DUAL WAN in Fail-over setup (with fail-back enabled)

due to ISP issues cable modem was not giving any IP to ASUS primary WAN (0.0.0.0)

Asus AC68U with latest MerlinFW (380.60_2) went on jumping between Primary WAN and Secondary WAN
because Primary WAN was "connected" even though IP on Primary WAN interface is 0.0.0.0

fail-over was transferring connection from Primary WAN to Secondary WAN
fail-back was transferring connection from Secondary WAN back to Primary WAN (connected - 0.0.0.0) and
it went into loop, so connection was breaking every 30-60 seconds :(

I did a few restarts, but it didn't help

this is very easy to replicate: connect switch or computer (no dhcp) to Primary WAN (Automatic IP) and let ASUS DualWAN "do the magic" :(

Screenshots from ASUS AC68U

dual wan fail-over went to secondary WAN, should have stayed there until Primary WAN comes properly back
y4mMGe06L9Hi9RpgUC4XF2YcXOAjXL6noXQQSfDjCoJ6-Zvd7qtAmBQi0KG1yTJBht0Hbn1xgAwNu1PoxDPpD_BAGQkY6hyCwWBmuAc4UaewXfxxLmql449dfKmq0FltFf3QWGMKlsRCF6_syyuC6Z08kGjFp3tIurw-VdIp0rWZs0uSv5v0sjHcGYUboPRMoXfPJrbr48fNWmKw3-_Ic9_LQ

on the computer, connection to Internet is interrupted every 30-60 seconds
y4mpX2oyGDJNzNfB3r8SCQK-B4b7SU7THDip5DpujFjB3Dm7_9K_fhHVOIkyTSY_usJXYZ0Dms3dHUJjsLWJa1FkVwjreK9I2C_LzO5LjQo0G7TAVJlIxUhZL32s-hJIDITU6jbi4qhDUUaIxIHgIkMjlKNEcLOtLtCeOzjX9d_jxngP2B_5RaniAEMW_ikDtRW-O9iDLNE3dVKGLLbwUPMRA

daul-wan config
y4mSD-b0LqL289ZSGnkLVBfYxF6iuEYzqPyPrYUXVa836m2a17fQW-XkHU5s378sg6heSNeczGAAtPBXjYm5jL7vQJNORL-aH794hiql4tr3tLkPldwOQ06WmdzTlZzg9DuCcv0tP0NS--naccMoXEpURYKaGU1wN5od9Jo0S0lwnNEsuMtc6X0yVzMTJO5xpMnpOePRmBWAIysKdAQ9MSu9g


going back to primary WAN even though IP is 0.0.0.0


meanwhile competition has this working: https://www.synology.com/en-global/knowledgebase/SRM/help/SRM/RouterApp/internet_smartwan#t1a

fail-over option checking multiple destinations (not only destination one like on ASUS watchdog)
https://izjhpw.bn1303.livefilestore...FQvWP3-zw?width=1492&height=979&cropmode=none

policy route (for dual-wan load-balancing mode) - Asus doesn't have this
https://hdl8gw.bn1303.livefilestore...ywSpZGbbA?width=1190&height=595&cropmode=none
That bug (or should we call it feature :mad:) is not new and will persistently switch forth and back between WAN0 and WAN1.
Maybe if one poor soul at Asus actually would try it they would spend some time in the Wanduck code and fix THAT part.
There are many more bugs, as you've noticed.
 

Similar threads

Latest threads

Support SNBForums w/ Amazon

If you'd like to support SNBForums, just use this link and buy anything on Amazon. Thanks!

Sign Up For SNBForums Daily Digest

Get an update of what's new every day delivered to your mailbox. Sign up here!
Top