You may want to talk about these scripts in the README
I've been having occasional outages of 1 or 2 minutes, maybe up to 10 minutes for the last six months since my WISP replaced my antenna with Tarana equipment. For the last couple of months, I've been running pingplotter continuously and reporting outages to the WISP's engineers so they can correlate with what's happening at their tower. They identified RF interference at the tower a couple of weeks ago and it has been better, but yesterday afternoon was my worst outage yet--- about 4 hours of pings
looking like this. So this morning they came and replaced my antenna. Hopefully that will fix things.
I see, thank you for the explanation. It seems you are using wireless link.
Then It should be a moderm hand over the up-stream DHCP, I'm curious about what is the modern's local ip address?
May I know your upstream network configuration? Is it PPPoE or LAN?
This cannot be true. If the LAN has problems like dnsmasq handing out bogus IPs just because the WAN goes down, that would be unacceptable and nobody would buy Asus equipment. I've been using my RT-AC68U with this WISP for almost ten years and have never seen anything like this until I installed merlin_zerotier.
I believe something in your wan_event script is crashing and causing the router to be unstable. The first time this happened I saw this in my syslog:
That's strange; I've had ac68ux3, ac86ux2, ax86ux2 running this script without ever crashing the routers before.
This script merely checks and adds router table/iptables if the desired rules are missing, and starts zerotier if it fails, every minute.
For debugging purposes, you might want to:
1. Comment out cru inside `myservices.sh` and use `cru l` to see what is inside the crontab, then remove `ZeroTierDaemon, cruGuard1, and cruGuard2` with `cru d ZeroTierDaemon; cru d cruGuard1; cru d cruGuard2`.
2. Comment out the `lan-route-table.sh` in `firewall-start` and remove the IP route/iptables related items manually, or just restart the router if it's too trivial.
3. Change `ENABLED=yes` to` ENABLED=no` in `init.d/S91zerotier-one`, then use `/opt/etc/init.d/S91zerotier-one stop` to terminate the zerotier service, and see whether the dnsmasq still crashes.
I noticed this, perhaps your dnsmasq crashes are more closely related to the YazDHCP you're using.
Iguessing, but my kernael does not seem to be noormal. Sep 14 03:58:39 kernel: eth0 (Int switch port: 6) (Logical Port: 6) (phyId: 13) Link DOWN. Sep 14 03:58:42 kernel: ^[[0;33;41mFCACHE ERROR: fc_config_mcast_group: Mcast: client has already JOINed the mcast group (duplicate JOIN)^[[0m Sep...
www.snbforums.com
This is all hypothetically assuming the user is running 388.2 firmware with no "reliable" DNS upstream configured on their WAN page?
This really does look like your case.
May I know which merlin/dnsmasq version you are running on?
Code:
Oct 14 16:37:55 WAN_Connection: WAN was restored.
Oct 14 16:37:55 dnsmasq[3330]: read /etc/hosts - 24 names
Oct 14 16:37:55 dnsmasq[3330]: read /jffs/addons/YazDHCP.d/.hostnames - 20 names
Oct 14 16:37:55 kernel: potentially unexpected fatal signal 6.
Oct 14 16:37:55 kernel: CPU: 0 PID: 3330 Comm: dnsmasq Tainted: P O 4.19.183 #1
Oct 14 16:37:55 kernel: Hardware name: RTAX86U_PRO (DT)
Oct 14 16:37:55 kernel: pstate: 00070010 (nzcv q A32 LE aif)
Oct 14 16:37:55 kernel: pc : 00000000f78c03a4
Oct 14 16:37:55 kernel: lr : 00000000ffeeab50
Oct 14 16:37:55 kernel: sp : 00000000ffeeab50
Oct 14 16:37:55 kernel: x12: 0000000000000000
Oct 14 16:37:55 kernel: x11: 00000000ffeeade0 x10: 00000000f79b6de0
Oct 14 16:37:55 kernel: x9 : 0000000000000002 x8 : 0000000000000001
Oct 14 16:37:55 kernel: x7 : 00000000000000af x6 : 00000000ffeead90
Oct 14 16:37:55 kernel: x5 : 00000000ffeead90 x4 : 0000000000000006
Oct 14 16:37:55 kernel: x3 : 0000000000000008 x2 : 0000000000000000
Oct 14 16:37:55 kernel: x1 : 00000000ffeeab50 x0 : 0000000000000000
Oct 14 16:37:59 rc_service: watchdog 2372:notify_rc start_dnsmasq
Oct 14 16:37:59 custom_script: Running /jffs/scripts/service-event (args: start dnsmasq)
Oct 14 16:37:59 custom_config: Appending content of /jffs/configs/dnsmasq.conf.add.
Oct 14 16:37:59 dnsmasq[12940]: started, version 2.89 cachesize 1500
Oct 14 16:37:59 dnsmasq[12940]: compile time options: IPv6 GNU-getopt no-R
What I meant was that whenever the WAN is down for 20-30 seconds or more dnsmasq starts handing out bogus IPs.
From the log above, it seems that dnsmasq has crashed. I have noticed that your dnsmasq runs some addons.
Could it be possible that something inside YazDHCP.d is generating bogus IPs?
Reasons for the 10.0.0.1 Return Address
- Fallback Configuration: The DNS resolver (possibly dnsmasq) may have a fallback configuration to respond with a predefined IP address (10.0.0.1 in this case) when the internet is down, or it can't reach the upstream DNS server for some reason.
- Cache Behavior: If dnsmasq or another local DNS resolver has a stale or incorrect entry in its cache, it may return the wrong IP address during an internet outage.
- Local Hosts File: There might be entries in the local hosts file that map the hostname marantz to the IP 10.0.0.1. During an internet outage, if the DNS resolver cannot query external DNS servers, it might rely more on local records, resulting in this IP being returned.
- Network Segmentation: If there are different network segments or VLANs, the DNS resolver might be configured to return different IPs based on the network's status or the querying device's network location.
If the zerotier blocks the connection, we can kill and disable it before WAN is fully up.
Another possibilities, is there more than one dnsmasq running?
custom_script: Running /jffs/scripts/service-event
Despite the extra log entry in my scripts, a similar log should also appear when the router runs the script. So, I believe the `wan-event` might not have even been called.
With this kind of outage, the WAN is never *completely* down. The ping times just get longer and longer until various services start to fail. However, since the outage was over, I've been testing by simply unplugging my WAN ethernet cable, and the same problem happens.
Should this script be added to the repository?
I have added firewall-start. This will implement iptables immediately after the network status changes. However, if the zt network hasn’t started up yet, the IP route might fail to be added, so crontab will attempt to add it every minute.
On further testing, I've found that I was wrong about the NAT iptables entry never getting restored. They do come back a few minutes after the WAN comes back up.
I believe that crontab was functioning correctly. As long as it invokes `S90zerotier-one.sh`, the `lan-route-table.sh` will also be called.
For some reason, most of your sysLOG messages don't show up in my syslog. And when I try to execute your sysLOG function from the console I get an error:
Apologies, this issue has been resolved; I have updated GitHub.
However, this error only occurs when you execute scripts from sh.
Since wan-event and firewall-start don’t use sysLOG(), they are not affected by this bug.