I noticed an few issues with the supervision mechanism:
- I enabled DNS supervision and selected dnsmasq + adguard.
But in supervision logging I only see:
Code:
2022-07-13 19:32:01 1222.64 [SUPERVISION] addon_supervision.sh 13243: dnsmasq_is_running: Ok
2022-07-13 19:32:01 1222.72 [SUPERVISION] addon_supervision.sh 13243: openvpn_is_running: Ok
2022-07-13 19:32:01 1223.22 [SUPERVISION] addon_supervision.sh 13243: SUCCESS: Ping IP OK
Also if I do "killall AdGuardHome" then supervision newer restarts AGH.
- This morning I had an issue that AGH was not responding -> this would not be detected by simply checking if the process is still running.
Only way to detect this, is by actually trying to ping some URLs or do some nslookups.
But the supervision.log seems to suggest that supervision is only pinging IPs and not pinging URLs.
I only see: SUCCESS: Ping IP OK and no SUCCESS: Ping URL OK. Or is that only a naming thing, and does the ping module do both?
Aside that issue, even if it does ping an URL, then the router would directly use the DNS servers in /etc/resolv.conf, so it will not detect issues with AGH. But only upstream DNS (if you have configured AGH with the same servers as resolv.conf(
So I was thinking of some potential solution to improve AGH supervision.
My idea: add to lines to /tmp/addons/firewall-start-adguardhome.sh
Code:
iptables -w -t nat -A OUTPUT -p udp -d 1.2.3.4 --dport 53 -j DNAT --to 192.168.1.1:5300
iptables -w -t nat -A OUTPUT -p tcp -d 1.2.3.4 --dport 53 -j DNAT --to 192.168.1.1:5300
and then do something like:
nslookup cloudflare.com 1.2.3.4
(or some of the other common URLs that you already use for URL testing)
I.e. tell nslookup to use the bogus IP 1.2.3.4 to resolve an URL. and then have that iptables rule force that lookup to go via AGH.
If that lookup would fail, then it could mean that AGH is hanging.
The result however:
- without the iptables rule the nslookup command waits a few seconds (logical as 1.2.3.4 isn't reachable) but then eventually does give back a result.
checking with tcpdump, it seems that after the timeout of the specified server, nslookup falls back to the default system resolvers.
(so if the traffic is redirected to AGH and AGH would be hanging, then nslookup would still give back a result, only much slower -> difficult to reliably use this check.)
- if I add the iptables rule, then the nslookup command immediately gives a result.
so that seemed to be working.
- next idea, to work around the detection issue with the first point: add a bogus DNS entry to AGH (via DNS rewrite rule")
And then do nslookup bogus.dns.name 1.2.3.4 -> if nslookup falls back to resolv.conf than that would give an getaddrinfo failed message.
And if AGH is working, then it should return the ip-address that was configured in the DNS rewrite rule.
But this also didn't work.
It turns out that without iptables rule, there is a timeout, followed by a fallback to resolv.conf. And with the iptables rule, there is NO timeout, but it directly uses resolv.conf. I do see that the nat rule is hit. But traffic never seems to reach AGH.
So a mystery that I need to dive deeper into.
tbc