storkinsj
Occasional Visitor
This is going to be a bit tricky to describe I apologize in advance for all the mistakes I'm going to make describing this.
I am using a GT-AX11000 with Merlin 386.5_2. It had been working very well until I began running a blockchain validator on my intranet. I have a very high bandwidth connection, double NAT'D (500Mbs up and down minimum).
My DNS setup includes DOT to both quad9 servers and appears to be working correctly when the validator is not running. I have 9.9.9.9 and 149.112.112.112 configured in the DOT locations as well as the DNS Server1 and Server2 locations. Again. This works perfectly when validator is running.
When I run the validator, I lose all DNS capabilities.
The validator is maintaining in the neighborhood of 1000 peers which it connects to via "udp" on its "gossip" channel; here is a slice of the outgoing tracked connections, which are capped at "500" in the UI:
udp solana 8007 141.95.125.35 8009 Untracked
udp solana 8007 184.105.146.34 8008 Untracked
udp solana 8000 146.59.68.225 8000 Untracked
udp solana 8000 141.95.35.126 8000 Untracked
...
Given a few seconds of connecting to its peers, the validator breaks DNS for all machines on the router's intranet. I did a "service dnsmasq_off" first and ran "dnsmasq --no-daemon --log-queries". I believe that basically all queries fail with that SERVFAIL message:
masq: query[A] lp-push-server-300.lastpass.com from 192.168.0.71
dnsmasq: forwarded lp-push-server-300.lastpass.com to 127.0.1.1
dnsmasq: query[A] metrics.solana.com from 192.168.4.2
dnsmasq: forwarded metrics.solana.com to 127.0.1.1
dnsmasq: query[AAAA] metrics.solana.com from 192.168.4.2
dnsmasq: forwarded metrics.solana.com to 127.0.1.1
dnsmasq: query[A] bolt.dropbox.com from 192.168.0.200
dnsmasq: forwarded bolt.dropbox.com to 127.0.1.1
dnsmasq: query[AAAA] bolt.dropbox.com from 192.168.0.200
dnsmasq: forwarded bolt.dropbox.com to 127.0.1.1
dnsmasq: query[A] ipecho.net from 192.168.0.71
dnsmasq: forwarded ipecho.net to 127.0.1.1
dnsmasq: forwarded captive.apple.com to 127.0.1.1
dnsmasq: forwarded captive.apple.com to 127.0.1.1
dnsmasq: reply error is SERVFAIL
dnsmasq: reply error is SERVFAIL
dnsmasq: reply error is SERVFAIL
dnsmasq: reply error is SERVFAIL
dnsmasq: query[A] www.1k3j1blg.com from 192.168.0.200
dnsmasq: forwarded www.1k3j1blg.com to 127.0.1.1
dnsmasq: query[AAAA] www.1k3j1blg.com from 192.168.0.200
dnsmasq: forwarded www.1k3j1blg.com to 127.0.1.1
dnsmasq: query[AAAA] www.expressapisv2.net from 192.168.0.200
dnsmasq: forwarded www.expressapisv2.net to 127.0.1.1
dnsmasq: query[A] www.expressapisv2.net from 192.168.0.200
dnsmasq: forwarded www.expressapisv2.net to 127.0.1.1
dnsmasq: reply error is SERVFAIL
dnsmasq: reply error is SERVFAIL
dnsmasq: reply error is SERVFAIL
dnsmasq: reply error is SERVFAIL
I would not have tried DNS over TLS except for the fact that I thought that using a TCP based DNS service might actually help DNS queries wade through the sea of UDP traffic.
I also have tried with DNSSEC turned on and off, although currently it is set as off.
I also tried "Use local caching DNS server as system resolver (default: No)" set to On to hopefully force the router itself to use DOT but that didn't seem to work either.
I tried Adaptive QOS, Classic QOS, and bandwidth limiter (specifically the machine with the validator) set to a much smaller value but all that did was get the validator behind on its work. DNS still failed.
---
Edit: An important detail I left out: When the validator is taken off the network, it takes about 10 minutes for the DNS to begin working again. This is true even if I reboot the router.
I thought that this may be related to the building equipment (a 10.* space NAT with NTT equipment) so I tried plugging my laptop into the WAN cable. It works immediately, even when the router is not able to do DNS.
In fact, the validator is pushing massive amounts of data through the router. That all works correctly and at high speed. But DNS fails. Again, if I reboot the router, the DNS will still not work until about 10 minutes have passed. It seems that some network state may be retained between reboots but I truly don't know that for a fact.
---
I think this is going to be difficult to get debugged (I've been trying off and on for about a week) and I appreciate that this could be a problem at the kernel level.
Any help GREATLY appreciated of course, but I will accept defeat if this doesn't work.
I am considering doing multiwan and somehow routing the DNS traffic (only) on the second connection if I can set that up.
I am using a GT-AX11000 with Merlin 386.5_2. It had been working very well until I began running a blockchain validator on my intranet. I have a very high bandwidth connection, double NAT'D (500Mbs up and down minimum).
My DNS setup includes DOT to both quad9 servers and appears to be working correctly when the validator is not running. I have 9.9.9.9 and 149.112.112.112 configured in the DOT locations as well as the DNS Server1 and Server2 locations. Again. This works perfectly when validator is running.
When I run the validator, I lose all DNS capabilities.
The validator is maintaining in the neighborhood of 1000 peers which it connects to via "udp" on its "gossip" channel; here is a slice of the outgoing tracked connections, which are capped at "500" in the UI:
udp solana 8007 141.95.125.35 8009 Untracked
udp solana 8007 184.105.146.34 8008 Untracked
udp solana 8000 146.59.68.225 8000 Untracked
udp solana 8000 141.95.35.126 8000 Untracked
...
Given a few seconds of connecting to its peers, the validator breaks DNS for all machines on the router's intranet. I did a "service dnsmasq_off" first and ran "dnsmasq --no-daemon --log-queries". I believe that basically all queries fail with that SERVFAIL message:
masq: query[A] lp-push-server-300.lastpass.com from 192.168.0.71
dnsmasq: forwarded lp-push-server-300.lastpass.com to 127.0.1.1
dnsmasq: query[A] metrics.solana.com from 192.168.4.2
dnsmasq: forwarded metrics.solana.com to 127.0.1.1
dnsmasq: query[AAAA] metrics.solana.com from 192.168.4.2
dnsmasq: forwarded metrics.solana.com to 127.0.1.1
dnsmasq: query[A] bolt.dropbox.com from 192.168.0.200
dnsmasq: forwarded bolt.dropbox.com to 127.0.1.1
dnsmasq: query[AAAA] bolt.dropbox.com from 192.168.0.200
dnsmasq: forwarded bolt.dropbox.com to 127.0.1.1
dnsmasq: query[A] ipecho.net from 192.168.0.71
dnsmasq: forwarded ipecho.net to 127.0.1.1
dnsmasq: forwarded captive.apple.com to 127.0.1.1
dnsmasq: forwarded captive.apple.com to 127.0.1.1
dnsmasq: reply error is SERVFAIL
dnsmasq: reply error is SERVFAIL
dnsmasq: reply error is SERVFAIL
dnsmasq: reply error is SERVFAIL
dnsmasq: query[A] www.1k3j1blg.com from 192.168.0.200
dnsmasq: forwarded www.1k3j1blg.com to 127.0.1.1
dnsmasq: query[AAAA] www.1k3j1blg.com from 192.168.0.200
dnsmasq: forwarded www.1k3j1blg.com to 127.0.1.1
dnsmasq: query[AAAA] www.expressapisv2.net from 192.168.0.200
dnsmasq: forwarded www.expressapisv2.net to 127.0.1.1
dnsmasq: query[A] www.expressapisv2.net from 192.168.0.200
dnsmasq: forwarded www.expressapisv2.net to 127.0.1.1
dnsmasq: reply error is SERVFAIL
dnsmasq: reply error is SERVFAIL
dnsmasq: reply error is SERVFAIL
dnsmasq: reply error is SERVFAIL
I would not have tried DNS over TLS except for the fact that I thought that using a TCP based DNS service might actually help DNS queries wade through the sea of UDP traffic.
I also have tried with DNSSEC turned on and off, although currently it is set as off.
I also tried "Use local caching DNS server as system resolver (default: No)" set to On to hopefully force the router itself to use DOT but that didn't seem to work either.
I tried Adaptive QOS, Classic QOS, and bandwidth limiter (specifically the machine with the validator) set to a much smaller value but all that did was get the validator behind on its work. DNS still failed.
---
Edit: An important detail I left out: When the validator is taken off the network, it takes about 10 minutes for the DNS to begin working again. This is true even if I reboot the router.
I thought that this may be related to the building equipment (a 10.* space NAT with NTT equipment) so I tried plugging my laptop into the WAN cable. It works immediately, even when the router is not able to do DNS.
In fact, the validator is pushing massive amounts of data through the router. That all works correctly and at high speed. But DNS fails. Again, if I reboot the router, the DNS will still not work until about 10 minutes have passed. It seems that some network state may be retained between reboots but I truly don't know that for a fact.
---
I think this is going to be difficult to get debugged (I've been trying off and on for about a week) and I appreciate that this could be a problem at the kernel level.
Any help GREATLY appreciated of course, but I will accept defeat if this doesn't work.
I am considering doing multiwan and somehow routing the DNS traffic (only) on the second connection if I can set that up.
Last edited: