What's new
  • SNBForums Code of Conduct

    SNBForums is a community for everyone, no matter what their level of experience.

    Please be tolerant and patient of others, especially newcomers. We are all here to share and learn!

    The rules are simple: Be patient, be nice, be helpful or be gone!

Router performing crippling "DNSKEY" look-ups out of the blue

CB7

Occasional Visitor
Hi,

Sometimes, out of nowhere as I really cannot put a clock or any preceding event to it, the router suddenly starts sending out thousands of queries for "." (without quotes, so literally just a dot).
The type is "DNSKEY, plain DNS" and, probably due to the sheer volume of them (we're talking tens of thousands of requests suddenly), they take about 20 seconds to complete - each! The CPU load on the router skyrockets and DNS resolution comes to a halt (as well as services ceasing to respond).

... But why is it doing that? Digging deeeeep in my memory, I believe the . DNSKEY is to find root servers. But why on earth does it send tens of thousands of requests for it and all at once? It supposed to request it and then cache it for a while. Not try to resolve it thousands and thousands of time. To be clear: the requests come from the router itself, not from a client on the network.

Anyone got any idea? :) Google is not being overly helpful.
I'm running Merlin on an AX3000-V2 with AdGuard Home running as the DNS service. Attached some screenshots.

Thanks!
 

Attachments

  • QueryLogExcerpt.png
    QueryLogExcerpt.png
    96.8 KB · Views: 47
  • Tally.png
    Tally.png
    6.4 KB · Views: 46
Make sure DNSSEC is turned off on the WAN page. I can’t think of any other reason why the router would be making DNSKEY requests. Unless AdGuardHome is doing it to itself.
 
  • Like
Reactions: CB7
Make sure DNSSEC is turned off on the WAN page. I can’t think of any other reason why the router would be making DNSKEY requests. Unless AdGuardHome is doing it to itself.
Thanks - if that's the "Enable DNSSEC support" one you're referring to: that one is indeed already disabled. Right above that, I do have the DNS server set to the router itself (so AGH) - would it be better to change that to an external service? (I mean, it still shouldn't be sending tens of thousands of DNSKEY-requests; but at least it won't cripple itself over it.) I assume that won't affect DNS Director for all other clients on the network.
 
What are the upstream servers for AGH? Maybe these queries are caught in a DNS loop.
 
What are the upstream servers for AGH? Maybe these queries are caught in a DNS loop.
Currently, they're configured to:

Code:
https://dns.cloudflare.com/dns-query
tls://one.one.one.one
https://doh.opendns.com/dns-query
tls://dns.opendns.com
[/lan/][::]:553
[//][::]:553
 
I would look in AGH for an option to disable DNSSEC then. Something is trying to validate and it’s not likely to be dnsmasq at this point.
 
I would look in AGH for an option to disable DNSSEC then. Something is trying to validate and it’s not likely to be dnsmasq at this point.
Alright, did that. Would prefer it to be enabled, but it's very far from the end of the world not to have it on - will see how it goes. I was looking in the statistics, and there's something else I find odd: it says the hits for the DNSKEY requests from the router have /etc/hosts as source.
 
Currently, they're configured to:

Code:
https://dns.cloudflare.com/dns-query
tls://one.one.one.one
https://doh.opendns.com/dns-query
tls://dns.opendns.com
[/lan/][::]:553
[//][::]:553

I would start by simplifying this list. Specifically:
1. Leave only 2 servers.
2. In the case above I'd leave the two tls based entries for clourdflare and opendns, nuke everything else.
3. Hit Apply and Test Upstream and see what the results are.

In the logs it's timing out finding the root servers, "." so it's having trouble getting started on name resolutions.

I run several adguardhome servers and have not seen the issue you describe, but I've also not got an odd mix & match grouping of upstream servers.

My $0.02 :)
 
@cptnoblivious Thanks for the suggestion! If you don't mind me asking, what is the rationale behind that? I'm following the AGH manual on that end and doing pretty much exactly what they tell you to do. Doesn't make sense this would cause issues then. (And tested: the HTTPS resolvers do perform root lookups effortlessly (and are generally a lot faster than the TLS ones))? Don't get me the wrong way, grateful for the suggestion and am happy to test this - but if these settings would pose a threat to the proper functioning of AGH, there must either be a bug in AGH or a mistake in the documentation - and in both cases I'd like to report it to AGH's teams to avoid the issue in the future. :)

However, I haven't changed this list yet. What I did do was modify the DNS server to use under the ASUS adminpanel WAN-settings. It no longer asks the local AGH instance, but it asks Quad9. So far: the problem has not occurred again. Fingers crossed. But if this solves it, I wonder if AGH is the problem at all or just the router doing something very odd.
 
@cptnoblivious Thanks for the suggestion! If you don't mind me asking, what is the rationale behind that? I'm following the AGH manual on that end and doing pretty much exactly what they tell you to do. Doesn't make sense this would cause issues then. (And tested: the HTTPS resolvers do perform root lookups effortlessly (and are generally a lot faster than the TLS ones))? Don't get me the wrong way, grateful for the suggestion and am happy to test this - but if these settings would pose a threat to the proper functioning of AGH, there must either be a bug in AGH or a mistake in the documentation - and in both cases I'd like to report it to AGH's teams to avoid the issue in the future. :)

However, I haven't changed this list yet. What I did do was modify the DNS server to use under the ASUS adminpanel WAN-settings. It no longer asks the local AGH instance, but it asks Quad9. So far: the problem has not occurred again. Fingers crossed. But if this solves it, I wonder if AGH is the problem at all or just the router doing something very odd.
AGH gives you examples for ways to configure DNS servers. That doesn't mean you should use every way and a metric f-ton of servers and mixed and matched for ways to connect to them.

What I gave you were steps to troubleshooting. Step 1 of pretty much all troubleshooting is to remove complexity and see if the simplest way works. It wasn't about DoT vs DoH. That's not about speed either, I typically see <75 ms response times from tls://1.1.1.1 and you were seeing 20,000ms response times from your queries. When you mouse over the "?" in the query log, do you see which servers are taking that long? Is it all of them? Start to narrow down the issue.

20 seconds to resolve a dns query shows some serious misconfiguration or issues on the device AGH is running on IMO.
 
AGH gives you examples for ways to configure DNS servers. That doesn't mean you should use every way and a metric f-ton of servers and mixed and matched for ways to connect to them.

What I gave you were steps to troubleshooting. Step 1 of pretty much all troubleshooting is to remove complexity and see if the simplest way works. It wasn't about DoT vs DoH. That's not about speed either, I typically see <75 ms response times from tls://1.1.1.1 and you were seeing 20,000ms response times from your queries. When you mouse over the "?" in the query log, do you see which servers are taking that long? Is it all of them? Start to narrow down the issue.

20 seconds to resolve a dns query shows some serious misconfiguration or issues on the device AGH is running on IMO.
I'll respectfully disagree that 4 nameservers is "a metric f-ton". :)
Indeed, 20 seconds to resolve is very long. But that was explained on the account of receiving tens of thousands simultaneous requests - the router was essentially DoS'ing itself. The hardware simply can't cope with that many requests all at once; so it makes sense the processing time goes way up. Huge que of requests, high CPU load - that's to be expected when it suddenly get's so many requests at once.

Either way, changing the resolver under the routers' WAN-settings appears to have remediated the problem. Or that is to say: maybe the problem isn't actually solved, but now the requests are sent to an outbound server so it won't bring AGH down. I don't want to be a nuisance elsewhere either though, so I'll continue trying to figure out _why_ it periodically attempted to do it and what went wrong. Maybe I can find a package that is the culprit, it doesn't make any sense why it would send so many requests. I tested AGH and it responds to DNSKEY the way its supposed to, no issues at all.

Considering the issue disappears when telling the router to use a different resolver and AGH responds to the requests like you'd expect, I think the issue lies beyond AGH.
 

Similar threads

Support SNBForums w/ Amazon

If you'd like to support SNBForums, just use this link and buy anything on Amazon. Thanks!

Sign Up For SNBForums Daily Digest

Get an update of what's new every day delivered to your mailbox. Sign up here!
Back
Top