What's new

dnsmasq stability

  • SNBForums Code of Conduct

    SNBForums is a community for everyone, no matter what their level of experience.

    Please be tolerant and patient of others, especially newcomers. We are all here to share and learn!

    The rules are simple: Be patient, be nice, be helpful or be gone!

Same here. Working as expected in both strict-order and all-servers modes. With all-servers I got 2 retries out of 330 to 8.8.8.8. All other servers and combinations give me 0 retries.

What dnsmasq versions are we using? I have 2.75.
2.75 is what I stuck with for now on the fork. Someone reported a problem with dnsmasq 2.76 during the V17 beta, so I decided to stay put for the time being.
 
@mstombs Given that we have the same ISP (Virgin Media) and both use their DNS servers; I never see any retries (0) whereas you are seeing ~16%. So I think your results are probably being skewed by a connection issue.
 
hmm is it possible to share info of this patch with john? as I am curious if it will help.

I settled on using all-servers with 2 servers in config that point to my server.

You misread my post. But I do plan to share the kernel when ready. It's for AC56/AC68/AC87.
 
quick update guys, this is with almost 24 hours uptime since last dnsmasq restart :) Although there has been a couple of cache flushes for dns list updates.

Code:
Apr 12 20:59:01 dnsmasq[23646]: time 2261379
Apr 12 20:59:01 dnsmasq[23646]: cache size 2000, 0/6165 cache insertions re-used unexpired cache entries.
Apr 12 20:59:01 dnsmasq[23646]: queries forwarded 7641, queries answered locally 8197
Apr 12 20:59:01 dnsmasq[23646]: DNSSEC memory in use 28380, max 35332, allocated 199980
Apr 12 20:59:01 dnsmasq[23646]: server 8.8.4.4#53: queries sent 50, retried or failed 0
Apr 12 20:59:01 dnsmasq[23646]: server 8.8.8.8#53: queries sent 87, retried or failed 14
Apr 12 20:59:01 dnsmasq[23646]: server 127.0.0.1#65057: queries sent 8241, retried or failed 0
Apr 12 20:59:01 dnsmasq[23646]: server 127.0.0.1#65058: queries sent 7781, retried or failed 1581
 
2.75 is what I stuck with for now on the fork. Someone reported a problem with dnsmasq 2.76 during the V17 beta, so I decided to stay put for the time being.

john do you know which version merlin is using, and do you have the 2.76 changelog handy? cannot find it anywhere. :(
 
john do you know which version merlin is using, and do you have the 2.76 changelog handy? cannot find it anywhere. :(
2.75 is the last official release. Now, dnsmasq is one of those components that has ASUS customizations applied (at least one of which is to support the reporting of remaining lease times...ask me how I know this :) ) so I only pick up this one after Merlin has pulled it in during one of his ASUS merges.
Based on the dates, it looks like Merlin is on 2.76-test1....they are now up to 2.76-test12
You can find the CHANGELOG in Merlin's github
https://github.com/RMerl/asuswrt-merlin/blob/master/release/src/router/dnsmasq/CHANGELOG
 
Asus tends to regularly upgrade dnsmasq to newer versions, including unofficial Git snapshots of it.
 
an update

I moved queries to unbound, now have 0 failed lookups, and in addition no dnssec issues (dnsmasq was giving me weird dnssec issues).

guide here to merge, take note of the first reply to that blog to make sure can resolve lan names.

https://blog.josefsson.org/2015/10/26/combining-dnsmasq-and-unbound/

The only downside is unbound seems to only have one operational mode with forwarders, which is to randomise them, so cannot specify a forwarder order. like dnsmasq strict-order
 
some more unbound data, here is histogram of dns lookup speeds.

Code:
May  1 05:23:45 unbound: [1419:1] info: server stats for thread 1: 7668 queries, 1644 answers from cache, 6024 recursions, 48 prefetch
May  1 05:23:45 unbound: [1419:1] info: server stats for thread 1: requestlist max 14 avg 1.39756 exceeded 0 jostled 0
May  1 05:23:45 unbound: [1419:1] info: average recursion processing time 0.160380 sec
May  1 05:23:45 unbound: [1419:1] info: histogram of recursion processing times
May  1 05:23:45 unbound: [1419:1] info: [25%]=0.0352996 median[50%]=0.104211 [75%]=0.216155
May  1 05:23:45 unbound: [1419:1] info: lower(secs) upper(secs) recursions
May  1 05:23:45 unbound: [1419:1] info:    0.000000    0.000001 230
May  1 05:23:45 unbound: [1419:1] info:    0.000128    0.000256 1
May  1 05:23:45 unbound: [1419:1] info:    0.000256    0.000512 403
May  1 05:23:45 unbound: [1419:1] info:    0.000512    0.001024 31
May  1 05:23:45 unbound: [1419:1] info:    0.001024    0.002048 9
May  1 05:23:45 unbound: [1419:1] info:    0.002048    0.004096 18
May  1 05:23:45 unbound: [1419:1] info:    0.004096    0.008192 43
May  1 05:23:45 unbound: [1419:1] info:    0.008192    0.016384 237
May  1 05:23:45 unbound: [1419:1] info:    0.016384    0.032768 463
May  1 05:23:45 unbound: [1419:1] info:    0.032768    0.065536 919
May  1 05:23:45 unbound: [1419:1] info:    0.065536    0.131072 1115
May  1 05:23:45 unbound: [1419:1] info:    0.131072    0.262144 1616
May  1 05:23:45 unbound: [1419:1] info:    0.262144    0.524288 738
May  1 05:23:45 unbound: [1419:1] info:    0.524288    1.000000 153
May  1 05:23:45 unbound: [1419:1] info:    1.000000    2.000000 25
May  1 05:23:45 unbound: [1419:1] info:    2.000000    4.000000 19
May  1 05:23:45 unbound: [1419:1] info:    4.000000    8.000000 1
May  1 05:23:45 unbound: [1419:1] info:    8.000000   16.000000 1
May  1 05:23:45 unbound: [1419:1] info:   16.000000   32.000000 2
May  1 05:23:45 unbound: [1417:0] info: server stats for thread 0: 8250 queries, 1966 answers from cache, 6284 recursions, 35 prefetch
May  1 05:23:45 unbound: [1417:0] info: server stats for thread 0: requestlist max 15 avg 1.22155 exceeded 0 jostled 0
May  1 05:23:45 unbound: [1417:0] info: average recursion processing time 0.191713 sec
May  1 05:23:45 unbound: [1417:0] info: histogram of recursion processing times
May  1 05:23:45 unbound: [1417:0] info: [25%]=0.0360909 median[50%]=0.102516 [75%]=0.216632
May  1 05:23:45 unbound: [1417:0] info: lower(secs) upper(secs) recursions
May  1 05:23:45 unbound: [1417:0] info:    0.000000    0.000001 184
May  1 05:23:45 unbound: [1417:0] info:    0.000064    0.000128 1
May  1 05:23:45 unbound: [1417:0] info:    0.000256    0.000512 459
May  1 05:23:45 unbound: [1417:0] info:    0.000512    0.001024 20
May  1 05:23:45 unbound: [1417:0] info:    0.001024    0.002048 16
May  1 05:23:45 unbound: [1417:0] info:    0.002048    0.004096 15
May  1 05:23:45 unbound: [1417:0] info:    0.004096    0.008192 41
May  1 05:23:45 unbound: [1417:0] info:    0.008192    0.016384 242
May  1 05:23:45 unbound: [1417:0] info:    0.016384    0.032768 492
May  1 05:23:45 unbound: [1417:0] info:    0.032768    0.065536 996
May  1 05:23:45 unbound: [1417:0] info:    0.065536    0.131072 1198
May  1 05:23:45 unbound: [1417:0] info:    0.131072    0.262144 1607
May  1 05:23:45 unbound: [1417:0] info:    0.262144    0.524288 748
May  1 05:23:45 unbound: [1417:0] info:    0.524288    1.000000 204
May  1 05:23:45 unbound: [1417:0] info:    1.000000    2.000000 26
May  1 05:23:45 unbound: [1417:0] info:    2.000000    4.000000 12
May  1 05:23:45 unbound: [1417:0] info:    4.000000    8.000000 11
May  1 05:23:45 unbound: [1417:0] info:    8.000000   16.000000 6
May  1 05:23:45 unbound: [1417:0] info:   16.000000   32.000000 6

current stats with uptime of over a day.

Code:
num.answer.rcode.NOERROR=14532
num.answer.rcode.FORMERR=0
num.answer.rcode.SERVFAIL=66
num.answer.rcode.NXDOMAIN=1467
num.answer.rcode.NOTIMPL=0
num.answer.rcode.REFUSED=0
num.answer.rcode.nodata=130
num.answer.secure=571
num.answer.bogus=0
num.rrset.bogus=0

have filed a bug report to ask for strict order feature for forwarders.

Per merlin's information I will also refile my dnsmasq bug report to the dnsmasq mailing list.
 
Although I still wasn't having a high retry/fail rate (about 2%), I still wanted to do some investigation on dnsmasq. I patched dnsmasq to make a syslog entry for the 'retried/failed' lookups. Not sure what it means, but somewhat interesting results in that about 90% of the fails are to the same domains, all tracking domains....
cloudfront.net
parsley.com
whereisip.net
scorecardresearch.com
livefyre.com
imrworldwide.com
whatismyipaddress.com
 
Although I still wasn't having a high retry/fail rate (about 2%), I still wanted to do some investigation on dnsmasq. I patched dnsmasq to make a syslog entry for the 'retried/failed' lookups. Not sure what it means, but somewhat interesting results in that about 90% of the fails are to the same domains, all tracking domains....
cloudfront.net
parsley.com
whereisip.net
scorecardresearch.com
livefyre.com
imrworldwide.com
whatismyipaddress.com

Could it simply be that some of these have a non-working authoritative nameserver, and dnsmasq logs these rather than simply quietly try another authoritative server?

Get the list of authoritative servers for a failing domain, and do a nslookup by connecting directly to each auth servers.
 
Could it simply be that some of these have a non-working authoritative nameserver, and dnsmasq logs these rather than simply quietly try another authoritative server?

Get the list of authoritative servers for a failing domain, and do a nslookup by connecting directly to each auth servers.

More data.....I'm running through a VPN which pushes 2 dns servers, dnsmasq is set for strict mode. Of the 2% 'retired or failed', about 90% get successfully retried on the second VPN dns server. The remaining 10% 'leak' to my first ISP dns server and are successful there.
Code:
May  3 12:15:01 dnsmasq[2546]: server 209.222.18.222#53: queries sent 6402, retried or failed 146
May  3 12:15:01 dnsmasq[2546]: server 209.222.18.218#53: queries sent 146, retried or failed 15
May  3 12:15:01 dnsmasq[2546]: server 68.105.28.11#53: queries sent 15, retried or failed 0
May  3 12:15:01 dnsmasq[2546]: server 68.105.29.11#53: queries sent 0, retried or failed 0
May  3 12:15:01 dnsmasq[2546]: server 68.105.28.12#53: queries sent 0, retried or failed 0
May  3 12:15:01 dnsmasq[2546]: server 2001:578:3f::30#53: queries sent 0, retried or failed 0
May  3 12:15:01 dnsmasq[2546]: server 2001:578:3f:1::30#53: queries sent 0, retried or failed 0

It just struck me as to how many of the initial 2% retried or failed were to tracking domains.
 
Although I still wasn't having a high retry/fail rate (about 2%), I still wanted to do some investigation on dnsmasq. I patched dnsmasq to make a syslog entry for the 'retried/failed' lookups. Not sure what it means, but somewhat interesting results in that about 90% of the fails are to the same domains, all tracking domains....
cloudfront.net
parsley.com
whereisip.net
scorecardresearch.com
livefyre.com
imrworldwide.com
whatismyipaddress.com

sweet, can I test this build please? to see if I get the same findings.

Unbound is really bad at handling tracking files, it uses too much memory, so my unbound forwards all domains in my tracking list to dnsmasq to process.

But i will retest dnsmasq with the logging to see what shows up.

Also the dev did get back to me eventually, but he is blaming network conditions.
 
May I ask, where can I edit or customize my dnsmasq.conf file?
 

Similar threads

Support SNBForums w/ Amazon

If you'd like to support SNBForums, just use this link and buy anything on Amazon. Thanks!

Sign Up For SNBForums Daily Digest

Get an update of what's new every day delivered to your mailbox. Sign up here!
Top