To test things, I created two DNS zones, “toomanyns.test.globnix.net” and “toomanyns-eth.test.globnix.net”; the label “www” exists within those. These are set up with, respectively 7 and 20 NS resource records (RRs), each with a leading label 63 octets long. As a result, a response which includes the NS records will overflow either a normal 512 octet UDP packet, or a 1500 octet sized-for-ethernet UDP packet.
This let me confirm that dnspython really did make it easy to trivially retry with TCP. I also found that the library made it trivially easy to turn on EDNS0 for the queries, so I did that too.
However, manual testing with dig(1) revealed something curious. The zone is served by bind9, and I saw:
% mv ~/.digrc ~/.digrc--
% dig +norec +ignore -t ns toomanyns-eth.test.globnix.net @nlns.globnix.net
[ TC set, 5 NS responses (answer section) ]
% dig +bufsize=1500 +norec +ignore -t ns toomanyns-eth.test.globnix.net @nlns.globnix.net
[ TC set, EDNS enabled, 0 NS responses ]
I got the same result with +bufsize=512; it's the EDNS0 which triggered the “problem”.
So I checked with a DNS admin I know who's more experienced than me, who also checked with another similarly experienced DNS admin, and they both were surprised by it, at which point it became worth a mail to the bind9-bugs ISC ticketing system.
I got a prompt response from Mark Andrews, who continued to look into the problem to diagnose it; the fact that he didn't immediately know the reason is actually reassuring, confirming that this is an obscure point.
He determined though that this is deliberate behaviour on the part of bind. I'll quote his succinct summation:
“It's not a bug. EDNS uses a OPT record in the additional section.
Some nameservers assume that only the last section is incomplete
if TC is set so we don't attempt to send partial rrsets to prevent
them being accidently cached when EDNS in in use.”
What an awkward situation. The additional section makes the most sense for the OPT pseudo-RR, and given that assumption by some resolvers, a response can't have data in a section after the truncated one.
Against this, there are people who persist in denying the use of TCP to their DNS servers, on the basis that only UDP should be used. They're wrong, of course, but they're out there. Today, DNSSEC is being rolled out more and more widely, so overflowing of the bare 512-octet UDP response limit becomes more likely.
I can see a horrible scenario where such setups use EDNS0 to indicate a capability for larger packet sizes but keep TCP turned off. As long as responses fit within the raised limit, this appears to work: they get all the data. But as soon as truncation happens anyway, they are suddenly left with *no* data and no ability to resolve even a subset of the information.
The moral of this story: if you implement EDNS0, you MUST implement TCP fallback too.