The Grumpy Troll

Ramblings of a grumpy troll.

DNS: don't implement EDNS0 to bypass implementing TCP fallback

This grumpy troll occasionally hacks on scripts which use DNS in slightly unusual ways. As part of this, I was sending queries directly to authoritative DNS servers, which with dnspython necessitated sending UDP queries explicitly, rather than invoking an interface that would fall back to TCP.

To test things, I created two DNS zones, “” and “”; the label “www” exists within those. These are set up with, respectively 7 and 20 NS resource records (RRs), each with a leading label 63 octets long. As a result, a response which includes the NS records will overflow either a normal 512 octet UDP packet, or a 1500 octet sized-for-ethernet UDP packet.

This let me confirm that dnspython really did make it easy to trivially retry with TCP. I also found that the library made it trivially easy to turn on EDNS0 for the queries, so I did that too.

However, manual testing with dig(1) revealed something curious. The zone is served by bind9, and I saw:

% mv ~/.digrc ~/.digrc--
% dig +norec +ignore -t ns
[ TC set, 5 NS responses (answer section) ]
% dig +bufsize=1500 +norec +ignore -t ns
[ TC set, EDNS enabled, 0 NS responses ]

I got the same result with +bufsize=512; it's the EDNS0 which triggered the “problem”.

So I checked with a DNS admin I know who's more experienced than me, who also checked with another similarly experienced DNS admin, and they both were surprised by it, at which point it became worth a mail to the bind9-bugs ISC ticketing system.

I got a prompt response from Mark Andrews, who continued to look into the problem to diagnose it; the fact that he didn't immediately know the reason is actually reassuring, confirming that this is an obscure point.

He determined though that this is deliberate behaviour on the part of bind. I'll quote his succinct summation:

“It's not a bug. EDNS uses a OPT record in the additional section.
Some nameservers assume that only the last section is incomplete
if TC is set so we don't attempt to send partial rrsets to prevent
them being accidently cached when EDNS in in use.”

What an awkward situation. The additional section makes the most sense for the OPT pseudo-RR, and given that assumption (explicitly correct by RFC) by some resolvers, a response can't have data in a section after the truncated one.

Against this, there are people who persist in denying the use of TCP to their DNS servers, on the basis that only UDP should be used. They're wrong, of course, but they're out there. Today, DNSSEC is being rolled out more and more widely, so overflowing of the bare 512-octet UDP response limit becomes more likely.

I can see a horrible scenario where such setups use EDNS0 to indicate a capability for larger packet sizes but keep TCP turned off. As long as responses fit within the raised limit, this appears to work: they get all the data. But as soon as truncation happens anyway, they are suddenly left with *no* data and no ability to resolve even a subset of the information.


The moral of this story: if you implement EDNS0, you MUST implement TCP fallback too.


rick jones
"As a result, a response which includes the NS records will overflow either a normal 512 octet UDP packet, or a 1500 octet sized-for-ethernet UDP packet."

There seems to be a bit of mixing of sizes with and without the UDP datagram header there. IIRC 512 octets is how much DNS/BIND would put into a message being send by UDP, which would then be 520 bytes of UDP datagram. A UDP datagram that was 1500 bytes in size would be larger than an unfragmented IP datagram over a "standard" 1500 byte Ethernet IP MTU.

The maximum size UDP datagram that would fit without triggering IP fragmentation would be 1480 bytes including UDP header or 1472 bytes of user data (to match the previous 512).

An alternative wording:

"As a result, a response which includes the NS records will overflow either the classic 512 octet DNS over UDP message limit, or the 1472 octet limit to allow the message to be carried in a UDP datagram itself carried in an unfragmented IP datagram over a "standard" 1500 byte MTU Ethernet."
Categories: TCP dns EDNS0 UDP debugging