Extended DNS Errors: Unlocking the Full Potential of DNS Troubleshooting

Sept 28, 2023 · Yevheniya Nosyk, Maciej Korczyński, and Andrzej Duda

The Domain Name System (DNS) has traditionally relied on response codes to signal anomalies, but they are of little help to precisely identify the root causes behind failures. This shortcoming was addressed in RFC-8914, which introduced Extended DNS Errors (EDE) - a new mechanism to provide extra feedback on DNS resolutions. At Université Grenoble Alpes, we recently studied the implementation of this proposed standard and enumerated domain misconfigurations in the wild. This blog post summarizes the key findings of our paper.

Background

Extended DNS Errors rely on EDNS(0) to serve data inside the OPT resource record under the option code of 15. As of September 2023, the Extended DNS Error Codes registry at IANA contains 30 entries, 5 of which were added after the release of the original RFC 8914. The Table below presents them all. The codes cover different aspects of DNS, such as DNSSEC validation (1, 2, 5-12, 25, 27), caching (3, 13, 19, 29), resolver policies (4, 15-18, 20), software operation (14, 21-23), etc. These EDE codes exist independently from traditional response codes and the EDE specification does not prohibit any combination of the two. Importantly, any DNS system, whether a recursive resolver, a forwarder, or an authoritative nameserver, can generate, forward, and parse the EDE codes.

Code Description Code Description
0. Other Error 15. Blocked
1. Unsupported DNSKEY Algorithm 16. Censored
2. Unsupported DS Digest Type 17. Filtered
3. Stale Answer 18. Prohibited
4. Forged Answer 19. Stale NXDomain Answer
5. DNSSEC Indeterminate 20. Not Authoritative
6. DNSSEC Bogus 21. Not Supported
7. Signature Expired 22. No Reachable Authority
8. Signature Not Yet Valid 23. Network Error
9. DNSKEY Missing 24. Invalid Data
10. RRSIGs Missing 25. Signature Expired before Valid
11. No Zone Key Bit Set 26. Too Early
12. NSEC Missing 27. Unsupported NSEC3 Iterations Value
13. Cached Error 28. Unable to conform to policy
14. Not Ready 29. Synthesized

Implementation

As of May 2023, Extended DNS Errors are implemented by major resolver software vendors (BIND9, Unbound, Knot Resolver, PowerDNS Recursor) and public resolvers (Cloudflare, Quad9, OpenDNS). Note that Google DNS announced its support of RFC 8914 two months after our experiments, in July 2023.

We were wondering what kind of issues can trigger recursive resolvers to return EDE codes. To answer this question, we have set up 63 domains reflecting different misconfigurations and corner cases, such as erroneous DNSSEC configurations (wrong keys, signatures, digests, very old/new algorithms), unreachable nameservers, restrictive ACLs, etc. Refer to https://extended-dns-errors.com for a full list of domains and feel free to use them for your own tests.

Next, we queried Cloudflare, Quad9, OpenDNS, as well as our own instances of BIND 9.19.9, Unbound 1.16.2, PowerDNS 4.8.2, and Knot 5.6.0. Overall, our 63 test domains generated 12 different EDE codes. Only 4 test cases out of 63 triggered the same results across all the seven tested systems: the no-ds, nsec3-iter-200, unsigned, and valid subdomains did not result in any extended error code. The following factors contributed to the inconsistency among the remaining 94% of tests:

Misconfigurations in the wild

We now set out to discover the most prevalent issues in the wild. We gathered a dataset of more than 303 million registered domains across 1,475 TLDs and requested Cloudflare public DNS to resolve their A records. Overall, 17.7 million domain names triggered 14 individual EDE codes or their combinations.

Lame delegations are the most common issue encountered - 14.8 million domains triggered No Reachable Authority and/or Network Error EDEs. These refer to cases when recursive resolvers cannot reach some or all the domain's authoritative nameservers. Cloudflare used the EXTRA-TEXT field of the EDE entry to inform that some nameservers returned REFUSED or SERVFAIL response codes, thus not serving the authoritative data.

DNSSEC misconfigurations are another prevalent problem. Expired / missing / not yet valid signatures, missing keys or proofs of non-existence, DNSKEYs not corresponding to DS records, and broken chains of trust - all make those domains inaccessible when end users are behind validating DNS resolvers. However, when using unsupported cryptographic algorithms, resolutions would not fail, but rather be accompanied by Unsupported DNSKEY Algorithm or Unsupported DS Digest Type.

Finally, two debugging EDEs were returned to signal that we were served stale answers (Stale Answer) or previously cached SERVFAIL (Cached Error).

Interestingly, 2.47 million domain names under two European ccTLDs triggered the RRSIGs Missing EDE code without leading to DNSSEC validation failures. We reached out to one of the TLD operators who explained to us that despite the TLD zone being correctly configured, Cloudflare DNS signaled the problem with a so-called stand-by KSK, i.e., the one published in the zone file in case the emergency key rollover is needed, but not actively used to establish the chain of trust. We identified 22 more public suffixes and TLDs with stand-by DNSSEC keys triggering the same error. We contacted Cloudflare and reported our findings. They, in turn, confirmed that it was an expected behavior and updated their documentation to inform that “key rollover in-progress, stand-by key, and attacker stripping signatures” may trigger the RRSIGs Missing EDEs.

Conclusions

Our measurements revealed that all the systems implementing RFC 8914 were successful in determining root causes of misconfigurations with different levels of specificity. Moreover, this standard is particularly useful to enumerate misconfigurations at scale. Therefore, we believe that EDE is a promising technique that assists DNS operators, domain owners, and end clients in identifying and resolving DNS issues.