Hi, Please find my review, as member of the INT Area Directorate, of the following document: dprive S. Bortzmeyer Internet-Draft AFNIC Obsoletes: 7626 (if approved) S. Dickinson Intended status: Informational Sinodun IT Expires: July 19, 2020 January 16, 2020 DNS Privacy Considerations draft-ietf-dprive-rfc7626-bis-04 1. Introduction Let us begin with a simplified reminder of how the DNS works (See also [RFC8499]). A client, the stub resolver, issues a DNS query to a server, called the recursive resolver (also called caching resolver or full resolver or recursive name server). Let's use the query "What are the AAAA records for www.example.com?" as an example. AAAA is the QTYPE (Query Type), and www.example.com is the QNAME (Query Name). (The description that follows assumes a cold cache, for instance, because the server just started.) The recursive resolver will first query the root name servers. In most cases, the root name servers will send a referral. In this example, the referral will be to the .com name servers. The resolver repeats the query to one of the .com name servers. The .com name servers, in turn, will refer to the example.com name servers. The example.com name server will then return the answer. The root name servers, the name servers of .com, and the name servers of example.com are called authoritative name servers. It is important, when analyzing the privacy issues, to remember that the question asked to all these name servers is always the original question, not a derived question. The question sent to the root name servers is "What are the AAAA records for www.example.com?", not "What are the name servers of .com?". By repeating the full question, instead of just the relevant part of the question to the next in line, the DNS provides more information than necessary to the name server. In this simplified description, recursive resolvers do not implement QNAME minimization as described in [RFC7816], which will only send the relevant part of the question to the upstream name server. IMHO, that would be clearer to split the previous paragraph into 2 paragraphs: - one explaining the general DNS process - one showing the privacy issue related to the fact the question is not derived BTW, the construction of the end of the previous paragraph suggests that question derivation and QNAME minimization are two different things. At the time of writing, almost all this DNS traffic is currently sent in clear (i.e., unencrypted). However there is increasing deployment of DNS-over-TLS (DoT) [RFC7858] and DNS-over-HTTPS (DoH) [RFC8484], particularly in mobile devices, browsers, and by providers of anycast recursive DNS resolution services. There are a few cases where there is some alternative channel encryption, for instance, in an IPsec VPN tunnel, at least between the stub resolver and the resolver. IPsec: a reference is missing. o Tertiary requests: these are the additional requests performed by the DNS system itself. For instance, if the answer to a query is a referral to a set of name servers, and the glue records are not returned, the resolver will have to do additional requests to turn the name servers' names into IP addresses. Similarly, even if glue records are returned, a careful recursive server will do tertiary requests to verify the IP addresses of those records. “glue records”: IMHO, either a reference or a definition is needed. 2. Scope This document focuses mostly on the study of privacy risks for the end user (the one performing DNS requests). We consider the risks of pervasive surveillance [RFC7258] as well as risks coming from a more focused surveillance. From my point of view, but maybe I am wrong, this document is the “Problem Statement” document regarding DNS Privacy mechanisms. If so, I regret that there is no text about impact(s), in a security context, when privacy policy (e.g., DoT, DoH) is deployed. Please, find more comments on such a point inside Security Considerations section. 3.2. Data in the DNS Request For the communication between the stub resolver and the recursive resolver, the source IP address is the address of the user's machine. Therefore, all the issues and warnings about collection of IP addresses apply here. For the communication between the recursive resolver and the authoritative name servers, the source IP address has a different meaning; it does not have the same status as the source address in an HTTP connection. It is typically the IP address of the recursive resolver that, in a way, "hides" the real user. However, hiding does not always work. Sometimes EDNS(0) Client subnet [RFC7871] is used (see its privacy analysis in [denis-edns-client-subnet]). Sometimes the end user has a personal recursive resolver on her machine. In both cases, the IP address is as sensitive as it is for HTTP [sidn-entrada]. A note about IP addresses: there is currently no IETF document that describes in detail all the privacy issues around IP addressing in general, although [RFC7721] does discuss privacy considerations for IPv6 address generation mechanisms. In the meantime, the discussion here is intended to include both IPv4 and IPv6 source addresses. For a number of reasons, their assignment and utilization characteristics are different, which may have implications for details of information leakage associated with the collection of source addresses. (For example, a specific IPv6 source address seen on the public Internet is less likely than an IPv4 address to originate behind an address sharing scheme.) However, for both IPv4 and IPv6 addresses, it is important to note that source addresses are propagated with queries and comprise metadata about the host, user, or application that originated them. “It is typically the IP address of the recursive resolver that, in a way, "hides" the real user.” “... it is important to note that source addresses are propagated with queries and comprise metadata about the host, user, or application that originated them.” IMHO, with such a construction, a reader may be misled (i.e., finally, a recursive resolver propagates the end-user’s source address). Maybe, the last paragraph should be at the beginning of the section. 3.4. On the Wire 3.4.1. Unencrypted Transports o The recursive resolver can be in the IAP network. For most residential users and potentially other networks, the typical case is for the end user's device to be configured (typically automatically through DHCP or RA options) with the addresses of the DNS proxy in the CPE, which in turns points to the DNS recursive resolvers at the IAP. The attack surface for on-the- wire attacks is therefore from the end user system across the local network and across the IAP network to the IAP's recursive resolvers. IMHO, it should be: “The best attack surface for on-the wire attacks is therefore from the end user system to the CPE (i.e., DNS Proxy). From the CPE to the IAP’s recursive resolvers, the eavesdropping is more complex as the end-user’s source address may be “hidden”, as explained in Section 3.2”. It is also noted that typically a device connected _only_ to a modern cellular network is o directly configured with only the recursive resolvers of the IAP and o afforded some level of protection against some types of eavesdropping for all traffic (including DNS traffic) due to the cellular network link-layer encryption. Sorry but I don’t agree except if the recursive resolvers are located inside mobile antennas :) More seriously, AFAIK, even there is L2 encryption on cellular network, either L2 encryption (e.g., MACSEC) or L3 encryption (e.g., IPsec) on fixed networks from RAN to recursive resolvers, this encryption is not E2E with the recursive resolvers. BTW, the recursive resolvers may be the same for “Mobile” customers and “Fixed” (e.g., DSL, Fiber) customers. 4. Actual "Attacks" Many research papers about malware detection use DNS traffic to detect "abnormal" behavior that can be traced back to the activity of malware on infected machines. Yes, this research was done for the good, but technically it is a privacy attack and it demonstrates the power of the observation of DNS traffic. See [dns-footprint], [dagon-malware], and [darkreading-dns]. “... but technically, it is a privacy attack” Please, add either a definition of what is a “privacy attack” inside the document or a reference of an existing definition. By the way, I am curious to check with the definition whether anti-virus software is also considered as a privacy attacker. 6. Security Considerations This document is entirely about security, more precisely privacy. It just lays out the problem; it does not try to set requirements (with the choices and compromises they imply), much less define solutions. Possible solutions to the issues described here are discussed in other documents (currently too many to all be mentioned); see, for instance, 'Recommendations for DNS Privacy Operators' [I-D.ietf-dprive-bcp-op]. As I mentioned inside Section 2, in case this document is considered as a “Problem Statement” document, IMHO, impact(s) from privacy on security is (are) missing inside this document. Indeed, there is no text about, at least for me – but maybe there are other points, the following points: - DNS Tunneling As, generally, DNS flows are not filtered/blocked, this technique may be used for malicious activities (e.g., botnet C&C, malware propagation, data extraction from compromised devices, fraud). One way to mitigate such malicious activities is the monitoring of DNS flows. The encryption of DNS flows may encourage the filtering/blocking of encrypted DNS flows (cf. Section 3.5.1.3. topic) - DDoS attacks based on DNS amplification I am not a DDoS expert, but I am wondering on potential detection mechanisms, closed to the sources, of DDoS attacks based on DNS amplification: is there any potential impact to have DoH/DoT deployed? Thanks in advance for your replies. Best regards, JMC.