[noise] Security audit of Noise-based DNS tunnel, protocol layering

Sun Apr 25 14:17:05 PDT 2021

A Noise-based DNS tunnel I wrote has recently had a security audit. You
can see the report and a summary of changes here:
https://www.bamsoftware.com/software/dnstt/security.html#cure53-turbotunnel-2021
Of the 4 Noise-related issues discovered, 3 had to do with a dependency
Noise library, which has already been updated:
https://github.com/flynn/noise/security/advisories/GHSA-g9mp-8g3h-3c5c

I want to talk about this one: "UCB-02-005: Client ID security
considerations & Noise authenticated data."
https://www.bamsoftware.com/software/dnstt/cure53-turbotunnel-2021.pdf#page=8
To understand what it's about, you need to know two things about how the
tunnel is structured:

1. The DNS tunnel server needs to be able to distinguish among multiple
   concurrent client sessions. When the server receives a DNS query
   containing some encapsulated data, it needs to know which ongoing
   session to deliver that data to. In a non-tunneled TCP-based service,
   sessions would be distinguished (inside the kernel TCP) by their
   source IP address and port. But a network-layer source address won't
   work for a DNS tunnel, as queries are generally forwarded by an
   intermediate recursive resolver, removing the original source
   information. So in my implementation, DNS tunnel clients attach to
   every query a "client ID," a 64-bit number randomly generated by the
   client. The client ID may be compared to connection IDs in QUIC:
   abstract identifiers not tied to any particular network address. The
   DNS tunnel server maintains a mapping from client IDs to sessions,
   which in this case are outgoing TCP connections. The server creates a
   new session the first time it sees a new client ID. When a DNS query
   arrives at the server bearing a certain client ID, not only is the
   data it contains delivered upstream to the associated session, but
   the response to the query is also eligible to contain downstream data
   from the same session.
2. UDP-based DNS is unreliable and unordered. In order to establish a
   reliable stream abstraction over DNS messages, we have the DNS
   messages encode not raw application-layer data, but *packets* of a
   user-space sequencing and reliability protocol, in this case KCP. The
   details are not important; it's enough to know that KCP, like TCP,
   uses sequence numbers and acknowledgements, and does retransmission
   of lost data. Rather than let KCP interface with the network
   directly, we intercept its network calls and encapsulate the packets
   in DNS messages.

On top of a session discriminated by client ID, and a reliable stream
provided by KCP, we overlay a Noise protocol. Noise messages are
delimited by 16-bit length prefixes, the same way you might do if you
were transferring them over TCP. The protocol stack looks like this:
	user data
	Noise
	KCP
	DNS messages
Notably, the KCP-layer headers are not protected by the Noise layer.
It's similar to running TLS over a TCP connection: the TLS is internally
encrypted and integrity-protected, but the containing TCP headers are
subject to certain kinds of manipulation. For more discussion, see
https://www.bamsoftware.com/software/dnstt/protocol.html#crypto.

Back to UCB-02-005. Though it is outside the censorship circumvention
threat model (in that model, encrypted DNS hides the client ID from all
relevant adversaries) the audit rightly notes that if an attacker can
discover or guess a user's client ID, they can send queries on behalf
of, and receive responses intended for, that client. This because a
client ID attached to a DNS query is a token that entitles the sender to
interact with the data stream associated with that client ID. It's a bit
like TCP hijacking, where an attacker can manipulate a connection if it
knows a client's source IP address, port, and current sequence numbers.
The attacker cannot actually *do* anything with the data it hijacks,
because the reassembled KCP packets contain Noise messages, to which the
attacker does not know the key; and any injected data will invariably
fail integrity checks and at worst cause the session to terminate. But
the fact is, certain kinds of manipulation are possible.

To mitigate this risk, the audit report recommends folding the client ID
into the associated data in the Noise AEAD construction. While a
reasonable suggestion, I don't think it actually solves the problem in
this case. The reason is the way the protocol stack is layered. Noise
messages are at a higher layer than KCP packets and the DNS messages
that carry them. DNS messages are not independently authenticated. An
attacker that sends DNS queries bearing the client ID of some client
would still be able to claim DNS responses intended for that client.

To make it work the way the audit report intends, the protocol stack
would rather have to look like this:
	user data
	KCP
	Noise
	DNS messages
That is, let every DNS message contain an encapsulated Noise message,
independently authenticatable along with the attached client ID as
additional data. The Noise messages contain KCP packets, which carry
user data. In this model, the server would authenticate each incoming
query before acting on it, which would prevent processing of any queries
other than those sent by the legitimate Noise-layer peer.

I actually considered this alternative layering while designing the DNS
tunnel protocol, and decided against it for two reasons: somewhat
greater implementation complexity (you need to account for
retransmission of handshake messages outside of KCP), and bandwidth
limitations. DNS messages, especially queries, are squeezed for
bandwidth. See discussion here:
https://www.bamsoftware.com/software/dnstt/survey.html#Generalobservations
After you account for the need for DNS-safe encoding, the amount of raw
data you can pack into a query is only about 140 bytes. An 8-byte
explicit nonce, 16-byte AEAD tag, and 1-byte type indicator (to
distinguish handshake messages from transport messages) eat about 18% of
available space, not counting the additional overhead of client ID and
KCP headers. The benefit of placing Noise higher in the protocol stack
is that Noise messages can be longer, explicit nonces are not needed,
and AEAD tag overhead is amortized over multiple DNS messages.

But the design choice was based mostly on intuition, and I did not
actually try the alternative design to see how much it affects
efficiency. Now I'm wondering if it's worth rearchitecting the protocol
stack to be based on out-of-order Noise messages, something like
https://noiseprotocol.org/noise.html#out-of-order-transport-messages
https://www.wireguard.com/papers/wireguard.pdf#page=12 (Section 5.4.6)
Questions of bandwidth efficiency aside, it would be a better design
cryptographically. On the other hand, it would break compatibility with
existing installations, and as I understand things, it would not greatly
diminish the class of attackers able to interfere with a session: it
would exclude on-path, non-flow-blocking attackers who can passively
listen for client IDs and inject traffic; but an in-path, flow-blocking
attacker could still deny service by dropping packets.

I'm writing this to invite comment. What do you think? Am I overlooking
any more subtle attack, which the current protocol design is vulnerable
to, but the alternative would not be?