[noise] New branch: hkdf

Tue Oct 13 22:25:16 PDT 2015

On Tue, Oct 13, 2015 at 5:15 PM, Kenton Varda <kenton at sandstorm.io> wrote:
> On Sat, Oct 10, 2015 at 12:21 PM, Trevor Perrin <trevp at trevp.net> wrote:
>>
>> But assuming well-optimized code:  If the HKDF version is, say, 8-9
>> HMACs slower, then it's hashing an extra ~2000 bytes (with SHA256).
>> To put very rough numbers on it, that might be 20K Haswell cycles [1].
>> Compared to 3 ECDHs + 1 keygen at ~500K cycles [2], this is < 5% speed
>> difference for the handshake.
>
>
> FWIW, as a system builder but not a cryptographer, dismissing 5% speed
> losses makes me uncomfortable, for a couple reasons:

I wouldn't dismiss it, but we're weighing tradeoffs, so a sub-5%
performance difference doesn't carry much weight.

To be more precise about the difference:  In Noise_XX there would be 4
HKDF (3 MixKey, 1 Split).  Each has 3 HMACs, each HMAC processes 4
blocks, so there's 48 blocks processed, so total cycles of HKDF =
 - at ~10 cycles/byte and 64-byte blocks for SHA256 on Haswell = 48 *
64 * 10 = 30720
 - at ~3 cycles/byte and 128-byte blocks for BLAKE2s on Haswell = 48 *
128 * 3 = 18432

Compared to ~500K cycles for a fast Curve25519 ECDH (3 ECDH + keygen),
HKDF is 6% (SHA256) or 3.6% (BLAKE2s) of total computation cost.
Jason's suggestion of just doing a single HMAC for each MixKey only
works with a 64-byte hash like BLAKE2s, and removes 2 out of 3 HMACs,
so the full HKDF handshake would only be ~2.4% slower (with BLAKE2s).

With n0 the difference versus HKDF is even less, since GETKEY costs
something - potentially a lot with some ciphers (e.g. AES-GCM) where
implementations often build lookup tables for each new key.

> - Asserting that performance of one component doesn't matter because it is
> dwarfed by some other connected component implies a strong coupling between
> these components and may be a problem if someone wants to reuse the first
> component in some other context some day. Specifically, in this case, I
> wonder what will happen when I use Noise as a transport for Cap'n Proto,
> which emphasizes fast, live introductions (where one party instructs two
> other parties to start a session). It's conceivable that an introduction can
> be done securely with no asymmetric crypto (by having the introducer provide
> key material, if MITM by the introducer is a non-threat as in Cap'n Proto),
> and in high-performance cluster scenarios I can imagine the ability to avoid
> ECDH ops on every introduction being valuable. With the ECDH out, the slower
> KDF becomes more significant, possibly dominating connection setup.

If you're just passing around symmetric keys you might not need MixKey
/ KDF steps - you could skip the handshake and go straight to
transport messages.

> I'm not personally qualified to judge the relative security risks of HKDF
> vs. the n0 approach. But the impression I get is that the change is trading
> off performance mainly for familiarity,

Familiarity is a big deal:  It means peace of mind for designers
(we're using something widely studied that doesn't need novel
analysis) and for adopters.  It would be bad if the idea takes root
that we cut corners in security and are using an exotic / unstudied
KDF, instead of the "standard".

As for security specifics, let's look at Jason's suggestion for doing
a single HMAC (aka HKDF-Extract), with a 64-byte hash output.  This is
arguably simpler and more efficient - but more different - from HKDF
and n0:

The HKDF paper and spec consider using the output from HKDF-Extract
directly, without HKDF-Expand.  But they don't recommend it:

https://eprint.iacr.org/2010/264.pdf, Appendix D:
"""
when SKM is from a source or form that requires the use of the
extraction step but the number of key-material bits required is no
larger than the output PRK of the extractor, one could use directly
this key as the output of the KDF. Yet, in this case one could still
apply the PRF* part as an additional level of “smoothing” of the
output of XTR. In particular, this can help against potential attacks
on XTR since by applying PRF* we make sure that the raw output from
XTR is never directly exposed.
"""

RFC 5869, 3.3:
"""
In the case where the amount of required key bits, L, is no more than
HashLen, one could use PRK directly as the OKM.  This, however, is NOT
RECOMMENDED, especially because it would omit the use of 'info' as
part of the derivation process (and adding 'info' as an input to the
extract step is not advisable -- see [HKDF-paper]).
"""

The HKDF paper also argues for security benefits from truncating
larger hashes, suggesting, for example, SHA-512 truncated to 256 bits
in the extract phase.

So we'd be disregarding the most conservative advice from HKDF, and
we'd also have a design that wouldn't work - or would have to work
differently - on 256-bit hashes, which is awkward.

>
> Also, FWIW, as a non-cryptographer I find the HKDF approach significantly
> less intuitive than the n0 approach, which I guess is ironic since HKDF is
> supposed to be more intuitive to cryptographers. I find myself wondering:
> "Isn't extracting entropy in a non-reversible way exactly what a stream
> cipher does?

This probably depends on what you've seen before - people who've
worked with IPsec, TLS, TextSecure, or QUIC, would be more familiar
with HKDF, for example.

I also think you're using "extract" for what the HKDF paper would call
"expanding", maybe that helps...

Trevor