[noise] Analysis of Noise KDF

Thu Apr 28 17:29:32 PDT 2016

On Thu, Apr 28, 2016 at 3:09 PM, Jason A. Donenfeld <Jason at zx2c4.com> wrote:
>> (Note on "Dual-PRF": The HMAC proof in [BELLARE2006] assumes the
>> compression function is a "Dual-PRF", i.e. a PRF when keyed either
>> through the message, or through the IV.  Bellare uses this to go from
>> NMAC -> HMAC, since the HMAC key is passed through the message input.
>> Dual-PRF is a reasonable assumption, since hash functions are designed
>> to be random if any part of the input is random, not just the IV.)
>
> Keyed-BLAKE2 is also a Dual-PRF. Why not use HMAC-SHA2-n for the
> SHA2-256 and SHA2-512 families, and Keyed-BLAKE2n for the BLAKE2s and
> BLAKE2b constructions? You get a Dual-PRF out of SHA2 with HMAC. You
> get a Dual-PRF out of BLAKE2 with its built in PRF mode.

Not exactly, you can make the Dual-PRF assumption about the
compression function of either hash (or any good hash).  HMAC's not
needed for this.

Your proposal reduces the amount of hashing applied to inputs.  So the
current design has more security margin, if the hash turns out to be
bad.

That also gives up some theoretical arguments.  For example, Lemma 5
from #3, or anything based on assuming separate properties of HMAC's
inner and outer hashes.

If we did this, it would also be hard to explain why HMAC is used for
SHA2 but not BLAKE2:
 * HMAC prevents length-extension, but we're using the KDF with
prefix-free inputs, so that doesn't matter here.  So if a cascade-PRF
like keyed BLAKE2 is fine here, it should be fine for BLAKE2 *and*
SHA2.
 * SHA-3 competitors (like BLAKE, BLAKE2's ancestor, and Keccak) were
required to support HMAC, so there's no reason they can't be used with
it.

It also opens a can of worms, because then all hash function designers
would want a special API for their sponge features, or tweak features,
or parallel tree-hashing modes, or whatever.

It's easiest to just funnel everyone through a single HASH() API.

My sense from talking to the BLAKE2 designers is they think HMAC is
overkill here, but aren't too upset about it.  And I'm fine with
overkill.

>> HKDF(ck, input):
>>   temp       = HMAC(key=ck, input)
>>   new_ck     = HMAC(key=temp, 0x01)
>>   output_key = HMAC(key=temp, new_ck || 0x02)
>>   return (new_ck, output_key)
>
> I'm wondering, since Noise only ever needs two new values out of the
> KDF, why not use something simpler like:
>
> KDF(ck, input):
>   temp       = HMAC(key=ck, input)
>   new_ck     = HMAC(key=temp, [empty])
>   output_key = HMAC(key=temp, new_ck)
>   return (new_ck, output_key)
>
> This is simpler and less expensive computationally

It's not simpler if you already have an HKDF function.  Then it's just one call:

  HKDF(ck, input, "", 2*HASHLEN).

There's no computational difference, it's going to be a single block
of data to one compression function either way (with Noise's current
hash functions).

That's also not simpler to justify.  Right now the discussion is:

Q: How do you do key derivation
A: HKDF, like everyone else

With your change, it becomes:

Q: How do you do key derivation
A: We use an HKDF-like KDF
Q: HKDF-like?
A: Yeah, we simplified it by removing some irrelevant inputs to a hash-
Q: You did what?!  Why didn't you just do HKDF?
A: Well, it only needs to be a PRF, but it had some security feature
preventing cycles with lots of output which we didn't need, so we
thought it looked cleaner if...
Q: Uh, you removed a security feature?  For cosmetics?

Also if people do more cryptanalysis on HKDF-SOMEHASH as used in
IPsec, Signal, etc, they might include the counters in the analysis,
and we want that cryptanalysis to apply to Noise.

Trevor