[noise] Stateful Hash Object Proposal

Sun Nov 25 23:58:01 PST 2018

Trevor Perrin <trevp at trevp.net> wrote:
> Hey Peter,

Hey Trevor, hey all,

> Glad you're here!, I wanted your feedback, see below:

I'm including Andy Hülsing in CC; he is also interested in this, but
isn't on the Noise mailing list.

> On Thu, Nov 22, 2018 at 11:03 AM Peter Schwabe <peter at cryptojedi.org> wrote:
> >
> > Some post-quantum KEMs need multiple hash functions, i.e., they would
> > need such a StatefulHashObject to be domain-separated. Is the idea that
> > the caller needs to first absorb the domain-separation string then?
> 
> Yes, I was thinking the PQ KEM could create multiple new SHO objects,
> and Absorb() a different domain-separation string into each of them,
> e.g. FrodoKEM could absorb a 16-bit customization label at the start
> of each SHO.
> 
> 
> > When using SHA-2 you'd probably want to put the domain-separation string
> > into a separate block, so the callee needs to know that some input is
> > not a part of the message, but a domain separation. That would require
> > making domain-separators a separate argument of the first call to Absorb
> > or alternatively an argument of an Init function.
> 
> If you want a longer domain-separation string (e.g. including the name
> "FrodoKEM" and other parameters) then you might choose to Ratchet()
> after absorbing the domain-separation string(s).  This would enable
> implementers to store the precalculated chaining variable, so they
> could skip re-calculating the Absorb(domain_separator)+Ratchet
> operations for every PQ operation.
> 
> If that's the only reason to put the domain-separation string in a
> separate hash block, then I think Ratchet() supports it adequately?
> (and for SHA2, Ratchet would just zero-pad the block, like you want).

So your idea is to export Ratchet() as a function to the caller?
Shouldn't Absorb and Squeeze know when to ratchet internally? I think
this boils down to how much the caller knows (and has to know) about the
details of the underlying hash function inside the SHO. To me it feels
more clean to abstract those away, deal with padding, ratcheting, etc. 
inside the object. 

If that's the case, what I would probably want is domain separation to
be an optional argument to the SHO construtor (or, in C, init function).
The intuition would then be something like "I would like to have two
stateful hash objects A and B from the same hash family":

  A = SHO("first domain")
  B = SHO("second domain")

What precisely happens with those domain separators is then left to the
constructor and can depend on the hash function (that the caller doesn't
need to know anything about).

> > I'm not sure how you want to do domain separation in the Squeeze'
> > function. This wouldn't really be (input) domain separation but some
> > sort of output separation?
> 
> Yes, you're right.  There would be have to be some output-separator
> field that gets encoded at the end of the hash input to distinguish
> the normal Squeeze from the Squeeze' that happens as part of
> Encrypt().
> 
> STROBE does this, using cSHAKE and adding an extra operation byte at
> the end to distinguish STROBE operations like PRF from ENCRYPT.
> 
> You could also imagine a non-STROBE use of Keccak with a
> non-SHA3/cSHAKE/SHAKE padding suffix, but I think the options are
> dwindling, IIRC the current NIST-allocated suffixes are:
> 
> 00 = cSHAKE
> 01 = SHA3
> 11 = SHAKE

All these things are very elegant using SHA-3 or SHAKE, but I'm not sure
how solid this is when using SHA-2 (Andy can probably say more about
this). You can almost certainly not get collision resilience there, but
maybe this is also not something you'd necessarily need. 

> > About instantiating Squeeze for SHA-2: Wouldn't the "standard" way to
> > build a XOF from SHA-2 be to use MGF1 [1]?
> 
> I think MGF1 is just:
>  HASH(input || uint32(0)) ||
>  HASH(input || uint32(1)) ||
>  HASH(input || uint32(2)) ||
>  ...
> 
> That doesn't fix SHA2's length-extension problem, and isn't very
> efficient if input is long, so for a SHA2 SHO I was suggesting
> Absorb(s) followed by Squeeze would result in:
> 
>  HASH(HASH(input) || varint(0)) ||
>  HASH(HASH(input) || varint(1)) ||
>  HASH(HASH(input) || varint(2)) ||
>  ...
> 
> https://moderncrypto.org/mail-archive/noise/2018/001876.html
> 
> The space-savings of the varint doesn't matter since the HASH(input)
> and counter will fit into a single hash block however it's encoded, so
> a uint64 might be a simpler choice?

The performance difference is negligible: In MGF1 you can remember the
state before the last block to not always hash over the whole message
again, so it's essentially also one compression-function call per output
block.

Length-extension attacks are a good point. It's probably not too hard to
prove, but is there a proof that your construction is secure for secret
input as a PRG?

Cheers,

Peter

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: not available
URL: <http://moderncrypto.org/mail-archive/noise/attachments/20181126/012931cc/attachment.sig>