[noise] Stateful Hash Object Proposal

Tue Nov 27 00:15:09 PST 2018

On Mon, Nov 26, 2018 at 7:58 AM Peter Schwabe <peter at cryptojedi.org> wrote:
>
> So your idea is to export Ratchet() as a function to the caller?

Yes, I was thinking Absorb, Squeeze, Ratchet, and Clone would all be
exposed to caller.

> Shouldn't Absorb and Squeeze know when to ratchet internally? I think
> this boils down to how much the caller knows (and has to know) about the
> details of the underlying hash function inside the SHO. To me it feels
> more clean to abstract those away, deal with padding, ratcheting, etc.
> inside the object.

If Absorb always ratcheted, then it wouldn't behave like a typical
streaming API where Absorb("abc") == Absorb("a) followed by
Absorb("bc").

I think that's a useful API, so it's good to give callers explicit
control on "Ratcheting", rather than have it happen on every Absorb.
Callers would Ratchet:
 -  to make sure that any "buffered" state is flushed through a 1-way
function for compromise resistance (so the buffered data gets mixed
with old entropy)
 - to compress the SHO state (clearing the buffer and the Sponge
"rate") to save space in RAM or ROM (e.g. if storing a
domain-separated SHO state - like a cSHAKE state after processing the
initial block - it would be better to Ratchet so you're storing a 32
or 64-byte Keccak capacity rather than a 200-byte full state).

For example, Noise would call Ratchet after sending each message, for
forward-security (a compromise won't find useful data in the SHO input
buffer), and to minimize the state that has to be kept around
in-between messages.

> If that's the case, what I would probably want is domain separation to
> be an optional argument to the SHO construtor (or, in C, init function).
> The intuition would then be something like "I would like to have two
> stateful hash objects A and B from the same hash family":
>
>   A = SHO("first domain")
>   B = SHO("second domain")
>
> What precisely happens with those domain separators is then left to the
> constructor and can depend on the hash function (that the caller doesn't
> need to know anything about).

Here's my counter-arguments for just giving people Absorb and having
them do this themselves:

If you're using a small value (like a single byte) for a domain
separator then it makes sense just to prepend it to your "real" data.
But if you're using a larger value (like multiple name strings or
parameter blocks) it might be more efficient to process it in a single
block by itself, so the result of processing that block can be
hardcoded as a constant.

Given those differences, it's hard to say what SHO(domain_separator) should do?

Also, I'd argue for parsimony / minimalism:  Is a domain separator
that different from Absorbing input?  If not, better to not add new
mechanisms.

> > > I'm not sure how you want to do domain separation in the Squeeze'
> > > function. This wouldn't really be (input) domain separation but some
> > > sort of output separation?
> >
> > Yes, you're right.  There would be have to be some output-separator
> > field that gets encoded at the end of the hash input to distinguish
> > the normal Squeeze from the Squeeze' that happens as part of
> > Encrypt().
> >
> > STROBE does this, using cSHAKE and adding an extra operation byte at
> > the end to distinguish STROBE operations like PRF from ENCRYPT.
[...]
>
> All these things are very elegant using SHA-3 or SHAKE, but I'm not sure
> how solid this is when using SHA-2 (Andy can probably say more about
> this). You can almost certainly not get collision resilience there, but
> maybe this is also not something you'd necessarily need.

Yeah, the Squeeze' thing is just a way of modelling Encrypt/Decrypt,
but since SHA2 doesn't lend itself to a Duplex encryption like Keccak,
I think that a SHO/SHA2 would not provide Encrypt/Decrypt.

Trevor