[noise] Nonce Post-Increment vs Pre-Increment

Mon Nov 9 14:48:44 PST 2015

On Mon, Nov 9, 2015 at 5:13 AM, Jason A. Donenfeld <Jason at zx2c4.com> wrote:
> Hi Trevor,
>
> We spoke a while back about nonce++ vs ++nonce. At the time I said I
> preferred ++nonce for my implementation particularities, but that was
> just me, and you shouldn't take that into consideration. Now, when
> trying to adjust my implementation to interop with your rust one, I've
> gone back through seeing what it'd be like to do nonce++. It turns out
> that it makes the code considerably more complex and less performant.
>
> On the receiving end, I use a 64bit counter to store the greatest
> nonce received so far, and an "unsigned long"-size bitfield for a
> backtrack of the "sizeof(unsigned long)*8" previous nonces. I thought
> I had invented this cleverclever method, and was feeling quite smug
> and dandy, when Bellovin pointed out to me that this is actually the
> exact algorithm of Appendix C of RFC 2401; there are no original
> ideas, alas. But this does give me some assurance that for what I'm
> doing, this is the correct algorithm. In this algorithm, since the
> 64bit counter stores the _greatest_ counter yet received, there is not
> room for receiving 0, since the variable is already initialized to it.

Couldn't you just as easily store the _next_ counter you haven't received?

This seems like an arbitrary choice.  Maybe IPsec sequence numbers
start at 1, but TLS (and DTLS) start at 0, e.g. DTLS RFC 5347: ("This
counter is initialy set to zero [...] DTLS implementations maintain
(at least notionally) a next_receive_seq counter").

> On the sending end, I use atomic instructions to "increment, then
> return". The kernel's library routines don't even contain a function
> for "return, then increment". Yes, I could just subtract one every
> time, but in addition to the extra instruction (probably negligible
> anyway), this again complicates the wraparound logic and is
> burdensome.

The case where you have multiple threads contending to send out the
next transport message is already pretty complicated.  I don't see why
it's burdensome to use your "increment, then return" instruction, and
just subtract one to get zero-based indexing.

Initializing to zero and counting from zero is the simpler and more
obvious thing to do, so I'm inclined to leave this as-is.

If you want to optimize for a very specific case (out-of-order
transport messages; multiple threads; atomic increment) then spending
a tiny bit of extra effort to subtract 1 doesn't seem like a big deal.

Trevor