[noise] Lightweight ciphers and Noise

Trevor Perrin trevp at trevp.net
Tue Nov 21 20:49:52 PST 2017

On Wed, Nov 22, 2017 at 2:56 AM, Rhys Weatherley
<rhys.weatherley at gmail.com> wrote:
> On Wed, Nov 22, 2017 at 12:10 PM, Trevor Perrin <trevp at trevp.net> wrote:
>> > Sort of related, I've been doing some research and implementation
>> > testing on
>> > light-weight ciphers for IoT environments as part of my Arduino crypto
>> > library
>> Once you've chosen these variants to be comparable to our existing
>> symmetric crypto - and added authentication/MAC to create an "AEAD" -
>> I wonder how performance compares to our existing crypto?
> The raw figures for the algorithms on Arduino (AVR and ARM) can be found at
> [1] and [2].

OK!  That's a thorough answer.

> SPECK with a 256-bit key and EAX mode for AEAD operation is faster than
> ChaChaPoly but uses more RAM for permanent state.

Comparing 256-bit key / 128-bit block Speck, your numbers show this
for Uno (8-bit AVR) but not Due (ARM), where ChaChaPoly is faster as
well as smaller.

The Uno case seems like a slow Poly1305 implementation, though?  You
have Poly1305 at ~175% of the speed of ChaCha20, but [1] shows it at
~75% of Salsa20 (similar to ChaCha20).  If your numbers were more like
[1], I think ChaChaPoly would be neck-and neck with EAX<Speck> on the

> We may want to have a separate discussion as to when it is acceptable to use
> 64-bit block ciphers with Noise.  A lot of the research in lightweight
> crypto is focused on that size block.  Since data volumes on small devices
> isn't high, maybe 64-bit would be OK?

The Noise spec currently has a discussion about the (small) security
concern with large data volumes and 128-bit block ciphers like AES.
So I'd prefer if things went the other direction (towards PRFs like
ChaCha with *less* risk than 128-bit PRPs; rather than towards more
risk and tighter limits).

>  Converting a 256-bit key from the
> Noise handshake into a 128-bit key is easy - XOR the two halves together.
> EAX authentication tags would be 8 bytes in size rather than 16.

I hate to discourage fun experiments, but have to raise red flags here:

 * Noise was designed for (and requires) 256-bit keys, which this
isn't.  256-bit keys provide a larger security margin, including
against quantum attacks; and also affect protocol design.  For
example, the IETF prefers 128-bit keys, but then asks protocol
designers to stuff extra entropy into their nonces for added
resistance against time/memory-tradeoff / multiuser attacks [2].  We
don't do messy things like that, but we do require 256-bit keys.

 * We also require 128-bit authentication tags.  Cutting this down is
perhaps less of a concern, but for a general library where you can't
know the consequences and risks of repeated guessing attacks; and
where future designers might swap different authentication types (like
GCM) where truncation has different effects on security; it's best to
stick with the current design.

>> Also - if code size / area is a concern, does something like STROBE /
>> Disco start to become the better strategy, to eliminate the separate
>> hash function?
> In the low-end space, flash memory (code size) is usually pretty cheap, but
> RAM (runtime state and stack) is not.  So I usually don't care about the
> code size in my comparisons - the size of the crypto algorithms will be
> small compared to whatever task the application upstairs is performing.
> SHA3 is pretty RAM hungry - 400 bytes of permanent state and another 400
> bytes of stack when the core block operation is evaluated.  Performance in
> software implementations, even my ridiculously optimised assembly version
> for AVR, is also not pretty.  I must admit though that I haven't studied
> Strobe/Disco enough to make a fair comparison yet.
> Stack space isn't that big of a deal - from my experiments Curve25519 needs
> about 1k of stack space to evaluate the curve.  Once you have enough RAM to
> pay that cost, the stack costs of the other algorithms don't matter much -
> they can reuse the 1k that is free when Curve25519 finishes.  The hash
> algorithms in Noise operate on the stack - there's no permanent state other
> than ck and h, so the hash contexts can be stacked when needed and tossed
> afterwards.

Thanks, that's all a great analysis and good info.

My takeaway is that ChaChaPoly/BLAKE2s looks pretty good on these
devices.  The speedup from faster options seems like it comes mostly
just from cutting down the security level, which is probably not
advisable for a general-purpose crypto protocol like Noise.

If you did want to explore more exotic / risky speedups, you might get
more benefit from looking at different DH choices, but that's a whole
other discussion....


[1] https://cryptojedi.org/papers/avrnacl-20130220.pdf
[2] https://eprint.iacr.org/2016/564.pdf

More information about the Noise mailing list