[noise] Lightweight ciphers and Noise
Trevor Perrin
trevp at trevp.net
Tue Nov 21 20:49:52 PST 2017
On Wed, Nov 22, 2017 at 2:56 AM, Rhys Weatherley
<rhys.weatherley at gmail.com> wrote:
> On Wed, Nov 22, 2017 at 12:10 PM, Trevor Perrin <trevp at trevp.net> wrote:
>>
>> > Sort of related, I've been doing some research and implementation
>> > testing on
>> > light-weight ciphers for IoT environments as part of my Arduino crypto
>> > library
[...]
>> Once you've chosen these variants to be comparable to our existing
>> symmetric crypto - and added authentication/MAC to create an "AEAD" -
>> I wonder how performance compares to our existing crypto?
>
>
> The raw figures for the algorithms on Arduino (AVR and ARM) can be found at
> [1] and [2].
OK! That's a thorough answer.
> SPECK with a 256-bit key and EAX mode for AEAD operation is faster than
> ChaChaPoly but uses more RAM for permanent state.
Comparing 256-bit key / 128-bit block Speck, your numbers show this
for Uno (8-bit AVR) but not Due (ARM), where ChaChaPoly is faster as
well as smaller.
The Uno case seems like a slow Poly1305 implementation, though? You
have Poly1305 at ~175% of the speed of ChaCha20, but [1] shows it at
~75% of Salsa20 (similar to ChaCha20). If your numbers were more like
[1], I think ChaChaPoly would be neck-and neck with EAX<Speck> on the
Uno.
> We may want to have a separate discussion as to when it is acceptable to use
> 64-bit block ciphers with Noise. A lot of the research in lightweight
> crypto is focused on that size block. Since data volumes on small devices
> isn't high, maybe 64-bit would be OK?
The Noise spec currently has a discussion about the (small) security
concern with large data volumes and 128-bit block ciphers like AES.
So I'd prefer if things went the other direction (towards PRFs like
ChaCha with *less* risk than 128-bit PRPs; rather than towards more
risk and tighter limits).
> Converting a 256-bit key from the
> Noise handshake into a 128-bit key is easy - XOR the two halves together.
> EAX authentication tags would be 8 bytes in size rather than 16.
I hate to discourage fun experiments, but have to raise red flags here:
* Noise was designed for (and requires) 256-bit keys, which this
isn't. 256-bit keys provide a larger security margin, including
against quantum attacks; and also affect protocol design. For
example, the IETF prefers 128-bit keys, but then asks protocol
designers to stuff extra entropy into their nonces for added
resistance against time/memory-tradeoff / multiuser attacks [2]. We
don't do messy things like that, but we do require 256-bit keys.
* We also require 128-bit authentication tags. Cutting this down is
perhaps less of a concern, but for a general library where you can't
know the consequences and risks of repeated guessing attacks; and
where future designers might swap different authentication types (like
GCM) where truncation has different effects on security; it's best to
stick with the current design.
>> Also - if code size / area is a concern, does something like STROBE /
>> Disco start to become the better strategy, to eliminate the separate
>> hash function?
>
>
> In the low-end space, flash memory (code size) is usually pretty cheap, but
> RAM (runtime state and stack) is not. So I usually don't care about the
> code size in my comparisons - the size of the crypto algorithms will be
> small compared to whatever task the application upstairs is performing.
>
> SHA3 is pretty RAM hungry - 400 bytes of permanent state and another 400
> bytes of stack when the core block operation is evaluated. Performance in
> software implementations, even my ridiculously optimised assembly version
> for AVR, is also not pretty. I must admit though that I haven't studied
> Strobe/Disco enough to make a fair comparison yet.
>
> Stack space isn't that big of a deal - from my experiments Curve25519 needs
> about 1k of stack space to evaluate the curve. Once you have enough RAM to
> pay that cost, the stack costs of the other algorithms don't matter much -
> they can reuse the 1k that is free when Curve25519 finishes. The hash
> algorithms in Noise operate on the stack - there's no permanent state other
> than ck and h, so the hash contexts can be stacked when needed and tossed
> afterwards.
Thanks, that's all a great analysis and good info.
My takeaway is that ChaChaPoly/BLAKE2s looks pretty good on these
devices. The speedup from faster options seems like it comes mostly
just from cutting down the security level, which is probably not
advisable for a general-purpose crypto protocol like Noise.
If you did want to explore more exotic / risky speedups, you might get
more benefit from looking at different DH choices, but that's a whole
other discussion....
Trevor
[1] https://cryptojedi.org/papers/avrnacl-20130220.pdf
[2] https://eprint.iacr.org/2016/564.pdf
More information about the Noise
mailing list