[noise] The STROBE lite framework

Michael Hamburg mike at shiftleft.org
Thu Jul 30 18:06:48 PDT 2015


Hello all,

I’ve been working recently on a protocol framework based on Noise, Keyak and on MJ Saarinen’s BLINKER, but targeted specifically at constrained devices like IoT.  I’d very much appreciate your feedback on the design.

The goal is to use only one symmetric primitive in the whole system, and to make it easier to design both protocols and symmetric crypto (just like Noise has pipes and boxes).  The security target is about 128-bit security in the random oracle model, even if an attacker interacts with many devices many times.  You could surely scale this up at the cost of performance.

It’s also important to fit in a small amount of code and data.  In particular, I’ve implemented the entire symmetric part of the library and a few protocol building blocks in C compiling to something like ~1650 bytes on ARM, so it should fit nicely in a Cortex-M system.  This is with rolled loops so it’s slow; unrolling the cipher costs about +450 bytes.  The session states are 100 bytes, but can be compressed to 32 bytes.  I haven’t done speed benchmarks yet.


Disclaimer:
I can’t release the code yet, because I have to clear it with legal.  I don’t know of patents that cover this framework, but my company (Rambus/CRI) can file patents on anything novel I came up with (i.e. a few things, but much of this design is derivative).  I’m working on this at work, and even if I wasn’t, all my base would belong to CRI because this relates to foreseeable company business.  It is also possible to build protocols out of this that infringe patents, including CRI’s existing patents.


Overview and ops:
In the STROBE lite framework, the two parties with a session maintain matching states for a sponge construction, which is currently Keccak-f[800] with a 256-2-bit capacity and a 544+2-bit rate.  The 2 bits are padding.  Protocols are broken down into ops which have a protocol-defined tag byte, an opcode, and some data.  The opcodes are squeeze, absorb, duplex and unduplex.  Unduplex is like duplex, except that the state becomes the input instead of state^input; this is effectively a Keyak decryption.  To perform an op:

Assemble a tag word from:
    the protocol-defined tag byte
    which party sent the message, if any
    the opcode (if the sender and receiver do different operations, then the one the sender does)
    a 2-byte length field
Unduplex the tag word and the padding 0b11 = 3.  It could be absorb, but unduplex is probably better for reasons explained below.
Run the sponge
Squeeze/absorb/duplex/unduplex the data.  If at any point you’d overrun the rate, pad with 0b1 and run the sponge.

Higher-level primitives are special cases of these ops.  For example,

Setting a key is (TAG_SETKEY, absorb, keybytes)
Setting a nonce or associated data is also absorb.
Encrypting a message (with no authentication) is (TAG_ENC, duplex, plaintext)
Decrypting a message is (TAG_ENC, unduplex, ciphertext)
Sending a MAC is (TAG_MAC, unduplex, [0]*mac_length).  It could also be squeeze, but unduplexing is probably better.
Checking a MAC is (TAG_MAC, duplex, mac).  If mac_length <= rate, this sets the first mac_length bytes of the state to 0, which you can check.

The design ensures that the output from the sponge (i.e. what is returned from squeeze, or xor’d into data for duplex/unduplex) is indifferentiable from a random oracle called on the entire history of ops.  Furthermore, all the ops are parseable from that history, and the purpose of the current output (its tag, opcode and length) can be determined from that history.  The length field is not necessary to parse the transactions.  This means you can have a variant which omits it for streaming operations.  It is included to make eg MACs different for different lengths of data, or in a more general sense, to make it more clear what the “purpose” of a sponge output is.

One nice property of this sort of design is that every message passed between the two parties passes through the sponge in one of maybe four ways.  This simplifies library design.  Also, everything other than the messages is 4-byte aligned, so if you use 4-byte aligned messages, you don’t need any byte alignment in your CPU.

Unfortunately, the design is not nonce misuse resistant.  But you mainly need nonces for protocols with out-of-order messages, which isn’t the forte of STROBE lite anyway.


Example protocols:
Here’s a TLS-like key exchange.  The client is on the left.  A message marked with absorb and a direction is sent in cleartext.  An operation with stars instead of an arrow is done privately by both parties.

(HELLO, absorb, version and ciphersuite info, maybe some SNI) ->
(DH_EPH, absorb, g^x) ->

<- (HELLO, absorb, version and ciphersuite info)
<- (DH_EPH, absorb, g^y)
** (DH_KEY, absorb, g^xy) **
<- (CERT, duplex, certificate on g^s)+ // duplex mode means it’s encrypted.
<- (SIG_EPH, absorb, g^k)
** c = (SIG_CHAL, squeeze, [0]*size of group) **
<- (SIG_RESP, duplex, cs+k mod q)

// possibly client sends cert and sig here

(APP_DATA, duplex, request) ->
(MAC, unduplex, [0]*mac_length) ->

… responses etc.

A real TLS replacement would go full-duplex after the handshake, but the above design is half-duplex.  A straightforward fix for this is to initialize a new sponge for the other direction with its rate half set to 0, and its capacity half randomized by (FORK, squeeze, capacity_bytes).

It’s also straightforward to use this design for other handshakes.  If you’re doing FHMQV, you can simply squeeze the exponents out of the sponge.  For example (HELLO etc omitted):

(DH_LONGTERM, absorb, g^X) ->
(DH_EPH, absorb, g^x) ->

<- (DH_LONGTERM, absorb, g^Y)
<- (DH_EPH, absorb, g^y)
** a,b = (FHMQV_CHAL, squeeze, 2*group size) **
** (FHMQV_KEY, absorb, g^((x+aX)(y+aY))) **
<- (MAC, unduplex, [0]*mac_len)

(MAC, unduplex, [0]*mac_len) ->
<- … data … ->


Likewise it is easy to use the design with a PAKE such as SPAKE2:

(DH_PAKE, absorb, g^x h^hashed_password) ->
<- (DH_EPH, absorb, g^y) // or g^y h2^password, but SPAKE2’s security proof doesn’t need this unless it’s simultaneous.
** (DH_KEY, absorb, g^xy) **
** (HASHED_PASSWORD, absorb, hashed_password) ** // SPAKE2’s security proof needs this
<- (MAC, …)
(MAC, …) ->

It is also easy to design a fixed-key protocol like PSK.

Of course, there’s no need to use this design in an interactive protocol.  You can use it like Noise boxes for statically encrypted/signed data as well.  In particular, you can use it as a cipher:

(FIXED_KEY, absorb, k)
(NONCE, absorb, nonce_data)
(ASSOCIATED_DATA, absorb, ad)
(ENC, duplex, plaintext) // to decrypt: (ENC, unduplex, ciphertext)
(MAC, unduplex, [0]*mac_length) // to decrypt: (MAC, duplex, mac)

This means that if you need out-of-order message support or otherwise can’t use the state as a session, you can still use the design as a cipher.  Also, you can use it as a DRBG, with a STIR and RANDOM keywords to stir entropy and extract pseudorandomness, followed by FORGET to compress the state and prevent rollback.  Presumably in a fancier system you’d want to use a Fortuna-like collection system, but in an IoT device your best bet is probably to read a ton of bits from your TRNG or radio or whatever, maybe do a simple quality test, and then feed them to the DRBG hope they’re good.



Framing:
This layer of the protocol doesn’t care about framing, but there are a few straightforward options.  A simple framing protocol is just to send the tag word, which includes a length, and then the data.  However, when the tag and/or length are known, omitting them would save bandwidth.  The 1650 byte version of the library described above includes the ability to send or receive data from sockets (stubbed into an external library) or memory buffers, either framed or unframed; a few dozen more bytes adds support for including just the tag or just the length in the frame instead of both.


Forward security:
Sponge functions are permutations.  To prevent the sponge state from being rolled back, you can forget state by unduplexing a block of zeros and throwing away the output.  The obvious amount to forget is 128 bits, but it’s better to forget rate-4 = 64 bytes; you can also effectively forget the next 4 bytes, which will hold the tag word.  (This is why the tag word is unduplexed instead of duplexed.)

To save time, this operation is done by the default MAC function right after the MAC, forgetting rate - mac_bytes - 4 bytes.  Therefore the usual pattern is to encrypt and then mac/forget, which compresses the state down to its capacity = 32 bytes.

If you were to replace Keccak-f[800] with a sponge having a much smaller rate, this is the only thing you’d really have to change.


Initialization:
The sponge is initialized in a way that makes it different from any other Keccak modes that I know of.

The end of the capacity component is set to the 16-byte string “STROBE lite v1.0”.

The byte (or word, since Keccak is little-endian) immediately after the rate component is set to 0x4.  Since the padding is either 1 or 3, that bit will remain set for the Keccak-f call no matter what is duplexed in.

Finally, a protocol-specific perso string is duplexed in.  This may be more than one block long, and the usual append-1 padding will be used.


Side channels:
Because the state changes on every block, this design should naturally resist DPA, except for the block when a fixed key is injected and the next few blocks after that.  So for protocols using a fixed key, the pattern should be to initialize the sponge in a fixed state (or one of a few fixed states), immediately inject the key, and then duplex in a random or otherwise unique value a couple bits at a time with the usual 1-padding afterward.  A FORGET operation prevents the derived key from being rolled back, should it somehow be recovered.  (This is CRI’s patented “key tree" algorithm.)  This behavior can be parsed from the transaction history, because it is the only way that less than one byte can be duplexed in.

Support for this costs about 160 extra bytes, including some protocol building block functions.



Design variants and extensions:
The original STROBE code had variable-size control words.  I dropped this because it adds code size.

BLINKER xors its control word into a fixed place in every block.  I dropped this because it decreases performance, adds complexity to the spec, and adds complexity to the framing; but it saves a little code so it might be worth doing.

You can obviously vary the duplex vs unduplex to trade simplicity of spec vs forgetting/compression features.

It may be worthwhile to divide the control word up in some other way to increase flexibility eg for multiparty protocols.

You can use a different sponge.  NORX seems like a reasonable choice, as do the various tiny hardware ones.  If you use a tiny hardware sponge, you’ll need a different story for forward security.




That’s all I’ve got.  Thoughts?

Thanks for your time,
— Mike


More information about the Noise mailing list