[messaging] saltpack spec and library

Thu Feb 11 12:09:14 PST 2016

Thanks Mike. I'll try to answer the parts that I can here.

> 1) Why MessagePack vs other binary serialisation formats?  (i tend to use protobufs)

The honest answer is that we're not very familiar with protobufs. Here
are some things we like about MessagePack, though, where I'd be
curious to learn more about how protobufs does things:
- It's really easy to implement.
- It's self-delimiting, so we don't need to do anything special to
parse a stream of payload packets.
- If we avoid maps (because of questions around how duplicate keys and
non-hashable keys get handled), we can be pretty sure that all
implementations of MessagePack will parse the same blob in the same
way. Though there's a good chance there are more problems here I
haven't thought of...

> 2) When staying in binary, what sort of overhead does the format impose?

The biggest part of the header is the keys in the recipients list:
with packing overhead and all, that comes out to 52 bytes (anonymous)
or 85 bytes (visible) per recipient. The rest of the header is ~100
bytes. In the payload of the message, for every 1 MB payload chunk, we
use 34 bytes per recipient to authenticate it, and another ~20 bytes
of per-packet overhead. Also every message includes an empty payload
packet at the end, which is authenticated like the others.

Examples: A 1-byte message, encrypted to a single anonymous recipient,
is 262 bytes. With ten visible recipients, it's 1673 bytes. Increasing
the plaintext length doesn't add any more overhead, though, up to 1MB.

So far we've preferred ease of implementation and sticking to NaCl's
defaults and high-level interfaces, over optimizing for the size of
small messages. Here are some optimizations that so far we've
deliberately avoided:
- Defining additional non-streaming modes. We could avoid
authenticating the empty payload packet at the end, if there were
separate encryption and signing modes that specified a single payload
packet.
- Using smaller keys. Truncating the payload key and the HMACs to 128
bits (or using crypto_onetimeauth instead) would save a lot. I read
that Curve25519 has 128 bits of security, so it's possible that our
256 bit keys/authenticators are overkill, even apart from the question
of whether 256 bits is overkill in general.
- Using crypto_stream_xor instead of crypto_secretbox. In the places
where we use crypto_secretbox, the 16-byte authenticator it prepends
is redundant. (Though note that many languages, including I think Ruby
and JS, don't have NaCl/libsodium wrappers that expose low-level
functions like crypto_stream_xor.)

> 3) If you imagine a mix network for routing of small binary messages, is saltpack an appropriate format to use for protecting the messages in your estimation? Or are there gotchas that its replacement-for-pgp design would create for the case of pure machine-to-machine messaging?

I don't know anything about mix networks -- would you ever have
multiple recipients in a mix network? -- but I can point out at least
a couple anonymity issues that could come up, in addition to what Jeff
pointed out. Trevor caught one of them in this thread: we leak when
two recipients have the same key, and we need to fix that by changing
our nonces. Another concern is that even when recipients are
anonymous, the *number* of recipients is visible. If for example a
message has 37 recipient keys, and I'm the only person in the world
who owns exactly 37 laptops, that could identify me.

Another general downside of using saltpack for machine-to-machine
messaging, is that if you want ephemeral recipient keys for forward
secrecy, you'll have to arrange that outside the format. It might make
more sense to use a protocol that was designed for having both parties
online?

> 4) MIME type? Could you maybe forbid/strongly discourage in the spec emails that contain ASCII armoured saltpack messages? I think some clients have struggled in the past with the UI for showing a message that contains partially signed and partially not signed text, as they tend to treat the signedness of a message as a boolean. Formally forbidding mixing of the two can solve that.

The main use case we wanted saltpack for is the problematic one you're
describing, where I paste an encrypted message into GChat or reddit or
email, and the recipient decrypts it by hand. So we don't want to
forbid that. Maybe we could suggest that implementations using
saltpack for something other than decryption-by-a-human-being, should
tweak the format name and the nonces to avoid compatibility with the
regular format? Also like above, I'd worry that applications using
saltpack in an automated, online way might be missing out on forward
secrecy.

> 5) The format appears to be at least partly defined through unversioned reference to a particular library (NaCL). In particular it does not specify what a "NaCL public key" actually is (curve25519 presumably). That seems like it should be fixed for a realistic spec.

Good point. Would explicitly referring to
http://cr.yp.to/highspeed/naclcrypto-20090310.pdf be a reasonable way
to do that?

> 6) It'd be nice if there was a way to embed X.509 cert chains (i.e. signed curve25519 certificate) into the headers, to allow the sender to authenticate themselves with a PKI instead of Keybase. Then it could act as a competitor to CMS.

We designed saltpack with the assumption that the client
implementation would handle the heavy lifting to figure out what real
world identity corresponds to a given public key. In our
implementation we use all the Keybase machinery around sigchains and
public proofs, but the saltpack format itself doesn't know anything
about that. Would it be reasonable for a different implementation to
do something similar with X.509? Are there attacks that could come up
if the cert chain isn't embedded directly in the message? Or is the
idea more that you could verify a sender without talking to any
servers?

- Jack