[messaging] MPHFs for public key lookup?

Sat Jul 4 13:50:55 PDT 2015

On 6/29/15 1:30 PM, Jason A. Donenfeld wrote:

> And then, further, is there a clever way of doing this that might
> actually preserve identity hiding in the protocol (not featured
> above)?

I *think* that what you're asking is: how can many senders create
encrypted messages that 1: can be efficiently decrypted by the
recipient, and 2: are indistinguishable from random for an eavesdropper.

I wrote up a scheme[1] for this in my Petmail project. There is a setup
phase which lets the receiver manage a (rotating) current+old pair of
public keys known to each potential sender. In addition, each
sender-to-receiver link has a unique symmetric key called the CIDKey.
Each message has a unique (incrementing) sequence number, and uses
ElGamal-style encryption (ephemeral Curve25519 keypair, the public half
is put into the message). with rotating receiver pubkeys.

The sender puts three items into each message: (some details omitted for
brevity, SecretBox/Box are from NaCl)

1: "CIDToken": HKDF(key=CIDKey+seqnum)
2: "CIDBox": SecretBox(key=CIDKey, data=seqnum+rxpub+..)
3: encrypted message: Box(privkey=ephemeralpriv, pub=rxpub, data=msg)

The recipient looks up the CIDToken against a pre-generated table of
values created for the next-expected sequence number for each potential
sender. In the best case, this is basically O(1), and identifies the
sender (who either used the current pubkey, or the old one).

If that fails (perhaps because a seqnum was dropped/skipped), the
recipient does trial Salsa20 decryptions with the CIDKeys of each
potential sender: 1*S symmetric decryptions. If it succeeded, we only
have to do one Salsa20 decryption. This identifies the sender, and which
pubkey (current-vs-old) they used.

And in the worst case, the recipient does trial Curve25519 decryption
for each potential sender, for both the current and the previous pubkey:
2*S asymmetric decryptions.

(I have a feeling I could drop the last step: there should be no way for
a valid message to fail the CIDBox lookup).

There are some other fields in there to prevent the transport from
mixing and matching the CIDToken/CIDBox/encmsg values from different
sources. And there are a few layers on top of this, to thwart an
eavesdropper between the sender and the mailbox server that buffers
these messages.

But (I think) everything that's exposed at this level is
indistinguishable from random: it's either a random single-use token,
the output of HKDF, or the output of an encryption/MAC function.

Is that kind of what you were looking for?

cheers,
 -Brian

[1]:
https://github.com/warner/petmail/blob/master/docs/mailbox.md#sender-flow