[messaging] Recognizing senders of metadata-hidden messages (was: MPHFs for public key lookup?)

Mon Jul 6 12:00:18 PDT 2015

On Mon, Jul 6, 2015 at 6:45 AM, Michael Rogers <michael at briarproject.org> wrote:
> On 04/07/15 21:50, Brian Warner wrote:
>> The sender puts three items into each message: (some details omitted for
>> brevity, SecretBox/Box are from NaCl)
[...]
> This is similar to what we do in Briar, with the following differences:

Hi Michael, Brian,

Good topic, renaming the thread.

I think the question here is:  if Alice sends Bob a message via some
metadata-hiding system (e.g. anonymous remailers, or Pond), how does
Bob know which key to decrypt with?

This is complicated because it interacts with forward-security and
metadata-hiding.  I think it also depends on whether messages use
public-key or symmetric-key encryption:

Public-key encryption
----------------------
If all messages use public-key encryption, all senders to Bob can
encrypt to the same public key, within some time interval.

This means Bob doesn't have to attribute the message to a sender prior
to decryption.  Bob can take several strategies to forward security:
 (a) Alice encrypts to Bob's long-term public key
 (b) Alice encrypts to Bob's short-term public key signed with his
long-term public key
 (c) Alice encrypts to a long-term public key that supports
"forward-secure encryption", so the private keys corresponding to
older time intervals can be deleted [1,2]
 (d) Alice encrypts to a long-term public key that supports
"puncturable encryption", so the private keys corresponding to older
time intervals and messages can be deleted [3]

Some downsides here are lower granularity of forward secrecy due to
replacing keys on a time basis instead of per-message (a,b,c) and
expensive crypto (c,d).

Symmetric-key encryption
-------------------------
To avoid above downsides, some systems take a "stateful" or
"session-based" approach, and establish pairwise symmetric keys
between parties.  This allows efficient per-message key updating [ k =
H(k) ] in systems like SCIMP, TextSecure, or Pond.  Also, an OTR-like
ping-pong of DH values can be worked in to re-key the symmetric keys
on every round-trip, giving some ability to recover from compromise
("Axolotl ratchet").

This complicates the "recognition" problem in a couple ways:
 - Bob needs to recognize the message's sender to know which session
to use for decryption
 - If the session is using per-message key updating, Bob will also
need to recognize the particular message number *within* that session

Trial decryption is the naive solution, but trying all possible
sessions and messages within each session (considering lost /
out-of-order delivery) could be painful.

In some cases, the metadata-hiding transport might be able to identify
the sender (and maybe even message number) to the recipient.  For
example, Pond initially used BBS signatures which hide the identity of
the sender from the recipient's mailbox, but enable the sender to be
recognized by the recipient.  Pond is replacing this with one-time
delivery tokens where only the recipient keeps track of which tokens
were handed out to which sender (and thus can recognize the senders).

This raises the intriguing possibility that one-time delivery tokens
could be aligned with the ratchet, so that use of a particular
delivery token indicates both the sender and message key.  Pond
currently uses a "header-encryption" variant of the Axolotl ratchet so
that once a particular sender is recognized, only two keys (current
and next header keys) need to be tried.  But it seems vaguely possible
that the ratchet and one-time-delivery token mechanisms could be
integrated, perhaps eliminating the need for header keys.

But if the transport doesn't solve the recognition problem, you have
to solve it yourself:

Brian and Michael suggest a clever "tag" approach:
 - The message contains a one-way function of the message key [tag = PRF(k)]
 - The recipient can then recognize the key with a single lookup

This could still be a lot of key tags to store on the recipient side
(1000s of potential senders multiplied by 10s or 100s of message
numbers?)  I think Brian's Petmail only has a current/next key, which
reduces the number, but Michael's Briar I think tags every message
key.

One alternative would be to tag a "header key" that changes less
frequently than the message keys, and encrypts a header containing the
message number.  E.g. Pond's header key is changed on every
round-trip, but not on every message, so you'd only have to tag 2 keys
(current/next header key), but could still support a large window of
message numbers.  This is accepting reduced forward secrecy for
metadata vs message contents.

Following that reasoning, we could reconsider the public-key
mechanisms.  For example, just encrypt the sender and message number
to the recipient's long-term public key, and accept much-reduced
forward secrecy for the metadata vs contents...

Anyways, complicated design space!  Probably we'd have to look at the
specifics of a system to figure out the best approach.

[1] https://www.cs.umd.edu/~jkatz/papers/forward-enc-full.pdf
[2] https://eprint.iacr.org/2005/015.pdf
[3] http://www.ieee-security.org/TC/SP2015/papers-archived/6949a305.pdf