[messaging] Metadata free instant messaging protocol, with basic spam throttling

Mon Sep 29 18:11:00 PDT 2014

Thanks Mike, this is a great writeup.  This has come up in conversation
with with Trevor, Mike Perry, and some other folks over the past few
weeks, so maybe it's something in the water.

In some ways, I think approaching this for calls is easier than messages
as a first cut.  For messages, there are currently "linkable" counters
and metadata in axolotl headers that are part of the message (axolotl's
"header keys" extension isn't used in TextSecure because of transport
space constraints), which we'd have to deal with first.

Some other constraints we have to deal with:

1) Clients need to be relatively free to re-register at will.  People
uninstall and reinstall all the time, or unregister and then re-register
for push messages.  So if a server signs some stuff at setup time, it's
worth considering that we can't constrain that to be the only time.

2) I've been really hesitant to introduce any disk writes into the
message/signal delivery path.  Right now rate limiting is done by
building a leaky bucket into a memcache key indexed off the user's
authenticated identifier.  The whole path is read-only at the moment,
and that's pretty important to me.  If we're interested in designing
stuff that larger providers could adopt, I believe they will have
similar concerns.  So I think we need to stay away from the disk if we
can, which means we need something bounded that we can put in memory.

3) Token reuse can cause linkability.  It sounds like you're suggesting
a token assignment and release strategy, but to my understanding, that
means a token could be used to initiate with multiple recipients.  Maybe
that's alright, but it starts to create problems.  If a journalist calls
their mom, their partner, their colleagues, and then their source using
the same token, it might not be a huge leap to figure out who the token
belongs to, and thus who their source is.

4) We have to be careful with signatures.  We probably can't have a
certificate that signs the whole message/signal, because then we lose
deniability.  Maybe that'd be alright for calls, but we have to be
careful with messages. On the other hand, the server can't just sign a
simple statement attesting to the sender's id which gets passed in the
signal/message, because then it could be replayed by the recipient to
spoof a call to someone else.

5) We have to be careful with system-wide periodic tasks.  One could
imagine having every client get a new batch of blinded tokens every 24
hrs, which would be timestamped to limit how many could potentially need
to be kept in memory as "used" server-side, but any periodic task that
*all* clients perform frequently is potentially rough when you begin to
consider how few seconds there are in the day.  We're already paying
that price for contact intersection, but the request processing is
pretty simple.

I think this kind of thing (partial metadata hiding) is possible if we
can figure something out for the rate limiting, but the rate limiting /
abuse monitoring appears to be the hard part so far.

- moxie

-- 
http://www.thoughtcrime.org