[messaging] Best practices (if any) for backing up message key data server-side

Thu Nov 30 11:28:04 PST 2017

Hi all,

We're currently tackling the problem of backing up message keys in 
Matrix.org's end-to-end encryption architecture.  The aim is to give 
users a way to recover their message history if they only have one 
client app (aka 'device') which they then lose.

For context: Matrix's E2EE strategy is that each device in a chatroom 
establishes a 1:1 Double Ratchet between each other in a full mesh 
(using the Olm ratchet: https://matrix.org/docs/spec/olm.html).  Each 
device then maintains a simpler hash ratchet (Megolm: 
https://matrix.org/docs/spec/megolm.html) which it uses to encrypt 
sequences of messages it sends to the other devices in the room via 
Matrix (HTTPS+JSON).  The state of each device's megolm ratchet (its 
"megolm key") is sent to all the other devices in the room over the 
secure 1:1 Olm channel, such that they can decrypt the messages and 
message history as long as they have the necessary megolm session keys. 
The sessions are regularly re-established to avoid reusing the same key 
throughout the lifetime of the room (especially as users join/part the 
room).

So far we let users manually export/import their megolm keys for a given 
device as a passphrased blob (HMAC'd AES-256-CTR, using a PBKDF2 derived 
key from the passphrase).  We've also just added the ability for users 
to sync megolm keys on demand between their own trusted devices via 
so-called "keyshare requests" over the Olm channel.

However, this fails for the scenario where the user is logging into a 
new device but doesn't have any other active devices online (e.g. having 
lost them, or because they're turned off, etc).  So we've been trying to 
establish the best approach for *optionally* backing up the keys 
serverside.  The options we've considered so far are:

1. Prompt the user for a passphrase at login (or launch?), which is 
stored to encrypt the megolm keys and sync them to the server.  If the 
client is missing any megolm keys for whatever reason it can retrieve 
them from the server.  The disadvantage is the bad UX of needing the 
user to remember and enter a passphrase whenever they login (as well as 
doing a more normal login/password sign-in), and the fact a 
passphrase-equivalent needs to hang around on the client.

2. Generate a recovery keypair for the account, and give the private key 
to the user as a 'recovery code' to keep safe.  We sync the public key 
between the user's verified devices, and they encrypt the megolm keys 
with the public key and store them on the server.  If the user has a 
disaster and needs to recover the keys, they enter their 'recovery code' 
and sync the keys back to their client.  This has the advantage of not 
storing this master private key anywhere (other than out-of-band by the 
user), and only prompting the user when things are going wrong. 
However, it means the server-side keys can't be used to transparently 
recover missing keys on an ad hoc basis, and the UX of storing and 
entering long 'recovery codes' is perhaps questionable.

3. Same as option 1, but we sync the passphrase-equivalent between the 
user's verified devices over the Olm channel.  This means trusted 
devices magically get access to the history keys stored on the server - 
but means that we are enthusiastically copying an unprotected master key 
between devices (albeit trusted devices), which feels dangerous. 
However, we are effectively doing a subset of this today already when we 
transfer specific megolm keys between devices using keyshare requests.

I've been going around in circles on this, and given the whole idea of 
"storing private keys serverside" generally rings alarm bells, I thought 
I'd ask for opinions from the wider community before we screw something 
up.  Feedback on the overall scheme would be appreciated too: it feels 
slightly wrong that we're going through all the hassle of Olm and Megolm 
ratchets only to then go and deliberately store message keys or master 
recovery keys in order to decrypt history.  (That said, it's worth 
noting that rooms can theoretically be configured to deliberately 
discard old session keys if PFS is more important than serverside history).

thoughts welcome!

thanks,

Matthew

-- 
Matthew Hodgson
Matrix.org