[messaging] Best practices (if any) for backing up message key data server-side
matthew at matrix.org
Thu Nov 30 11:28:04 PST 2017
We're currently tackling the problem of backing up message keys in
Matrix.org's end-to-end encryption architecture. The aim is to give
users a way to recover their message history if they only have one
client app (aka 'device') which they then lose.
For context: Matrix's E2EE strategy is that each device in a chatroom
establishes a 1:1 Double Ratchet between each other in a full mesh
(using the Olm ratchet: https://matrix.org/docs/spec/olm.html). Each
device then maintains a simpler hash ratchet (Megolm:
https://matrix.org/docs/spec/megolm.html) which it uses to encrypt
sequences of messages it sends to the other devices in the room via
Matrix (HTTPS+JSON). The state of each device's megolm ratchet (its
"megolm key") is sent to all the other devices in the room over the
secure 1:1 Olm channel, such that they can decrypt the messages and
message history as long as they have the necessary megolm session keys.
The sessions are regularly re-established to avoid reusing the same key
throughout the lifetime of the room (especially as users join/part the
So far we let users manually export/import their megolm keys for a given
device as a passphrased blob (HMAC'd AES-256-CTR, using a PBKDF2 derived
key from the passphrase). We've also just added the ability for users
to sync megolm keys on demand between their own trusted devices via
so-called "keyshare requests" over the Olm channel.
However, this fails for the scenario where the user is logging into a
new device but doesn't have any other active devices online (e.g. having
lost them, or because they're turned off, etc). So we've been trying to
establish the best approach for *optionally* backing up the keys
serverside. The options we've considered so far are:
1. Prompt the user for a passphrase at login (or launch?), which is
stored to encrypt the megolm keys and sync them to the server. If the
client is missing any megolm keys for whatever reason it can retrieve
them from the server. The disadvantage is the bad UX of needing the
user to remember and enter a passphrase whenever they login (as well as
doing a more normal login/password sign-in), and the fact a
passphrase-equivalent needs to hang around on the client.
2. Generate a recovery keypair for the account, and give the private key
to the user as a 'recovery code' to keep safe. We sync the public key
between the user's verified devices, and they encrypt the megolm keys
with the public key and store them on the server. If the user has a
disaster and needs to recover the keys, they enter their 'recovery code'
and sync the keys back to their client. This has the advantage of not
storing this master private key anywhere (other than out-of-band by the
user), and only prompting the user when things are going wrong.
However, it means the server-side keys can't be used to transparently
recover missing keys on an ad hoc basis, and the UX of storing and
entering long 'recovery codes' is perhaps questionable.
3. Same as option 1, but we sync the passphrase-equivalent between the
user's verified devices over the Olm channel. This means trusted
devices magically get access to the history keys stored on the server -
but means that we are enthusiastically copying an unprotected master key
between devices (albeit trusted devices), which feels dangerous.
However, we are effectively doing a subset of this today already when we
transfer specific megolm keys between devices using keyshare requests.
I've been going around in circles on this, and given the whole idea of
"storing private keys serverside" generally rings alarm bells, I thought
I'd ask for opinions from the wider community before we screw something
up. Feedback on the overall scheme would be appreciated too: it feels
slightly wrong that we're going through all the hassle of Olm and Megolm
ratchets only to then go and deliberately store message keys or master
recovery keys in order to decrypt history. (That said, it's worth
noting that rooms can theoretically be configured to deliberately
discard old session keys if PFS is more important than serverside history).
More information about the Messaging