[messaging] Matrix.org: Decentralized group chat

Wed Mar 11 04:22:40 PDT 2015

On Sunday, March 8, 2015 00:13 GMT, infinity0 at pwned.gg (Ximin Luo) wrote:
> For a private dynamic group chat, the merging requirements are not exactly the same as in a collaborative environment like Google Wave:
>
> - for a basic application, one does not *need* to merge arbitrary state. The only thing you need to merge is an unordered set (the membership set).
>
> - merging unordered sets is easy and a solved problem if you have full history. However, the fact that this set represents "who has permission to see what history" makes the problem hard again, in the context of the other constraints (secure, casually-consistent, decentralised). Merge algorithms require history, but if someone can't see past history (e.g. after joining) then they don't have enough information to execute the merge algorithm, nor to verify others' claimed merges. (This is not necessarily an impossible problem, and indeed is the "last missing piece" in the bundle of ideas I haven't yet written down.)
>
> Briefly skimming over their spec: http://matrix.org/docs/spec/ "The state of the room at a given point is calculated by considering all events preceding and including a given event in the graph. Where events describe the same state, a merge conflict algorithm is applied."
>
> My impression too is that they underestimate how hard this will be. Also, their security practises are questionable - they do have a list of threats in the spec, but there are no suggestions or even hints on techniques on how they will defend against these threats. I guess they will add this stuff in later, but as I'm sure everyone on this list knows, you can't just "add security later". It sounds like they just did the equivalent of adding partial ordering to a federated protocol similar to XMPP, using TLS "for security", and hoping that everything will be naturally straightforward to conceptually lock down as they write the code.
>
> X
>
> --
> GPG: 4096R/1318EFAC5FBBDBCE
> git://github.com/infinity0/pubkeys.git
>

I've been working on a lot of the federation side of Matrix since its 
inception, and I can assure everyone that we really do know how hard 
these sorts of generic merge algorithms are. Hence why we've promptly 
not tried to solve them.

What we have done is attempted to solve a much simpler subset of the 
general problem. The basic requirement is along the lines of: "All 
servers in the group chat (that can talk to each other) will 
/eventually/ agree what the state of the room is." Note the careful 
omission of the word "correct".

(The state of the room, or more accurately the state of the room at any 
given event according to a given server, is simply a mapping between 
keys and nodes/events in the directed acyclic graph. The key is included 
in the event.)

The main points to note are:
  - We differentiate "auth events", i.e. events that affect 
authorization of "new" events. There are very few types of auth events, 
and Matrix doesn't support custom auth events.
  - The merge algorithm of non auth events, as Ximin points out, is 
relatively easy.
  - The merge algorithm for auth events is designed so that maliciously 
branching and re-merging does not allow circumvention of auth changes. 
This is the hardest part of merging, and is heavily special cased.
  - Each event points to the auth events that allow that event to 
happen, separately to the full event DAG.
  - All events are signed by the originating server, and all event 
relationships are protected by hashes (à la git).
  - Events can be "redacted" by servers, where all non protocol relevant 
keys are stripped, e.g. message contents. This allows servers to 
effectively "delete" nasty messages without leaving a whole in the 
graph. This is possible since we include a hash of the full event, but 
servers only sign the hash and protocol relevant keys rather than the 
full event.
  - Every server is required to store the full auth chain of the events 
it sends (for at least as long as it stores the original event). The 
auth chain is simply all the auth events for that event, and the auth 
events for those, etc.
  - The auth chain is enough for a server to accept that an event could 
have happened.
  - Servers can tell each other if they think another server shouldn't 
have accepted an event and why. (For example the sender of the event had 
been kicked between, in topological terms, the join event and the 
message event)
  - When a server joins the room, it gets given (by the server it is 
joining via) the other servers view of the current state and associated 
auth chains.
  - Servers can throw away old history/events. They still have to keep 
all the auth events.

Thus, different servers can have different views of what the current 
state is if buggy or malicious servers are involved; for example, when 
joining a room your server can be given an incorrect view of the current 
state. However, the auth chain ensures that these views can only be 
incomplete or stale, servers cannot simply include arbitrary synthesized 
events.
If two servers discover that they disagree on the state (which they 
quickly will), then they will exchange their views and attempt to prove 
the other side wrong; by the end of this process two correctly 
functioning servers will come to an agreement of the current state. In 
effect providing a "healing" mechanism.

(This does mean that servers may initially accept an event, only to be 
subsequently shown that the event should have been rejected.)

The outcome of all this is (hopefully) that the subgroup of correctly 
functioning servers *will* eventually agree on the current state, and so 
which events should accepted and rejected.

Hopefully this clarifies a little what Matrix is trying to do and what 
the federation protocol does and doesn't aim to provide and guarantee.

Erik.