[messaging] Matrix.org: Decentralized group chat

Tue Mar 10 16:51:18 PDT 2015

On Tue, 10 Mar 2015, Ximin Luo wrote:

> On 10/03/15 21:39, Matthew Hodgson wrote:
> > So in terms of state merge resolution: yes, this is obviously a hard 
> > problem to get right.  Hopefully we haven't underestimated it.  The WIP 
> > draft documentation for it is at 
> > https://github.com/matrix-org/matrix-doc/blob/master/drafts/erikj_federation.rst#state-resolution.  
> > Meanwhile the reference python server implementation actually implements 
> > this, and hasn't exploded yet.
> > 
> 
> I haven't yet read the full thing, but this feels a bit naive. In a 
> situation where your state can be said to have a "removal" operation, 
> then the properties stated currently don't work.
> 
>    {abc}
>    /   \
>   /     \
> {ab}   {bc}
> 
> The result of merging the two children should be {b}. Note that the 
> history is important - if the parent state were different in the above 
> case, e.g.:
> 
>     {b}
>    /   \
>   /     \
> {ab}   {bc}
> 
> then the merge result should be {abc} - i.e. the "time-reversed" form of 
> the first case. "Removal" is abstract; I think this results applies for 
> any state that can be said to admit "inverse" operations.
> 
> But perhaps you've architected your state such that "inverse operations" 
> don't exist? (c.f. what Trevor said before about "If group membership 
> can be added but not deleted, then merging is easy (union of group 
> members).")

Correct - currently we don't allow removal.  In the scenario of room 
membership, we simply switch members' state to 'left' when they leave, so 
currently state accumulates increasingly over time.  (For instance, the 
#matrix:matrix.org room currently is aware of 693 users, of which around 
200 are currently in state 'left', as opposed to 'invited' or 'joined').  
We haven't seen this to be a major problem in practice so far, although 
obviously we will need to revisit this in the scenario of join-spam 
attacks.

There is a plan to spec and implement best-effort removal to prune the 
departed users, where deletions are considered the lowest possible 
priority in merge conflict resolution, but it's not fleshed out yet.

> The "hash of event_id" feature also feels a bit ugly. Say an attacker 
> wants to screw with things, then he can generate an event_id with a 
> greater hash, and send it pointing to the older event. Why would he want 
> to do this? I don't know, it depends on your application, but maybe he's 
> not allowed to explicitly reverse an operation *after* it happens. 
> (There is something in there about not admitting certain ban 
> operations.) I don't know if there is a *specific* attack, but the 
> general idea of "conflict resolution via arbitrary tie-breaking" is 
> susceptible to such things. Do you have some arguments why this is N/A 
> for your application?

Events are immutable and irreversible.  The idea of maliciously 
referencing an older event to create a fork in history is an obvious 
attack.  Ways we mitigate this include:

 * Distinguishing causal events (e.g. kicks/bans/ops/joins) from 
non-causal events (e.g. IMs).  Servers must cache causal events for a 
room, but may discard non-causal ones from history.
 * When emitting a causal event, a server must provide the 'auth chain' of 
prior dependent causal events to prove why this user claims to have 
permission to perform this operation.  Other servers verify this against 
their record of events, pulling in auth events from other servers if 
needed, thus stopping people synthesising malicious events at whatever 
point in history.
 * For non causal events, there are heuristics (currently not well 
specified outside of the code) that servers can use to reject particularly 
stale or particularly 'split' events.  After all, it's impossible to 
distinguish a legitimate netsplit from a malicious one when the contents 
is just IMs, so the best we can do is to put arbitrary thresholds on the 
volume and age of unexpected bursts of 'lost' events.  For instance, this 
can help the scenario where a user keeps chatting away on a branch after 
they've been kicked, where a malicious server choses to ignore the kick 
request.
 * We also are planning (but have not remotely specced) reputation 
analysis for both demonstrably bad acting servers and end-users.

> Also, what do you mean by "end-to-end encryption"? When more than 2 
> parties co-operate in a session, there are extra security properties we 
> want like consistency, which is not "automatic" with a partial order 
> since an attacker can always create two events attached to an older 
> parent that "looks authentic" but then (with co-operation of the 
> transport) delivers different events to different recipients.

In the proposed implementation, i believe the sending client encrypts the 
message payload once with a (configurably) ephemeral key and relays it 
through normal group chat to the participants in a room, including copies 
of the key & plaintext hash encrypted pairwise for each participant in the 
room based on their DH keys.  So yes, a malicious party can create 
multiple authentic looking events - but I don't believe this is any 
different to synthesising malicious unencrypted events.  The message graph 
is still signed, and the same merge resolution algorithms kick in.

> Signing also interferes with the deniable-authentication of axolotl. But 
> perhaps this isn't part of your threat model.

Yes, unfortunately axolotl's deniability is lost in Matrix's (ab)use of 
it.  In fact, one of Matrix's biggest design shortcomings (in my opinion) 
is our lack of protection or encryption for metadata.  In the end, if you 
want to avoid metadata analysis the only current solution is to use 
throwaway accounts.

M

-- 
Matthew Hodgson
Matrix.org