[messaging] Merging partially-ordered membership operations was Re: Message order in group chat (attempt at summary)

Sun Apr 20 17:20:56 PDT 2014

On 20/04/14 19:33, Trevor Perrin wrote:
> On Sat, Apr 19, 2014 at 9:16 AM, Ximin Luo <infinity0 at pwned.gg> wrote:
>> On 19/04/14 16:45, Trevor Perrin wrote:
>>> On Sat, Apr 19, 2014 at 6:53 AM, Ximin Luo <infinity0 at pwned.gg> wrote:
>>>> On 19/04/14 00:54, Trevor Perrin wrote:
>>>>>
>>>>> To handle "join" and "part" events, I think we can assume these
>>>>> actions are represented as messages which operate on the "member-set"
>>>>> each participant recognizes.  The question Ximin has thought about
>>>>> earlier is how to handle "merging" partially-ordered changes to the
>>>>> member-set
>>> [...]
>>>>
>>>> I have an algorithm (with source code and a few test cases) for this, plus a semi-formal argument on why it works, and also why it's the unique solution. It would be good to formalise it. It's probably a little too technical to post here but I can explain it in more detail next time we meet.
>>>
>>> Thinking more: if a group chat allows member-add but not
>>> member-delete, then this is is easy, right?  The merged member-set is
>>> just the union of the parent member-sets?
>>>
>>> With delete, there's not a perfect solution: what if one parent branch
>>> adds Bob, and another adds Bob, then deletes him?  Since these
>>> branches aren't ordered, I think it's ambiguous whether the merge
>>> should contain Bob or not.
>>>
>>
>> I'd argue the correct merge is that it should contain Bob. Perhaps I'm biased in my assessment, since this is what my algorithm does :p
> 
> OK.  So a "member-delete" message would cancel all causally-prior
> "member-adds" of that member (though not necessarily all
> temporally-prior ones).
> 
> Seems reasonable.  Does the below work? -
> 
> Clients track sent/received messages, but garbage-collect messages
> more than X seconds older than the latest message (i.e. messages are
> garbage-collected once they are old enough that new messages are
> expected to refer to later ones).
> 
> For each message, clients track:
>  - a member-set M, each member associated with the messageID(s) that added it
>  - a deleted-member-set D, each member associated with the
> causally-prior messageID(s) that added the member being deleted
> 
> The client's view of the current M and D is derived by merging M and D
> from its immediate predecessor messages:
>  - M = union(M from predecessors), D = union(D from predecessors)
>  - if there is a member listed in M and D, delete from the member in M
> all the messageIDs listed for the member in D.  Then, if the member
> has no messageIDs in M left, delete the member from M.
> 
> When messages are garbage-collected, if the last mention of a
> messageID for a member in M is removed, then the same messageID can be
> removed for that member in all D.  (In other words: once a member-add
> has been fully cancelled by a member-delete in all messages that a new
> message could refer to, the member-delete no longer needs to be
> tracked.)
> 

I'm unsure what you're trying to achieve with M, D. If you only want to achieve the cancel behaviour, tracking which message "joined" a member is unnecessary. The cancel behaviour is actually only a by-product of the general merge algorithm, which works by finding the Latest-Common-Ancestor between merge parents. This is what git does, and I assume darcs/hg too, and is what inspired my algorithm.

Come to think of it, I think you will actually never get *any* items in D. If an entry (mId, member) is in D for a message, then it must also be in M, since mId is causally before that message. But then, you ought to delete this entry from both D and M.

If you designed the above specifically to be able to discard old history, I think this is still possible with my merge algorithm - it will never need information before LCA({everyone's latest message}), so you can discard everything before those messages.

For your example above, it would work as follows (same behaviour for git, and my unordered-sets version):

Say r is tip of the first branch (members={A}), and s is the tip of the second branch (members={A,B}). Then p (members={A}) is the LCA of r, s. Then, the merged message m must have member-set:

members(m)
= apply(r, diff(p, s)) (or equivalently) apply(s, diff(p, r))
= apply({A}, (+{B}, -{})) (or equivalently) apply({A, B}, (+{}, -{}))
= {A, B}

So you see, the fact that you added-then-deleted Bob from the first branch, is bypassed by the algorithm. I am pretty certain the LHS == RHS always, which makes the algorithm symmetrical wrt the order of merge-parents, but I haven't yet formally proved this.

If the LCA is multiple sets, you recursively merge these first - this is what git means when it says "merge by recursive". If you merge multiple parents, you fold over the pairwise merge - this is what "git merge-base" does. (I originally had a simpler behaviour that I thought was a short-cut, but it did the wrong thing; I now better understand why git-merge-base works the way it does.)

X

> ---
> 
> This is getting complicated, would be great if someone could create a
> wiki or something and try to list:
>  - threats we're defending against
>  - various assumptions that can be made about the underlying network,
> group semantics, and UI
>  - algorithms and their options
> 
> 
> Trevor
> 

-- 
GPG: 4096R/1318EFAC5FBBDBCE
git://github.com/infinity0/pubkeys.git

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 880 bytes
Desc: OpenPGP digital signature
URL: <http://moderncrypto.org/mail-archive/messaging/attachments/20140421/609e4c03/attachment.sig>