[messaging] Merging partially-ordered membership operations was Re: Message order in group chat (attempt at summary)

Mon Apr 21 06:24:46 PDT 2014

On 21/04/14 04:31, Trevor Perrin wrote:
> On Sun, Apr 20, 2014 at 5:20 PM, Ximin Luo <infinity0 at pwned.gg> wrote:
>>
>> I'm unsure what you're trying to achieve with M, D.
> 
> I'd like to understand the algorithms for tracking causal order and
> group membership.
> 
> There's different membership policies, depending on who can add
> members (anyone? group leader?) and who can delete (anyone? group
> leader? you can delete yourself?).  But I think the "anyone can add
> and delete" policy is the most complex, so if we can handle that we're
> in good shape.
> 
> I think my algorithm works.  Yours probably works too, though I'd like
> to see a fuller writeup.
> 
> 
>> Come to think of it, I think you will actually never get *any* items in D. If an entry (mId, member) is in D for a message, then it must also be in M, since mId is causally before that message.
> 
> No, I meant the following:
>  - A "member-add" message would result in (mId, member) being added to
> that message's M.
>  - A "member-delete" message would move all entries associated with
> member from M to D.
> 
> 
>> But then, you ought to delete this entry from both D and M.
> 
> The only case where an entry in D could be removed (or ignored) was
> once it becomes irrelevant (all references to mId in any message's M
> have been garbage-collected).
> 
> 

I think this does work, but I see a few issues. It leaks some information about previous members in the session - I don't think the garbage-collection would be enough to hide this. Also, I'm generally uncomfortable with relying on redundant information - M and D can be deduced by examining previous history, which is the canonical place where this information is generated.

If you want to support incomplete views of history (e.g. so that new joiners don't see old history) this redundancy may present some problems, since new joiners can't verify that M and D are consistent with the old history.

In a more complex situation, this could lead to unreasonable behaviour (perhaps severe enough to call "an attack"). In the diagram below, we do a merge (C) of 2 branches (A,B) where Bob joined separately in each branch. Then (in C) M should have two entries (A, Bob), (B, Bob). When Alice then adds Carol (in F) she can misrepresent M to only contain one entry (B, Bob). Will the other members raise an alert about this? If not, then Carol is forced to trust Alice. When she deletes Bob (in G), she will only move one entry to D, when she ought to move both entries. Later, a merge (H) of (A,G) will cause Bob to be added back into the conversation, even though this should not be the case, since Bob was removed in a message (G) strictly later than both A and B.

O -- A -----------------
 \       \              \
  -- B -- C -- F -- G -- H

In the situation where everyone can add-and-delete, this can be fixed manually (if the humans notice...). But this may cause problems in more restricted-policy scenarios.

In general, I believe that {minimising reliance on redundant information} is a good principle to follow when designing secure protocols, even if it makes your algorithms more complex. Though, I don't think my algorithm is *too* complex, at least less so than e.g. a block cipher.

To support partial views, my approach is to have a looser concept of "when" someone joins/parts. I believe the general merge algorithm is compatible with subjective incomplete views of the history. For example, using the syntax mId{members}:

A{p} <-- B{p,q} <-- C{p,q,r}.

p thinks that q joined at message B. However, from r's point of view, q only joined at message C (B and A don't exist for them).

I appreciate all of this is getting a bit complicated and I will try to write this up. The algorithm itself is not too bad (being very similar to git; see below), but the explanation on "why it works" is more complex, especially when extending the justification to the incomplete-view scenario above.

>> If the LCA is multiple sets, you recursively merge these first - this is what git means when it says "merge by recursive". If you merge multiple parents, you fold over the pairwise merge - this is what "git merge-base" does.
> 
> Do you mean something like this?
> 
> http://codicesoftware.blogspot.com/2011/09/merge-recursive-strategy.html
> 
> That looks complicated.  Maybe you should try to write out the
> pseudocode for all this, so we can evaluate.
> 

Here is the pseudocode with type annotations, but it will take me a lot longer to write up the justification:

(set[UId] is the type of a member-set, A|B is set union, A-B is set subtraction)

merge_mem(parents: set[MId]) -> set[UId]:
    """The merged members of the given set of messages.

    If parents is empty, return empty set.
    If parents is a singleton, return members(parents[0]).
    Otherwise, same as members(foldr(merge_mem_2, parents)), except that we
    do not actually add any nodes to the DAG, where merge_mem_2, etc. are
    defined as:

    merge_mem_2(a: MId, b: MId) -> MId:
        merged = new Message {
            parents = {a, b}
            members = 3_way_merge(merge_mem(lca(a,b)), members(a), members(b))
        }
        graph.add(merged)
        return merged.mId

    3_way_merge(O: set[UId], A: set[UId], B: set[UId]) -> set[UId]:
        := A | (B-O) - (O-B)
        := A - (O-B) | (B-O)
        := B | (A-O) - (O-A)
        := B - (O-A) | (A-O)
        # all four definitions are equivalent

    lca(messages: set[MId]) -> set[MId]:
        := max(anc(messages)) # latest common ancestors

    anc(messages: set[MId]) -> set[MId]:
        := { p | p <= m for all m in messages }

    merge_mem_2 is commutative and associative, so the order of the fold
    doesn't affect the final result of merge_mem. (TODO: formally prove
    associativity. So far we only do this "by example" in the tests.)

    NOTE: implementations will want a slightly modified version of 
    merge_mem_2 that doesn't add any new nodes, and uses an accumulator. 
    We decided to describe this simpler version instead, so as to not to 
    visually detract from the core purpose of our algorithm.
    """

(I do have the actual implementation as well, but this version is easier to explain and understand. The performance is fine, at least good enough for chatting with; one can optimise the LCA calculation.)

X

-- 
GPG: 4096R/1318EFAC5FBBDBCE
git://github.com/infinity0/pubkeys.git

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 880 bytes
Desc: OpenPGP digital signature
URL: <http://moderncrypto.org/mail-archive/messaging/attachments/20140421/6244581a/attachment.sig>