[messaging] Partial ordering, dynamic groups and event ordering

Fri Mar 14 09:23:44 PDT 2014

On 14/03/14 07:14, Trevor Perrin wrote:
> 
> On Wed, Mar 12, 2014 at 1:42 PM, Ximin Luo <infinity0 at pwned.gg <mailto:infinity0 at pwned.gg>> wrote:
> 
>     Hey guys, I am doing work on partially-ordered transcripts. This was inspired by Old Blue[0], discussions with George Kadianakis and Trevor Perrin, as well as background knowledge on git, and immutable content-addressing schemes as seen in Freenet and Tahoe-LAFS.
> 
>     For everyone's interest, here is what I have so far, followed by a statement of some issues that (AFAICS) have not been discussed before, and directions for solving them.
> 
> 
> Hey Ximin,
> 
> We may need a more gentle introduction to this, and pseudocode - not everyone here saw the discussions on OTR-dev, and I think you've worked out new details I don't fully understand.
> 

Thanks for the input - I'll work on spending more time to explain things at a slower pace, and either re-post or upload it somewhere.

> My take:  You're considering problems that arise when trying to achieve "transcript consistency" for a multiparty conversation.
> 
> The general approach you're considering is for parties to piggyback hash values on the messages they send, referring to the messages they've previously sent and received.  The goal is to protect the context of your message, so that if you send a message saying "yes", an attacker can't perform replay/reorder/deletion to change what it appears you're responding to.
> 

The hashIDs also serve another purpose - they give some level of assurance to the other recipients that they are *seeing the same view* of what I send. Otherwise, I can send different things to different people.

> You're raising a couple questions about what happens when users join / leave the conversation.
> 
> You point out that piggybacked hashes sent to new users might leak information about messages before the new users joined.
> 
> I'm not sure that's a big deal.  But at least in a "pairwise" situation where each message is separately encrypted to each recipient (instead of using a group key and broadcast medium), wouldn't it be easy to omit old hashes to a new member?
> 

Due to hashIDs needing to be the same for each message, we send the same actual ciphertext to all recipients, but they decrypt/read different portions of it. To save space, there is only one copy of the message contents, including the parent pointers - but the path of which parts each recipient needs to decrypt, is different, e.g. roughly ciphertext = {E(K1,K) E(K2, K) E(K,Msg)}.

I could indeed separately-encrypt different parents to different recipients. The difficulty would be to do this in a restricted manner, to *only* allow the case we're describing, to make sure that people can't send arbitrarily-inconsistent parents. It would also make the message format more complex.

I'll explore this idea a bit more - thanks for the tip. It is similar in principle to my other idea ("show the same parents, but add some annotation to tell the new user he shouldn't expect to see a subset of them") and probably not much more complex.

> I think you were also concerned about whether users joining/leaving the conversation could get things out of sync?  It seems you've resolved that concern, but I admit I didn't quite follow.
> 

The stuff about "not-after"/"after" in the first email was maybe not the most direct way of explaining things, but it was what I had in my confused head at the time. Perhaps the following model is easier to follow:

At all times, the chat must have a member-set ("current participants") - this is how you decide, "who to send my next message to". If your transcript only has one "most recent" message, we can define it simply as the member-set of the previous message (i.e. its sender plus recipients, embedded into each message), plus/minus anyone you want to invite/kick.

This definition is also *constant* (wrt who you are in the chat), given a constant transcript. This is important, for consistency between members.

The problem is when we have a fork in our transcript, then who do we send our next message to? It turns out that, we need a merge algorithm. Each node (message) has a member-set, edges represent join/part/NOP operations to this member-set, and we need to merge these together when we have a fork.

Every version-control system has one of these algorithms, but sometimes they result in edit conflicts. With line-by-line content, it is theoretically impossible to automatically resolve some of these (there is a proof floating around somewhere).
This was my concern in the first email. However, I later showed that with unordered sets (with no additional structure that needs to be preserved), like "members in a chat", we can actually automatically-resolve all of these.

However, the merge algorithm in my previous email has a (fixable) flaw, which I describe below. But if you didn't follow the previous few paragraphs, please let me know - the rest of this email will not make sense otherwise.

I explored the area a bit more today, and concluded that the merge_mem I described in my previous email doesn't quite work like we need it to. It is indeed consistent and verifiable, however it does not actually behave intuitively, as I asserted.

In the following graphs, the node labels are the set of members (i.e. "participants in the chat") of each node, e.g. "bc" means "this node/message has users b and c". For brevity, I also use this as an identifier for the node, when describing inputs to merge_mem(). In the diagrams below, this is ok, because each node has a unique member-set that no other node has (which is generally not the case, of course).

~~~~
digraph multiroot {
 x;
 bcd -> bc [label="+d"];
 b -> bc [label="-c"];
 rankdir=BT;
 { rank=same; bc x }
}
~~~~
https://people.torproject.org/~infinity0/res/mcrypt/po-merge-set-multiroot.png

Intuitively, we have {bc}, +d, -c and {x}, so we want merge_mem(bcd, b, x) == bdx. However, the algorithm I outlined in my previous email returns bcdx instead.

At first I thought it was a problem with my generosity of not disallowing multiple roots. But, in the algorithm I defined previously, it simply treats multiple roots *as if they all have one same (invisible) parent*, where that parent's member-set is the empty set.

The problem shows up in the same way, if we add a member {a} to all of the nodes, and show the empty-set (now with {a}) node explicitly:

~~~~
digraph singleroot {
 ax -> a [label="+x"];
 abc -> a [label="+bc"];
 abcd -> abc [label="+d"];
 ab -> abc [label="-c"];
 rankdir=BT;
 { rank=max; a }
}
~~~~
https://people.torproject.org/~infinity0/res/mcrypt/po-merge-set-singleroot.png

Again, we want merge_mem(abcd, ab, ax) == abdx, but the algorithm defined previously returns abcdx.

The problem stems from the fact that our previous algorithm treats all nodes equally. However, in the cases above, {bcd, b} have a more recent parent than x, so we should really merge bcd, b together first, before we try to merge it with x. It will be a bit more complex trying to specify this precisely, though.

X

-- 
GPG: 4096R/1318EFAC5FBBDBCE
git://github.com/infinity0/pubkeys.git

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 880 bytes
Desc: OpenPGP digital signature
URL: <http://moderncrypto.org/mail-archive/messaging/attachments/20140314/1d764888/attachment.sig>