[messaging] Panoramix decryption mixnet messaging spec and design documents

Fri Nov 24 08:09:50 PST 2017

On 21/11/17 02:12, dawuud wrote:
> Katzenpost is delivery by pull as well. Clients retreive messages from their Provider.
> This paper:
> 
>  [DUMMYINTERSECTION] Berthold, O., Langos, H.,
>                      "Dummy Traffic Against Long Term Intersection Attacks",
>                      In the Proceedings of the PETS 2002,
>                      <https://www.freehaven.net/anonbib/cache/langos02.pdf>.
> 
> mentions that there are some constraints needed for the intersection attack to work:
> 
> a. the adversary is able to watch all user lines
> b. the adversary is able to watch all Provider/server lines (see figure 1 in paper)
> c. the adversary knows how long it takes for a message to traverse the mix network
> d. messages belonging to a certain session have to have the same property
>    that allows the adversary to link them each together
> 
> So in Katzenpost I don't think constraint "c" is met since we aren't using
> synchronized batch mixing... but rather the Poisson mix strategy where
> clients select the mix delay for each hop in the route which should be different
> for every sent message, even retransmits... although client route hop delays are sampled
> from an exponential distribution. The attacker can of course compute the
> probability of a certain route delay having been choosen by a client.

Yes, I think the attack would still be possible given a probability
distribution over delays instead of a fixed delay.

My starting point here is the statistical disclosure attack
(https://www.freehaven.net/anonbib/cache/statistical-disclosure.pdf),
which says that if I'm watching the inputs and outputs of a round-based
anonymity system, and I see Alice sending messages into the system, and
for each round where Alice sends a message I see all the messages
exiting the system, then starting from a uniform probability
distribution over all possible recipients I can refine that distribution
with every observation, to eventually discover Alice's communication
partners. The longer I watch the system, the closer my distribution gets
to Alice's real distribution.

I think we can generalise that by replacing the rounds with a
probability distribution over delays. So for each message Alice sends
into the system, instead of having a set of output messages that
definitely belong to the same round, I assign every future output
message a probability of being Alice's message based on the delay. But
the rest of the attack remains the same: I use the probabilities
assigned to output messages to update the probability distribution over
possible recipients, and over time that distribution approaches Alice's
real distribution.

Of course the Loopix authors know this, hence all the dummy traffic to
prevent an attacker from knowing which users are actually sending and
receiving messages. I think the big question is whether that provides a
useful amount of protection at a reasonable level of overhead.

> I also cannot think of a way that "d" is met... unless of course there
> is collusion with the destination Provider.

If I understand right, condition d just means there must be some set of
messages you're interested in, and you must know which messages they are.

In the statistical disclosure paper, the session is all messages sent by
Alice and the goal is to identify the set of recipients. In the Berthold
and Langos paper it's the other way round - the session is all messages
sent by some unknown sender and the goal is to identify the sender. But
either way, there's some set of interesting messages and the goal is to
separate them from the uninteresting messages.

Just as we can generalise from rounds to a probability distribution over
delays, I wonder if we can generalise from a fixed set of interesting
messages to a probability of being interesting.

So in a system that uses dummy traffic, when I see Alice sending a
message into the system, I assign it a probability of being a real
message. Then I assign every future output message a probability of
being the same message based on the delay. And I update the recipient's
probability of being one of Alice's communication partners based on the
product of those two probabilities.

In theory the attack still works, but my distribution converges more
slowly. On the other hand, the system is also more expensive to use. So
it comes down to a practical question: how long can the system conceal
sender-recipient relationships, at what cost in dummy traffic?

I'd love to see some back-of-an-envelope calculations of this for
Katzenpost. If Alice sends and receives n dummy bytes for each real
byte, what does that do to the attacker's probability distribution over
possible communication partners? How much longer does it take the
attacker's distribution to converge?

Cheers,
Michael
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0x9FC527CC.asc
Type: application/pgp-keys
Size: 4660 bytes
Desc: not available
URL: <http://moderncrypto.org/mail-archive/messaging/attachments/20171124/cfe6e158/attachment.key>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 473 bytes
Desc: OpenPGP digital signature
URL: <http://moderncrypto.org/mail-archive/messaging/attachments/20171124/cfe6e158/attachment.sig>