[messaging] Panoramix decryption mixnet messaging spec and design documents

Fri Nov 24 22:37:14 PST 2017

Hi Michael,

Thank you for your thoughtful e-mail. I think you are correct... longterm
statistical disclosure cannot be prevented, only delayed or made more
difficult given that users do at times disconnect from the network.
(unrealistic constraints are unrealistic)

That having been said, I think we need to get clear on exactly
how this attack will work for systems like Loopix/Katzenpost
because the literature is describing very different mixnet systems
that are more like peer to peer systems.

There are a couple of points that you didn't mention that
I would like to raise below.

But before diving in... say we were to use a mix network to transport
zcash transactions: from client to blockchain. For this case it's hard
for me to imagine how this attack would work. I mean... there is no receiver
of the message. The zcash blockchain is the receiver.

> Yes, I think the attack would still be possible given a probability
> distribution over delays instead of a fixed delay.
> 
> My starting point here is the statistical disclosure attack
> (https://www.freehaven.net/anonbib/cache/statistical-disclosure.pdf),
> which says that if I'm watching the inputs and outputs of a round-based
> anonymity system, and I see Alice sending messages into the system, and
> for each round where Alice sends a message I see all the messages
> exiting the system, then starting from a uniform probability

In this case, exiting the system means a message travels from the last
mix hop to the destination Provider where the message is
queued. Later, anytime later, the recipient client can retrieve the
message from their "mailbox" on the Provider.

> distribution over all possible recipients I can refine that distribution
> with every observation, to eventually discover Alice's communication
> partners. The longer I watch the system, the closer my distribution gets
> to Alice's real distribution.

Yes but how to discover the set of suspect recipients?
The passive adversary watches to see who connects to the recipient Provider
to receive messages from their mailbox. However it is unclear to me
how the adversary is able to reduce the suspect set size.

> I think we can generalise that by replacing the rounds with a
> probability distribution over delays. So for each message Alice sends
> into the system, instead of having a set of output messages that
> definitely belong to the same round, I assign every future output
> message a probability of being Alice's message based on the delay. But

Yes that sounds correct... but we are still missing the output side
of the attack in this explaination.

> the rest of the attack remains the same: I use the probabilities
> assigned to output messages to update the probability distribution over
> possible recipients, and over time that distribution approaches Alice's
> real distribution.

I should point out that when the Katzenpost design is upgraded to use
decoy traffic the way the Loopix paper describes, the client would
also receive decoy messages from the Provider when retrieving
messages. That is, all clients would receive the same amount of
"stuff" when polling the Provider for queued messages.

> Of course the Loopix authors know this, hence all the dummy traffic to
> prevent an attacker from knowing which users are actually sending and
> receiving messages. I think the big question is whether that provides a
> useful amount of protection at a reasonable level of overhead.
> 
> > I also cannot think of a way that "d" is met... unless of course there
> > is collusion with the destination Provider.
> 
> If I understand right, condition d just means there must be some set of
> messages you're interested in, and you must know which messages they are.
> 
> In the statistical disclosure paper, the session is all messages sent by
> Alice and the goal is to identify the set of recipients. In the Berthold
> and Langos paper it's the other way round - the session is all messages
> sent by some unknown sender and the goal is to identify the sender. But
> either way, there's some set of interesting messages and the goal is to
> separate them from the uninteresting messages.
> 
> Just as we can generalise from rounds to a probability distribution over
> delays, I wonder if we can generalise from a fixed set of interesting
> messages to a probability of being interesting.
> 
> So in a system that uses dummy traffic, when I see Alice sending a
> message into the system, I assign it a probability of being a real
> message. Then I assign every future output message a probability of
> being the same message based on the delay. And I update the recipient's
> probability of being one of Alice's communication partners based on the
> product of those two probabilities.

I see what you mean AND I think the Client to Provider decoy traffic as I
described above would make this a lot more difficult. I don't yet fully
understand how this attack would work because I don't see a way to reduce
the set of suspect output messages. In this case, output messages means
the data that the client retreives from their Provider. It could all be decoy
traffic everytime the client polls their Provider for new messages... and
clients can connect at a much later time to retreive messages as well.

> In theory the attack still works, but my distribution converges more
> slowly. On the other hand, the system is also more expensive to use. So
> it comes down to a practical question: how long can the system conceal
> sender-recipient relationships, at what cost in dummy traffic?
> 
> I'd love to see some back-of-an-envelope calculations of this for
> Katzenpost. If Alice sends and receives n dummy bytes for each real
> byte, what does that do to the attacker's probability distribution over
> possible communication partners? How much longer does it take the
> attacker's distribution to converge?

I also would be interested in this although the academics that Yawning and I
are working with have thus far not spent any time on developing a decoy traffic
specification. The plan was to first deploy katzenpost without decoy traffic,
take measurements and then designs decoy traffic appropriate for the given
real traffic and desired mix entropy.

In the end, the decoy traffic will have to serve at least 3 purposes:

1. detection of n-1 attacks via loops from clients AND mixes
2. reduction of mix latency while maintaining desired mix entropy
   for a given traffic volume
3. prevent short term statistical disclosure attacks and make
   long term statistical disclosure attacks more difficult

Therefore any back-of-an-envelope calculations would have to take into
account these three uses of decoy traffic.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: not available
URL: <http://moderncrypto.org/mail-archive/messaging/attachments/20171125/946d4000/attachment.sig>