[messaging] Modern anti-spam and E2E crypto

Tue Sep 9 22:41:39 PDT 2014

It depends on the construction.  (I've never seen any form of cipher mode
used.  It's more involved.)

The simplest construction I know of is SSE-2:

Take an inverted index of the corpus:

w1 : {id1, id2, id3}
w2 : {id2}
w3 : {id2, id3}

Map it to a set of triples, where the first value is the word, the second
is the position, and the third is the id:

(w1, 0, id1), (w1, 1, id2), ..., (w3, 0, id2), (w3, 1, id3)

Let Pi be a PRP keyed by K.  Generate a table T.  Foreach entry x in the
array:

k = Pi(K, x(0) || x(1))
T[k] = x(2)

And then so many random values are added to T and the dictionary is
shuffled if it has order (to obscure the true number of keywords).

T is then the index and can be published.

A trapdoor to search for w3 is then {Pi(K, w3 || 0), Pi(K, w3 || 1), Pi(K,
w3 || 2)}
With the full index, you can simply look each of those up to see if they're
in T, and if any are, return the associated ids or the file in the
encrypted corpus.

Because the data owner has K, he can produce the trapdoor, which allows the
server to learn which files it should return.  Without the trapdoor, the
server just has a long list of random data and ids.

SSE-1 and its derivatives embed linked lists in an array--one linked list
for each line of the inverted index, one node for each id (with its data
XORed with an analog of Pi(w)).  That leads to a constant-size trapdoor,
but more to explain.

The game for CKA2 captures the notion that an adversary, allowed to produce
the plaintext corpus and query the functions that require a key an
unbounded number of times can't distinguish between a real SSE system and a
random simulator intentionally leaking specific things like file length.
In other words, given everything except the key, an adversary can't learn
anything.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://moderncrypto.org/mail-archive/messaging/attachments/20140910/d4d0b153/attachment.html>