[messaging] Efficiently compressing an exabyte of ciphertext

Wed Jul 23 12:14:24 PDT 2014

# Or, how to make storing ciphertexts cheaper for the NSA.
## (Answer: Keep using stream ciphers.)

Suppose that I worked for the NSA, and were tasked with storing every ECDH
+ symmetric key enciphered communication for eventual decryption.[^how] How
could I save storage bandwidth and space, and store the maximum
*information* possible?

1. Chop the MAC from stored AE ciphertexts. (I trust what I know about
where ciphertexts come from more than I trust the MAC.)

2. For any ciphertext which I believe to be padded, truncate material which
is almost certainly padding. (E.g., for presumed exponentially-padded
messages, truncate the last quarter.)

3. For any message encrypted using a stream cipher or a block cipher in a
chaining mode, use a linear code to compress the message.

3.1. *Stream ciphers*:  a linear code can compress down to the entropy rate
of the underlying plaintext distribution. (Trivial example, rate 8b code:
presumed ASCII, truncate to 7 bits. Less trivial codes necessary for
reaching the entropy rate. Entropy rate of JSON < HTML < English < .5.)

3.2. *Block ciphers in CBC or OFB mode*: use a block-level turbo code. See,
e.g., Klinc's work at T.J. Watson.[^ibm] His constructions achieve -- for
reasonable message lengths -- compression down to the entropy of the
underlying plaintext distribution. (The story for CFB mode is more
complicated.)

This saves me a lot of money; the cost of decrypting everything sent today
by solving discrete-log with a QC isn't the eventual computation. It's the
storage costs until then.

[^how]: Say, with a quantum computer, in twenty years or so.
[^ibm]:  See [Klinc's thesis][klincthesis], p. 99 et seq. Or, the original
paper:  Klinc et al. [On compression of data encrypted with block
ciphers][klincpaper]. (Note that the original *conference proceeding* does
not contain a construction for CFB mode; the thesis and paper do.)

## How to make storing ciphertexts more expensive.

Okay, so taking off my suit, bowler hat, and adorable bowtie for a moment.

### Preventing padding from being stripped

Standard construction, proved secure by [Boyko in his thesis][boykothesis].
OAEP is an all-or-nothing transform:

    len(packey) == len(k)
    packey = randombytes(len(packey))
    ptext = keystream(packey, olen=len(plaintext)) ^ plaintext
    phash = hash(ptext, olen=len(packagekey))
    packagedkey = phash ^ packagekey
    ciphertext =   keystream(k, len(packagekey) + len(plaintext))
                 ^ concat(ptext, packagedkey)

Except: Boyko's model is slightly too weak. The adversary here corresponds
to his fully-adaptive adversary, which he doesn't prove anything about.

Or are people content with the semi-adaptive proof?

### Preventing compression

Something that seems somewhat better than OAEP (more like [Rivest's
original package transform][rivest97]):

    packagekey = randombytes()
    packagetext = stream_xor([c]*32 + message, k=aontkey)
    packagehash = hash(packagetext)
    enckey = packagekey ^ packagehash
    innertext = packagetext + enckey
    ciphertext = aes_ecb(innertext, k=messagekey)

Is this mode's security reducible to OAEP's for message lengths below the
birthday bound? (In particular, if AES-ECB is not a good block-cipher, then
can AES be a good stream cipher?)

As Klinc's paper notes, block ciphers in ECB mode are incompressible (or,
rather, if they can be compressed, they are insecure). This would then
provide a mode of operation that is an AONT with a simple proof of
incompressibility.

### Short messages.

It's not at all clear (at least to me) what to do about tag truncation,
especially in the short-message case, where it's most profitable. (A tag
can be >50% of an average IM message.) Perhaps one of the key-wrap
modes?[^aeskw]

Alternatively: Klinc's construction for CFB reaches the plaintext entropy
distribution very slowly. His construction works on pairs of CFB blocks,
and the compression bound on depends inversely on blocklength. (Though he
doesn't prove, as best I can tell, that his construction is optimal except
for the class of codes he considers.)

So, his construction couldn't achieve any significant compression for the
splendidly inefficient CFB-1. Thus, prefixing the tag to the ciphertext,
and then encrypting both with CFB-1 seems like it would achieve prevent
'unbinding' tags with zero (space) overhead.

References
------------
[coron08]: https://eprint.iacr.org/2008/24
[rivest97]:
http://www.csshl.net/sites/default/files/downloadable/crypto/Rivest_-_All-or-Nothing_encryption_and_the_package_transform.pdf
 "Rivest, Ronald L. *All-or-nothing encryption and the package transform.*
In Fast Software Encryption, pp. 210-218. Springer, 1997."
[boykothesis]: http://groups.csail.mit.edu/cis/theses/victor-phd.pdf
[klincthesis]:
https://smartech.gatech.edu/bitstream/handle/1853/39610/klinc_demijan_201105_phd.pdf
[klincpaper]: http://arxiv.org/pdf/1009.1759.pdf

[^aeskw]: It seems like AESKW prevents something like this, though it is
entirely lacking when it comes to security proofs.

Curious, though: A six-round Luby-Rackoff Feistel network is an ideal
cipher: What's the barrier (I assume there's an obvious one I'm missing) to
assuming keyed AES is an RO, and proving that AESKW is an ideal cipher? Cf.
the well-known [Coron et al.][coron08] ideal cipher construction. (I've
repeatedly read that AESKW has 'resisted' analysis, but haven't found many
publications analyzing it other than that Bellare-Rogaway SIV paper.)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://moderncrypto.org/mail-archive/messaging/attachments/20140723/3995dde3/attachment.html>