[messaging] Test Data for the Usability Study

Sun Jun 15 10:26:28 PDT 2014

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

On 12/06/14 23:17, Tom Ritter wrote:
> It would be great if people could review the code (and math) and
> check for mistakes. Please don't judge me by my code, I committed
> to getting something together for this, and it fell at a time of a
> cross-country move, job change, and other large commitments.
> Re-link: https://github.com/tomrittervg/crypto-usability-study
> Christine: it would also be great if you could clone and confirm
> you can 'make' in the 'pseudoword_testdata' directory and run
> ./genTestData.py successfully.

Thanks for creating this! It works for me too, after chmod as
Christine mentioned. The Perl version of the poem code was lagging
behind the Java version - I've brought it up to date and submitted a
pull request.

I'm trying to understand the 2^80 attack code for the poem generator.
As far as I can tell, it generates a poem, then creates a modified
version by replacing at least 48 bits' worth of words with random
words of the same type (nouns for nouns, adjectives for adjectives,
etc). So approximately 80 bits' worth of words are shared between the
two poems.

There are a couple of problems with this approach. First, it never
modifies the structure of the poem, which encodes some of the bits.
Second, the modified bits appear in clusters, aligned with word
boundaries. If we were to modify 48 bits at independently chosen
positions, more of the words would be affected.

In general I'm not sure that modifying the representation, as opposed
to the fingerprint being represented, is a good way to model the
attacker. I thought we were trying to model an attacker who can
generate 2^80 fingerprints, then pick the one that's closest to the
victim's fingerprint, for some definition of "closest". In the first
stage of the study, we define "closest" as "sharing the most bits". *

If that's what we're trying to model, perhaps we should rewrite the
encoders to separate fingerprint generation (random or pseudo-random)
from encoding (deterministic). We could then generate pairs of
fingerprints (victim and attacker) that differ in the expected number
of bits, and run them through the encoders.

I'm happy to do this for the poem generator, but I may not be the best
person to do it for the other encoders.

Cheers,
Michael

* I'm not sure how many bits we'd expect the fingerprints to share in
this situation. I *think* it's 94, because the attacker can choose the
values of 80 bits and on average half the remaining bits will match by
chance. But I'm not confident about this...
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)

iQEcBAEBCAAGBQJTnddEAAoJEBEET9GfxSfMhrAIALten86vADGlVFsEyDt0ZX4S
NxbA25RSXxZwwcnhSArPwgzGiIRy5ea62UBETuRQcpbKPU8pAyNFs5btnszi1vNe
IC23Kj+9vKmWU5paXrqrru8xcNITOU3CmndF89sfGdmxYW9WQP8TfDK/WIAsKntj
8XCWegXg7w2b95sSNWE9PJyKqpD6GBdtP9PQxkVZ+uucEBhhdgZYPN/n74Ipob5L
4VbBnSEo2tisOZW/b0K/yUNZY4I5iYgztktUW1bJtscsmRuptUpNib2T7BUHTUsl
ec+0rq4yjuoQOHIgvW/f4/KZByLC7J0yPytDLsQX7CIYCSWpIR4oFccydMSuXYs=
=EQKs
-----END PGP SIGNATURE-----