[messaging] Test Data for the Usability Study

Sun May 25 17:15:43 PDT 2014

Hey all!

Christine and I have opened 6 issues at
https://github.com/tomrittervg/crypto-usability-study/issues to
produce test data for the usability study.  We have until JULY 15TH to
produce this data.

There's several components here.

First: Collecting source from a few different places to produce the
variants of English-Word, English-Poem, Pseudoword, and Visual
Fingerprints.  Because all of the code we're using should be open
source, let's just copy it into subdirectories, modified to run simply
with minimal command line args needed, and stick an appropriate
LICENSE in the subdirectory.

Second: Approximate a 2^80 attacker trying to match the english word,
pseudoword, english poem, and visual fingerprint styles.

Third: Figure out how to approximate an attacker who can perform 2^80
calculations in the 'weird' cases.  For a 32-character hex
fingerprint, a 2^80 attacker can match 20 characters.

Weird Case 1: An attacker matches the beginning and end parts of the
fingerprint to try and trick someone doing a visual compare. Clearly,
matching the beginning and ending 10 characters exactly is harder than
matching any 20. but how much harder? Would a match of the beginning
and ending 8 characters correctly characterize a 2^80 attacker?

Weird Case 2: An attacker tries the match the fingerprint by
pronunciation to try and trick someone doing a vocal compare. Again,
matching 20 characters exactly and making the remaining 12 'sound
alike' is harder than just matching 20. Would an attacker getting 28
characters to 'sound alike' and have the rest match exactly
approximate a 2^80 attack?

I've committed a not very good idea of the hexadecimal data generator,
which has TODOs and pointers to addressing the word cases.
https://github.com/tomrittervg/crypto-usability-study/blob/master/hexadecimal-testdata/hexdata.py

-tom