[messaging] Test Data for the Usability Study

Thu Jun 26 03:53:56 PDT 2014

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

Sorry for the slow reply - I was hoping to work on this last weekend
but didn't get a chance.

On 23/06/14 22:34, Tom Ritter wrote:
> I implemented this on a branch 
> (https://github.com/tomrittervg/crypto-usability-study/commit/9df0e72f15391128b6b067e891323363780cb451
>
> 
), and ran into three issues:
> 
> 1) I also am not sure if, when we flip the bits, they should be 
> flipped at random, or just negated.  My gut says negated...

I'd look at it the other way - we don't flip bits, we match bits. The
attacker can match 80 bits and the rest are random (because if a real
attacker was generating fingerprints by hashing keys and looking for
partial matches, the non-matching bits would be random).

> 2) The 850 word corpus does not translate directly into an even
> number of bits.  I wound making it 14 words, each representing 9
> bits (using 512 of the words)

Makes sense. I did something similar for the Basic English dictionary
IIRC - all the word lists have lengths that are powers of two.

> 3) The more I thought about it, and then verified, the
> fingerprints barely match at all.

That's to be expected, isn't it? This is why I think we should start
by looking at small random differences (e.g. flipping a single bit) -
if we start by testing fingerprints with 80 matching and 48
non-matching bits, I'd expect the differences to be noticeable
regardless of the encoding, so we may not get a meaningful comparison
between encodings.

> While I agree this is a more perfect modeling of a 2^80 attacker,
> the fact remains that if I were an attacker I would not waste my 
> computation trying to match the correct number of _bits_ - I would 
> spend it trying to match the encoding as well as I could - even if 
> that means the actual number of bits I match isn't as close.  I
> know this is the same problem as trying to model an attacker who
> puts more emphasis on a fingerprint match at the beginning or end
> of the string though.

I totally agree that an attacker wouldn't try this. Rather than
spending time on modelling an unrealistic attacker, I think we should
start from the most basic questions - how does each encoding perform
in terms of comparison speed, false negative rate and false positive
rate - then use the data to test our intuitions about which
differences are most noticeable for each encoding, and then, if the
data hasn't led us off down some other interesting path, proceed to
model an attacker who has a 2^80 budget to make least-detectable
partial matches for each encoding.

I'm not saying this because the problem of a 2^80 attacker isn't
interesting, but because even the first step towards the problem is
interesting enough to keep us occupied for a while.

Cheers,
Michael
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)

iQEcBAEBCAAGBQJTq/vEAAoJEBEET9GfxSfMYasIAKXRtQLTdPIS/RvD8cxD2ML1
Lws2QDDchPDNh+4D3r8BVM6PsGZaG+v+MvYxgNopw9h8kcLwNHBgXxamdW/19I6X
5T9sVasobg6ckK0XmhjBcyomsyDiZLNcXQ3Gw3rzSIIZvd/IFCZ2eubpVmhuVfAX
FHN0FNZAUu8xVHc/4pGh0yDYgLmDwOFuEt48N1rWNLWyXVN+VABDwJAXJ7M+WYlB
wJpB04L1ViVv638FE68JrfgUmYGnWkdnZv+Xl2avjvT48AgShsIXLW3DgGUMvtsL
zQWDPmiEaZ2v4eU/Lb7MXYkGAhzc9mQ9usg2d4J9AKDf2Abv5nTxvupE+iZnbyU=
=sddd
-----END PGP SIGNATURE-----