[messaging] Test Data for the Usability Study

Mon Jun 23 14:34:42 PDT 2014

I implemented this on a branch
(https://github.com/tomrittervg/crypto-usability-study/commit/9df0e72f15391128b6b067e891323363780cb451
), and ran into three issues:

1) I also am not sure if, when we flip the bits, they should be
flipped at random, or just negated.  My gut says negated...
2) The 850 word corpus does not translate directly into an even number
of bits.  I wound making it 14 words, each representing 9 bits (using
512 of the words)
3) The more I thought about it, and then verified, the fingerprints
barely match at all.

Negation:
wood - be - jump - though - punishment - for - company - animal - far
- you - unit - snow - cover - father
disease - society - wool - punishment - to - even - edge - again -
hour - base - wood - as - amusement - daughter

Random:
attention - smell - behavior - smile - rain - the - wood - food -
stage - get - almost - competition - increase - earth
birth - cough - apparatus - soap - knowledge - of - band - friend -
snow - get - then - stretch - belief - earth

While I agree this is a more perfect modeling of a 2^80 attacker, the
fact remains that if I were an attacker I would not waste my
computation trying to match the correct number of _bits_ - I would
spend it trying to match the encoding as well as I could - even if
that means the actual number of bits I match isn't as close.  I know
this is the same problem as trying to model an attacker who puts more
emphasis on a fingerprint match at the beginning or end of the string
though.

-tom

On 15 June 2014 13:41, Tom Ritter <tom at ritter.vg> wrote:
> On 15 June 2014 13:26, Michael Rogers <michael at briarproject.org> wrote:
>> In general I'm not sure that modifying the representation, as opposed
>> to the fingerprint being represented, is a good way to model the
>> attacker. I thought we were trying to model an attacker who can
>> generate 2^80 fingerprints, then pick the one that's closest to the
>> victim's fingerprint, for some definition of "closest". In the first
>> stage of the study, we define "closest" as "sharing the most bits". *
>>
>> If that's what we're trying to model, perhaps we should rewrite the
>> encoders to separate fingerprint generation (random or pseudo-random)
>> from encoding (deterministic). We could then generate pairs of
>> fingerprints (victim and attacker) that differ in the expected number
>> of bits, and run them through the encoders.
>>
>> I'm happy to do this for the poem generator, but I may not be the best
>> person to do it for the other encoders.
>
>
> I agree, and that would be great!
>
> English word should be relatively easy to do in that model.  I'll
> generate a 130 bit fingerprint, and let each successive 10 bits
> determine the next word.  Then I'd flip 40 of the bits at random and
> repeat the encoding.
>
> English pseudowords would be slightly more difficult.  Trevor would be
> the best person to change that code, but if he doesn't have time I
> will investigate.
>
> -tom