[messaging] Test Data for the Usability Study

Tue May 27 19:55:32 PDT 2014

On 27 May 2014 09:08, Michael Rogers <michael at briarproject.org> wrote:
> Nevertheless, we still don't have a way to compare the noticeability
> of modifications across representations. How much phonic edit distance
> is equivalent to, say, the difference between modifying a character at
> the start of a fingerprint and modifying a character in the middle?

The data can be sliced in many ways. We don't have to try and compare
close-phonetic-matching fingerprints to end-matching hexadecimal
fingerprints.  But if we do we haven't made a conclusive statement,
we've just decided that this data is interesting and it should be
studied more closely.

The 75% hexadecimal can be compared to the 75% phonetic, and the 75%
phonetic to the 25% phonetic. I think conclusions that we may be able
to draw are "It seems like fingerprints that match phonetically
closely do in fact confuse users" and "It does not appear that
matching only on the ends is dufficient to confuse users."

> It seems to me that the only credible way to answer such questions is
> empirically. We should start by making random modifications to the
> data to be compared, and measuring the error rate (false positives and
> false negatives) for each representation. Then we can come up with
> some hypotheses for which modifications are more or less noticeable
> for each representation, and test them against the data.
>
> *Then* we may be able to say that this modification to this
> representation is equally as noticeable as that modification to that
> representation - and if so, we can then ask which representation
> offers the most noticeability given an adversary with a computational
> budget for making least-noticeable modifications.
>
> Trying to guess which modifications will be least noticeable for each
> representation before we have any data is trying to run before we can
> walk, in my always humble opinion. ;-)

I don't entirely disagree, but my aim with this is more to get a
sample of data at walking, jogging, and running to demonstrate to
people that there is some interesting stuff here that should be the
subject of more involved, professional study.

We're testing several difference variables, and it doesn't make sense
to cross compare everything. But with the complete dataset obtained,
I'm more than happy to let people compare it in the ways they're
comfortable and follow up on different avenues.

On 27 May 2014 09:53, Christine Corbett Moran
<christine.corbett at gmail.com> wrote:
> the choice is between
> 1. doing nothing with pronunciation because it too hard
> 2. doing something fuzzy with pronunciation because it is too hard to
> approach quantitatively
> 3. doing something well defined, but certainly approximate, yet well defined
> enough to be scientific.
> 4. doing something closer to exact, but difficult.
>
> I think your approach is definitely better, it's just approaching being a
> linguistics question that is outside the scope of this small 30 person pilot
> study. what I would propose is that we do 3. then if we found out that e.g.
> the poems/natural language really is some frontrunner for amazing (in
> reality I think non of these schemes will be a standout, but that the study
> may inform further design), we could try to convince a linguist to run 4.

I agree.  My intention was the following:

1) We do random changes (that is, no trickery putting the changes in
the middle vs randomly spaced or no trickery trying to get
phonetically similar words) 75% of the time. And we do the tricky bits
25% of the time.

2) To get the phonetic tricky bit, we do something like the following
simplified example:
a) Take your corpus of english words: [adam, bob, carol, madam, cobb, rob, lobb]
b) Seperate them into phonetic groups: [adam, madam], [bob, rob, cobb], [carol]
c) Generate a random fingerprint: carol, cobb, bob, adam, carol
d) A 2^80 attacker (scaled down) could match about 3 of those tokens...
    But for this fingerprint, a 2^80 attacker wants to achieve a
fingerprint that hits any combination of {[carol], [bob, rob, cobb],
[bob, rob, cobb], [adam, madam], [carol]}
    I need to (learn how to) do the math, but I think we should be
able to calculate how many of those tokens they could hit in 2^80
random guesses.

Is this process perfect? I doubt it. But I think it falls into
category #3.  And, this is only 1/4 of the attempts we make.

-tom