[messaging] Test Data for the Usability Study
Christine Corbett Moran
corbett at alum.mit.edu
Tue May 27 06:55:27 PDT 2014
I think the question is well defined scientifically if we do e.g. two words
with a close "edit distance" for a given metric (proposed: use nltk +
cmudict to get phonemes and then run standard edit distance algorithm).
then the question could be, in a separate study or speculation or
commentary, how close does an "edit distance" of one, two etc. between
words actually correspond to being "hard to distinguish", and whether you
have to go finer grained (some phonemes are probably closer than others for
example, and as you mention the position in the word is probably a big
effect). you are totally right that that is a separate and harder question.
however it is one that could be answered after the fact if there is
interest. e.g. with the results of that follow up study, you could without
rerunning the original study multiply by some additional factors and
reinterpret results. there is probably existing research on this.
the choice is between
1. doing nothing with pronunciation because it too hard
2. doing something fuzzy with pronunciation because it is too hard to
3. doing something well defined, but certainly approximate, yet well
defined enough to be scientific.
4. doing something closer to exact, but difficult.
I think your approach is definitely better, it's just approaching being a
linguistics question that is outside the scope of this small 30 person
pilot study. what I would propose is that we do 3. then if we found out
that e.g. the poems/natural language really is some frontrunner for amazing
(in reality I think non of these schemes will be a standout, but that the
study may inform further design), we could try to convince a linguist to
On Tue, May 27, 2014 at 3:08 PM, Michael Rogers <michael at briarproject.org>wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA256
> Thanks for the correction - I didn't know there was a concept of edit
> distance for pronounciation.
> Nevertheless, we still don't have a way to compare the noticeability
> of modifications across representations. How much phonic edit distance
> is equivalent to, say, the difference between modifying a character at
> the start of a fingerprint and modifying a character in the middle?
> It seems to me that the only credible way to answer such questions is
> empirically. We should start by making random modifications to the
> data to be compared, and measuring the error rate (false positives and
> false negatives) for each representation. Then we can come up with
> some hypotheses for which modifications are more or less noticeable
> for each representation, and test them against the data.
> *Then* we may be able to say that this modification to this
> representation is equally as noticeable as that modification to that
> representation - and if so, we can then ask which representation
> offers the most noticeability given an adversary with a computational
> budget for making least-noticeable modifications.
> Trying to guess which modifications will be least noticeable for each
> representation before we have any data is trying to run before we can
> walk, in my always humble opinion. ;-)
> On 26/05/14 11:12, Christine Corbett Moran wrote:
> > Actually we can have a metric for "sound alike"
> > it's a bit hackish but a simple pass would be to use nltk here's an
> > example gist out there on getting pronunciation
> > https://gist.github.com/ConstantineLignos/1219749
> > two words "sound alike" if they have some specified edit distance
> > between their two pronunciations. e.g. one phone apart, or some
> > more complicated measure.
> > C
> > On Mon, May 26, 2014 at 11:55 AM, Michael Rogers
> > <michael at briarproject.org <mailto:michael at briarproject.org>>
> > wrote:
> > On 26/05/14 01:15, Tom Ritter wrote:
> >> Third: Figure out how to approximate an attacker who can perform
> >> 2^80 calculations in the 'weird' cases. For a 32-character hex
> >> fingerprint, a 2^80 attacker can match 20 characters.
> >> Weird Case 1: An attacker matches the beginning and end parts of
> >> the fingerprint to try and trick someone doing a visual compare.
> >> Clearly, matching the beginning and ending 10 characters exactly
> >> is harder than matching any 20. but how much harder? Would a
> >> match of the beginning and ending 8 characters correctly
> >> characterize a 2^80 attacker?
> > As I've mentioned before, I don't think we can make a fair
> > comparison of 'weird' attacks across fingerprint representations.
> > Having said that... a 2^80 attacker can match 20 characters at
> > chosen positions. I don't know how to calculate how many characters
> > a 2^80 attacker could match at unchosen positions, but it seems to
> > me that it would depend on the number of positions, i.e. the length
> > of the fingerprint.
> >> Weird Case 2: An attacker tries the match the fingerprint by
> >> pronunciation to try and trick someone doing a vocal compare.
> >> Again, matching 20 characters exactly and making the remaining
> >> 12 'sound alike' is harder than just matching 20. Would an
> >> attacker getting 28 characters to 'sound alike' and have the rest
> >> match exactly approximate a 2^80 attack?
> > We don't even have a metric for 'sound alike', so this question
> > isn't well-founded.
> > Cheers, Michael _______________________________________________
> > Messaging mailing list Messaging at moderncrypto.org
> > <mailto:Messaging at moderncrypto.org>
> > https://moderncrypto.org/mailman/listinfo/messaging
> > -- Christine Corbett Moran christine.corbett at gmail.com
> > <mailto:christine.corbett at gmail.com> Physics @ ICS.uzh.ch
> > <http://ICS.uzh.ch> // Zurich: +41 79 962 4499 Dev @
> > http://circleof6app.com // Boston: +1 (617) 398-0452 Dev @
> > https://whispersystems.org // SF: +1 (415) 670 9629
> > www.christinecorbettmoran.com
> > <http://www.christinecorbettmoran.com/>
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.12 (GNU/Linux)
> -----END PGP SIGNATURE-----
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Messaging