<div dir="ltr"><div dir="ltr" style="font-family:arial,sans-serif;font-size:12.727272033691406px">I think the question is well defined scientifically if we do e.g. two words with a close "edit distance" for a given metric (proposed: use nltk + cmudict to get phonemes and then run standard edit distance algorithm).<div>
<br></div><div>then the question could be, in a separate study or speculation or commentary, how close does an "edit distance" of one, two etc. between words actually correspond to being "hard to distinguish", and whether you have to go finer grained (some phonemes are probably closer than others for example, and as you mention the position in the word is probably a big effect). you are totally right that that is a separate and harder question. however it is one that could be answered after the fact if there is interest. e.g. with the results of that follow up study, you could without rerunning the original study multiply by some additional factors and reinterpret results. there is probably existing research on this.</div>
<div><br></div><div>the choice is between</div><div>1. doing nothing with pronunciation because it too hard</div><div>2. doing something fuzzy with pronunciation because it is too hard to approach quantitatively</div><div>
3. doing something well defined, but certainly approximate, yet well defined enough to be scientific.</div><div>4. doing something closer to exact, but difficult.</div><div><br></div><div>I think your approach is definitely better, it's just approaching being a linguistics question that is outside the scope of this small 30 person pilot study. what I would propose is that we do 3. then if we found out that e.g. the poems/natural language really is some frontrunner for amazing (in reality I think non of these schemes will be a standout, but that the study may inform further design), we could try to convince a linguist to run 4. <br>
<div><br></div><div>C</div></div><div><br></div></div></div><div class="gmail_extra"><br><br><div class="gmail_quote">On Tue, May 27, 2014 at 3:08 PM, Michael Rogers <span dir="ltr"><<a href="mailto:michael@briarproject.org" target="_blank">michael@briarproject.org</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="">-----BEGIN PGP SIGNED MESSAGE-----<br>
Hash: SHA256<br>
<br>
</div>Thanks for the correction - I didn't know there was a concept of edit<br>
distance for pronounciation.<br>
<br>
Nevertheless, we still don't have a way to compare the noticeability<br>
of modifications across representations. How much phonic edit distance<br>
is equivalent to, say, the difference between modifying a character at<br>
the start of a fingerprint and modifying a character in the middle?<br>
<br>
It seems to me that the only credible way to answer such questions is<br>
empirically. We should start by making random modifications to the<br>
data to be compared, and measuring the error rate (false positives and<br>
false negatives) for each representation. Then we can come up with<br>
some hypotheses for which modifications are more or less noticeable<br>
for each representation, and test them against the data.<br>
<br>
*Then* we may be able to say that this modification to this<br>
representation is equally as noticeable as that modification to that<br>
representation - and if so, we can then ask which representation<br>
offers the most noticeability given an adversary with a computational<br>
budget for making least-noticeable modifications.<br>
<br>
Trying to guess which modifications will be least noticeable for each<br>
representation before we have any data is trying to run before we can<br>
walk, in my always humble opinion. ;-)<br>
<br>
Cheers,<br>
Michael<br>
<div class=""><br>
On 26/05/14 11:12, Christine Corbett Moran wrote:<br>
> Actually we can have a metric for "sound alike"<br>
><br>
> it's a bit hackish but a simple pass would be to use nltk here's an<br>
> example gist out there on getting pronunciation<br>
> <a href="https://gist.github.com/ConstantineLignos/1219749" target="_blank">https://gist.github.com/ConstantineLignos/1219749</a><br>
><br>
> two words "sound alike" if they have some specified edit distance<br>
> between their two pronunciations. e.g. one phone apart, or some<br>
> more complicated measure.<br>
><br>
> C<br>
><br>
><br>
> On Mon, May 26, 2014 at 11:55 AM, Michael Rogers<br>
</div>> <<a href="mailto:michael@briarproject.org">michael@briarproject.org</a> <mailto:<a href="mailto:michael@briarproject.org">michael@briarproject.org</a>>><br>
<div><div class="h5">> wrote:<br>
><br>
> On 26/05/14 01:15, Tom Ritter wrote:<br>
>> Third: Figure out how to approximate an attacker who can perform<br>
>> 2^80 calculations in the 'weird' cases. For a 32-character hex<br>
>> fingerprint, a 2^80 attacker can match 20 characters.<br>
><br>
>> Weird Case 1: An attacker matches the beginning and end parts of<br>
>> the fingerprint to try and trick someone doing a visual compare.<br>
>> Clearly, matching the beginning and ending 10 characters exactly<br>
>> is harder than matching any 20. but how much harder? Would a<br>
>> match of the beginning and ending 8 characters correctly<br>
>> characterize a 2^80 attacker?<br>
><br>
> As I've mentioned before, I don't think we can make a fair<br>
> comparison of 'weird' attacks across fingerprint representations.<br>
><br>
> Having said that... a 2^80 attacker can match 20 characters at<br>
> chosen positions. I don't know how to calculate how many characters<br>
> a 2^80 attacker could match at unchosen positions, but it seems to<br>
> me that it would depend on the number of positions, i.e. the length<br>
> of the fingerprint.<br>
><br>
>> Weird Case 2: An attacker tries the match the fingerprint by<br>
>> pronunciation to try and trick someone doing a vocal compare.<br>
>> Again, matching 20 characters exactly and making the remaining<br>
>> 12 'sound alike' is harder than just matching 20. Would an<br>
>> attacker getting 28 characters to 'sound alike' and have the rest<br>
>> match exactly approximate a 2^80 attack?<br>
><br>
> We don't even have a metric for 'sound alike', so this question<br>
> isn't well-founded.<br>
><br>
</div></div>> Cheers, Michael _______________________________________________<br>
> Messaging mailing list <a href="mailto:Messaging@moderncrypto.org">Messaging@moderncrypto.org</a><br>
> <mailto:<a href="mailto:Messaging@moderncrypto.org">Messaging@moderncrypto.org</a>><br>
<div class="">> <a href="https://moderncrypto.org/mailman/listinfo/messaging" target="_blank">https://moderncrypto.org/mailman/listinfo/messaging</a><br>
><br>
><br>
><br>
><br>
> -- Christine Corbett Moran <a href="mailto:christine.corbett@gmail.com">christine.corbett@gmail.com</a><br>
</div>> <mailto:<a href="mailto:christine.corbett@gmail.com">christine.corbett@gmail.com</a>> Physics @ <a href="http://ICS.uzh.ch" target="_blank">ICS.uzh.ch</a><br>
> <<a href="http://ICS.uzh.ch" target="_blank">http://ICS.uzh.ch</a>> // Zurich: <a href="tel:%2B41%2079%20962%204499" value="+41799624499">+41 79 962 4499</a> Dev @<br>
<div class="">> <a href="http://circleof6app.com" target="_blank">http://circleof6app.com</a> // Boston: <a href="tel:%2B1%20%28617%29%20398-0452" value="+16173980452">+1 (617) 398-0452</a> Dev @<br>
> <a href="https://whispersystems.org" target="_blank">https://whispersystems.org</a> // SF: <a href="tel:%2B1%20%28415%29%20670%209629" value="+14156709629">+1 (415) 670 9629</a><br>
> <a href="http://www.christinecorbettmoran.com" target="_blank">www.christinecorbettmoran.com</a><br>
</div>> <<a href="http://www.christinecorbettmoran.com/" target="_blank">http://www.christinecorbettmoran.com/</a>><br>
<div class="">><br>
-----BEGIN PGP SIGNATURE-----<br>
Version: GnuPG v1.4.12 (GNU/Linux)<br>
<br>
</div>iQEcBAEBCAAGBQJThI4/AAoJEBEET9GfxSfMwj0H/iLAxsPk6AS9gse3dQx+1c+N<br>
cAieLME58d63QjklQgVr67l9nMFSsJkSci3WelzJluJuf8xcFX+v/2X2nrWuZzfW<br>
ALm4AQLM5mKlKCEyhGlFOHFgN5X03NXN8PriSsQpJuytfiWQnt/2gpSpWcNUkvNY<br>
pkjOqvbC5t8xVEGudkoreNw53L+//JMcNjNFOWrX5qNQdawdWqZc6PXq1+0nFd1d<br>
31uFGus2taxka34v6YM/8vzhhzsJMze58RRna+S+kui1MnBCJi3q43vYCVUMuCAw<br>
4AqhfZZMw/BJn3JQHKZAuVzjXUh8IxFtL0NwC7Xv84sL2nVkeBh4iY13b6udFvk=<br>
=THuh<br>
-----END PGP SIGNATURE-----<br>
</blockquote></div><br></div>