<div dir="ltr"><div class="gmail_extra"><div class="gmail_quote">On Sun, Mar 1, 2015 at 9:15 AM, Joseph Bonneau <span dir="ltr"><<a href="mailto:jbonneau@gmail.com" target="_blank">jbonneau@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><br><div class="gmail_quote"><div><div class="h5">On Sat, Feb 28, 2015 at 11:46 AM, Trevor Perrin <span dir="ltr"><<a href="mailto:trevp@trevp.net" target="_blank">trevp@trevp.net</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">On Fri, Feb 27, 2015 at 7:26 AM, Daniel Kahn Gillmor<br> <span><<a href="mailto:dkg@fifthhorseman.net" target="_blank">dkg@fifthhorseman.net</a>> wrote:<br> > On Fri 2015-02-27 04:50:19 -0500, Nadim Kobeissi wrote:<br> >> On Thu, Feb 26, 2015 at 11:55 PM, Daniel Kahn Gillmor <<a href="mailto:dkg@fifthhorseman.net" target="_blank">dkg@fifthhorseman.net</a>> wrote:<br> >><br> >>> I agree that this part of the peerio/minilock approach is pretty<br> >>> disconcerting, and not just because it goes against years of practice<br> >>> and convention. it opens an obvious hole (offline dictionary attacks<br> >>> for high-value key material) and i'd love to see some more analysis of<br> >>> the underlying tradeoffs involved.<br> >><br> >> My understanding is that any search would be currently simply too expensive.<br> ><br> > I'm glad to hear that. Do you have pointers to details of your<br> > analysis? I'd love to read those thoughts.<br> <br> <br> </span>I echo dkg - I'd really like to see more analysis, it's not obvious<br> the attack cost is that high.<br> <br> Back of envelope:<br> <br> The peerio scrypt parameters (N=2^14, r=8) have been estimated to take<br> < 100 milliseconds on a single core of a 2009 Intel processor [1].<br> Assuming I can rent cores at ~$0.04/hr [2] = $1/day, that means:<br> - about $1 per 2^20 (~1 million) guesses<br> - about $1K per 2^30 guesses<br> - about $1M per 2^40 guesses<br> <br> How much entropy is in peerio passphrases? The tutorial video [3]<br> suggests choosing a sentence "that is unique to you, like moments<br> shared with friends, or childhood memories", and gives a couple<br> examples:<br> "My mother makes the best cheesecake." (36 chars)<br> "Waffles the cat had blue eyes" (29 chars)<br> <br> You'll find various estimates for entropy-per-English character, but 1<br> to 1.5 bits per character seems common [4]. This is very crude, but<br> that would put sentences like above in the 30-50 bit range. So it<br> seems plausible that a million-dollar 2^40 attacker might have a good<br> chance of success targeting a single account.<br> <br> (I guess the zxcvbn password-strength-checker is estimating these as<br> >100 bits entropy? That seems high. Maybe zxcvbn is tuned for<br> passwords, not sentences?).<br></blockquote><div><br></div></div></div><div>There are some serious problems with this type of analysis and I would like to permanently retire it from discussions about security.</div><div><br></div><div>Problem 1: Shannon entropy is not (and was never intended to be) a measure of how difficult it is to guess something (ie search for an unknown item by individual queries). It is a measure of how much something can be compressed and is an average-case metric. </div><div><br></div><div>Problem 2: The estimate of 1.5 bits of Shannon Entropy per character in English estimate is useless for security purposes. There are a few places these estimates come from: (a) Shannon's original 1950 paper which used an 8-character Markov model with inadequate statistical support (although it was an admiral effort for the pre-computer era) or (b) modern experiments where people compress English text with generic compression schemes. 1.5 bits comes from PPM. These are both character-based approaches which don't leverage any NLP to look at word-level and sentence-level influences, for example the existence of proper English grammar or even bigram patterns like the fact that speakers rarely use the word "inclement" before anything but "weather." Essentially, forget these numbers.</div><div><br></div><div>Min-entropy is the simplest metric that is mathematically appropriate for guessing and is a worst-case metric, which is usually what we want. In addition to min-entropy there are more specific metrics for guessing difficulty in my PhD thesis and 2012 IEEE Oakland paper, but the eseential question to ask is pretty simple: </div><div><br></div><div>Assume an adversary will work hard to come up with a dictionary of somewhere between 2^40 and 2^60 likely passphrases to try. What percentage of users will pick something in that large set? I would expect a very significant percentage will and the Peerio will burn these users.</div><div><br></div><div>The closest data point (and it's not perfect) is a 2006 Kuo et al. paper on phrase-based mnemonic passwords. Users were asked to pick a phrase-based password. With a dictionary of 400,000 phrases drawn from books, movie titles, etc. they cracked 4% of users in that study. This was a very limited effort of course and we don't know exactly how to build a dictionary of even 2^40 sentences to this purpose so we don't know what percent can be expected to fall. With a gun to my head I would estimate 25-50%. Again, this has (to my knowledge) never been publicly tested.</div><div><br></div><div>The bottom line is: Peerio's security model is based on a critical and completely untested assumption about how users will pick passphrases. Nadim seems to suggest "if there is evidence that this isn't secure, Peerio will change it". I would turn the onus around and say that there is no example that I know of of a human-chosen distribution of secrets under any conditions resisting serious attack at a rate acceptable for a widespread tool.</div></div></div></div></blockquote><div><br></div><div>Yes, it all boils down to this. Waiting for evidence of insecurity is silly. Joseph and Trevor's insights have been convincing. I think I need to act now -- going beyond public beta without making key derivation stronger would be a mistake.</div><div><br></div><div>I'm actually very interested in the solution that Michael Hamburg just outlined. Link for convenience:</div><div><a href="http://moderncrypto.org/mail-archive/messaging/2015/001574.html">http://moderncrypto.org/mail-archive/messaging/2015/001574.html</a><br></div><div><br></div><div>This strikes me as the best idea right now. Would be happy to hear thoughts.</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><div> Given the utter lack of evidence Peerio's approach will be secure, I think getting behind this security model is a mistake.</div><div><br></div><div>Personally I would advocate focusing on training users to memorize machine-chosen 60-70 bit passwords, strengthen them to 80-90 bits and then worry about all the other ways users can lose their passwords.</div></div></div></div> <br>_______________________________________________<br> Messaging mailing list<br> <a href="mailto:Messaging@moderncrypto.org">Messaging@moderncrypto.org</a><br> <a href="https://moderncrypto.org/mailman/listinfo/messaging" target="_blank">https://moderncrypto.org/mailman/listinfo/messaging</a><br> <br></blockquote></div><br></div></div>