[messaging] Peerio

Sun Mar 1 00:15:21 PST 2015

On Sat, Feb 28, 2015 at 11:46 AM, Trevor Perrin <trevp at trevp.net> wrote:

> On Fri, Feb 27, 2015 at 7:26 AM, Daniel Kahn Gillmor
> <dkg at fifthhorseman.net> wrote:
> > On Fri 2015-02-27 04:50:19 -0500, Nadim Kobeissi wrote:
> >> On Thu, Feb 26, 2015 at 11:55 PM, Daniel Kahn Gillmor <
> dkg at fifthhorseman.net> wrote:
> >>
> >>> I agree that this part of the peerio/minilock approach is pretty
> >>> disconcerting, and not just because it goes against years of practice
> >>> and convention.  it opens an obvious hole (offline dictionary attacks
> >>> for high-value key material) and i'd love to see some more analysis of
> >>> the underlying tradeoffs involved.
> >>
> >> My understanding is that any search would be currently simply too
> expensive.
> >
> > I'm glad to hear that.  Do you have pointers to details of your
> > analysis?  I'd love to read those thoughts.
>
>
> I echo dkg - I'd really like to see more analysis, it's not obvious
> the attack cost is that high.
>
> Back of envelope:
>
> The peerio scrypt parameters (N=2^14, r=8) have been estimated to take
> < 100 milliseconds on a single core of a 2009 Intel processor [1].
> Assuming I can rent cores at ~$0.04/hr [2] = $1/day, that means:
>  - about $1 per 2^20 (~1 million) guesses
>  - about $1K per 2^30 guesses
>  - about $1M per 2^40 guesses
>
> How much entropy is in peerio passphrases?  The tutorial video [3]
> suggests choosing a sentence "that is unique to you, like moments
> shared with friends, or childhood memories", and gives a couple
> examples:
>  "My mother makes the best cheesecake." (36 chars)
>  "Waffles the cat had blue eyes" (29 chars)
>
> You'll find various estimates for entropy-per-English character, but 1
> to 1.5 bits per character seems common [4].  This is very crude, but
> that would put sentences like above in the 30-50 bit range.  So it
> seems plausible that a million-dollar 2^40 attacker might have a good
> chance of success targeting a single account.
>
> (I guess the zxcvbn password-strength-checker is estimating these as
> >100 bits entropy?  That seems high.  Maybe zxcvbn is tuned for
> passwords, not sentences?).
>

There are some serious problems with this type of analysis and I would like
to permanently retire it from discussions about security.

Problem 1: Shannon entropy is not (and was never intended to be) a measure
of how difficult it is to guess something (ie search for an unknown item by
individual queries). It is a measure of how much something can be
compressed and is an average-case metric.

Problem 2: The estimate of 1.5 bits of Shannon Entropy per character in
English estimate is useless for security purposes. There are a few places
these estimates come from: (a) Shannon's original 1950 paper which used an
8-character Markov model with inadequate statistical support (although it
was an admiral effort for the pre-computer era) or (b) modern experiments
where people compress English text with generic compression schemes. 1.5
bits comes from PPM. These are both character-based approaches which don't
leverage any NLP to look at word-level and sentence-level influences, for
example the existence of proper English grammar or even bigram patterns
like the fact that speakers rarely use the word "inclement" before anything
but "weather." Essentially, forget these numbers.

Min-entropy is the simplest metric that is mathematically appropriate for
guessing and is a worst-case metric, which is usually what we want. In
addition to min-entropy there are more specific metrics for guessing
difficulty in my PhD thesis and 2012 IEEE Oakland paper, but the eseential
question to ask is pretty simple:

Assume an adversary will work hard to come up with a dictionary of
somewhere between 2^40 and 2^60 likely passphrases to try. What percentage
of users will pick something in that large set? I would expect a very
significant percentage will and the Peerio will burn these users.

The closest data point (and it's not perfect) is a 2006 Kuo et al. paper on
phrase-based mnemonic passwords. Users were asked to pick a phrase-based
password. With a dictionary of 400,000 phrases drawn from books, movie
titles, etc. they cracked 4% of users in that study.  This was a very
limited effort of course and we don't know exactly how to build a
dictionary of even 2^40 sentences to this purpose so we don't know what
percent can be expected to fall. With a gun to my head I would estimate
25-50%. Again, this has (to my knowledge) never been publicly tested.

The bottom line is: Peerio's security model is based on a critical and
completely untested assumption about how users will pick passphrases. Nadim
seems to suggest "if there is evidence that this isn't secure, Peerio will
change it". I would turn the onus around and say that there is no example
that I know of of a human-chosen distribution of secrets under any
conditions resisting serious attack at a rate acceptable for a widespread
tool. Given the utter lack of evidence Peerio's approach will be secure, I
think getting behind this security model is a mistake.

Personally I would advocate focusing on training users to memorize
machine-chosen 60-70 bit passwords, strengthen them to 80-90 bits and then
worry about all the other ways users can lose their passwords.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://moderncrypto.org/mail-archive/messaging/attachments/20150301/cfc1cdfb/attachment.html>