[messaging] Let's run a usability study (was Useability of public-key fingerprints)

Wed Mar 12 23:18:43 PDT 2014

On 11 March 2014 00:41, Trevor Perrin <trevp at trevp.net> wrote:
> Fingerprint Types
>  - Visual and poetry fingerprints seem worth including.

Does anyone have a preference for type of visual fingerprint?  Some of
the implementations I know of are:
 - Identicons: http://haacked.com/archive/2007/01/22/Identicons_as_Visual_Fingerprints.aspx/
 - Monsters: http://www.splitbrain.org/projects/monsterid
 - Wavatars: http://www.shamusyoung.com/twentysidedtale/?p=1462
 - Unicorns (really)
http://meta.stackoverflow.com/questions/37328/my-godits-full-of-unicorns

I think I will go with identicons unless anyone really thinks unicorns
is better ;)

As far as poetry goes, I think I missed that, couldn't see it in
archives either.  Is there a reference to what poetry fingerprints
would look like?  Is it significantly different from english words?

> Comparison Method
>  - Business cards can only fit a small amount of text (since most of the
> space is taken up with other stuff), and don't typically contain
> high-resolution images.  So I'm not sure that comparing things between
> screens can be reduced to comparing things on business cards.

My work business card has a pretty-high-resolution QR code on the back
of it.  The PGP fingerprint is printed along the bottom.  It's a
smallish font, but it still fits everything... Most people I know, who
care about putting a fingerprint on the card, design things with the
fingerprint as a non-sacraficed design element.

> Approaches
>  - I suggest giving users X seconds to perform a comparison between a pair
> of values that are either identical or close, then seeing whether they
> correctly distinguish these cases.  X can be calibrated by performing some
> preliminary tests, then choosing a number that's likely to produce a
> variable error rate (i.e. not so low that subjects are always guessing
> randomly, not so high that they're always getting it right).  This is
> modeled after "character legibility" studies, e.g.
>
>  http://psychology.wichita.edu/surl/usabilitynews/81/legibility.asp

Sounds good.

> Modulating Speed
>  - For the "Spoken Aloud" test, why not just have pairs of subjects compare
> the fingerprints by speaking to each other?

Is the idea here to make the speed at which fingerprints are read
variable, but out of the control of the experiement conductor, so it's
variable in a "simulating the real world" sense?

> Error Rates
>  - I'm not sure about the '"One Subtle Flaw" case, because the fingerprints
> have different notions of "tokens" so this will be hard to compare between
> formats.  Also, it doesn't model a realistic attacker.

I agree it doesn't model a real attacker, but I thought it might help
us draw conclusions better.  Instead of just saying "Most users are
not fooled by a 2^80 match", perhaps we can say "If users actually
verify fingerprints, most are not fooled by any unmatching bytes."
Across the spectrum of unmatching bytes (from all bytes unmatching to
no bytes unmatching) test points along the spectrum to see if there's
a dropoff.  Granted we're only testing a couple points, but it seemed
this was a good point on the spectrum.

I'm not married to it though.

>  - For the computationally-chosen flaw, I think you should just assume an
> attacker that can consider 2^80 random candidate fingerprints, and choose
> the closest-matched fingerprint this attacker could find (but of course
> don't actually do 2^80 hashes, just set 80 bits of the fingerprint equal).

I do like this idea better, as it requires a lot less work. =)

I will try and update the document this weekend with this and any
other feedback I get.

-tom