[messaging] Let's run a usability study (was Useability of public-key fingerprints)

Thu Feb 13 06:06:49 PST 2014

On 12 February 2014 04:19, Trevor Perrin <trevp at trevp.net> wrote:
>> Some of the places where I see fingerprints continuing to be useful
>> far into the future:
>>  - Business Cards. My key is at <this url> and the fingerprint is [field]
>>  - You are handed a new device and {don't have access to your database
>> of trusted fingerprints/need to accomplish work, quickly}. Your
>> contact sends you a <OTR message/pgp-signed message/whatever>. Do you
>> recognize this fingerprint?
>>  - "Here, SSH into my server, I set you up an account." "Is <this> the
>> SSH fingerprint?" "Uh.... yea I recognize that."
>
> Interesting that your 2nd and 3rd cases involve recognizing a
> fingerprint from long-term memory.  That's a hard use case for
> fingerprints.  At most I'd expect a user to remember a few bytes,
> which isn't a secure check.

It is a hard use case - but it happens to me regurally. I doubt I'm
the only one who in the real world is pressured to complete work
quickly and not delay by forcing either colleagues or clients to walk
back to their desk/pick up the phone and verify a fingerprint.
Especially when said client is in a meeting and I can't actually get
in touch with them for hours from now.

> The main use case I see is comparing two fingerprints, one on your
> screen for your communication partner, and one you're checking against
> (from a phone call, a slip of paper, a friend's screen, a webpage,
> etc.)

I think that's a common use case, the one I would put that is slightly
more common is I need to check a fingerprint against nothing. I have a
signed email, or I have an email I'm about the encrypt, or I have a
new instant message and I need to figure out if this fingerprint is
genuine.  I *could* go google for a web page to assert validity that
way (or query that PGP-in-DNS thing) - but that's asking too much of a
regular user.  We should remember this use case.

>> The THC tool that brute forces look-alikes, with the tricks you'd
>> expect: it weighted fingerprints that matches better at the beginning
>> and ending higher, it weighted '3' as somewhat close to to '8' but not
>> as close as 'B', etc. Such a tool can be built for any field we come
>> up with.
>
> I think the THC "fuzzy fingerprint" tool isn't actually that
> effective.  Here's its "high score" outputs (
> https://www.thc.org/thc-ffp ):
>
>   08:54:5d:27:f8:e9:47:4e:49:8a:87:7e:03:cc:98:73   (target)
>
>   08:54:5d:27:a1:5b:82:39:f6:ba:79:df:67:6d:78:73   (14 hour run)
>
>   08:54:56:2c:28:d6:87:89:5e:02:a6:fd:43:c9:d8:73   (109 day run)
>
>   08:54:5d:39:d6:20:58:b3:f0:99:39:2d:7d:2c:98:73   (63 day run)
>
>
> You'd need a lot more computation to fool anyone who's doing more than
> a cursory check.

I want to believe you, but I'm going to withhold doing so until I see
a study that says it didn't fool users. :)

>> Besides Trevor's examples:
>> Here are some already made:
>>  - Identicons:
>> http://haacked.com/archive/2007/01/22/Identicons_as_Visual_Fingerprints.aspx/
>>  - Monsters: http://www.splitbrain.org/projects/monsterid
>>  - Wavatars: http://www.shamusyoung.com/twentysidedtale/?p=1462
>>  - Unicorns (really)
>> http://meta.stackoverflow.com/questions/37328/my-godits-full-of-unicorns
>> Here are some more ideas:
>>  - A spirograph
>>  - A color pattern, a gradient
>>  - A floral pattern, or flannel, etc
>>  - A geometric plot on a Cartesian graph
>>  - A geometric plot on a globe (potentially limited to landmasses and
>> ignoring oceans)
>
> I agree with dkg's skepticism of visual fingerprints [1].  They're
> less flexible than text which can be handwritten or verbalized.  I
> don't know any studies showing they're effective.

Do we know studies saying that PGP fingerprints are effective?

> And the main
> argument seems to be that they make it easy to notice that unrelated
> fingerprints are different.  But that's easy even with text!

I disagree. In the context that visual fingerprints would be used
(that is, every time you X, you see the fingerprint) I'm going to
suggest that detections _can_ be detected by ordinary users, while a
textual fingerprint in the same situation would not.

> The test should be:
>
>  * Does the format resist false-accept errors even when an attacker
> expends significant computational effort on a "fuzzy match"?  I don't
> think we know the answer for visual fingerprints.

Or for text ones.

>> Take a step back forward: let's say we magically have a geometric map
>> plot that people recognized with pretty good confidence.  What would
>> this get us? I keep imagining my OTR conversation having their
>> map-fingerprint visible so I would get used to seeing it. And seeing
>> their map-fingerprint in my mail client.  Are we gaining anything by
>> getting a user to recognize their friend's fingerprint?
>>
>> I'm not sure I see a context where we are, honestly.
>
> Yeah, I think trying to remember fingerprints and recognize them on
> every conversation is the wrong way to get value out of fingerprints.
> It makes more sense to just have them as an option for users to
> perform infrequent checks (on new or changed keys, or to double-check
> an important key), and then have software remember that key.

I'm arguing half-with you but half-against you. I'm not 100% against, yet.

>> RE: Pseudowords. Maaaaybe.  Just like THC's tool, I could create one
>> that aims to produce similarly-looking pseudowords. And since the
>> words are not words, I think people's pattern recognition will be
>> tricked more easily than with an actual wordlist.  I think a usability
>> study could be in order. ;)
>
> Yeah, we need some studies here...

I agree.  Let's run one.  I've participated in them - it's really not
that hard, especially if we can find a professor in the field who's
willing to advise/review our proposal.

In addition to figuring out what we want to test, my thinking is we
need to find a way to test a hypothesis without influencing the
subject under test. That is, we can't tell them we want to test how
well they compare text fingerprints - because then they'll do it
really well. That part will be tricky...

But as far as what we want to test I think the following are useful
experiments to run. (They need to have that 'trick' developed into
them though.)

-----------------------------
Testing visual fingerprints via SSH

1) A subject SSHes into a machine with some regularity. More than once
a year, less than once a day. Upon every SSH, the server will print
the visual SSH fingerprint to the console, whith the user will see but
eventually filter out. We record how often they SSH into the machine.
We bring them into a test facility and they SSH from a fresh machine
without known_hosts into a machine with a 'fuzzy' fingerprint. We
examine the detection rate, plotted against how often they connect.

2) We do the identical test using the current text fingerprints.

3) We do the identical test using pseudoword fingerprints.

4) We do the identical test using actual english words.

5) We do the identical test using no change in login, using only
known_fingerprints as a control.

This was has the advantage of not requiring much programming. :)

-----------------------------
Testing visual fingerprints via IM

1) A subject communicates with a partner via IM, and their icon is an
identicon calculated from a long term OTR identity key. No other OTR
messages are shown - hell maybe it's not even OTR at all. (It is still
a bit buggy depending on communication habits.) They communicate with
some frequency, the icon never changes.  We bring them into the
facility, they recieve a message from someone with the same name but a
fuzzy fingerprint. We try to figure out whether or not they notice.

2) We do the same, but instead of an icon, there's a textual fingerprint.

3) There is no control.

-----------------------------
Testing oral comparison

1) We bring someone into a test facility, we have them call someone,
we get them to compare full-hexadecimal fingerprints orally. We record
the time taken for the comparison and the number of needed repeats.

2) We repeat for restricted hexadecimal

3) We repeat for pseudoword

4) We repeat for english words

5..N) We repeat as needed for other algorithms we devise

-----------------------------
Testing visual comparison

1) We get someone to send an encrypted email to someone, given a
business card. The business card has a fingerprint. We observe the
rate of acceptance of comparison errors based on changes of the
fingerprint, with the following types:
a) completely different
b) matching first 4 and last 4
c) matching all but a middle single digit
d) matching completely

2) Repeated but with an identicon, with photoshopped differences
approximation the above.

-----------------------------
Things we would need to conduct these tests as I see it.

1) Rough consensus that the experiments we design collectively will
actually address (some of) the questions we have.
2) For Test #2: someone willing to do the development
3) University professor in this field willing to shepard and advise us
4) Large body of humans with which to experiment on. (Hopefully the
university professor can help there.)

-tom