[messaging] Let's run a usability study (was Useability of public-key fingerprints)

Mon Apr 7 08:48:09 PDT 2014

On 27 March 2014 21:24, Trevor Perrin <trevp at trevp.net> wrote:
>>  - We have two participants speaking fingerprints aloud to each other.
>> Do we want them to do it over a cell phone to add difficulty, or just
>> omit that bit?
>
> I think speaking over a phone is a good case to test, because in
> person you could also use QR codes, or just look at each other's
> screens.  A landline might provide more consistent voice quality than
> cellphone.

It's true two tests over cellphones may get different voice quality.
I tend to think the difference between tests would in this case be
okay, because it's a real world factor people ahve to deal with... but
I could go either way.

> Other:
>
>  *  I think a printed biz card may not work well w/high-resolution
> visual fingerprints, so maybe doing it on a small phone screen is
> better?

I actually wasn't planning on doing a printed visual fingerprint... I
suppose I could though.  I don't think the phone screen would work,
because it's not terribly common to have someone's fingerprint on your
phone and then try to verify it on your desktop.

>  * When comparing aloud, I would suggest also having a time limit
> designed to provoke a fairly high error rate, so the same methodology
> is applied to all modes of use.  Otherwise, there's two variables in
> the read-aloud case (time, error rate), so not as easy to compare
> different formats.  Having the tester record "how many times the
> participants asks the other to repeat the last token, slow down, or
> otherwise change how they're reciting it" seems subjective and
> unnecessary.

I added in the time limit. I don't think it would be subjective (it
seems pretty clear to me that if someone asks for a repeat we record
it, if someone asks for them to slow down, we record it, etc).  As for
as necessity, we could of course not capture that information, but I
feel like it's relevant.  For example: if we get successful results
for English words with no repetitions, but successful results on
pseudowords with tons of repetitions - are english words not better?

>  * Should there be a handwritten test, i.e. one user handwrites the
> fingerprint for the other?

I don't think it's worth the additional testing...

>  * Regarding the "computationally chosen flaws" - I think you should
> randomize where the error goes, otherwise users will figure out it's
> always in the middle/inner chars, and game the test.

I want to make sure we test cases of just the middle N being
different, but I agree.  I put in a 25/7% mix.

-tom