[messaging] Let's run a usability study (was Useability of public-key fingerprints)

Trevor Perrin trevp at trevp.net
Mon Apr 7 10:36:49 PDT 2014

On Mon, Apr 7, 2014 at 8:48 AM, Tom Ritter <tom at ritter.vg> wrote:
> On 27 March 2014 21:24, Trevor Perrin <trevp at trevp.net> wrote:
>>>  - We have two participants speaking fingerprints aloud to each other.
>>> Do we want them to do it over a cell phone to add difficulty, or just
>>> omit that bit?
>> I think speaking over a phone is a good case to test, because in
>> person you could also use QR codes, or just look at each other's
>> screens.  A landline might provide more consistent voice quality than
>> cellphone.
> It's true two tests over cellphones may get different voice quality.
> I tend to think the difference between tests would in this case be
> okay, because it's a real world factor people ahve to deal with... but
> I could go either way.

My concern is that if you do a bunch of base32 tests one day and
sentence tests the next, differences in the phones used or cell
reception might bias the results.

Landlines might make it easier to provide a constant voice-quality
experience.  Alternatively, if you mixed the tests across days and
across different cellphones, maybe you could argue that the
voice-quality variability is the same for everything?, though that
seems more complicated.

Perhaps just flag this debate, and Christine's UI researchers could
decide, since I think this is getting close to the point at which
professionals should be involved.

>> Other:
>>  *  I think a printed biz card may not work well w/high-resolution
>> visual fingerprints, so maybe doing it on a small phone screen is
>> better?
> I actually wasn't planning on doing a printed visual fingerprint... I
> suppose I could though.  I don't think the phone screen would work,
> because it's not terribly common to have someone's fingerprint on your
> phone and then try to verify it on your desktop.
>>  * When comparing aloud, I would suggest also having a time limit
>> designed to provoke a fairly high error rate, so the same methodology
>> is applied to all modes of use.  Otherwise, there's two variables in
>> the read-aloud case (time, error rate), so not as easy to compare
>> different formats.  Having the tester record "how many times the
>> participants asks the other to repeat the last token, slow down, or
>> otherwise change how they're reciting it" seems subjective and
>> unnecessary.
> I added in the time limit. I don't think it would be subjective (it
> seems pretty clear to me that if someone asks for a repeat we record
> it, if someone asks for them to slow down, we record it, etc).

What if I say "uhhh...ok" to confirm I've heard the last group, but
also subtly slow-down the rate at which you're saying things?  What if
I say "that was an 'A', right?"  What if I repeat every group after
you say it?

>  As for
> as necessity, we could of course not capture that information, but I
> feel like it's relevant.  For example: if we get successful results
> for English words with no repetitions, but successful results on
> pseudowords with tons of repetitions - are english words not better?

I'd argue that success rate on the fixed-time distinguishing test is a
better metric.

I.e. if given 12 seconds to talk, people can distinguish
matching/not-matching pseudowords 80% of the time, but only 30% of
time for English words, that seems more meaningful, even if more
repetitions / interactivity are used for the pseudowords.

>>  * Should there be a handwritten test, i.e. one user handwrites the
>> fingerprint for the other?
> I don't think it's worth the additional testing...

It may be too much given resources, but perhaps worth mentioning as
another test which might bring different things to light.


More information about the Messaging mailing list