[messaging] "Pseudoword" base32 fingerprints

Trevor Perrin trevp at trevp.net
Thu Feb 6 22:48:40 PST 2014


On Thu, Feb 6, 2014 at 1:40 AM, Daniel Thomas <drt24+messaging at cam.ac.uk> wrote:
> On 06/02/14 01:35, Trevor Perrin wrote:
>> I like the smaller size of the pseudowords, particularly for
>> transcribing these things, spelling out the characters over the phone,
>> or viewing on a small screen.  And a lot of the words are unusual so
>> are going to need to be spelled out.
>>
>> But it would be interesting to see what a better wordlist looks like.
>
> Diceware[0] is has a (fairly short 7776) word list in multiple languages
> for the purpose of generating easy to remember passphrases.


Hi Daniel,

That's about a 13-bit list, which seems like a good middle ground
between an 8-bit list (like PGPfone) or a 16-bit list.  An 8-bit list
necessitates 16 word fingerprints for 128-bit security, which feels
like too many words.  A 16-bit list contains 65K words, which is more
than most people's vocabulary, meaning a lot of unusual words that
would have to be spelled out.

The Diceware dictionary is designed around short words and word
fragments (it includes numbers, punctuation, and non-words, which is a
bit weird IMO).  I wrote a script to generate 10 random Diceware words
to see what fingerprints might look like:

https://github.com/trevp/keyname


hop - flu - urn - belie - gogo - gravy - mayor - avow - plush - enter

bump - seem - soft - lm - plane - exit - plus - stilt - behind - malta

tract - rude - rhine - ready - climb - fell - fell - reek - cody - kudzu

bunch - sound - adler - galt - signor - glom - soup - on - lund - juju

essay - eave - ef - pro - stung - gn - smash - josef - vetch - busy

dawson - tic - vy - cake - rock - sr - store - ice - plunk - gp

old - swept - win - mike - xy - chill - seethe - allow - alva - jh

grace - curia - coke - rebut - 15 - foray - jaw - weco - anvil - buenos

pn - adair - swelt - faith - slash - berlin - watch - blood - start - santa

grow - del - bon - 99th - kepler - cam - fun - 37th - dryad - prone


Below compares 5 diceware fingerprints side-by-side with 5 pseudoword
fingerprints of score=18.  The pseudoword fingerprints took an average
of ~30 seconds apiece to generate on a single core of my Macbook Air.
(The max possible score is 20, a score of 18 means 2 deviations from
vowel/consonant alternation):


oman - swath - haze - elmer - gouda - admix - feat - afar - reel - for

ukigex - 3kiw - jejod - yvak - rewupa


blitz - teal - emma - bambi - queen - 92 - mecum - om - derek - twa

lijuv7 - woxm - pokoj - cixa - ehajen


op - zomba - 84th - soy - oval - evolve - spook - fk - ghi - magog

syivoh - upim - leewo - hoda - madeso


piotr - vain - david - mk - gasp - buoy - malt - az - hang - rena

bewora - zutm - hirub - ugux - tlezeb


perk - fate - cinch - gulf - jb - marks - wag - canoe - sprig - maw

ripoyu - ime2 - fenef - aqos - lehnof


Both approaches seem pretty decent, not sure which is best.  Choosing
13-bit wordlists for different languages and dealing with
cross-language compatibility seems a hassle, but so is computing tens
of millions of hashes for a fingerprint.

There's a lot more that could be done here:  e.g. make a better
wordlist than Diceware, or optimize the pseudoword search and do
better scoring.

If anyone wants to do UX research, these would be great projects...


Trevor


More information about the Messaging mailing list