[messaging] "Pseudoword" base32 fingerprints
Trevor Perrin
trevp at trevp.net
Thu Feb 6 22:48:40 PST 2014
On Thu, Feb 6, 2014 at 1:40 AM, Daniel Thomas <drt24+messaging at cam.ac.uk> wrote:
> On 06/02/14 01:35, Trevor Perrin wrote:
>> I like the smaller size of the pseudowords, particularly for
>> transcribing these things, spelling out the characters over the phone,
>> or viewing on a small screen. And a lot of the words are unusual so
>> are going to need to be spelled out.
>>
>> But it would be interesting to see what a better wordlist looks like.
>
> Diceware[0] is has a (fairly short 7776) word list in multiple languages
> for the purpose of generating easy to remember passphrases.
Hi Daniel,
That's about a 13-bit list, which seems like a good middle ground
between an 8-bit list (like PGPfone) or a 16-bit list. An 8-bit list
necessitates 16 word fingerprints for 128-bit security, which feels
like too many words. A 16-bit list contains 65K words, which is more
than most people's vocabulary, meaning a lot of unusual words that
would have to be spelled out.
The Diceware dictionary is designed around short words and word
fragments (it includes numbers, punctuation, and non-words, which is a
bit weird IMO). I wrote a script to generate 10 random Diceware words
to see what fingerprints might look like:
https://github.com/trevp/keyname
hop - flu - urn - belie - gogo - gravy - mayor - avow - plush - enter
bump - seem - soft - lm - plane - exit - plus - stilt - behind - malta
tract - rude - rhine - ready - climb - fell - fell - reek - cody - kudzu
bunch - sound - adler - galt - signor - glom - soup - on - lund - juju
essay - eave - ef - pro - stung - gn - smash - josef - vetch - busy
dawson - tic - vy - cake - rock - sr - store - ice - plunk - gp
old - swept - win - mike - xy - chill - seethe - allow - alva - jh
grace - curia - coke - rebut - 15 - foray - jaw - weco - anvil - buenos
pn - adair - swelt - faith - slash - berlin - watch - blood - start - santa
grow - del - bon - 99th - kepler - cam - fun - 37th - dryad - prone
Below compares 5 diceware fingerprints side-by-side with 5 pseudoword
fingerprints of score=18. The pseudoword fingerprints took an average
of ~30 seconds apiece to generate on a single core of my Macbook Air.
(The max possible score is 20, a score of 18 means 2 deviations from
vowel/consonant alternation):
oman - swath - haze - elmer - gouda - admix - feat - afar - reel - for
ukigex - 3kiw - jejod - yvak - rewupa
blitz - teal - emma - bambi - queen - 92 - mecum - om - derek - twa
lijuv7 - woxm - pokoj - cixa - ehajen
op - zomba - 84th - soy - oval - evolve - spook - fk - ghi - magog
syivoh - upim - leewo - hoda - madeso
piotr - vain - david - mk - gasp - buoy - malt - az - hang - rena
bewora - zutm - hirub - ugux - tlezeb
perk - fate - cinch - gulf - jb - marks - wag - canoe - sprig - maw
ripoyu - ime2 - fenef - aqos - lehnof
Both approaches seem pretty decent, not sure which is best. Choosing
13-bit wordlists for different languages and dealing with
cross-language compatibility seems a hassle, but so is computing tens
of millions of hashes for a fingerprint.
There's a lot more that could be done here: e.g. make a better
wordlist than Diceware, or optimize the pseudoword search and do
better scoring.
If anyone wants to do UX research, these would be great projects...
Trevor
More information about the Messaging
mailing list