[messaging] "Pseudoword" base32 fingerprints
trevp at trevp.net
Thu Feb 6 22:48:40 PST 2014
On Thu, Feb 6, 2014 at 1:40 AM, Daniel Thomas <drt24+messaging at cam.ac.uk> wrote:
> On 06/02/14 01:35, Trevor Perrin wrote:
>> I like the smaller size of the pseudowords, particularly for
>> transcribing these things, spelling out the characters over the phone,
>> or viewing on a small screen. And a lot of the words are unusual so
>> are going to need to be spelled out.
>> But it would be interesting to see what a better wordlist looks like.
> Diceware is has a (fairly short 7776) word list in multiple languages
> for the purpose of generating easy to remember passphrases.
That's about a 13-bit list, which seems like a good middle ground
between an 8-bit list (like PGPfone) or a 16-bit list. An 8-bit list
necessitates 16 word fingerprints for 128-bit security, which feels
like too many words. A 16-bit list contains 65K words, which is more
than most people's vocabulary, meaning a lot of unusual words that
would have to be spelled out.
The Diceware dictionary is designed around short words and word
fragments (it includes numbers, punctuation, and non-words, which is a
bit weird IMO). I wrote a script to generate 10 random Diceware words
to see what fingerprints might look like:
hop - flu - urn - belie - gogo - gravy - mayor - avow - plush - enter
bump - seem - soft - lm - plane - exit - plus - stilt - behind - malta
tract - rude - rhine - ready - climb - fell - fell - reek - cody - kudzu
bunch - sound - adler - galt - signor - glom - soup - on - lund - juju
essay - eave - ef - pro - stung - gn - smash - josef - vetch - busy
dawson - tic - vy - cake - rock - sr - store - ice - plunk - gp
old - swept - win - mike - xy - chill - seethe - allow - alva - jh
grace - curia - coke - rebut - 15 - foray - jaw - weco - anvil - buenos
pn - adair - swelt - faith - slash - berlin - watch - blood - start - santa
grow - del - bon - 99th - kepler - cam - fun - 37th - dryad - prone
Below compares 5 diceware fingerprints side-by-side with 5 pseudoword
fingerprints of score=18. The pseudoword fingerprints took an average
of ~30 seconds apiece to generate on a single core of my Macbook Air.
(The max possible score is 20, a score of 18 means 2 deviations from
oman - swath - haze - elmer - gouda - admix - feat - afar - reel - for
ukigex - 3kiw - jejod - yvak - rewupa
blitz - teal - emma - bambi - queen - 92 - mecum - om - derek - twa
lijuv7 - woxm - pokoj - cixa - ehajen
op - zomba - 84th - soy - oval - evolve - spook - fk - ghi - magog
syivoh - upim - leewo - hoda - madeso
piotr - vain - david - mk - gasp - buoy - malt - az - hang - rena
bewora - zutm - hirub - ugux - tlezeb
perk - fate - cinch - gulf - jb - marks - wag - canoe - sprig - maw
ripoyu - ime2 - fenef - aqos - lehnof
Both approaches seem pretty decent, not sure which is best. Choosing
13-bit wordlists for different languages and dealing with
cross-language compatibility seems a hassle, but so is computing tens
of millions of hashes for a fingerprint.
There's a lot more that could be done here: e.g. make a better
wordlist than Diceware, or optimize the pseudoword search and do
If anyone wants to do UX research, these would be great projects...
More information about the Messaging