[messaging] "Pseudoword" base32 fingerprints
jbonneau at gmail.com
Fri Feb 7 16:54:02 PST 2014
An attempt to summarize two problem areas being discussed in this thread:
A) Long-term key fingerprints probably need to be a bijection from 128-bit
long bitstrings to human-friendly form. We can lower this space (as Trevor
mentioned) by key stretching: searching for a random nonce (or random
public key) which happens to produce a 128-bit hash starting with N zeros,
which are truncated. One can trade-off generation cost vs. verification
cost here, so if you can do 28 bits of work at generation time effectively
you need to transmit and check a 100-bit value. Long-term key fingerprints
should be as easy as possible to display (business cards), typeable,
writeable, checkable, and (as a bonus) pronouncebale.
B) The requirements for ephemeral authentication secrets vary by protocol,
but in the simplest case (e.g. Socialist millionaire) they be anything and
only need to be about 30-40 bits. In that case all we really need is an
invertible function from 30-40 random bits to a value that is easy to
recognize and (as a bonus) pronounce. In other cases they'll have to be
generated as a hash of a longer value. The best setup is probably an
invertible function E applied to a truncated hash.
There are possibly other use cases (including keys that can actually be
committed to memory), but these two seem to be the main areas of interest..
There is probably quite a bit of overlap between them. I like the word list
approach, especially for (B), though I'd sound a note of caution on
Diceware-the list contains a lot of words that aren't easy to recognize or
type. I'd advocate for words of consistent length (4-6 characters) with
edit distance at least 2 between any pair, which would be nice so that
smartphone typing works. I whipped up one of these a while back, and ended
up with about 750 words. I'd guess around 1k is the limit
On Thu, Feb 6, 2014 at 10:48 PM, Trevor Perrin <trevp at trevp.net> wrote:
> On Thu, Feb 6, 2014 at 1:40 AM, Daniel Thomas <drt24+messaging at cam.ac.uk>
> > On 06/02/14 01:35, Trevor Perrin wrote:
> >> I like the smaller size of the pseudowords, particularly for
> >> transcribing these things, spelling out the characters over the phone,
> >> or viewing on a small screen. And a lot of the words are unusual so
> >> are going to need to be spelled out.
> >> But it would be interesting to see what a better wordlist looks like.
> > Diceware is has a (fairly short 7776) word list in multiple languages
> > for the purpose of generating easy to remember passphrases.
> Hi Daniel,
> That's about a 13-bit list, which seems like a good middle ground
> between an 8-bit list (like PGPfone) or a 16-bit list. An 8-bit list
> necessitates 16 word fingerprints for 128-bit security, which feels
> like too many words. A 16-bit list contains 65K words, which is more
> than most people's vocabulary, meaning a lot of unusual words that
> would have to be spelled out.
> The Diceware dictionary is designed around short words and word
> fragments (it includes numbers, punctuation, and non-words, which is a
> bit weird IMO). I wrote a script to generate 10 random Diceware words
> to see what fingerprints might look like:
> hop - flu - urn - belie - gogo - gravy - mayor - avow - plush - enter
> bump - seem - soft - lm - plane - exit - plus - stilt - behind - malta
> tract - rude - rhine - ready - climb - fell - fell - reek - cody - kudzu
> bunch - sound - adler - galt - signor - glom - soup - on - lund - juju
> essay - eave - ef - pro - stung - gn - smash - josef - vetch - busy
> dawson - tic - vy - cake - rock - sr - store - ice - plunk - gp
> old - swept - win - mike - xy - chill - seethe - allow - alva - jh
> grace - curia - coke - rebut - 15 - foray - jaw - weco - anvil - buenos
> pn - adair - swelt - faith - slash - berlin - watch - blood - start - santa
> grow - del - bon - 99th - kepler - cam - fun - 37th - dryad - prone
> Below compares 5 diceware fingerprints side-by-side with 5 pseudoword
> fingerprints of score=18. The pseudoword fingerprints took an average
> of ~30 seconds apiece to generate on a single core of my Macbook Air.
> (The max possible score is 20, a score of 18 means 2 deviations from
> vowel/consonant alternation):
> oman - swath - haze - elmer - gouda - admix - feat - afar - reel - for
> ukigex - 3kiw - jejod - yvak - rewupa
> blitz - teal - emma - bambi - queen - 92 - mecum - om - derek - twa
> lijuv7 - woxm - pokoj - cixa - ehajen
> op - zomba - 84th - soy - oval - evolve - spook - fk - ghi - magog
> syivoh - upim - leewo - hoda - madeso
> piotr - vain - david - mk - gasp - buoy - malt - az - hang - rena
> bewora - zutm - hirub - ugux - tlezeb
> perk - fate - cinch - gulf - jb - marks - wag - canoe - sprig - maw
> ripoyu - ime2 - fenef - aqos - lehnof
> Both approaches seem pretty decent, not sure which is best. Choosing
> 13-bit wordlists for different languages and dealing with
> cross-language compatibility seems a hassle, but so is computing tens
> of millions of hashes for a fingerprint.
> There's a lot more that could be done here: e.g. make a better
> wordlist than Diceware, or optimize the pseudoword search and do
> better scoring.
> If anyone wants to do UX research, these would be great projects...
> Messaging mailing list
> Messaging at moderncrypto.org
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Messaging