<div dir="ltr">Hi Andy. I happened to run across a mnemonic word list which was online a few years ago:<div><div style="font-family:arial,sans-serif;font-size:13px"><a href="http://web.archive.org/web/20090918202746/http://tothink.com/mnemonic/wordlist.html" target="_blank">http://web.archive.org/web/20090918202746/http://tothink.com/mnemonic/wordlist.html</a></div> <div style="font-family:arial,sans-serif;font-size:13px"><br></div><div style="font-family:arial,sans-serif;font-size:13px"><div>C encoder / decoder:</div><div><a href="https://github.com/singpolyma/mnemonicode" target="_blank">https://github.com/singpolyma/mnemonicode</a><br> </div><div><br></div><div>JS encoder / decoder:</div><div><a href="https://github.com/mbrubeck/mnemonic.js" target="_blank">https://github.com/mbrubeck/mnemonic.js</a><br></div><div><br></div></div><div style="font-family:arial,sans-serif;font-size:13px"> Words are all 4-7 letters, with no common prefixes. The author manually removed similar sounding words. Encoding an 8-byte input will output two 3-word triplets like:<br></div><div style="font-family:arial,sans-serif;font-size:13px"> <div><div><i>bonjour orient random. </i><i>acrobat market crystal</i></div></div></div><div style="font-family:arial,sans-serif;font-size:13px"><br></div><div style="font-family:arial,sans-serif;font-size:13px">The authors compare it to similar functionality in PGPfone and OTP, so this seems to be well-trod territory.</div> <div style="font-family:arial,sans-serif;font-size:13px"><br></div><div style="font-family:arial,sans-serif;font-size:13px"><br></div></div></div><div class="gmail_extra"><br><br><div class="gmail_quote">On Thu, Jul 10, 2014 at 9:23 AM, Andy Isaacson <span dir="ltr"><<a href="mailto:adi@hexapodia.org" target="_blank">adi@hexapodia.org</a>></span> wrote:<br> <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">On Tue, Jul 08, 2014 at 12:41:36PM -0700, Tony Arcieri wrote:<br> > On Tuesday, July 8, 2014, Steve Weis <<a href="mailto:steveweis@gmail.com">steveweis@gmail.com</a>> wrote:<br> > > To make it a bit more memorable<br> ><br> > I'm actually optimizing for forgettable, single-use strings which<br> > authenticate public keys which are then added to a local (encrypted)<br> > keystore. In that regard, I'm optimizing for a short length.<br> ><br> > I think the wordlist could be further improved, for example by filtering<br> > out longer words and choosing shorter-but-less-popular words.<br> <br> > shared metaphor property sigh capture<br> > yeah gravity cycle struggle parental<br> > recipient briefly payment schedule target<br> > stare educator ally peak employ<br> <br> For this particular application (reading words that have no semantic<br> redundancy over a lossy voice line) you'd want to ensure there are no<br> homophones in your dictionary (or rather, you want to *track* homophones<br> as the same word and converge them).<br> <br> Hmmm, I guess it depends on the detail of the protocol -- does Alice<br> type in what Bob reads to her, or does she match what Bob says to what's<br> on her screen? The latter doesn't care about homophones so much.<br> <br> I'd find it hard to reliably say "property sigh capture" such that the<br> second word is not mistakable for "sign" over a GSM voice line.<br> Similarly "be" / "me", and confusions between dialects for some simple<br> words (Queen's English vs New England vs Ohio vs California vs NZ vs<br> Scots). But words like "schedule" I'll get right even if a RP speaker<br> uses their adorable "shedule" pronounciation. So in the absence of<br> grammatical and semantic redundancy, phonological redundancy within the<br> word can help to disambiguate, leading to *longer* words being more<br> usable!<br> <br> To help non-native speakers, choose words with non-surprising spelling<br> and avoid confusion like the 7 pronounciations of "ough".<br> <br> ----<br> <br> Note that real-time voice impersonation is a rapidly developing field,<br> which allows MITM to simply substitute their preferred fingerprint in<br> the conversation. A researcher said they're getting good results with<br> realtime *video* impersonation, and that anything short of an HD face<br> closeup is already convincingly fakeable in realtime in the lab. The<br> hard part is getting the flow within a conversation right, but reading a<br> string of nonsense words is in some sense the best possible deployment<br> scenario for voice impersonation.<br> <br> The US IC is, of course, funding development of this technology for<br> psyops and disinformation campaigns. (Imagine how useful it would be to<br> release video of your chosen enemy saying outlandish things repugnant to<br> their supporters.)<br> <span class="HOEnZb"><font color="#888888"><br> -andy<br> </font></span></blockquote></div><br></div>