[messaging] unambiguous transcription dictionary (was Re: "Short" authentication strings)

Thu Jul 10 10:11:25 PDT 2014

Hi Andy. I happened to run across a mnemonic word list which was online a
few years ago:
http://web.archive.org/web/20090918202746/http://tothink.com/mnemonic/wordlist.html

C encoder / decoder:
https://github.com/singpolyma/mnemonicode

JS encoder / decoder:
https://github.com/mbrubeck/mnemonic.js

Words are all 4-7 letters, with no common prefixes. The author manually
removed similar sounding words. Encoding an 8-byte input will output two
3-word triplets like:
*bonjour orient random. **acrobat market crystal*

The authors compare it to similar functionality in PGPfone and OTP, so this
seems to be well-trod territory.

On Thu, Jul 10, 2014 at 9:23 AM, Andy Isaacson <adi at hexapodia.org> wrote:

> On Tue, Jul 08, 2014 at 12:41:36PM -0700, Tony Arcieri wrote:
> > On Tuesday, July 8, 2014, Steve Weis <steveweis at gmail.com> wrote:
> > > To make it a bit more memorable
> >
> > I'm actually optimizing for forgettable, single-use strings which
> > authenticate public keys which are then added to a local (encrypted)
> > keystore. In that regard, I'm optimizing for a short length.
> >
> > I think the wordlist could be further improved, for example by filtering
> > out longer words and choosing shorter-but-less-popular words.
>
> > shared metaphor property sigh capture
> > yeah gravity cycle struggle parental
> > recipient briefly payment schedule target
> > stare educator ally peak employ
>
> For this particular application (reading words that have no semantic
> redundancy over a lossy voice line) you'd want to ensure there are no
> homophones in your dictionary (or rather, you want to *track* homophones
> as the same word and converge them).
>
> Hmmm, I guess it depends on the detail of the protocol -- does Alice
> type in what Bob reads to her, or does she match what Bob says to what's
> on her screen?  The latter doesn't care about homophones so much.
>
> I'd find it hard to reliably say "property sigh capture" such that the
> second word is not mistakable for "sign" over a GSM voice line.
> Similarly "be" / "me", and confusions between dialects for some simple
> words (Queen's English vs New England vs Ohio vs California vs NZ vs
> Scots).  But words like "schedule" I'll get right even if a RP speaker
> uses their adorable "shedule" pronounciation.  So in the absence of
> grammatical and semantic redundancy, phonological redundancy within the
> word can help to disambiguate, leading to *longer* words being more
> usable!
>
> To help non-native speakers, choose words with non-surprising spelling
> and avoid confusion like the 7 pronounciations of "ough".
>
> ----
>
> Note that real-time voice impersonation is a rapidly developing field,
> which allows MITM to simply substitute their preferred fingerprint in
> the conversation.  A researcher said they're getting good results with
> realtime *video* impersonation, and that anything short of an HD face
> closeup is already convincingly fakeable in realtime in the lab.  The
> hard part is getting the flow within a conversation right, but reading a
> string of nonsense words is in some sense the best possible deployment
> scenario for voice impersonation.
>
> The US IC is, of course, funding development of this technology for
> psyops and disinformation campaigns.  (Imagine how useful it would be to
> release video of your chosen enemy saying outlandish things repugnant to
> their supporters.)
>
> -andy
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://moderncrypto.org/mail-archive/messaging/attachments/20140710/c28e3d8c/attachment.html>