[messaging] unambiguous transcription dictionary (was Re: "Short" authentication strings)
steveweis at gmail.com
Thu Jul 10 10:11:25 PDT 2014
Hi Andy. I happened to run across a mnemonic word list which was online a
few years ago:
C encoder / decoder:
JS encoder / decoder:
Words are all 4-7 letters, with no common prefixes. The author manually
removed similar sounding words. Encoding an 8-byte input will output two
3-word triplets like:
*bonjour orient random. **acrobat market crystal*
The authors compare it to similar functionality in PGPfone and OTP, so this
seems to be well-trod territory.
On Thu, Jul 10, 2014 at 9:23 AM, Andy Isaacson <adi at hexapodia.org> wrote:
> On Tue, Jul 08, 2014 at 12:41:36PM -0700, Tony Arcieri wrote:
> > On Tuesday, July 8, 2014, Steve Weis <steveweis at gmail.com> wrote:
> > > To make it a bit more memorable
> > I'm actually optimizing for forgettable, single-use strings which
> > authenticate public keys which are then added to a local (encrypted)
> > keystore. In that regard, I'm optimizing for a short length.
> > I think the wordlist could be further improved, for example by filtering
> > out longer words and choosing shorter-but-less-popular words.
> > shared metaphor property sigh capture
> > yeah gravity cycle struggle parental
> > recipient briefly payment schedule target
> > stare educator ally peak employ
> For this particular application (reading words that have no semantic
> redundancy over a lossy voice line) you'd want to ensure there are no
> homophones in your dictionary (or rather, you want to *track* homophones
> as the same word and converge them).
> Hmmm, I guess it depends on the detail of the protocol -- does Alice
> type in what Bob reads to her, or does she match what Bob says to what's
> on her screen? The latter doesn't care about homophones so much.
> I'd find it hard to reliably say "property sigh capture" such that the
> second word is not mistakable for "sign" over a GSM voice line.
> Similarly "be" / "me", and confusions between dialects for some simple
> words (Queen's English vs New England vs Ohio vs California vs NZ vs
> Scots). But words like "schedule" I'll get right even if a RP speaker
> uses their adorable "shedule" pronounciation. So in the absence of
> grammatical and semantic redundancy, phonological redundancy within the
> word can help to disambiguate, leading to *longer* words being more
> To help non-native speakers, choose words with non-surprising spelling
> and avoid confusion like the 7 pronounciations of "ough".
> Note that real-time voice impersonation is a rapidly developing field,
> which allows MITM to simply substitute their preferred fingerprint in
> the conversation. A researcher said they're getting good results with
> realtime *video* impersonation, and that anything short of an HD face
> closeup is already convincingly fakeable in realtime in the lab. The
> hard part is getting the flow within a conversation right, but reading a
> string of nonsense words is in some sense the best possible deployment
> scenario for voice impersonation.
> The US IC is, of course, funding development of this technology for
> psyops and disinformation campaigns. (Imagine how useful it would be to
> release video of your chosen enemy saying outlandish things repugnant to
> their supporters.)
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Messaging