[messaging] unambiguous transcription dictionary (was Re: "Short" authentication strings)

Andy Isaacson adi at hexapodia.org
Thu Jul 10 09:23:07 PDT 2014

On Tue, Jul 08, 2014 at 12:41:36PM -0700, Tony Arcieri wrote:
> On Tuesday, July 8, 2014, Steve Weis <steveweis at gmail.com> wrote:
> > To make it a bit more memorable
> I'm actually optimizing for forgettable, single-use strings which
> authenticate public keys which are then added to a local (encrypted)
> keystore. In that regard, I'm optimizing for a short length.
> I think the wordlist could be further improved, for example by filtering
> out longer words and choosing shorter-but-less-popular words.

> shared metaphor property sigh capture
> yeah gravity cycle struggle parental
> recipient briefly payment schedule target
> stare educator ally peak employ

For this particular application (reading words that have no semantic
redundancy over a lossy voice line) you'd want to ensure there are no
homophones in your dictionary (or rather, you want to *track* homophones
as the same word and converge them).

Hmmm, I guess it depends on the detail of the protocol -- does Alice
type in what Bob reads to her, or does she match what Bob says to what's
on her screen?  The latter doesn't care about homophones so much.

I'd find it hard to reliably say "property sigh capture" such that the
second word is not mistakable for "sign" over a GSM voice line.
Similarly "be" / "me", and confusions between dialects for some simple
words (Queen's English vs New England vs Ohio vs California vs NZ vs
Scots).  But words like "schedule" I'll get right even if a RP speaker
uses their adorable "shedule" pronounciation.  So in the absence of
grammatical and semantic redundancy, phonological redundancy within the
word can help to disambiguate, leading to *longer* words being more

To help non-native speakers, choose words with non-surprising spelling
and avoid confusion like the 7 pronounciations of "ough".


Note that real-time voice impersonation is a rapidly developing field,
which allows MITM to simply substitute their preferred fingerprint in
the conversation.  A researcher said they're getting good results with
realtime *video* impersonation, and that anything short of an HD face
closeup is already convincingly fakeable in realtime in the lab.  The
hard part is getting the flow within a conversation right, but reading a
string of nonsense words is in some sense the best possible deployment
scenario for voice impersonation.

The US IC is, of course, funding development of this technology for
psyops and disinformation campaigns.  (Imagine how useful it would be to
release video of your chosen enemy saying outlandish things repugnant to
their supporters.)


More information about the Messaging mailing list