[messaging] Pour one out for "voice authentication"
jbonneau at gmail.com
Sat Jan 3 09:10:40 PST 2015
Taking a step back, what we want is for users to communicate a small amount
of information over a channel which is embeddable in an audio (and/or
video) channel that is both "AI-hard" and "Impersonation-hard". That is,
human users can signal over this channel (possibly at a very low rate) but
it is hard for AI or another human to simulate communication over this
channel. Voice synthesis was assumed to be AI-hard, turns out it's already
probably not (it's also probably not all that impersonation-hard ).
Jeff is proposing a more complex channel where the information is buried in
a long sequence of text. The hope here is that NLP is still sufficiently
poor that this channel is AI-hard to replicate. This is an interesting
idea, reminds me of "The Dining Freemasons" paper .
One hangup is the possibility of a mixed AI/impersonation attack where a
human types/speaks the "long sequence of text" and the AI then synthesizes
voice/video. The problem here is that the "foreshadowing" or other
conversation-embedding is probably not impersonation-hard, taking the voice
away Mallory can probably simulate how Alice would work the required
information into a conversation.
The bigger problem is this is probably vastly too complicated and
time-consuming for most users to do successfully. The appeal of SAS
protocols is humans need only say a few words and not think much, if they
need to not only speak but come up with whole sentences/paragraphs and
think critically about how believably the other side is doing this same,
you're limited to a pretty small group of users who's up for doing that.
On Fri, Jan 2, 2015 at 5:26 PM, Jeff Burdges <burdges at gmail.com> wrote:
> An important question here is :
> What happens if users attempt to communicate the information before
> explaining it’s purpose aka foreshadowing?
> Algorithm :
> - Let Words() be a function that returns a list of dictionary words, and
> ideally corresponding images, based upon a sha256.
> - Assign the two parties roles named Alice and Bob based upon the session
> - Let X be the session information, let X_a = sha256("Alice" + X), and let
> X_b = sha256(“bob” + X)
> - Alice's device tells her to communicate Words(X_a) in the conversation,
> and expect Bob to communicate Words(X_b).
> - Bob’s device does the same swapping X_a and X_b.
> Both devices explain that :
> - the words should be used or foreshadowed in the conversation in close
> proximity in a context that makes using another word difficult,
> - ideally any variation in the order in which they appear in the
> conversation should be explained later, and
> - Alice and Bob should discuss when they think they’ve finished the
> exchange, citing when they believe referenced the words in the conversation.
> Example : Alice does not need to say openly that her fist word is
> elephant, but could instead mention seeing a zoo animal eating hay in a
> strange place, and then elaborate later in the conversation. Alice could
> foreshadow the appearance of an elephant in the conversation.
> There is an issue here merely the appearance of words in the conversation
> is not enough because our hostile AI could insert words where they only
> kinda make sense, like adding “like an apple” onto a sentence that trailed
> off. That’s why we ask that the words all be foreshadowed in close
> proximity, and then later discuss that part of the conversation that
> contained the key exchange.
> We’ll eventually have AIs who could defeat such a key exchange of course,
> but at the same time some humans are extremely witty, so the risk of
> exposure for an attacker could be kept high. Also, wit might remain pretty
> hard for machines to grasp for quite some time, partially because industry
> probably lacks the economic incentives to dump a bunch of resources into
> understanding wit.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Messaging