[messaging] Hashing entries in a transparency log

Thu Sep 4 06:52:54 PDT 2014

On 04/09/14 02:54, Joseph Bonneau wrote:
> On Wed, Sep 3, 2014 at 2:26 PM, Trevor Perrin <trevp at trevp.net> wrote:
> 
>> People would probably reverse most of the addresses,
>> so this means the difference between publishing, I dunno, 90% of email
>> addresses versus 100%? (though for targeted users - political
>> candidates, celebrities, etc, people would tune the searches and have
>> a higher success rate.)
>>
> 
> A bit more formally stated, after hashing an attacker willing to check X
> trial hashes will get Y% of email addresses. By "strengthening" the hash
> (multiple iterations, memory-hard functions, etc.) you can try to limit the
> value of X for a given attacker.
> 
> We have no hard numbers on what the X/Y curve would look like for email
> addresses, but based on the distributions of passwords human names which I
> studied extensively in my thesis [1], it's probably safe to say that for X
> < 2^30 you would get at least 50% of the email addresses and for 2^40 or
> 2^50 you'd hit the 90% range.

I tried to think of a way of allowing the full log of hashes to keys to
be published while providing a rate-limitable way of obtaining the salts
needed to check email addresses, without allowing a bad provider to
issue multiple salts for the same account.
I thought that perhaps the salt could be the signature of the hash of
the email address (hence if two different salts were produced, the
provider could be proved to be misbehaving).

However if the attacker has 100millions hosts then the rate limiter
needs to be able to block a host after much less than 10 malicious
requests ever (at the 2^30 level). While at the same time not blocking
large providers which legitimately send thousands of invalid requests a
day due to typoed email addresses. No amount of proof of works or
multi-request protocols is going to solve that. The only real advantage
of storing the hash rather than the email address in the log is the
fixed size of the hash output.

Any solution putting all the email addresses in the world in
transparency logs probably also needs to solve spam at the same time.
That is more plausible than it might be as I think that a lot of spam
filtering is done based on the reputation of the sender. Senders using
an authenticated encryption system could have their reputation more
tightly determined than is possible at present. However discriminating
between new legitimate users and new spam accounts would remain difficult.
Unfortunately deployment is difficult, early adopters get more spam,
only when most people are using it does it become possible to penalise
people who don't.

However even without a transparency log, if there is a user existence
oracle from the provider holding the public keys then this problem
remains as client machines would need to make those requests for public
keys. Client machines are indistinguishable from bots making blocking
them difficult. Hence a provider would need to always return keys for
any guessed email address (in constant time).
I guess email providers currently do spam filtering before returning
'Mail delivery failed - no such address' messages so that attackers
don't know if they guessed correctly.

Daniel

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 820 bytes
Desc: not available
URL: <http://moderncrypto.org/mail-archive/messaging/attachments/20140904/ed06cb81/attachment.sig>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: OpenPGP digital signature
URL: <http://moderncrypto.org/mail-archive/messaging/attachments/20140904/ed06cb81/attachment-0001.sig>