[messaging] Hashing entries in a transparency log

Wed Sep 3 18:54:14 PDT 2014

On Wed, Sep 3, 2014 at 2:26 PM, Trevor Perrin <trevp at trevp.net> wrote:

> People would probably reverse most of the addresses,
> so this means the difference between publishing, I dunno, 90% of email
> addresses versus 100%? (though for targeted users - political
> candidates, celebrities, etc, people would tune the searches and have
> a higher success rate.)
>

A bit more formally stated, after hashing an attacker willing to check X
trial hashes will get Y% of email addresses. By "strengthening" the hash
(multiple iterations, memory-hard functions, etc.) you can try to limit the
value of X for a given attacker.

We have no hard numbers on what the X/Y curve would look like for email
addresses, but based on the distributions of passwords human names which I
studied extensively in my thesis [1], it's probably safe to say that for X
< 2^30 you would get at least 50% of the email addresses and for 2^40 or
2^50 you'd hit the 90% range.

It would be a fun project to modify a password cracking library to guess
email addresses and see how well you can actually do.

[1] http://www.jbonneau.com/doc/2012-jbonneau-phd_thesis.pdf
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://moderncrypto.org/mail-archive/messaging/attachments/20140903/d2f2376f/attachment.html>