[messaging] Hashing entries in a transparency log

Thu Sep 4 10:52:02 PDT 2014

>
> I’m thinking that an interim level of privacy where we have encrypted
> payload, with the disadvantage of clear metadata and the advantage of good
> spam control, might still be be an improvement over where we are today.
>

I think so too.

Unfortunately there's not always a clean division between content and
metadata. Spam filters do calculate reputations over entities found in
message content. At least one of those entities (url domains) is a critical
signal. "Where can I get to by clicking on this email" is information of
large value to filters because people who buy things from spammers are
lazy. Spammers experimented with link obfuscation for a few years but
eventually gave up on this because their customers were too lazy to undo
the obfuscations. Nowadays they hack websites or use url shorteners.

There are other entities and signals extracted from message content used
for spam filtering, but I'd be wandering into the Gmail-secret-sauce area
if I discussed those.

I can't easily say how much worse spam filtering would get if all payloads
were routinely encrypted. Let's say - after url domains - it's a small
percentage of filtering that relies on the rest of the content. But, I was
once told that the difference between Gmail and Hotmail's spam filter was
maybe 1% coverage. The volumes are so huge even a tiny percentage loss of
accuracy can make a very noticeable difference. It's the difference between
"every day I clean my inbox of five spams" and "I don't think about spam".

It's just a very hard problem that I've thought about a lot, and not got
very far with. Eventually I concluded:

   - If you encrypt all payloads you may as well also scramble metadata,
   and rely on a totally different anti spam approach like using Bitcoin
   deposits or micropayments. That's why I'm more interested in Pond than
   systems that'd try and globally upgrade SMTP. And I actually implemented
   Bitcoin micropayments as a library, though not for spam.

   - Failing that, various kinds of clever PIR protocols might allow client
   apps to do spam filtering themselves with the support of big databases in
   the cloud, but the maths for this makes my head explode and usually comes
   with giant caveats that render it unworkable.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://moderncrypto.org/mail-archive/messaging/attachments/20140904/73ecd830/attachment.html>