[messaging] Spam-resistance of different systems

Tue Sep 23 13:15:28 PDT 2014

So Mike Hearn wrote a fantastic post on antispam for email [1].  The
system he describes performs "feature extraction" from both metadata
(e.g. sender) and contents (e.g links).  It then scores messages based
on the reputation of these features, and uses feedback from recipients
and the scoring engine to adjust reputations.

Mike argues that feature extraction from message contents is necessary
for antispam in email.  But there was discussion of communications
systems with different properties, e.g.

"Apple iMessage, Wickr and BBM Protected can all be described as
opportunistic encryption messaging systems that have been very
successful deployment-wise." - Joe Bonneau, [2]

"When you have central control everything becomes a million times
easier because you can change anything at any time. You can terminate
accounts and control signups." - Mike, [1]

---

So it looks like some systems are "spam-vulnerable" to the point that
message contents must be scanned, and the potential for widespread
"Opportunistic Encryption" [3] is limited.  But other systems seem
"spam-resistant" enough that content scanning isn't necessary.

I think the main theories about this difference are:

 1) Size of target population:  Email has a huge userbase, and email
addresses are widely shared, so spammers are able to harvest huge
target lists.

 2) Cost per communication:  Sending a single email is very cheap,
compared to (say) postal mail.

 3) Ability to attribute and penalize the sending service provider:
While a receiving mail server can attribute the sending IP and assign
it a reputation, IPs are fairly cheap and only loosely associated with
sending service providers.  Attributing a sending domain is more
difficult [4].

 4) Ability to attribute and penalize the sending user:  Free email
accounts and easy signup make it hard to impose a cost on abusive
users.

 5) Centralized control:  If a single system registers all users and
handles all feedback, then I guess it's easier to calculate user
reputations and penalize users.  But I'm unclear how much this
matters, or what other benefits come from centralization?

One question is: which factors could we change to create email-like
systems with spam-resistance and opportunistic encryption?

I would take 1, 2, and 5 off the table.  We want systems that can be
widely used (1).  Pay-per-message seems complicated, and would add
tremendous overhead and change the UX of email (2).  And we don't want
a single party in control (5).

So that leaves strengthening the reputation system, and basing it on
more costly "identities".  This is imaginable at the provider level.
Instead of letting any IP send email, providers could form a
federation where each sender has to sign up to certain obligations,
post a bond, etc.  Of course, this would be more "clubby" and less
open than email currently - perhaps more like the relationships
between telephony providers in the PSTN, or network operators in the
Internet?

If the reputation system also needs strengthening at the user level,
then users would have to spend or commit some costly resource to get
an account.  Captchas, anti-automation [5], and proof-of-work probably
only go so far.  If users have to commit something more expensive,
this could in theory be done in a privacy-preserving way (Tor! digital
cash!), but most users would find it easier to register with something
linked to them (phone number, credit card).  So then there are privacy
implications...

But anyways, it would be great if we could ground this in facts.  If
anyone had insight into large-scale communications systems where spam
and abuse are controlled *without* content scanning, it would be
interesting to hear how that works, and what the important factors
are.

Trevor

[1] https://moderncrypto.org/mail-archive/messaging/2014/000780.html
[2] https://moderncrypto.org/mail-archive/messaging/2014/000824.html
[3] https://moderncrypto.org/mail-archive/messaging/2014/000767.html
[4] http://ceas.cc/2006/19.pdf
[5] http://webcache.googleusercontent.com/search?q=cache:v6Iza2JzJCwJ:www.hackforums.net/archive/index.php/thread-2198360.html+&cd=8&hl=en&ct=clnk&gl=ch