The Grumpy Troll

Ramblings of a grumpy troll.

Email Cooperation

People blather on. Today, they do it online. There are several different ways of going about it:
  • Entering short messages into a transcript which others read over, such as forums or social networking websites, where later participants can see older messages
  • Entering longer messages onto a website, with indexing and optional commentary; blogs
  • Voice communications online, or video, typically not archived, but can be
  • Short back-and-forth realtime messages, ‟Instant Messaging”, IM, such as XMPP, MSN, ICQ, etc; some clients log these communications, and some server software does too
  • Online memos, sent around, often delivered quickly but left in an inbox to be worked through. This is commonly ‟email”.
I'm going to blather on about email, trust and some related bits.

Background

Email addresses are clearly divided up into two parts, with an ‛@’ in the middle joining them. A little more formally, email is a federated system. There's a domain and a local-part. Loosely speaking, ‟everyone” agrees to work together in a federation on the domain bit and everyone agrees to not mess with the local-part. There are common rules on domains. For instance, they're case-insensitive, and there's a standard look-up system. For your own domains, or domains where there's separate agreement between the operators of the domains, that look-up system might not be used, but it exists for the default case.

Sometimes a mail-system just fobs off responsibility to someone else to handle all mail-routing, by prior arrangement. It's very common for ISPs to provide this service to their customers. The customers expect that mail sent to another domain is sent using the normal look-up rules or whatever else has been agreed to help the mail get through. They don't expect that the mails be diverted to be messed around with or read by others. It's rare for the general public to understand the details of this well enough to put it into words, or for the ISP to lay out exactly what it will and won't do here.

Against that expectation, other mail service providers, including other ISPs, expect that first ISP to not be sending them spam. But many customer systems will typically be compromised and control of them taken away from the lawful owner, as remote control software is used to subvert them into various criminal activities. One such is sending spam. The ISP could claim that it has to just send on all the mail its customers give it, but that ISP would quickly find that other mail providers would stop accepting mail from them because of the spam.

Because underneath it, this system of federation rarely has legal requirements to cooperate and any such legal requirement would undoubtedly backfire as the criminals find ways around those laws. It's not anarchy, quite, but common consensus, loose cooperation, and some engineers getting together and publishing some specifications of how things should work, which all the mail-providers are free to adhere to or ignore as they see fit. Sometimes those engineers, working through the IETF, produce gems, sometimes they demonstrate themselves to be, cooperatively, batshit nuts. And nothing says that only the IETF can publish the rules. People who believe in central control have trouble understanding how anything can succeed given all this. But 98% of the population is generally decent and willing to work together to common goals and not violate trust and various ideas catch on. Many good, some poor. The approach has worked for the past 30 years, with various names and formalities applied to the actors.

So that ISP has to sometimes not send on its customers mail, in order that the mail of the other customers make it through. It might be censorship, but the ISP will typically be trying very hard to make sure that only mail for which there's strong evidence that it's not really from the customer is rejected and the ISP will be letting automated systems make these decisions. Then, the receiving mail-system's operators will have configured their mail-systems to also make a decision about how acceptable the mail is. Commonly, they score the opposite, how unacceptable it is, via the ‟spamminess” of the mail. This is typically going to be based entirely on the belief that the mail was unsolicited and is junk, rather than a moral judgement about the acceptability of any views espoused within the mail. But it doesn't have to be: every operator is a fiefdom unto itself, able to accept or reject the mail at its sole discretion. If a service provider wants to reject mails with an even number of words, on alternate weekends, that's their business, and the business of the people to whom it provides mail accounts. Of course, the confusion created by such an insane policy and the support burden on the sending sites might lead those sending sites to refuse to deliver mail there, because they could just set a clear error message explaining why and reduce their support costs from trying to debug the mess.

The mail receivers have systems trying to do a very hard job, figuring out for each mail received how likely it is that the mail is undesired spam. So part of the cooperation of mail providers is for senders to try to provide as much information as possible for the recipient to use when making that judgement call. This is the important point of this post.

Acting on the address

Oh, and that ‛@’ sign's two parts? Only the domain ‟owner” has any say in what the left-hand side (LHS) means. The senders (informally) agree to leave well enough alone and preserve it. One of the most common decisions of a domain owner is to make the LHS be case-insensitive, so that FRED@ is the same as fred@. This is in part so common because there's so much low-quality software which doesn't understand the split and automatically treats all of an email address as case-insensitive, even when that's not its call to make. But hey, this whole problem exists because of low-quality software being compromised to take over machines. Low-quality software is an unfortunate fact of life, and that's a rant for another blathering blog post.

Other common actions for the LHS@ include supporting sub-addressing, so that fred+anything@ is delivered to fred@ and supporting aliases. A less common but, in my opinion, excellent system which some providers use is to canonicalise aways dots in the LHS@, so that fred.bloggs@ is the same as fredbloggs@ — humans typically suck at remembering trivia like which accounts have dots and permitting those two to be separate accounts would just lead to insanity and mis-delivered email.

The common look-up system for figuring out what to do with mail? The Domain Name System, or DNS, is used. DNS provides a distributed, federated, database for looking up information about domains. For a domain example.org, the DNS federation has delegation from the commonly-agreed ‟root” to some servers which are authoritative for org., which delegates that domain to some servers registered to handle example.org. — those servers are registered when you register a domain with a registrar.

DNS contains address records, to map names to particular host IP addresses, which can be used to route packets, but that's not all that's in DNS. It also contains Mail eXchanger, or MX records, which state ‟for this domain, the mail is handled by these servers over here”. For instance:
example.org IN MX 10 mx.example.org.
example.org IN MX 20 mx-backup.example.org.
So, there are INternet-domain records stating that at priority 10, mail goes to one place, and if that is unreachable in certain ways then instead fall back to another place, listed at priority 20.

With this, the owner of example.org does not need to accept mail directly themselves. For instance, they might contract with another organisation to handle their mail entirely, or they might contract for them to just handle the filtering of the mail, before passing the accepted mails back to the organisation. This is that organisation's internal decision and the senders don't need to know or, ideally, care about the details. Except when things go wrong and the mail-herders (called ‟postmasters”) have to debug what is happening.

One other thing: this has been about routing the mail based on the forward path, not the claimed sender. The recipient can't normally just act on the sender addresses contained in the email. They're just text, which anyone can fake. It's trivial to send an email which appears to come from someone else. Spammers do this frequently, often deciding to set the sender to be someone who has upset them, or customers of an ISP which has upset them. I've talked someone, who was working an ISP Support desk, through their tears after a particularly harrowing Support call from an elderly grandmother who couldn't understand why she'd received so many nasty abusive emails. A spammer had forged their email address in spam and ignorant recipients had written vicious mails back, not understanding that they were attacking an innocent third party.

Conveying hints: SPF

The naïve approach to conveying hints is for a domain-owner to find a way to tell recipients, ‟All legitimate mail for me only comes from these addresses”. And wait, we have a way, the DNS. So, let's just publish some records saying so. This is what SPF does.

By way of example:
example.org. IN TXT "v=spf1 mx a:fred.example.org ptr include:_netblocks.google.com -all"
which says:
  1. This is an SPF record, version 1
  2. Accept mail from any host which is registered as an MX for example.org.
  3. Accept mail from fred.example.org too.
  4. Accept mail from any IP where the owner of the IP space has declared that the IP is within example.org.
  5. Go process the SPF records for these people, who can send mail on my behalf
  6. Reject everything else
The observant will notice that step 4 changes who you're trusting, since anyone can declare that their IP is for example.org. But abusing this in bulk, leaving as few traces as possible, raises the bar just enough that people can mostly get away with using this convenient step.

Unfortunately, the problem with all this is that the sender doesn't know what the recipient is going to do and it's common for the power-users to do things like forward mail on automatically. Suddenly, the forwarded mail is not coming from the right place and the final recipient can decide to reject it, even though it was legitimate, and just passed on unmodified.

Some declared, ‟So what? There's not many power users” and SPF caught on because it did somewhat help, and the people publishing the records could say that it was the recipients' decision to filter and the recipients could say that it was the senders' decision to publish and the people caught in the middle might protest and note that they have some relationship with the recipient, but the recipient would take the easy way out. So, people using forwarding services had to come up with a work-around, which is called the Sender Rewriting Service, SRS. With that, SPF kinda-mostly-sometimes doesn't break legitimate mail. Alas, by this time, the spammers were just publishing SPF records which said ‟anyone can send mail for me”. If recipients discard ‟anyone” rules as spam-sign, then the spammers can divide-and-conquer, publishing ‟anyone in this chunk of address-space can send for me” and instruct their mail pumping software to only use those legions of compromised machines in that address-space.

SPF has left a bad taste in the mouths of many because it externalised the cost of the system away from the people publishing the records or the people running the end-systems and disenfranchised the minority which often built these systems in the first place, the people who understand email and forward mail around, without actually solving the problem more than temporarily. And no, I haven't been caught, I run my own receiving mail-systems and the only SPF records I honour, as a recipient, are those which state that ‟this domain sends no email”. So I write this paragraph as an observer, not someone with reason to be upset.

What has SPF solved in the end? Not much. It helped a bit, for a time, and caused some problems, and pushed some spammers to publish an extra DNS record. The more ignorant spammers can still be caught, but since spammers are in business to make money, the successful spammers aren't ignorant.

Conveying hints: DKIM

The other approach is to not try and decide ‟Which addresses can send mail?”, but instead ‟Did this email really come from this domain?”. This simple shift is remarkably powerful. The recipient no longer needs to care how the mail got to them, as long as it got to them intact enough to not break the proof-of-domain. The people in the middle only need to worry if they're changing something about the mail which breaks that proof. And reputation can be established on a per-domain basis, not just a per-IP basis.

Reputation? I refer the interested reader to Sender Reputation in a Large Webmail Service, by Brad Taylor, presented at the CEAS 2006 Conference. In short: come up with scores by tracking the reputation of attributes of an email, such as the sender IP address, and use that to influence the spamminess decision.

The downside to this domain-based tying? I wrote above, ‟only need to worry if they're changing something about the mail which breaks that proof”. That would be, oh, mailing-list software which changes various headers. So there's still some externalisation of costs. This time, that goes to those who are already doing complicated processing anyway, so stripping away the proof-of-domain kind of works. Replacing it with proof-of-domain for the domain running the mailing-list works better. But it's still breaking a bit if not everybody uses it, so the sender still needs to be cautious before publishing a policy which says ‟If the signature is broken, discard the mail.” This time though, when spammers use this proof-of-domain system, it still just ties the domain in use to the content of the mail. As spammers use the system too, it doesn't break anything except the most naïve of policies, silliness like ‟Trust anything with a proof-of-domain!”

Trusting anything, just because it's tied to an identifier? Dumb, but during the early days it can somewhat work. But, we now have a solid persistent identifier for tracking the record. We can track the reputation over time, and senders who send enough mail which isn't spam can get a free pass through various checks, or the benefit of the doubt elsewhere, while those with no reputation can be looked at carefully. The reputation of sender domains today is established by each recipient domain, for themselves.

This system is called DomainKeys Identified Mail, or DKIM. It's the IETF ‟standardised” system based on Yahoo! DomainKeys. It uses the funky math of cryptography and public-key systems, where there's a private, secret, key and a public key. Cryptography (crypto) lets us do stuff like have the private key used to sign something so that anyone with the public key can verify that the private key must have been used to sign it. As long as the private key can't be figured out from the public key, everyone is happy. Alas, the funky math used to keep the private and public keys separate is based on the ‟We don't know how to do this” sort of math and is subject to breakthroughs as people puzzle out how to go the other way. But that's a topic for another post.

The sending mail-system takes a bunch of the headers of an email and the body of the email, and comes up with a fingerprint of the content, then signs that with the private key, to make a Signature. That signature is then prepended to the headers, in another header, and sent as part of the mail. As long as the math stays good, with Signature ties the headers to the content. And suddenly, those headers can not be forged by anyone who can read a spec and type some commands.

One of the bits of information in that new header, DKIM-Signature: gives a selector, used to fetch the public key to verify the signature. In the example below, this selector is d201004 but does not have to be based on the date. The sender domain's administrator can choose fairly freely. Where is the public key fetched from? Why, from DNS again! But, what if the mail comes in with no such header? For that, we have a policy statement, also in DNS. Spot a theme?

$ORIGIN example.org.
_adsp._domainkey IN TXT "dkim=all" ; RFC5617 unknown | all | discardable
d201004._domainkey IN TXT "k=rsa; t=y; p=MIG...lots_more_base64_encoded_data...QAB"

The Author Domain Signing Practices record says one of:
  • unknown: which begs the question, why say anything?
  • all: I try to sign everything, but don't know what happens with stuff in the middle
  • discardable: I sign everything, if it comes in not properly signed, please throw it away

Other flaws?

Besides those mentioned already? Well, we've just put a lot of trust into the DNS and the DNS is not immune to attack. Sometimes the attacks get better (11 seconds to compromise a domain!) and the defenses try to get better, without breaking everything, in a federated database system with multiple software authors and varying degrees of protocol adherence.

As long as DNS is difficult enough to spoof that it requires a focused attempt, then it doesn't matter too much from a spam prevention point of view, because the spammers are about working in bulk, skirting laws and leaving as little firm evidence as possible. Any time that DNS gets trivial to spoof, we have to worry about how conveniently the spammers can do so.

Making DNS trustworthy is the goal of the security extensions, DNSSEC, currently being deployed. The biggest problem with this is that some people are going to find that their DNS usage breaks because their home router box, which fiddles around with addresses to let them run a network at home but with just one public IP, and provides them with WiFi and other conveniences, has really lousy DNS in it. Heck, many of them with no specific DNS support in them undo the security measures added to try to get the time-to-compromise back up from 11 seconds (rewriting those carefully randomised source ports to linear predictable source ports on the way out).

2010 will be an interesting year for DNS security and for ISPs dealing with customers who lose connectivity.

To publish?

With DKIM, and even SPF, the sender provides as much signal to the recipient as possible, to help them make an informed decision about the origin of the message. The more help the sender provides to the recipient, the more the recipient can make a better decision. People who work together, providing help like this, can get their mail more readily accepted by others. People who say ‟I don't have to do anything, I should just be able to send my mail” will learn what it means to work in a cooperative system based on trust and federated delegation.

If you run a mail-system where you want to send mail to other people, you should be setting up DKIM. If you don't, it's only going to get harder to get the big players to accept your email. Work with them, rather than ranting against them.

Non-mail domains

It's much easier if you have a domain which never sends email and you want to reduce its utility for sending spam. You can safely publish SPF and DKIM records which state this. After all, the way these things go wrong is to reject mail that should have been accepted, and none of this mail should be accepted.

So, you publish an SPF record saying ‟reject all mail from this domain” and you publish a DKIM ADSP record saying, ‟all mail from this domain is signed, discard if not signed” and do not publish any public keys, so that nothing can be verified as signed. No cryptography required!

$ORIGIN example.org.
IN TXT "v=spf1 -all"
_adsp._domainkey IN TXT "dkim=discardable"

Reading

These RFCs might help. For the time being, you can probably ignore Vouch-By-Reference; if my memory serves, when I read it I concluded that it's an attempt to monetise the trust system.

DKIM
  • 4686 Analysis of Threats Motivating DomainKeys Identified Mail (DKIM)
  • 4871 DomainKeys Identified Mail (DKIM) Signatures
  • 5672 RFC 4871 DomainKeys Identified Mail (DKIM) Signatures -- Update
  • 5016 Requirements for a DomainKeys Identified Mail (DKIM) Signing Practices Protocol
  • 5518 Vouch By Reference
  • 5585 DomainKeys Identified Mail (DKIM) Service Overview
  • 5617 DomainKeys Identified Mail (DKIM) Author Domain Signing Practices (ADSP)
SPF
  • 4408 Sender Policy Framework (SPF) for Authorizing Use of Domains in E-Mail, Version 1
Email
  • 5598 Internet Mail Architecture
  • 5322 Internet Message Format
  • 5321 Simple Mail Transfer Protocol {SMTP}
  • 2505 Anti-Spam Recommendations for SMTP MTAs [BCP 30]
  • 5068 Email Submission Operations: Access and Accountability Requirements [BCP 134]

-The Grumpy Troll
Categories: trust dns crypto email dkim spf federation