One *Billion* Emails
In email marketing there are senders of all shapes and sizes, from small businesses using self-serve ESPs to the largest web properties self-sending to massive user bases. While only a few senders will reach or exceed volumes of one billion messages per month, the tools and practices needed to achieve such a volume level are applicable to all senders who want to succeed in email marketing.
Who Am I?
My name is Mike Hillyer (click here for bio and social links). I manage a team of Sales Engineers for Message Systems, a leading provider of digital messaging solutions for both senders and receivers. In my work over the last several years I have helped a number of clients reach the billion messages per month level and even more clients successfully deploy email marketing solutions ranging in scale from hundreds of thousands to millions of messages per month.
Who Needs To Send This Much Mail?
Contrary to what the image to the image above implies, there’s nothing inherently evil about sending a billion messages a month. Some of the businesses that move a billion messages a month include ESPs, social networks (some move more than a billion a day for that matter), social gaming sites and large online retailers.
Any time you have a fairly large number of users (5-20 million) who receive multiple messages per day, or a really large number of users (40-50 million) receiving one message per day you are heading into the billion messages per month territory.
What Are The Numbers?
So exactly how much mail are we talking about here? That will depend on sending patterns:
In a lot of high-volume environments the sending will be to a world-wide audience, resulting in round-the-clock sending with no significant bursts of traffic. In such an environment the hourly volume will be 1,000,000,000 messages divided by 30 days divided by 24 hours equaling 1,388,889 messages per hour (386 messages per second), assuming 30 days in a month.
In an environment with inconsistent hourly volumes, we have to allow for both an average hourly volume and a maximum hourly volume and then design our solution to address the maximum hourly volume.
We need to look at seasonal factors: Does your social network move a lot of extra messages around Mother’s Day? Does your dating site move a lot of extra messages around Valentine’s day? Does your web shopping portal do a lot of extra business around Christmas?
We need to look at growth: If you are sending a billion messages a month it is very likely due to successful growth of your user base, something which you certainly have no intention of slowing. Look at how you have grown your email volume so far and extrapolate it out for the next year or two (especially if you only get budget for your infrastructure every two years).
Let’s assume for the sake of this article that you have an average volume of one million messages per hour with a peak volume of two and a half million messages per hour during your busiest season. You expect to double your user base each year for the next two years. At the end of two years you expect to be sending ten million messages per hour, or 7.2 billion messages per month (I’ve seen just this kind of growth several times with customers and prospects).
You Will Need to Send In-House
A lot of senders start by using an Email Service Provider (ESP) for their sending and should do so: an ESP provides infrastructure and expertise to handle the details of sending email marketing messages for their clients at a good price, allowing companies to focus on their business. In addition, the costs of installing and maintaining proper sending infrastructure and practices are not justifiable for most low-volume senders.
That said, if you are aiming for a billion email marketing messages a month and are using an ESP it’s time to plan your move to in-house sending. Assuming a $1.00 CPM (Cost Per Mille with Mille being Latin for thousand, so cost per thousand) you are looking at paying an ESP a million dollars a month to handle this kind of volume. Naturally you can probably secure a better rate than $1.00 CPM at these volume levels but regardless of the discount at this volume level you will pay less to buy the infrastructure and hire the people needed to do this yourself, gaining the control you need when sending at these volume levels.
Start With a Good Reputation
In order to hit the volume levels we’re talking about it is going to be vital that you have a solid sending reputation. This means you need to follow best practices for list acquisition, list hygiene, segmentation and relevancy. There’s a wealth of information online and an excellent catalog of it at Email Marketing Reports. This article will focus primarily on the technical aspects of sending one billion email messages per month but keep in mind that reaching one billion messages a month without a solid reputation on your domain and sending IPs is very difficult. A number of tools for checking the reputation of your IP addresses can be found at Word to the Wise. At this volume level email is key to your business and a solid reputation is going to be essential.
From a technical perspective there’s a number of bases we need to cover regarding authentication, whitelisting, bounce processing and complaint handling.
As a reputable sender you will want to associate your IP addresses with your domain using the authentication standards available to you. These include SPF, SenderID, DomainKeys (DK) and DomainKeys Identified Mail (DKIM). There are indications that SPF (and SenderID by association) is ineffective but given the low effort required to implement it I would recommend doing so anyway. While SPF and SenderID are purely DNS-based, DK and DKIM require an implementation either during message creation or during relay by the MTA and as a result will impact the maximum throughput of your infrastructure (more about this later).
DomainKeys is quickly being superseded by DomainKeys Identified Mail but with most solutions supporting both DK and DKIM it is simple enough to use both when sending to an ISP that supports one standard or the other. Implementation details will vary based on your sending solution. While some recommend selectively signing DK and DKIM for only messages sent to ISPs that are known to check authentication (in order to lower the impact signing has on throughput on a solution that takes a significant performance hit from signing), I recommend signing all messages; you never know who is checking for authentication without announcing it.
One benefit of getting on the various whitelists provided by ISPs and reputation providers is that in some cases you can send higher volumes on whitelisted IP addresses than would otherwise be possible. Keep in mind that in most situations whitelisting is something that comes after sending has already begun in order to allow the provider of the whitelist to examine your sending patterns as part of the whitelisting process, so put your best foot forward (and follow it up with consistent behavior).
One quick way to lose reputation is to repeatedly send mail to recipients that do not exist. The ISPs will track how many non-existing addresses you send to and throttle you accordingly. Even more seriously, ISPs will occasionally take inactive email addresses and re-activate them as spam traps; any mail sent to the address will immediately get the source classified as a spam source and filtered accordingly.
To prevent this it is necessary to capture and act on the responses sent by the ISPs and unsubscribe those addresses identified as non-existent or inactive, while retaining those with responses that identify users on vacation and other not-fatal errors. Commercial sending solutions will perform this automatically with varying levels of effectiveness while other platforms will require a third-party solution such as Boogie Tools. Keep in mind that the more you send, the more you receive back in the form of automated responses and bounce notifications. As your reply addresses reach more and more users the flow of notifications will become contaminated with spam and virus-carrying messages, requiring the implementation of Anti-Virus/Anti-Spam solutions for your incoming mail stream.
In an effort to help senders improve their practices, a number of ISPs have implemented ARF formatted Feedback Loop programs. When a user on a supported ISP clicks the “This is Spam” button, an automated message is sent to an address you define in advance (when signing up with the ISP for the Feedback Loop program). By processing these messages and un-subscribing the relevant users, you prevent further reputation damage that may result when sending them future messages.
The ARF format used by the ISPs makes it relatively straightforward to process Feedback Loop messages and use them to unsubscribe the users who have complained about your messages. There are tools available to process ARF formatted messages and some sending solutions will handle FBL messages natively.
There are a number of architectural components that come into play to make it possible to send email marketing messages at volume levels of one billion email messages per month (or more) including network connectivity, server hardware and software.
Most professional sending operations are based in rented datacenters, simplifying the provisioning of network connectivity. In our initial example of a maximum throughput of 2.5 million messages per hour we’ll use a sample message size of 50 kilobytes (51,200 bytes), meaning that we need to send at a rate of 2,500,000 * 51,200 = 128,000,000,000 bytes per hour or 271.2 megabits per second.
With the throughput we’re talking about we certainly need to use gigabit speed networking within the datacenter and, more importantly, need backbone connectivity that can support not only a sustained throughput of 271 megabits per second but than can handle our future needs of 7.2 billion messages per month. You need to look at a datacenter that will be able to provide sustained gigabit speeds to the backbone.
Keep in mind that when you are sending a billion messages per month it means that email has significant impact on your bottom line and you won’t be able to tolerate extended outages. You need to not only make sure that the datacenter you choose has redundant power and backbone connections, you also need to consider using redundant datacenters.
Moving over a million messages per hour does not require the purchase of custom server hardware but it does require making a proper investment in hardware. Generally speaking you will be using an infrastructure similar to the following:
The Message Injector queries the database and uses the results to assemble one or more messages which it relays to the Outbound Mail Server. The Outbound Mail Server queues the message, performs any necessary manipulations on the message and then sends it via the Internet to the destination server. In the event of a delivery failure message or a feedback loop message, the incoming message arrives via the Internet to the Inbound Mail Server. The Inbound Mail Server performs anti-virus/anti-spam scanning and then, in the case of a legitimate message, processes the message and updates the subscriber information in the database (not all server solutions can perform this processing in-stream, when using such solutions an intermediate server will be needed to accept the clean message from the Inbound Mail Server and process it using custom code).
In a production deployment there can be several variations on this example, typically with multiple servers used on the outbound and inbound roles, with multiple message injectors pushing to the outbound machines and often specialty servers on the inbound side dedicated to processing incoming feedback loop and bounce messages.
I generally recommend mail servers similar to the following:
- 2x multi-core, 64-bit processors
- 16-32 GB of RAM
- 8x 15K RPM hard-disks
- Battery-backed RAID-10 controller
The specific details of your hardware selection will depend on the ability of your specific software to leverage the resources provided. A large number of fast disks in a RAID-10 array is recommended for the message spool as standards-compliant mail servers must write messages to disk before accepting them for delivery, placing significant demands on storage resources.
As an employee of a leading software provider for high-volume senders you would rightly expect me to recommend a commercial solution, and specifically my company’s solution. I’d like to take a moment to point out why:
We need to send at a rate in excess of one million messages per hour. I’ve dealt with a number of solutions and my experience has shown that most Open Source MTAs such as Postfix and Sendmail are limited to around 100,000 messages per hour. Commercial sending solutions typically show real-world performance ranging from 500,000 messages per hour to over two million messages per hour.
I have helped several companies that were operating dozens of Open Source servers to consolidate down to one-tenth as many servers running a commercial solution.
In addition to limited throughput, Open Source MTAs are usually limited to sending through a single IP address, meaning that to send through ten IP addresses you need ten separate server instances. Commercial solutions support sending through multiple IP addresses simultaneously.
Commercial solutions go beyond basic message queuing and sending, providing the additional functionality required for a high-volume sender. This includes features such as APIs, bounce classification, feedback loop processing, internal scripting, automated throttling, and database integration.
If you’re sending a billion emails a month, you absolutely need a solution that provides high availability out of the box. If a server goes down you can’t afford to be frantically activating a warm spare, just to find out that it too has some issue. You need an active-active solution that reacts automatically to server failures and keeps the mail flowing.
You need a solution that can be easily managed on your terms, whether you prefer editing configuration files or using a web interface. In addition, you need something that grows with you, providing centralized management of an entire cluster of servers. Commercial solutions will provide easier, centralized management.
One key to successful sending at high volumes is keeping tabs on how your server is performing and how your mailings are doing. You need to know what is passing through your server, how quickly messages are moving, whether queues are backed up, how the various ISPs are treating your traffic, all with the ability to drill down on specific source IPs and specific destination ISPs. You need to be able to see all of this in real-time and across your entire infrastructure. A good commercial solution provides all of this out of the box.
Time is Money
On multiple occasions I’ve seen organizations choose a free or low-cost solution and then spend countless hours building workarounds to the weaknesses of their chosen platform, writing scripts to automate administration, reporting tools to fill their needs, failover scripts to provide redundancy, etc.
While a lot of this work was impressive, it required time to implement and time to maintain. Time spent creating tools that are already provided in an alternative solution is time (and money) wasted. You are always better off using your time to create your “special sauce”; that which makes your business unique and gives you a competitive advantage.
The Price of Success is Continued Vigilance
When sending at a rate of one billion messages a month (or more), you can’t just use a ‘fire and forget’ mentality. You are going to have to have people around to keep a constant eye on what is happening in your environment, monitoring multiple key factors to ensure you can continue to successfully send.
Remember earlier when I said you need to start with a good reputation? You also need to keep a good reputation, and the only way to do that is to know what your reputation is. You will need to take advantage of reputation monitoring tools provided by companies like Return Path and Pivotal Veracity as well as keep a close eye on the reporting produced by your sending solution (remember when I said you need good reporting?)
You need to watch things such as bounce rates, FBL hit rates, blacklist hits, transient failures and response rates.
If email is your company’s lifeblood (and if you’re sending a billion messages a month it certainly is) then you need to make sure to keep it flowing no matter what happens, and that means making sure your email infrastructure stays online. I spoke earlier of the need for high-availability, active monitoring goes hand-in-hand with this need. You will need to monitor the health of the servers that support your infrastructure, the network components that carry your messages and the software that creates and relays your messages.
There are a number of monitoring solutions available to accomodate any platform and budget, be sure to implement one that meets your needs and get monitoring. Make sure to test simulated failures to confirm that monitoring is working successfully. Consider setting up a simulated mailing that runs on a regular basis using your full infrastructure stack: a monitoring script can check an inbox and if the test message fails to appear, something is potentially wrong in your sending infrastructure. This approach can help identify issues that may pass by other monitoring systems unnoticed (and can be integrated into some monitoring solutions directly).
Keep in mind that you get what you monitor for; if you focus too much on one metric it may improve without helping the big picture. In addition to making sure all the underlying pieces are in place, don’t forget to keep an eye on things where the rubber meets the road. You may be sending at phenomenal rates with great metrics but failing to generate customer actions that lead to revenue.
While by no means an exhaustive list, I hope this gives you some idea as to the scope of sending in a high-volume environment of one billion email messages per month. Watch this space over the coming weeks for deeper dives into some of the subjects covered here.
Questions? Did I miss something? Let me know in the comments!
The opinions and information in this post are my own and do not necessarily reflect those of my employer.