SuperIMAP - Version 0.1.2
SuperIMAP helps you build email-driven applications. It takes care of connecting to a customer's IMAP inbox, watching for new email, and triggering a webhook to your application when a new email arrives, typically within seconds.
SuperIMAP is built for scale. FiveStreet.com built SuperIMAP as an alternative to Context.io. It contains a subset of Context.io Lite API functionality. As of July 2015, the FiveStreet team runs a SuperIMAP cluster processing ~400k emails per day for thousands of users.
SuperIMAP is written in Ruby and open sourced under the MIT license. Why Ruby?
Contents
Screenshot
Installation
SuperIMAP was built to run on Heroku, but can run in any environment that supports Rails.
Installing on Heroku
- Provision a new Heroku project.
- Add an encrypted database.
- Set the
SECRET_KEY_BASE
andENCRYPTION_KEY
environment variables to something really long and complicated. - Add a Heroku remote endpoint, and push the code.
- Ramp up some workers.
- Seed the database with
heroku run rake db:setup db:seed
Then, log in as the default user: [email protected]
/ password
.
Remember: Change the username and password immediately!
The production Procfile assumes that you are installing on Heroku. As
a result it has multiple definitions for the imap_client
process
corresponding to different sized Heroku dynos. In order for load
balancing to work correctly, you should have all imap_client
processes use the same dyno size. Do not mix and match boxes.
Installing Elsewhere
- Get the code:
git clone https://github.com/rustyio/super-imap.git
- Update
config/database.yml
. (Useconfig/database.yml.example
) - Run
bundle
to install dependencies. - Set the
SECRET_KEY_BASE
andENCRYPTION_KEY
environment variables to something really long and complicated. - Seed the database with
RAILS_ENV=production rake db:setup db:seed
- Start the processes:
foreman start -c "web=1, worker=1, imap_client_1x=1
Then, log in as the default user: [email protected]
/ password
.
Remember: Change the username and password immediately!
The production Procfile assumes that you are installing on Heroku. As
a result it has multiple definitions for the imap_client
process
corresponding to different sized Heroku dynos. In order for load
balancing to work correctly, you should ensure that all imap_client
processes are the same size and point to the same database.
Usage
Set up a Partner
A single SuperIMAP instance can support multiple applications (and/or multiple environments for a single application.) This is done by creating a new "Partner" for every different application (and/or environment).
- Open the SuperIMAP dashboard.
- Click on the "Partners" tab. Click "New Partner".
- Set webhooks to notify your application of the following events:
- A new email has arrived.
- A user has connected their email account.
- A user has disconnected their email account.
Set up a Partner Connection
- Still within the dashboard, click the "Partners" tab.
- Click on the "Connections" link next to your Partner.
- Click on "New Partner Connection".
- Choose an authentication type, and fill out any necessary credentials.
For Gmail, you will need to get OAuth 2.0 Client credentials here from the Developer Console.
Add and Connect a User
- Still within the dashboard, click on "Partners".
- Click on the "Connections" link next to your partner.
- Click on the "Users" link next to your connection.
- Click on the "New User" button.
- Click the 'Connect' link to connect the user to an IMAP provider.
- Send yourself email, and watch the logs!
Add and Connect a User Programatically
The next step is to update your application to handle the process of creating and connecting to SuperIMAP users. Here is some example code:
require 'rest-client'
url = "https://my-app.com/api/v1/connections/GMAIL_OAUTH2/users"
users = RestClient::Resource.new(url, :headers => {
:'x-api-key' => "$API_KEY$",
:content_type => :json,
:accept => :json
})
# Create the user.
users.post(:tag => "MY_USER")
# Get the connect url.
response = users["MY_USER"].get
connect_url = JSON.parse(response)['connect_url']
# Set up the success and failure callbacks.
callbacks = {
:success => "http://my-app.com/connect_callback?success=1",
:failure => "http://my-app.com/connect_callback?failure=1"
}
# Redirect the user to the connect url.
redirect_to connect_url + "?" + callbacks.to_query
Disconnect a User Programatically
Below is sample code to disconnect a user:
url = "https://my-host.com/api/v1/connections/GMAIL_OAUTH2/users"
users = RestClient::Resource.new(url, :headers => {
:'x-api-key' => "$API_KEY$",
:content_type => :json,
:accept => :json
})
# Later, if you want to disconnect the user.
response = users["MY_USER"].get
disconnect_url = JSON.parse(response)['disconnect_url']
# Set up the success and failure callbacks.
callbacks = {
:success => "http://my-app.com/disconnect_callback?success=1"
}
# Redirect the user to the disconnect url.
redirect_to disconnect_url + "?" + callbacks.to_query
Security
This is a good time to mention security. It is a big responsibility to hold the keys to someone's email. Treat it with the appropriate amount of caution.
If you use this code:
- PLEASE ensure that you use very strong, safeguarded passwords.
- PLEASE use enable 2-factor-authentication for your Heroku account.
- PLEASE make sure your entire database is encrypted at rest.
Other security measures within SuperIMAP:
- SSL is required in production.
- Secure fields (e.g. passwords and other credentials) are never exposed via the web interface, and are encrypted in the database.
- Passwords are not recoverable by email.
- Accounts are locked for an hour after three invalid password attempts.
Webhooks
SuperIMAP sends new email events (and other events) to your applications through webhooks:
- All webhooks are dispatched through delayed jobs.
- Webhooks will be retried up to 6 times, with exponential backoff.
- Webhooks will fail if the receiving server takes more than 30 seconds to respond.
- Webhooks expect a success response (HTTP code 200 - 206).
- A "Forbidden" response code of 403 will automatically archive the user.
- Any other response codes count as an error, and will trigger a retry.
Webhook Security
All webhooks are signed. You can validate the signature as follows:
# Parse the incoming JSON body.
json_params = JSON.parse(request.raw_post)
# Calculate expected signature.
digest = OpenSSL::Digest.new('sha256')
api_key = Rails.application.config.super_imap_api_key
sha1 = json_params['sha1']
timestamp = json_params['timestamp']
expected_signature = OpenSSL::HMAC.hexdigest(digest, api_key, "#{timestamp}#{sha1}")
# Get actual signature.
actual_signature = json_params['signature']
# Compare signatures.
valid = expected_signature == actual_signature
New Mail Webhook
Called when a new mail arrives in a user's inbox.
timestamp
- Timestamp the webhook was sent. Seconds since Jan 1, 1970.sha1
- The SHA1 hash of the rfc822 parameter.imap_provider_code
- The IMAP provider code (e.g. "GMAIL_OAUTH2")user_tag
- The user's tag.- `envelope - The email envelope, including date, subject, from, sender, reply_to, to, cc, bcc, in_reply_to, and message_id.
rfc822
- The raw body of the email. http://www.w3.org/Protocols/rfc822/
User Connected Webhook
Called when a user has successfully authenticated with an IMAP provider. Only applies to OAuth connections at the moment.
timestamp
- Timestamp the webhook was sent. Seconds since Jan 1, 1970.sha1
- The SHA1 hash of the user's tag.imap_provider_code
- The IMAP provider code (e.g. "GMAIL_OAUTH2")user_tag
- The user's tag.email
- The email address with which the user authenticated.
User Disconnected Webhook
Called when a user has disconnected from an IMAP provider. Only applies to OAuth connections at the moment.
timestamp
- Timestamp the webhook was sent. Seconds since Jan 1, 1970.sha1
- The SHA1 hash of the user's tag.imap_provider_code
- The IMAP provider code (e.g. "GMAIL_OAUTH2")user_tag
- The user's tag.
API
All API calls are scoped by partner. To authenticate, send the Partner's API key using a header or a parameter. (A header is preferred because it won't normally appear in HTTP logs.)
# Access the API curl:
curl -H "Accept: json" \
-H "x-api-key:APIKEY" \
https://my-host.com/api/v1/connections
# Access the API using the rest-client gem:
url = "https://my-host.com/api/v1"
resource = RestClient::Resource.new(url, :headers => {
:'x-api-key' => "$API_KEY$",
:content_type => :json,
:accept => :json
})
resource['connections'].get
/api/v1/connections
GET
Get a list of connections for the specified partner.
POST
Create a new connection.
imap_provider_code
is required.- Other required parameters depend on the IMAP Provider used.
/api/v1/connections/:IMAP_PROVIDER_CODE
GET
Get information about a given connection.
PUT
Update settings for a given connection. The required parameters depend on the IMAP provider used.
DELETE
Delete a connection and all underlying user data.
/api/v1/connections/:IMAP_PROVIDER_CODE/users
GET
Get a list of users for the specified IMAP Provider.
POST
Create a new user.
tag
- Required, a unique tag within the scope of a partner connection, selected by the partner application.- Other required parameters depend on the IMAP Provider used.
/api/v1/connections/:IMAP_PROVIDER_CODE/users/:TAG
GET
Get information about the given user, including:
email
- The IMAP email address to which the user's account is connected.connected_at
- The date when the user's account was connected. Present only if connected.connect_url
- Redirect to this url to connect a user to a provider. For OAuth based IMAP providers, this begins the OAuth dance.disconnect_url
- Redirect to this url to disconnect a user from a provider.
PUT
Update a user. The required parameters depend on the IMAP provider used.
DELETE
Archive a user. The user can be restored in the web interface, or by updating the user (ie: a PUT request.
Operations
Process Types
SuperIMAP consists of 3 different processes, all written in Ruby / Rails:
web
- Serves the admin interface and the API.imap_client
- Handles the task of connecting to IMAP providers and listening for email.worker
- Processes background jobs generated by the 'imap_client' process.
Environment Variables
ENCRYPTION_KEY
- If provided, used to encrypt passwords and secret keys. Required in production.MAX_USER_THREADS
- Change the maximum number of user threads. Default is 500.NUM_WORKER_THREADS
- Change the number of worker threads. Default is 5.MAX_EMAIL_SIZE
- Change the maximum email size. Default is 1 MiB (1,048,576 bytes).TRACER_INTERVAL
- Interval, in seconds, between outgoing tracer emails. Default is 600 seconds (10 minutes).NUM_TRACERS
- Number of tracers to send at the end of each tracer interval. Default is 3.
Scaling
To scale SuperIMAP, you will mainly want to increase the number of IMAP Client processes. The IMAP Client processes automatically publish a heartbeat every 10 seconds. Other instances look for this heartbeat and re-calculate which neighboring processes are alive based on any processes that have published a heartbeat within the last 30 seconds.
The IMAP Client processes re-balance users every 10 seconds. If no new instances have entered or left the pool, then re-balancing will have no effect.
If a new IMAP Client instance is started, then a small number of users
will be taken from each running instance and handed to the new
instance. If one of the IMAP Client instances is stopped it is removed
from the pool, then it's users will be evenly distributed to the
remaining instances (assuming they are still below the
MAX_USER_THREADS
threshold.)
There is no "master" process that decides which IMAP Client process should handle a given user. SuperIMAP uses a Rendezvous Hash to allow IMAP Client instances to agree on how to evenly assign users without any central coordination. The algorithm assumes that all SuperIMAP instances have roughly the same number of resources.
Monitoring
SuperIMAP publishes some useful monitoring information in the logs. This includes:
imap_client.user_thread.count
- The size of the imap client work queue. Backups may indicate that your servers are overloaded.imap_client.total_emails_processed
- The total number of emails processed since the instance was started.imap_client.work_queue.length
- The number of user threads. This indicates how many users are connected on a given IMAP Client instance.work_queue.latency
- The latency, in seconds, between when an item is added to the work queue and when it is processed.
These metrics are published in a format that can be consumed by the Librato Add-On in Heroku. See https://devcenter.heroku.com/articles/librato#custom-log-based-metrics for more information.
Apart from keeping an eye on these metrics, SuperIMAP should need no other regular metrics.
You may also want to keep an eye out for any failing Delayed Job tasks. You can view these from the Admin site.
Tracer Emails
SuperIMAP has the ability to give you useful monitoring information through "tracer emails". The system will send a specially formatted email to an account, wait for the incoming email, and log the results. The logs can be accessed through the "Tracer Logs" tab.
To enable Tracer Emails, navigate to a user and check the "Enable Tracer" checkbox. It is recommended that you create a few dummy email addresses to use for tracer emails.
By default, a cluster of three tracers are sent every ten minutes from
each imap_client
instance to a random tracer-enabled user managed by
that instance.
Keep in mind that this could generate a lot of email. Three emails every ten minutes works out to ~430 emails per day.
Performance
SuperIMAP's architecture makes judicious use of system resources:
All connections to the IMAP server are managed by separate "user
threads", but these threads sit dormant most of the time. When
anything interesting happens that requires real work, the operation is
queued and handled by a worker in a worker pool. The size of the
worker pool is controlled by the NUM_WORKER_THREADS
environment
variable. Only worker pools threads, and a small number of other
system threads, require a database connection.
In terms of tradeoffs, this architecture chooses to slightly degrade an individual user's response time in favor of making sure that the system will not get overloaded when things get rough. When things get busy, the work simply builds up in the queue. The size of the worker queue, and the queue latency, becomes a rough measure of system health.
Typically, a SuperIMAP box is resource-limited by the number of user
processes that can be started. SuperIMAP requires 2 user processes for
each user's IMAP connection. On Heroku, the number of user processes
are limited at 256 for a 1X box, 512 for a 2X box, and 32,767 for a PX
box. You can set this at home using ulimit -u
. Divide this in half
to get the maximum number of users that the SuperIMAP process can
manage.
FiveStreet.com uses SuperIMAP to manages thousands of users and process over 1M incoming emails per week (as of January 2015). We currently run this load on a single Heroku PX dyno, with plenty of headroom. Our SuperIMAP instance serving thousands of users requires just 10 database connections, uses about 3GB of RAM, and has a 0.50 load average. The work queue usually sits near 0, with a latency of < 0.5 seconds.
Why Ruby?
At first glance, and from a purely technical point-of-view, Ruby is a poor choice for an application like SuperIMAP. SuperIMAP is highly concurrent, and Ruby is bad at concurrency.
Specifically, the imap_client
process spawns what could technically
be described as a "boatload" of threads (2 threads per connected user,
plus a handful of other threads). Ruby threads are heavyweight, so the
interpreter has to burn significant resources just to create and
schedule the threads before it can do any real work.
Using Erlang, Go, or Rust (all of which support lightweight threads and actor-style programming) would have made the concurrent bits of SuperIMAP less tricky to write, and would have required fewer computing resources, possibly allowing a single box to handle tens of thousands of active users.
So, why Ruby? A few reasons:
-
FiveStreet uses Ruby - The primary goal of SuperIMAP is to power a critical part of FiveStreet's application. FiveStreet.com is written in Ruby. It is built and maintained by a small team of Ruby engineers. Introducing a new language would force the team to spend dozens of hours learning a new stack and maintaining a new development environment.
-
Low barrier to entry - A secondary goal of SuperIMAP is to become a healthy open-source project. Based only on language popularity, it is more likely that another team can use, troubleshoot, and contribute to a Ruby-based SuperIMAP than an Erlang/Go/Rust-based SuperIMAP.
-
For us, the cost savings are small - FiveStreet's SuperIMAP cluster currently runs on three commodity servers and easily handles thousands of users. It's possible that if SuperIMAP were written in a different language, we could handle the load on a single machine, saving us a few hundred dollars a month. Not worth changing our stack for it.
-
The concurrency is not complicated - The concurrency in SuperIMAP is fairly straightforward -- one parent process, many child processes. It's a little painful to solve the problem in Ruby, but not impossible.
-
Ruby has mature IMAP and OAuth 2.0 libraries - Ruby has a built-in IMAP library, and a widely-used OAuth 2.0 library. As of 2015, the IMAP and OAuth 2.0 libraries for other languages are far less mature.
Side Note: This was a deeply considered choice. I (Rusty Klophaus, the author of SuperIMAP) spent about 4 years writing Erlang professionally. It's a fascinating language.
Appendix
Understanding OAuth 2.0
SuperIMAP can authenticate to email providers using OAuth 2.0. OAuth 2.0 can be difficult to understand. Here is how it works, in the context of a user authenticating to Gmail through SuperIMAP:
On your application:
- John visits a "Connect Your Email" page on your website.
- On the server side, your application uses the SuperIMAP API to fetch a special url called a
connect_url
. (sample code) - John clicks a link and is redirected to the
connect_url
.
On your SuperIMAP instance:
- The
connect_url
links to a SuperIMAP page. - The SuperIMAP page construct a special URL and redirects to Google.
On Google:
- Google displays a page asking John to confirm certain permissions for your application.
- John clicks the "Approve" button.
- Google redirects John to SuperIMAP and includes a "refresh token" parameter.
On your SuperIMAP instance:
- SuperIMAP grabs the "refresh token", and issues a server side request for an "access token".
- SuperIMAP saves the "access token". This is what allows SuperIMAP to connect to Google on behalf of the user in the future.
- SuperIMAP redirects the user back to the "success" page provided in step #2 above.
On your application:
- Your application tells John that the connection was successful.
To get one step more complicated, OAuth is secured in a few different ways:
- Google requires your application to pre-register callback URLS for your app. Sending the user back to a non-registered URL will fail.
- The access token is tied to a client id and a client secret on our site. Someone needs all three to authenticate as the user.
- There is more too it, but that's the extent of my knowledge.
If you want more detail, here's a video tutorial: https://www.youtube.com/watch?v=tFYrq3d54Dc
The OAuth settings are configured through the Google Developer console: https://console.developers.google.com/
Understanding IMAP
Once you are authenticated to an IMAP server, IMAP itself is a fairly straightforward protocol. It consists of simple plain text commands and responses. The commands and responses are tagged, allowing multiple commands to run in parallel.
Below is a sample IMAP session, taken directly from the Internet Message Access Protocol RFC (3501):
S: * OK IMAP4rev1 Service Ready
C: a001 login mrc secret
S: a001 OK LOGIN completed
C: a002 select inbox
S: * 18 EXISTS
S: * FLAGS (\Answered \Flagged \Deleted \Seen \Draft)
S: * 2 RECENT
S: * OK [UNSEEN 17] Message 17 is the first unseen message
S: * OK [UIDVALIDITY 3857529045] UIDs valid
S: a002 OK [READ-WRITE] SELECT completed
C: a003 fetch 12 full
S: * 12 FETCH (FLAGS (\Seen) INTERNALDATE "17-Jul-1996 02:44:25 -0700"
RFC822.SIZE 4286 ENVELOPE ("Wed, 17 Jul 1996 02:23:25 -0700 (PDT)"
"IMAP4rev1 WG mtg summary and minutes"
(("Terry Gray" NIL "gray" "cac.washington.edu"))
(("Terry Gray" NIL "gray" "cac.washington.edu"))
(("Terry Gray" NIL "gray" "cac.washington.edu"))
((NIL NIL "imap" "cac.washington.edu"))
((NIL NIL "minutes" "CNRI.Reston.VA.US")
("John Klensin" NIL "KLENSIN" "MIT.EDU")) NIL NIL
"<[email protected]>")
BODY ("TEXT" "PLAIN" ("CHARSET" "US-ASCII") NIL NIL "7BIT" 3028 92))
S: a003 OK FETCH completed
C: a004 fetch 12 body[header]
S: * 12 FETCH (BODY[HEADER] {342}
S: Date: Wed, 17 Jul 1996 02:23:25 -0700 (PDT)
S: From: Terry Gray <[email protected]>
S: Subject: IMAP4rev1 WG mtg summary and minutes
S: To: [email protected]
S: cc: [email protected], John Klensin <[email protected]>
S: Message-Id: <[email protected]>
S: MIME-Version: 1.0
S: Content-Type: TEXT/PLAIN; CHARSET=US-ASCII
S:
S: )
S: a004 OK FETCH completed
C: a005 store 12 +flags \deleted
S: * 12 FETCH (FLAGS (\Seen \Deleted))
S: a005 OK +FLAGS completed
C: a006 logout
S: * BYE IMAP4rev1 server terminating connection
S: a006 OK LOGOUT completed
SuperIMAP uses the IDLE command to wait for incoming email, defined in the IMAP4 Idle Command RFC (2177). The IMAP client sends an IDLE command to the server and awaits a response. When an IMAP connection is in IDLE mode, no other commands are allowed.
Development Tasks
The information below is mainly intended at developers who want to modify the SuperIMAP codebase.
Running Unit Tests
Run this once:
RAILS_ENV=test rake db:setup db:seed
Then run all tests:
rake test:all
Running Stress Tests
The stress test exercises the multi-threaded aspects of SuperIMAP, as well as the error recovery code. To do this, we point the SuperIMAP IMAP client code against a local IMAP server and generate a bunch of fake emails for many users.
Additionally, the IMAP server generates 'chaotic' events; it will intentionally generate incorrect or gibberish responses. The SuperIMAP IMAP client code is expected to recover gracefully while using a minimal amount of system resources.
Run this once:
RAILS_ENV=stress rake db:setup db:seed
Then run the stress test:
script/stress-test
Future Work
- Configure stress test to report code coverage.
- Make a way to "sweep" a user's inbox, generating webhook events for all emails.
Contributions
To contribute to this project, please fork and file a pull request. Small patches will be accepted more quickly than large patches.
Changes
Version 0.1.2
- Re-organize and improve documentation in README.md.
- Detect and handle race condition around changing UIDVALIDITY.
- Properly escape OAuth 2.0 disconnect URL.
- Improve usage of Rails connection pool.
- Clean up heartbeat records during exit.
- Synchronize access to shared objects.
License
The MIT License (MIT)
Copyright (c) 2015 Rusty Klophaus / FiveStreet.com
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.