--- name: inbound-processing description: Receive, parse, and process incoming email via provider webhooks. Use when setting up inbound email handling, parsing MIME messages, extracting content from replies, detecting threads, filtering spam on inbound, or routing incoming messages. license: MIT --- # Inbound Email Processing Receive incoming email, parse it into structured data, and route it to the right place. ## When to use this skill - Setting up inbound email processing for the first time - Choosing between provider inbound features (Postmark, SendGrid, Mailgun, SES) - Parsing MIME messages (multipart bodies, attachments, inline images) - Extracting clean content from HTML email or stripping quoted replies - Building thread detection from email headers (In-Reply-To, References, Message-ID) - Filtering inbound email for spam, phishing, or injection attacks - Designing routing logic for incoming messages (support, billing, leads, etc.) - Handling webhook payloads from email providers ## Related skills - `domain-authentication` - SPF/DKIM/DMARC setup that affects inbound auth verification - `reply-classification` - classifying reply intent (interested, OOO, objection, etc.) - `thread-management` - maintaining full conversation context across messages - `webhook-processing` - general webhook handling patterns (retries, idempotency) - `email-security` - injection attacks, content sanitization, phishing prevention - `bounce-handling` - processing delivery failures from outbound sends --- ## How inbound email works When someone sends an email to your domain, it hits an MX server. You have two options: 1. **Run your own mail server** - receive raw SMTP, parse MIME yourself. High control, high maintenance. Almost never worth it for application developers. 2. **Use a provider's inbound feature** - the provider receives the email, parses it, and POSTs structured data to your webhook URL. This is what you should do. The provider handles MX record reception, MIME parsing, spam pre-filtering, and delivers a clean JSON payload to your endpoint. You handle business logic. --- ## Provider inbound features ### Postmark The cleanest developer experience for inbound. Postmark parses emails and POSTs JSON to your webhook URL. **Setup:** 1. Point your MX record to Postmark's inbound servers 2. Configure the webhook URL in your Postmark server settings 3. Postmark POSTs JSON for every inbound message **Key payload fields:** ```json { "From": "sender@example.com", "FromFull": { "Email": "sender@example.com", "Name": "Jane Smith" }, "To": "support+ref123@yourdomain.com", "ToFull": [{ "Email": "support+ref123@yourdomain.com", "Name": "" }], "Subject": "Re: Your proposal", "TextBody": "Looks great, let's schedule a call.", "HtmlBody": "...", "MessageID": "", "Headers": [ { "Name": "In-Reply-To", "Value": "" }, { "Name": "References", "Value": "" }, { "Name": "Authentication-Results", "Value": "spf=pass; dkim=pass; dmarc=pass" } ], "Attachments": [ { "Name": "proposal.pdf", "Content": "base64-encoded-content", "ContentType": "application/pdf", "ContentLength": 54321 } ], "MailboxHash": "ref123" } ``` **MailboxHash trick:** Postmark parses the `+` portion of the To address into `MailboxHash`. Send from `support+userId123@yourdomain.com`, and when the reply comes back, `MailboxHash` is `userId123`. Use this for stateless thread/user association without database lookups. **Retry behavior:** Postmark retries on non-2xx responses. Return 200 quickly and process asynchronously. ### SendGrid (Inbound Parse) SendGrid's Inbound Parse posts email data as `multipart/form-data`, not JSON. This catches people off guard. **Setup:** 1. Add an MX record pointing to `mx.sendgrid.net` (priority 10) 2. Configure the Inbound Parse webhook URL in Settings > Inbound Parse 3. Optionally enable spam checking (for emails under 2.5 MB) **Key form fields:** | Field | Content | |-------|---------| | `from` | Sender address | | `to` | Recipient address | | `subject` | Subject line | | `text` | Plain text body | | `html` | HTML body | | `envelope` | JSON string with actual SMTP envelope sender/recipients | | `headers` | Full raw headers as a single string | | `attachments` | Number of attachments | | `attachment1`, `attachment2`... | File uploads | **Important:** The `headers` field is a raw string, not parsed JSON. You need to parse it yourself to extract `In-Reply-To`, `References`, and `Authentication-Results`. **Raw mode:** If you need the full raw MIME message (for your own parsing or archival), enable "Post the raw, full MIME message" in settings. The raw message arrives in the `email` field. ### Mailgun Mailgun's Routes feature is the most flexible for pattern-based inbound routing. **Setup:** 1. Point MX records to Mailgun's servers 2. Create Routes with match expressions and actions **Route matching examples:** ``` # Match a specific address match_recipient("support@yourdomain.com") -> forward("https://your-api.com/webhooks/support") # Catch-all for a domain match_recipient(".*@yourdomain.com") -> forward("https://your-api.com/webhooks/inbound") # Match by header match_header("subject", ".*urgent.*") -> forward("https://your-api.com/webhooks/urgent") ``` **Payload:** Mailgun POSTs `multipart/form-data` with fields like `sender`, `recipient`, `subject`, `body-plain`, `body-html`, `stripped-text` (body without quoted parts), `stripped-html`, and `Message-Id`. **Stripped content:** Mailgun is the only major provider that strips quoted reply text for you automatically. The `stripped-text` and `stripped-html` fields contain only the new content, not the quoted thread below. This saves you from implementing your own reply stripping. ### AWS SES SES is the most powerful option but requires the most assembly. It does not POST webhooks - it stores raw messages and notifies you. **Setup:** 1. Verify the domain in SES 2. Create Receipt Rules that define what happens when email arrives 3. Chain actions: store to S3, notify via SNS, invoke Lambda **Architecture pattern:** ``` Email arrives -> SES Receipt Rule matches recipient -> Store raw MIME in S3 -> Publish SNS notification -> Lambda triggered by SNS -> Parse MIME from S3 -> Process and route ``` **Key considerations:** - SES inbound is only available in US East (N. Virginia), US West (Oregon), and EU (Ireland) - Maximum email size is 40 MB (including headers) - You get the raw MIME message, not parsed fields - you must parse it yourself - Lambda can be invoked synchronously (to control mail flow with STOP_RULE/CONTINUE) or asynchronously (fire-and-forget processing) - Receipt Rules evaluate in order; processing stops at the first match unless you return CONTINUE **When to use SES:** When you need raw MIME access, want to store every message in S3 for compliance, or are already deep in the AWS ecosystem. Not recommended if you just want parsed JSON. --- ## MIME parsing If you are processing raw email (from SES, or using raw mode on other providers), you need to understand MIME structure. ### Multipart message structure A typical email with HTML body and attachments has this MIME tree: ``` multipart/mixed +-- multipart/alternative | +-- text/plain (plain text body) | +-- multipart/related | +-- text/html (HTML body) | +-- image/png (inline image, referenced by Content-ID) +-- application/pdf (attachment) ``` **Key multipart types:** | Type | Purpose | |------|---------| | `multipart/mixed` | Top-level container when message has attachments | | `multipart/alternative` | Same content in multiple formats (text + HTML) | | `multipart/related` | HTML body with inline resources (images referenced by `cid:`) | ### Walking the MIME tree Parse in this order: 1. Check the top-level `Content-Type`. If it is `multipart/*`, descend into parts. 2. For `multipart/alternative`, prefer `text/html` for rendering, keep `text/plain` as fallback. 3. For `multipart/related`, the first part is the HTML body. Subsequent parts are inline resources. Match them using `Content-ID` headers (the HTML references them as `src="cid:image001"`). 4. For `multipart/mixed`, iterate children. Parts with `Content-Disposition: attachment` are attachments. Parts with `Content-Disposition: inline` are inline content. 5. For each leaf part, decode based on `Content-Transfer-Encoding` (usually `base64` or `quoted-printable`). ### Content-ID and inline images Inline images use the `Content-ID` header to create a reference that the HTML body can embed: ``` Content-Type: image/png Content-ID: Content-Disposition: inline Content-Transfer-Encoding: base64 ``` The HTML body references this as ``. When processing inbound HTML, you can either: - Replace `cid:` references with data URIs (for immediate display) - Upload inline images to your own storage and rewrite the `src` attributes - Strip inline images entirely if you only need the text content ### Character encoding The `Content-Type` header specifies the charset: `Content-Type: text/plain; charset=utf-8`. Common charsets you will encounter: - `utf-8` - the standard, handles everything - `iso-8859-1` / `latin1` - Western European, still common in legacy systems - `windows-1252` - Microsoft's extension of ISO-8859-1 - `iso-2022-jp` - Japanese email, especially from older systems Always normalize to UTF-8 after decoding. Libraries like `iconv-lite` (Node.js) or Python's built-in `codecs` handle this. ### Parsing libraries Don't write your own MIME parser. Use battle-tested libraries: | Language | Library | Notes | |----------|---------|-------| | Node.js | `mailparser` (from Nodemailer) | Full-featured, handles edge cases well | | Node.js | `postal-mime` | Lightweight, works in workers/edge | | Python | `email` (stdlib) | Built-in, handles most cases | | Go | `net/mail` + `mime/multipart` | Standard library, lower-level | | Ruby | `mail` gem | Mature, widely used | | C#/.NET | `MimeKit` | The gold standard for .NET MIME parsing | --- ## Email header parsing ### Threading headers Three headers control email threading. All are defined in RFC 5322. **Message-ID:** A globally unique identifier for each message, enclosed in angle brackets. ``` Message-ID: ``` Generate a unique Message-ID for every outbound email. Format: ``. Without this, replies cannot reference your message. **In-Reply-To:** Contains the Message-ID of the message being replied to. ``` In-Reply-To: ``` This is your primary thread-linking mechanism. When an inbound message has `In-Reply-To`, look up the original send by matching against your outbound Message-IDs. **References:** Contains the Message-IDs of all messages in the thread chain, oldest first. ``` References: ``` When building a reply, set `References` to the parent's `References` (if any) followed by the parent's `Message-ID`. This creates a full thread chain that any email client can reconstruct. ### Thread detection in practice The reliable path for thread linking: ``` 1. Inbound message arrives with In-Reply-To header 2. Look up In-Reply-To value against your stored outbound Message-IDs 3. If found: exact match, high confidence (1.0) 4. If not found: fall back to References header, check each ID 5. If still not found: fall back to heuristic matching ``` **Fallback heuristics** (lower confidence, use with caution): - Match sender email against recent outbound recipients (within 7 days) - Match subject line after stripping Re:/Fwd: prefixes - Match the `+tag` portion of the recipient address (Postmark's MailboxHash pattern) Assign a confidence score to each linking method. Exact `In-Reply-To` match gets 1.0. Heuristic matches should get 0.5 or lower. Let downstream logic (routing, auto-responses) use the confidence to decide how aggressively to act. ### Authentication headers The `Authentication-Results` header is added by the receiving mail server and contains SPF, DKIM, and DMARC verification results. ``` Authentication-Results: mx.yourdomain.com; spf=pass (sender IP is 198.51.100.1) smtp.mailfrom=sender@example.com; dkim=pass header.d=example.com header.s=selector1; dmarc=pass (policy=reject) header.from=example.com ``` Parse this to extract three values: | Mechanism | Values | What it means | |-----------|--------|---------------| | SPF | pass, fail, softfail, neutral, none | Whether the sending IP is authorized | | DKIM | pass, fail, none | Whether the cryptographic signature is valid | | DMARC | pass, fail, none | Whether SPF/DKIM align with the From domain | **How to use auth results for inbound filtering:** - All three pass: sender is authenticated, lower spam score - DMARC fail: the From domain does not authorize this sender - increase phishing/spam score - SPF softfail + DKIM fail: suspicious but not definitive - flag for review - All three fail: very likely spoofed or unauthorized - quarantine or reject Also check the `Received-SPF` header as a fallback for SPF results if `Authentication-Results` does not contain SPF. --- ## Content extraction ### HTML to text conversion When you receive HTML email but need plain text (for classification, search indexing, or display), do not just strip tags. That turns `

Hello

World

` into `HelloWorld`. Proper conversion: - Insert newlines for block elements (`

`, `

`, `
`, `
  • `, ``) - Convert `text` to `text (url)` or just `text` - Convert lists to indented lines with bullets/numbers - Preserve table structure as aligned text where possible - Strip scripts, styles, and hidden elements before conversion Libraries: `html-to-text` (Node.js), `html2text` (Python), `Jsoup` (Java). ### Quoted reply stripping When someone replies to an email, their client includes the original message below a marker line. You want the new content, not the entire quoted history. **Common quote markers:** ``` On Mon, Mar 30, 2026, Jane Smith wrote: ``` ``` From: Jane Smith Sent: Monday, March 30, 2026 ``` ``` > This is quoted text > from the original message ``` ``` -----Original Message----- ``` ``` ________________________________ ``` **Stripping approaches:** 1. **Line-prefix detection:** Lines starting with `>` are quoted. Simple but misses HTML-formatted quotes. 2. **Marker line detection:** Scan for patterns like `On .* wrote:`, `-----Original Message-----`, or `From:.*Sent:.*` blocks. Everything after the marker is quoted. 3. **Provider features:** Mailgun gives you `stripped-text` automatically. Postmark does not. SendGrid does not. 4. **Libraries:** GitHub's `email_reply_parser` (Ruby, with ports to Python, JavaScript, Go) handles the common patterns. Mailgun's `talon` library (Python) uses machine learning for signature and reply detection. **Practical advice:** Start with marker-line detection for the most common patterns. Fall back to `>` prefix detection. Accept that you will never catch 100% of cases - email client formatting is inconsistent. Log raw content alongside stripped content so you can debug false positives. ### Content sanitization Inbound email content is untrusted input. Sanitize before storing or displaying. **Plain text sanitization:** - Strip invisible Unicode characters (zero-width spaces, byte order marks, directional overrides) - Remove data URIs (`data:text/html;base64,...`) that could embed executable content - Truncate to reasonable limits (100 KB for text, 500 KB for HTML, 1 KB for subject lines) - Preserve UTF-8 character boundaries when truncating - do not cut in the middle of a multi-byte character **HTML sanitization:** - Strip `