--- name: thread-management description: Maintain email conversation context across messages using threading headers. Use when building thread reconstruction, linking replies to conversations, detecting thread hijacking, stripping quoted content, or providing thread context to AI agents. license: MIT --- # Thread Management Keep email conversations connected - link replies to their threads, reconstruct conversation history, and detect when threads go wrong. ## When to use this skill - Building or debugging email thread reconstruction from headers - Replies are not grouping into conversations correctly - Implementing In-Reply-To and References headers on outbound replies - Stripping quoted reply content to extract new message text - Providing conversation context to AI agents or LLMs - Detecting thread hijacking or forged thread injection attacks - Threads display differently in Gmail vs Outlook and you need cross-client consistency - Managing long-lived threads that span days or weeks - Splitting or merging conversations programmatically ## Related skills - `inbound-processing` - receiving and parsing incoming email (prerequisite for threading) - `reply-classification` - classifying reply intent once you have thread context - `email-security` - injection and phishing prevention, including thread hijacking - `transactional-email` - sending emails that need proper threading headers --- ## How email threading works Email threading is built on three RFC 5322 headers. Every email client uses some combination of these to decide which messages belong to the same conversation. ### The three threading headers **Message-ID** - a globally unique identifier for each email message. Generated by the sending mail server or client. ``` Message-ID: ``` Format: ``. The local part is typically a UUID, timestamp, or random string. The domain should be the sending server's hostname. Always enclosed in angle brackets. **In-Reply-To** - contains the Message-ID of the message being replied to. Only present on replies, not on original messages. ``` In-Reply-To: ``` **References** - contains the Message-IDs of all ancestors in the thread, oldest first. Grows with each reply. ``` References: ``` ### How threading builds up Original message: ``` Message-ID: Subject: Q3 proposal ``` First reply: ``` Message-ID: In-Reply-To: References: Subject: Re: Q3 proposal ``` Reply to the reply: ``` Message-ID: In-Reply-To: References: Subject: Re: Q3 proposal ``` The References header creates a chain. Any client can reconstruct the full thread tree from it. ### Generating correct Message-IDs A Message-ID must be globally unique. Bad Message-IDs break threading because clients cannot distinguish messages. Good patterns: ``` <{uuid}@{your-sending-domain}> <{timestamp}.{random}@{your-sending-domain}> ``` Bad patterns: ``` <1@localhost> # Not unique # Reused across messages # Missing required angle brackets ``` Use your actual sending domain as the right-hand side. Some providers generate Message-IDs for you - if you need to reference them later (for threading inbound replies), store the provider-assigned Message-ID at send time. --- ## Provider threading behavior Gmail, Outlook, and other clients use different algorithms to group messages into conversations. If your emails thread correctly in one client but not another, this section explains why. ### Gmail Gmail uses a **multi-factor algorithm** combining headers and subject: 1. **References/In-Reply-To headers** - primary signal. Gmail traverses the chain of Message-IDs. 2. **Subject line matching** - secondary signal. Subjects must match after stripping `Re:`, `Fwd:`, and similar prefixes. If the subject changes, Gmail breaks the thread. 3. **Participants** - minor signal. Influences grouping in ambiguous cases. 4. **Time proximity** - prevents old unrelated emails with identical subjects from threading together. Gmail-specific behaviors: - Conversations max out at **100 messages**. After that, a new thread starts automatically. - Gmail assigns a `threadId` internally. You can query it via the Gmail API (`threads.list`, `threads.get`), but it is user-specific - the same conversation has different thread IDs for different participants. - To add a message to an existing thread via the Gmail API, you must set the `threadId` on the message resource AND include correct `References` and `In-Reply-To` headers AND keep the subject matching. ### Outlook / Microsoft 365 Outlook's conversation view groups primarily by **subject line**: 1. **Subject matching** - the dominant factor. Outlook normalizes subjects aggressively (strips `RE:`, `FW:`, `AW:`, `SV:`, etc. across languages). 2. **References/In-Reply-To** - used but weighted less heavily than in Gmail. Outlook may group messages with matching subjects even if References headers do not connect them. 3. **Time window** - Outlook is more liberal with time gaps than Gmail. Outlook-specific behaviors: - Outlook may split a single thread into multiple conversations if replies arrive from different paths (e.g., forwarded chains). - The "Conversation ID" (`ConversationId` in Microsoft Graph API) is assigned based on the subject hash and is shared across participants, unlike Gmail's per-user thread IDs. - `ConversationIndex` is a binary header Outlook uses internally for tree ordering. Do not try to generate this yourself unless you are building Outlook integrations specifically. ### Cross-client threading checklist To thread correctly across both Gmail and Outlook: 1. Always set `In-Reply-To` to the parent message's `Message-ID` 2. Always set `References` to the parent's `References` + parent's `Message-ID` 3. Keep the subject line consistent (prefix with `Re: ` only, do not modify the base subject) 4. Generate globally unique Message-IDs with your sending domain 5. Store outbound Message-IDs so you can reference them when processing inbound replies --- ## Thread reconstruction When you receive inbound email and need to rebuild the conversation, you have two linking strategies. ### Strategy 1: Header-based linking (high confidence) Match the inbound message's `In-Reply-To` header against stored outbound Message-IDs. ``` Inbound arrives with: In-Reply-To: Look up in your database: SELECT request_id FROM send_attempts WHERE provider_message_id = 'send-abc123@mail.yourapp.com' ``` This gives you a direct, high-confidence link (confidence = 1.0) back to the specific outbound message being replied to. From there, you can reconstruct the full thread. If `In-Reply-To` does not match, fall back to parsing the `References` header. It contains all ancestor Message-IDs, so any match connects you to the thread. ### Strategy 2: Heuristic linking (lower confidence) When headers do not match (common when emails are forwarded, or the recipient's client strips headers), fall back to heuristics: - **Sender + recipient + time window**: match the inbound sender against recent outbound recipients within the last 7 days - **Subject similarity**: strip `Re:`, `Fwd:` prefixes and compare normalized subjects - **Domain matching**: ensure the inbound sender's domain matches the outbound recipient's domain Heuristic linking should be flagged as lower confidence (e.g., 0.5) so downstream systems can treat it accordingly. ```typescript // Example: heuristic fallback linking type LinkResult = { requestId: string; method: 'in_reply_to' | 'references' | 'recipient_recent' | 'subject_match'; confidence: number; // 1.0 for header match, 0.3-0.7 for heuristics }; ``` ### What to store per message For reliable thread reconstruction, persist these fields for every inbound and outbound message: | Field | Why | |-------|-----| | `message_id` | Your internal ID | | `provider_message_id` | The Message-ID assigned by the provider/MTA | | `in_reply_to` | The In-Reply-To header value | | `references_header` | The full References header (space-separated Message-IDs) | | `thread_id` | Your internal thread/conversation ID | | `from_email` | Sender address | | `to_email` | Recipient address | | `subject` | For fallback matching | | `created_at` | For time-window heuristics | --- ## Quoted reply detection and stripping When a reply arrives, the message body contains both the new content and quoted previous messages. Extracting just the new content is essential for AI agents, search indexing, and clean display. ### Why this is hard There is no standard for how email clients format quoted replies. Every client does it differently: | Client | Plain text format | HTML format | |--------|------------------|-------------| | Gmail | `> ` prefix per line | `
` wrapper | | Outlook | `> ` or full block | `
` or `