--- name: spam-filter-avoidance description: Avoid triggering spam filters with your email content. Use when emails land in spam, auditing content for filter triggers, checking link patterns, or optimizing HTML structure. license: MIT --- # Spam Filter Avoidance Understand how spam filters evaluate your email and write content that passes cleanly - without tricks. ## When to use this skill - Writing email content and want to avoid common spam filter triggers - Emails are landing in spam despite good authentication and reputation - Building HTML email templates and need to know what patterns get flagged - Generating email content programmatically (AI agents, templates with dynamic variables) - Diagnosing why a specific email was filtered when everything else looks correct - Reviewing outbound email for content-level deliverability risks ## Related skills - `inbox-placement` - the full picture of what determines inbox vs spam (reputation, engagement, authentication, and content) - `domain-authentication` - SPF, DKIM, DMARC setup (filters check authentication before content) - `sender-reputation` - reputation signals that outweigh content in filter decisions - `template-design` - building HTML emails that render correctly and avoid structural triggers - `email-compliance` - legal requirements like unsubscribe links that also affect filtering --- ## How spam filters actually work Modern spam filters are not keyword blocklists. They are multi-signal classifiers that evaluate messages across several dimensions simultaneously. Understanding the architecture matters because it tells you what you can and cannot control at the content level. ### The filtering pipeline When your email arrives at a mailbox provider, it passes through these stages in order: 1. **Connection-level checks** - IP reputation, DNS blocklists, TLS, rate limiting. Bad senders get rejected here before the message is even read. 2. **Authentication checks** - SPF, DKIM, DMARC. Failures add negative weight or cause outright rejection (Gmail rejects unauthenticated mail from bulk senders as of November 2025). 3. **Reputation scoring** - Domain history, complaint rates, bounce rates, past engagement. This is the heaviest signal. 4. **Content analysis** - The message itself: headers, subject, body text, HTML structure, links, attachments. This is where the patterns in this skill apply. 5. **Engagement prediction** - ML models predict whether this specific recipient will engage with this specific message, based on past behavior with this sender. 6. **Final disposition** - Inbox, spam, or category tab (Promotions, Other, etc.). Content analysis is stage 4 of 6. By the time a filter evaluates your content, it has already formed an opinion based on your reputation and authentication. This is why the same content can land in inbox from one sender and spam from another. ### The ML reality Gmail processes over 15 billion unwanted messages daily using ML models trained on billions of user interactions. These models evaluate: - **Semantic meaning** - NLP models interpret context and tone, not just keywords - **Structural patterns** - HTML complexity, text-to-image ratio, link density - **Behavioral correlation** - how recipients with similar profiles reacted to similar messages - **Temporal patterns** - unusual send times, volume spikes, sudden content changes SpamAssassin (still widely used by corporate mail servers, ISPs, and hosting providers) takes a different approach: rule-based scoring where each matched rule adds points toward a threshold (default 5.0). This means specific patterns have specific, predictable scores. The practical consequence: you need to satisfy both ML classifiers (Gmail, Outlook) and rule-based systems (SpamAssassin, enterprise filters). ML classifiers are harder to game but more forgiving of individual signals. Rule-based systems are predictable but unforgiving when you trip multiple rules. --- ## Subject line patterns The subject line gets disproportionate attention from filters because spammers rely heavily on urgency and deception in subjects. ### What triggers filters **ALL CAPS subjects.** SpamAssassin rule `SUBJ_ALL_CAPS` adds 1.5+ points. Gmail's classifier also treats all-caps as a negative signal. A subject like `LIMITED TIME OFFER` trips both systems. **Excessive punctuation.** Multiple exclamation marks (`!!!`), question marks (`???`), or dollar signs (`$$$`) are classic spam signals. SpamAssassin has specific rules for these. One exclamation mark is fine. Three is a flag. **Spam trigger phrases in subjects.** These phrases in subject lines carry more weight than the same phrases in the body: | Category | Examples | Why they trigger | |----------|----------|-----------------| | Urgency | "Act now", "Limited time", "Urgent", "Expires today" | Pressure tactics are the most common spam pattern | | Financial | "Free money", "No obligation", "Guaranteed", "Double your income" | Financial fraud is the #1 spam category | | Deceptive | "Re:", "Fwd:" (on non-replies), "You've been selected" | Fake threading and fake personalization | | Medical | "Lose weight", "Miracle cure", "No prescription" | Pharmaceutical spam is heavily targeted | **Misleading Re:/Fwd: prefixes.** Adding `Re:` to a subject that isn't a reply trips the `FAKE_REPLY` rule in SpamAssassin and is actively penalized by Gmail. Same for `Fwd:` on messages that aren't forwards. Filters check Message-ID, In-Reply-To, and References headers to verify threading. ### What's actually safe - Normal capitalization and punctuation - Specific, descriptive subjects ("Your invoice for March" not "IMPORTANT DOCUMENT ENCLOSED") - Personalization with real data (recipient name, company name) - but not fake personalization like "Hi {{first_name}}" with an unfilled variable - Emojis in moderation - one emoji is fine, five is a flag --- ## Body content patterns ### Spam phrases in context The phrases listed below are not absolute blocklist words. A sender with strong reputation can use "free" or "guaranteed" without consequence. But these phrases add negative weight, and when combined with other signals (new domain, low engagement, poor HTML), they push the score over the threshold. **High-risk phrases** (carry the most weight across both ML and rule-based systems): - "Act now" / "Buy now" / "Order now" - "Click here" (as the sole anchor text for a link) - "Free money" / "No cost" / "Risk-free" - "Winner" / "Congratulations" / "You've won" - "No obligation" / "No strings attached" - "Guaranteed" / "100% satisfied" - "Double your income" / "Earn extra cash" **Medium-risk phrases** (contribute to score but rarely trigger alone): - "Limited time offer" - "Exclusive deal" - "Don't miss out" - "Special promotion" - "While supplies last" **The real rule:** Density matters more than individual words. One instance of "free" in a 500-word email is noise. Five instances of pressure phrases in a 100-word email is spam. Filters evaluate the ratio of promotional language to total content. ### Invisible text and encoding tricks Filters specifically detect attempts to hide content or fool classifiers: **Zero-width characters.** Inserting Unicode zero-width spaces (U+200B), zero-width joiners (U+200D), byte order marks (U+FEFF), or soft hyphens (U+00AD) between letters to break up spam words (like "V\u200Biagra") is an old trick that every modern filter detects. These characters are actively flagged and their presence alone is a spam signal. **Invisible text.** White text on white background, font-size:0, display:none, or visibility:hidden content is detected by both SpamAssassin (`HIDDEN_TEXT` rules) and Gmail. Spammers use this to inject "good" text (like news articles) that the recipient can't see but the classifier reads, trying to dilute the spam score. Filters now treat hidden text as a strong negative signal. **HTML comment stuffing.** Adding legitimate-looking text inside HTML comments (``) to influence classifiers. Detected and penalized. **Character substitution.** Using Cyrillic characters that look like Latin (e.g., Cyrillic "a" instead of Latin "a") or HTML entities (`&#V;iagra`) to bypass text matching. Modern filters normalize text before evaluation. ### Text-to-code ratio The ratio of visible text to HTML markup matters. An email that is mostly HTML tags with very little readable text looks like it's trying to hide something. Aim for substantial readable text in every email. --- ## Link patterns Links are the most scrutinized element in email content because they are the primary mechanism for phishing and malware delivery. ### URL shorteners Do not use URL shorteners (bit.ly, tinyurl.com, t.co, etc.) in email. They are heavily penalized because: 1. They obscure the destination URL, which is the primary phishing vector 2. Spammers use them to evade URL blocklist checks 3. If another sender using the same shortener service gets blocked, your emails using that service may be blocked too - guilt by shared domain 4. SpamAssassin has specific rules for known shortener domains (scored 2-4 points) Use your own domain for all links. If you need click tracking, use a subdomain you own (e.g., `track.example.com/click/...`) with proper HTTPS. ### Link density Too many links signal promotional or phishing email: - **0-3 links** - normal for transactional and personal email - **4-7 links** - acceptable for newsletters with good text-to-link ratio - **8+ links** - starts triggering density rules, especially if links point to different domains - **20+ links** - almost certainly flagged SpamAssassin scores increase progressively with link count. The `LOTS_OF_MONEY` and `URI_COUNT` family of rules fire at various thresholds. ### Mismatched anchor text When the visible text of a link is a URL that doesn't match the actual href, filters treat this as phishing: ```html https://www.yourbank.com/login Click here View pricing details https://example.com ``` Gmail specifically checks for URL-as-anchor-text mismatches and flags them as potential phishing. ### URL blocklists Every link in your email is checked against real-time URL blocklists (URIBL, SURBL, Google Safe Browsing). SpamAssassin's URIBL rules carry high scores (1.5-3.6 points each). If any domain in your email appears on these lists, the entire message is penalized. This means: - Don't link to domains you don't control unless you trust them - Don't use third-party redirect services - Monitor your own domains on blocklists (MXToolbox, multirbl.valli.org) - If you link to user-generated content, validate URLs before including them ### HTTP vs HTTPS All links should use HTTPS. SpamAssassin has rules for HTTP links in email (`HTTP_IN_EMAIL`), and Gmail treats HTTP links as a minor negative signal. More importantly, some enterprise filters block HTTP links outright as a security policy. --- ## HTML structure The way your HTML email is constructed tells filters a lot about whether you're a legitimate sender. ### Text-to-image ratio The widely cited guideline is 60:40 text-to-image ratio (by area). The practical rules: - **Minimum 400-500 characters of visible text.** Below this, filters suspect your content is hidden in images. - **Never send image-only emails.** An email that is one large image with no text is a strong spam signal. Filters can't read text in images, so they treat image-only messages as potentially hiding content. - **Alt text on every image.** Besides accessibility, alt text provides text content that helps your text-to-image ratio when images are blocked (which is the default in many email clients on first view). SpamAssassin's `HTML_IMAGE_RATIO_02` rule fires when text-to-image ratio is below 20%. The rule itself has a low score, but it compounds with other signals. ### HTML quality Broken, malformed, or unnecessarily complex HTML is a spam signal: - **Missing closing tags** - sloppy HTML suggests auto-generated spam - **Excessive nested tables** - some depth is needed for email layout, but extreme nesting (10+ levels) is a flag - **Non-standard tags** - ``, ``, ``, ``, `
` tags are stripped by email clients and flagged by filters - **JavaScript** - `