--- name: programmatic-seo description: > Programmatic page generation at scale using template-based SEO, data pipelines, and automated content production. Covers keyword pattern mining, template architecture, data sourcing, quality control, and indexation strategy for 100-100K+ page deployments. license: MIT + Commons Clause metadata: version: 1.0.0 author: borghei category: marketing-growth updated: 2026-03-31 tags: [seo, programmatic, templates, content-at-scale, data-driven-seo] --- # Programmatic SEO Production-grade framework for building SEO page sets at scale. Covers the full lifecycle from keyword pattern discovery through template design, data pipeline construction, quality assurance, and post-launch optimization. Designed for deployments ranging from 50 to 100,000+ pages. --- ## Table of Contents - [When to Use vs When Not To](#when-to-use-vs-when-not-to) - [Initial Assessment](#initial-assessment) - [The 14 Playbooks](#the-14-playbooks) - [Playbook Selection Matrix](#playbook-selection-matrix) - [Keyword Pattern Mining](#keyword-pattern-mining) - [Data Pipeline Architecture](#data-pipeline-architecture) - [Template Design System](#template-design-system) - [Quality Control Framework](#quality-control-framework) - [Internal Linking Architecture](#internal-linking-architecture) - [Indexation Strategy](#indexation-strategy) - [Launch Sequence](#launch-sequence) - [Post-Launch Optimization](#post-launch-optimization) - [Anti-Patterns and Penalty Avoidance](#anti-patterns-and-penalty-avoidance) - [Decision Matrix: Build vs Skip](#decision-matrix-build-vs-skip) - [Output Artifacts](#output-artifacts) - [Related Skills](#related-skills) --- ## When to Use vs When Not To **Use this skill when:** - You have a repeating keyword pattern with 50+ variations - You have (or can acquire) structured data to populate pages - The search intent is consistent across variations - Your domain has sufficient authority to compete **Do NOT use when:** - Each page requires unique editorial content (use content-creator instead) - Total addressable pages < 30 (manual content is more effective) - You lack a data source and would be generating thin placeholder content - Your domain authority is below DR 20 and competitors are DR 60+ --- ## Initial Assessment Before designing any pSEO strategy, answer these questions. Skip nothing. ### 1. Opportunity Validation | Question | Why It Matters | Red Flag | |----------|---------------|----------| | What is the repeating keyword pattern? | Defines the template structure | Pattern is vague or inconsistent | | What is the aggregate monthly search volume? | Determines ROI ceiling | < 5,000 aggregate monthly searches | | How many unique pages can you generate? | Scope the project | < 50 pages (too few) or > 50K without data infrastructure | | What does the SERP look like for sample queries? | Competitive feasibility | Page 1 dominated by DR 80+ editorial content | | Is intent informational, navigational, or transactional? | Template design | Mixed intent across the same pattern | ### 2. Data Source Evaluation Rate your data source on this scale: | Tier | Source Type | Defensibility | Example | |------|-----------|---------------|---------| | S | Proprietary first-party | Unbeatable | Your product usage data, internal benchmarks | | A | Product-derived | Strong | Aggregated user analytics, customer outcomes | | B | User-generated | Moderate | Community reviews, submitted content | | C | Licensed exclusive | Moderate | Paid data feed no competitor has | | D | Public aggregated | Weak | Government data, public APIs | | F | Scraped commodity | None | Wikipedia rewrites, copied listings | **Rule: Do not build pSEO on Tier F data.** Google penalizes commodity rewrites. If your only data source is public and easily replicable, invest in acquiring Tier A-C data first. ### 3. Competitive Moat Assessment For 5 sample queries in your pattern, analyze page 1 results: - What is the average Domain Rating of ranking pages? - Are existing results programmatic or editorial? - What unique data do ranking pages provide? - What is the content depth (word count, data richness, UX quality)? **Go/No-Go threshold:** If the average DR gap between you and page 1 is > 30 AND existing results have proprietary data, the opportunity requires either a differentiated approach or domain authority building first. --- ## The 14 Playbooks | # | Playbook | Pattern | Example | Data Requirement | |---|----------|---------|---------|-----------------| | 1 | Templates | "[Type] template" | "resume template", "invoice template" | Template files + metadata | | 2 | Curation | "best [category]" | "best CRM for startups" | Product/service reviews + ratings | | 3 | Conversions | "[X] to [Y]" | "100 USD to EUR" | Conversion logic/API | | 4 | Comparisons | "[X] vs [Y]" | "Notion vs Confluence" | Feature data for both products | | 5 | Examples | "[type] examples" | "landing page examples" | Curated example collection | | 6 | Locations | "[service] in [city]" | "coworking in Austin" | Location-specific data | | 7 | Personas | "[product] for [audience]" | "CRM for real estate" | Audience-specific use cases | | 8 | Integrations | "[A] + [B] integration" | "Slack Asana integration" | Integration documentation | | 9 | Glossary | "what is [term]" | "what is churn rate" | Domain expertise | | 10 | Translations | Content in N languages | Localized guides | Translation + localization data | | 11 | Directory | "[category] tools" | "AI writing tools" | Tool listings + evaluations | | 12 | Profiles | "[entity name]" | "Stripe company profile" | Entity-level data | | 13 | Statistics | "[topic] statistics" | "SaaS churn statistics 2026" | Verified statistical data | | 14 | Calculators | "[topic] calculator" | "LTV calculator" | Calculation logic + inputs | --- ## Playbook Selection Matrix | If you have... | Primary Playbook | Secondary Layer | |----------------|-----------------|-----------------| | A product with many integrations | Integrations | Comparisons | | A design/creative tool | Templates + Examples | Personas | | A multi-segment audience | Personas | Comparisons | | Local/regional presence | Locations | Directory | | A tool/utility product | Calculators + Conversions | Glossary | | Deep domain expertise | Glossary + Statistics | Curation | | A competitor landscape to exploit | Comparisons + Curation | Directory | | User-generated content | Examples + Directory | Profiles | **Layering rule:** Combine up to 2 playbooks per page set. Example: "Best coworking spaces in [city]" = Curation + Locations. --- ## Keyword Pattern Mining ### Step 1: Pattern Identification Extract the repeating structure from seed keywords: ``` Seed: "react developer salary san francisco" Pattern: [role] salary [city] Variables: role (200+ options), city (500+ options) Max pages: 200 x 500 = 100,000 ``` ### Step 2: Volume Distribution Analysis Not all variable combinations have search volume. Map the distribution: | Tier | Volume Range | Typical % of Total Pages | Strategy | |------|-------------|-------------------------|----------| | Head | 1,000+ monthly | 2-5% | Priority indexation, highest content quality | | Torso | 100-999 monthly | 15-25% | Standard template, full deployment | | Long-tail | 10-99 monthly | 40-50% | Template with conditional content blocks | | Zero-volume | < 10 monthly | 20-40% | Noindex OR skip unless data is uniquely valuable | ### Step 3: Intent Classification For each pattern, verify intent consistency: | Intent Type | Template Implications | CTA Strategy | |------------|----------------------|--------------| | Informational | Data-heavy, educational content | Newsletter, related content | | Commercial investigation | Comparison tables, pros/cons | Free trial, demo | | Transactional | Pricing, availability, features | Buy now, sign up | | Navigational | Brand-specific, direct answer | Product page link | --- ## Data Pipeline Architecture ### Pipeline Design ``` [Data Source] → [Extraction] → [Transformation] → [Enrichment] → [Validation] → [Template Population] → [Quality Check] → [Publish] ``` ### Data Quality Gates Every record must pass these gates before page generation: | Gate | Check | Failure Action | |------|-------|---------------| | Completeness | All required fields populated | Skip page, log for manual review | | Accuracy | Data matches source, no staleness > 90 days | Flag for refresh | | Uniqueness | No duplicate records | Merge or deduplicate | | Minimum richness | Page will have > 300 words of unique content | Skip or enrich | | Legal compliance | Data usage rights verified | Block publication | ### Update Cadence | Data Type | Recommended Update Frequency | Staleness Penalty | |-----------|------------------------------|-------------------| | Pricing data | Weekly | High (users notice immediately) | | Company/product data | Monthly | Medium | | Statistical data | Quarterly | Low if year-tagged | | Glossary/educational | Semi-annually | Very low | | Location data | Monthly | Medium (closures, address changes) | --- ## Template Design System ### Page Architecture Every programmatic page must have these zones: ``` ┌─────────────────────────────────────┐ │ Zone 1: Unique Header │ H1 with target keyword, unique intro paragraph ├─────────────────────────────────────┤ │ Zone 2: Primary Data Section │ The core data/content for this specific page ├─────────────────────────────────────┤ │ Zone 3: Contextual Analysis │ Insights, comparisons, trends specific to this entity ├─────────────────────────────────────┤ │ Zone 4: Related Data │ Adjacent data points that add depth ├─────────────────────────────────────┤ │ Zone 5: Internal Navigation │ Related pages, breadcrumbs, category links ├─────────────────────────────────────┤ │ Zone 6: CTA │ Conversion element matched to intent └─────────────────────────────────────┘ ``` ### Uniqueness Requirements Each page MUST have at least 3 of these 5 uniqueness sources: 1. **Unique data points** -- Numbers, facts, or attributes specific to this entity 2. **Conditional content blocks** -- Sections that appear/disappear based on data attributes 3. **Calculated insights** -- Derived metrics (percentages, comparisons, rankings) 4. **Contextual recommendations** -- "If X, then Y" advice blocks based on the data 5. **User-generated content** -- Reviews, comments, or community contributions ### URL Structure **Always use subfolders.** Never subdomains for pSEO. | Pattern | URL Template | Example | |---------|-------------|---------| | Location | `/[service]/[city]/` | `/coworking/austin/` | | Comparison | `/compare/[a]-vs-[b]/` | `/compare/notion-vs-confluence/` | | Integration | `/integrations/[partner]/` | `/integrations/slack/` | | Glossary | `/glossary/[term]/` | `/glossary/churn-rate/` | | Persona | `/[product]-for-[audience]/` | `/crm-for-real-estate/` | --- ## Quality Control Framework ### Pre-Publication QA Checklist **Content Quality:** - [ ] Each page has > 300 words of unique content (not counting shared template elements) - [ ] H1 is unique and contains the target keyword - [ ] Meta title is unique (< 60 chars) and meta description is unique (< 155 chars) - [ ] No broken data references (empty fields rendered as "N/A" or blank) - [ ] At least 2 conditional content blocks triggered per page - [ ] No duplicate pages targeting the same keyword **Technical SEO:** - [ ] Canonical tag points to self - [ ] Hreflang tags if multilingual - [ ] Schema markup renders without errors - [ ] Page loads in < 3 seconds - [ ] Mobile responsive **Internal Linking:** - [ ] Breadcrumb trail is complete - [ ] 3-5 related pages linked contextually - [ ] Hub page links to this page - [ ] No orphan pages in the set ### Thin Content Detection Run this check against every generated page: | Signal | Threshold | Action | |--------|-----------|--------| | Unique word count | < 200 unique words | Block publication | | Content similarity to another page in set | > 80% Jaccard similarity | Merge or differentiate | | Data fields populated | < 60% of template fields | Skip or enrich | | User time-on-page (post-launch) | < 15 seconds average | Review and improve | | Bounce rate (post-launch) | > 85% | Review intent match | --- ## Internal Linking Architecture ### Hub-and-Spoke Model ``` ┌─────────┐ │ HUB │ /coworking/ │ PAGE │ (ranks for "coworking spaces") └────┬────┘ ┌──────────────┼──────────────┐ ┌────┴────┐ ┌────┴────┐ ┌────┴────┐ │ SPOKE 1 │ │ SPOKE 2 │ │ SPOKE 3 │ │ /austin/│ │ /denver/│ │ /seattle/│ └────┬────┘ └────┬────┘ └────┬────┘ │ │ │ Cross-links between related spokes ``` **Linking rules:** - Hub links DOWN to every spoke (or top 50 spokes if > 200 pages) - Every spoke links UP to the hub - Spokes link ACROSS to 3-5 related spokes (geographic proximity, thematic similarity) - Deep pages link UP to their spoke AND the hub - Cross-silo links only when contextually genuine ### Pagination for Large Sets If a hub page has > 50 spokes, implement paginated sub-hubs: ``` /coworking/ → Top cities + browse by state /coworking/california/ → All California cities /coworking/california/page/2/ → Paginated if > 25 cities ``` --- ## Indexation Strategy ### Crawl Budget Management | Page Set Size | Strategy | |--------------|----------| | < 500 pages | Single XML sitemap, submit all | | 500-5,000 | Segmented sitemaps by category | | 5,000-50,000 | Segmented sitemaps + priority scoring + IndexNow | | 50,000+ | Programmatic sitemap generation + crawl budget monitoring + strategic noindex | ### Indexation Priority | Priority | Pages | Action | |----------|-------|--------| | P0 | Hub pages | Submit immediately, internal link from homepage | | P1 | Head-volume spokes (top 10%) | Submit in first sitemap batch | | P2 | Torso-volume spokes | Submit in second batch, 1-2 weeks later | | P3 | Long-tail spokes | Submit gradually over 4-6 weeks | | P4 | Zero-volume pages | Noindex unless data is uniquely valuable | ### IndexNow Integration For large-scale updates, use IndexNow to notify search engines immediately: ``` POST https://api.indexnow.org/indexnow { "host": "yoursite.com", "key": "your-api-key", "urlList": ["https://yoursite.com/page1", "https://yoursite.com/page2"] } ``` --- ## Launch Sequence ### Phase 1: Pilot (Week 1-2) - Deploy 20-50 pages from head-volume tier - Submit sitemap with pilot pages only - Monitor indexation rate daily - Check for crawl errors in Search Console ### Phase 2: Scale (Week 3-6) - Deploy remaining torso-volume pages in batches of 100-500 - Add cross-links between deployed pages - Monitor thin content warnings - Track impressions in Search Console ### Phase 3: Long-Tail (Week 7-12) - Deploy long-tail pages - Noindex zero-volume pages (keep them crawlable but not indexed) - Begin link acquisition outreach for hub pages ### Phase 4: Optimization (Ongoing) - A/B test template variations on head-volume pages - Refresh stale data quarterly - Add conditional content blocks based on engagement data - Monitor for keyword cannibalization across the set --- ## Post-Launch Optimization ### Metrics Dashboard | Metric | Frequency | Target | |--------|-----------|--------| | Indexation rate | Weekly | > 90% of submitted pages indexed within 60 days | | Organic impressions | Weekly | Trending up month-over-month | | Average position (by tier) | Bi-weekly | Head pages: top 10; Torso: top 30 | | Click-through rate | Monthly | > 3% for head pages | | Bounce rate | Monthly | < 70% | | Conversion rate | Monthly | > 1% for transactional intent | | Pages per session | Monthly | > 1.5 | ### Optimization Playbook | Signal | Diagnosis | Action | |--------|-----------|--------| | Indexed but not ranking | Content quality or authority gap | Enrich content, build links to hub | | Ranking but low CTR | Title/description not compelling | A/B test meta titles | | Ranking but high bounce | Intent mismatch or thin content | Audit against search intent, add data | | Deindexed after initial indexing | Thin content penalty | Improve uniqueness, reduce similarity | | Crawled but not indexed | Quality threshold not met | Add more unique content per page | --- ## Anti-Patterns and Penalty Avoidance | Anti-Pattern | Why It Fails | Prevention | |-------------|-------------|------------| | City-name swapping | Same content + different city = doorway page penalty | Each location page needs unique local data | | Keyword stuffing in templates | Unnatural density triggers spam filters | Keep keyword density 1-2%, write naturally | | Generating pages for zero-demand queries | Wastes crawl budget, signals low quality | Validate demand before generating | | No internal links to pSEO pages | Orphan pages get deprioritized | Connect every page to the hub-spoke structure | | Stale data never refreshed | Users lose trust, Google notices | Set update cadence per data type | | All pages identical structure | Lack of variation signals automation to Google | Use 3-5 template variants | --- ## Decision Matrix: Build vs Skip Score each dimension 1-5, then apply the threshold. | Dimension | Weight | 1 (Skip) | 5 (Build) | |-----------|--------|----------|-----------| | Search demand | 30% | < 1K aggregate monthly | > 50K aggregate monthly | | Data quality | 25% | Public/scraped, easily replicated | Proprietary, defensible | | Competitive gap | 20% | DR gap > 40, strong incumbents | DR gap < 15, weak/no incumbents | | Template feasibility | 15% | Each page needs unique editorial | Clean template fits all variations | | Business alignment | 10% | No conversion path from these pages | Direct path to core product | **Scoring guide:** - 4.0+ weighted average: Build immediately - 3.0-3.9: Build if resources allow, validate with pilot first - 2.0-2.9: Invest in data quality or authority first - < 2.0: Do not build --- ## Output Artifacts | Artifact | Format | Description | |----------|--------|-------------| | Opportunity Analysis | Markdown table | Keyword patterns x volume x data source x difficulty x business alignment | | Playbook Recommendation | Decision matrix | If/then mapping with rationale and real-world examples | | Page Template Specification | Annotated wireframe (markdown) | URL pattern, zone structure, uniqueness sources, conditional logic | | Data Pipeline Spec | Flow diagram (text) | Source > extraction > transformation > validation > publication | | Quality Scorecard | Checklist + thresholds | Pre-publication QA gates with pass/fail criteria | | Indexation Plan | Phased timeline | Priority tiers, sitemap structure, crawl budget allocation | | Post-Launch Dashboard | Metric table | KPIs, targets, review cadence, optimization triggers | --- ## Related Skills - **seo-audit** -- Run after pSEO pages are live to diagnose indexation issues, thin content warnings, or ranking problems across the page set. - **schema-markup** -- Add structured data to pSEO templates (Product, FAQ, LocalBusiness) for rich snippet eligibility at scale. - **site-architecture** -- Plan hub-and-spoke structure and crawl budget management for large pSEO deployments (500+ pages). - **competitor-alternatives** -- Use the Comparisons playbook when building "[X] vs [Y]" pages; competitor-alternatives has dedicated comparison page frameworks. - **content-creator** -- Use when individual pages in the set need editorial-quality unique content beyond template generation. --- ## Troubleshooting | Problem | Likely Cause | Fix | |---------|-------------|-----| | Google deindexed 90%+ of pSEO pages | Thin content — pages have insufficient unique content (<300 words) or >80% similarity | Increase unique content per page to 500+ words; ensure 30-40% differentiation between pages | | Pages indexed but getting zero traffic | Pages target zero-volume keywords or content does not match search intent | Validate demand before generating; noindex zero-volume pages; verify intent alignment | | "Doorway pages" manual action in GSC | Template pages with only variable substitution (city name swap) and no unique value | Add genuinely unique data per page — local stats, specific recommendations, conditional content blocks | | Hub page ranks but spokes do not | Spokes missing inbound internal links or hub not linking down to spokes | Verify bidirectional hub-spoke linking; add contextual cross-links between related spokes | | Crawl budget exhausted before all pages indexed | Too many pages submitted at once or low-value pages consuming crawl resources | Phase deployment in batches of 100-500; use tiered indexation with strategic noindex | | Content similarity too high across page set | Template lacks conditional content blocks; only variable substitution used | Add 3-5 conditional content sections per template that change based on data attributes | | AI content detection flagging pSEO pages | Over-reliance on AI generation without human editorial review | Use AI for data enrichment only, not full content generation; sample 5-10% for quality review | --- ## Success Criteria - **Indexation rate**: 90%+ of submitted pages indexed within 60 days of deployment - **Content uniqueness**: Every page has 500+ unique words with <40% similarity to any other page in the set (2026 Google threshold) - **Head keyword rankings**: Top 10% of pages (by volume) ranking in top 30 within 90 days - **Organic traffic growth**: Page set generating measurable organic traffic within 60 days of full deployment - **Thin content rate**: Zero pages flagged as thin content in Google Search Console - **Bounce rate**: Below 70% average across the page set (indicating intent match) - **Conversion rate**: 1%+ for transactional intent pages, measurable lead capture for informational pages --- ## Scope & Limitations **In scope:** - Keyword pattern mining and volume distribution analysis - Data pipeline design (source > extraction > transformation > validation > publication) - Template architecture with uniqueness requirements - Quality control frameworks including thin content detection - Hub-and-spoke internal linking for pSEO page sets - Phased indexation strategy and crawl budget management - Post-launch optimization and monitoring dashboards **Out of scope:** - Individual editorial content creation (use Content Production) - Data collection or web scraping implementation - CMS or static site generator setup and configuration - Server infrastructure for large-scale deployments - Paid acquisition for pSEO pages - Legal compliance for data usage rights **Known limitations:** - Google's 2026 helpful content system can deindex large page sets retroactively if quality drops below threshold - Programmatic SEO at Tier F data (public/scraped) carries high penalty risk regardless of template quality - Engagement metrics (bounce rate, time on page) now influence indexation decisions for pSEO pages - AI content detection is improving — fully automated content generation without human oversight is increasingly risky - Travel site case study: 50,000 city-swap pages had 98% deindexed within 3 months (per 2025 industry data) --- ## Scripts ```bash # Analyze keyword patterns for pSEO opportunities python scripts/keyword_pattern_miner.py --keywords keywords.csv --json # Score page templates for content quality and uniqueness python scripts/template_scorer.py --template template.html --data sample_data.json # Validate data quality for pSEO data pipeline python scripts/data_validator.py --file data.csv --rules rules.json --json ```