--- name: seo-programmatic description: "Plan and audit programmatic SEO pages generated at scale from structured data. Use when designing templates, URL systems, internal linking, quality gates, and index-bloat safeguards for pages at scale." risk: unknown source: "https://github.com/AgriciDaniel/claude-seo" date_added: "2026-03-21" user-invokable: true argument-hint: "[url or plan]" allowed-tools: - Read - Grep - Glob - Bash - WebFetch - Write --- # Programmatic SEO Analysis & Planning Build and audit SEO pages generated at scale from structured data sources. Enforces quality gates to prevent thin content penalties and index bloat. ## When to Use - Use when the user wants programmatic SEO planning or review. - Use when designing templates, data-driven pages, or scalable URL systems. - Use when preventing thin content and index bloat across large page sets. ## Data Source Assessment Evaluate the data powering programmatic pages: - **CSV/JSON files**: Row count, column uniqueness, missing values - **API endpoints**: Response structure, data freshness, rate limits - **Database queries**: Record count, field completeness, update frequency - Data quality checks: - Each record must have enough unique attributes to generate distinct content - Flag duplicate or near-duplicate records (>80% field overlap) - Verify data freshness; stale data produces stale pages ## Template Engine Planning Design templates that produce unique, valuable pages: - **Variable injection points**: Title, H1, body sections, meta description, schema - **Content blocks**: Static (shared across pages) vs dynamic (unique per page) - **Conditional logic**: Show/hide sections based on data availability - **Supplementary content**: Related items, contextual tips, user-generated content - Template review checklist: - Each page must read as a standalone, valuable resource - No "mad-libs" patterns (just swapping city/product names in identical text) - Dynamic sections must add genuine information, not just keyword variations ## URL Pattern Strategy ### Common Patterns - `/tools/[tool-name]`: Tool/product directory pages - `/[city]/[service]`: Location + service pages - `/integrations/[platform]`: Integration landing pages - `/glossary/[term]`: Definition/reference pages - `/templates/[template-name]`: Downloadable template pages ### URL Rules - Lowercase, hyphenated slugs derived from data - Logical hierarchy reflecting site architecture - No duplicate slugs; enforce uniqueness at generation time - Keep URLs under 100 characters - No query parameters for primary content URLs - Consistent trailing slash usage (match existing site pattern) ## Internal Linking Automation - **Hub/spoke model**: Category hub pages linking to individual programmatic pages - **Related items**: Auto-link to 3-5 related pages based on data attributes - **Breadcrumbs**: Generate BreadcrumbList schema from URL hierarchy - **Cross-linking**: Link between programmatic pages sharing attributes (same category, same city, same feature) - **Anchor text**: Use descriptive, varied anchor text. Avoid exact-match keyword repetition - Link density: 3-5 internal links per 1000 words (match seo-content guidelines) ## Thin Content Safeguards ### Quality Gates | Metric | Threshold | Action | |--------|-----------|--------| | Pages without content review | 100+ | ⚠️ WARNING: require content audit before publishing | | Pages without justification | 500+ | 🛑 HARD STOP: require explicit user approval and thin content audit | | Unique content per page | <40% | ❌ Flag as thin content (likely penalty risk) | | Word count per page | <300 | ⚠️ Flag for review (may lack sufficient value) | ### Scaled Content Abuse: Enforcement Context (2025-2026) Google's Scaled Content Abuse policy (introduced March 2024) saw major enforcement escalation in 2025: - **June 2025:** Wave of manual actions targeting websites with AI-generated content at scale - **August 2025:** SpamBrain spam update enhanced pattern detection for AI-generated link schemes and content farms - **Result:** Google reported 45% reduction in low-quality, unoriginal content in search results post-March 2024 enforcement **Enhanced quality gates for programmatic pages:** - **Content differentiation:** ≥30-40% of content must be genuinely unique between any two programmatic pages (not just city/keyword string replacement) - **Human review:** Minimum 5-10% sample review of generated pages before publishing - **Progressive rollout:** Publish in batches of 50-100 pages. Monitor indexing and rankings for 2-4 weeks before expanding. Never publish 500+ programmatic pages simultaneously without explicit quality review. - **Standalone value test:** Each page should pass: "Would this page be worth publishing even if no other similar pages existed?" - **Site reputation abuse:** If publishing programmatic content under a high-authority domain (not your own), this may trigger site reputation abuse penalties. Google began enforcing this aggressively in November 2024. > **Recommendation:** The WARNING gate at `<40% unique content` remains appropriate. Consider a HARD STOP at `<30%` unique content to prevent scaled content abuse risk. ### Safe Programmatic Pages (OK at scale) ✅ Integration pages (with real setup docs, API details, screenshots) ✅ Template/tool pages (with downloadable content, usage instructions) ✅ Glossary pages (200+ word definitions with examples, related terms) ✅ Product pages (unique specs, reviews, comparison data) ✅ Data-driven pages (unique statistics, charts, analysis per record) ### Penalty Risk (avoid at scale) ❌ Location pages with only city name swapped in identical text ❌ "Best [tool] for [industry]" without industry-specific value ❌ "[Competitor] alternative" without real comparison data ❌ AI-generated pages without human review and unique value-add ❌ Pages where >60% of content is shared template boilerplate ### Uniqueness Calculation Unique content % = (words unique to this page) / (total words on page) × 100 Measure against all other pages in the programmatic set. Shared headers, footers, and navigation are excluded from the calculation. Template boilerplate text IS included. ## Canonical Strategy - Every programmatic page must have a self-referencing canonical tag - Parameter variations (sort, filter, pagination) canonical to the base URL - Paginated series: canonical to page 1 or use rel=next/prev - If programmatic pages overlap with manual pages, the manual page is canonical - No canonical to a different domain unless intentional cross-domain setup ## Sitemap Integration - Auto-generate sitemap entries for all programmatic pages - Split at 50,000 URLs per sitemap file (protocol limit) - Use sitemap index if multiple sitemap files needed - `` reflects actual data update timestamp (not generation time) - Exclude noindexed programmatic pages from sitemap - Register sitemap in robots.txt - Update sitemap dynamically as new records are added to data source ## Index Bloat Prevention - **Noindex low-value pages**: Pages that don't meet quality gates - **Pagination**: Noindex paginated results beyond page 1 (or use rel=next/prev) - **Faceted navigation**: Noindex filtered views, canonical to base category - **Crawl budget**: For sites with >10k programmatic pages, monitor crawl stats in Search Console - **Thin page consolidation**: Merge records with insufficient data into aggregated pages - **Regular audits**: Monthly review of indexed page count vs intended count ## Output ### Programmatic SEO Score: XX/100 ### Assessment Summary | Category | Status | Score | |----------|--------|-------| | Data Quality | ✅/⚠️/❌ | XX/100 | | Template Uniqueness | ✅/⚠️/❌ | XX/100 | | URL Structure | ✅/⚠️/❌ | XX/100 | | Internal Linking | ✅/⚠️/❌ | XX/100 | | Thin Content Risk | ✅/⚠️/❌ | XX/100 | | Index Management | ✅/⚠️/❌ | XX/100 | ### Critical Issues (fix immediately) ### High Priority (fix within 1 week) ### Medium Priority (fix within 1 month) ### Low Priority (backlog) ### Recommendations - Data source improvements - Template modifications - URL pattern adjustments - Quality gate compliance actions ## Error Handling | Scenario | Action | |----------|--------| | URL unreachable | Report connection error with status code. Suggest verifying URL accessibility and checking for authentication requirements. | | No programmatic pages detected | Inform user that no template-generated or data-driven page patterns were found. Suggest checking if pages use client-side rendering or if the URL points to the correct section. | | Thin content threshold exceeded | Trigger quality gate warning. Report the unique content percentage and flag pages below 40% uniqueness. Require user acknowledgment before proceeding. | | Quality gate violation | Halt analysis at the HARD STOP threshold (500+ pages without justification or <30% unique content). Present findings and require explicit user approval to continue. | ## Limitations - Use this skill only when the task clearly matches the scope described above. - Do not treat the output as a substitute for environment-specific validation, testing, or expert review. - Stop and ask for clarification if required inputs, permissions, safety boundaries, or success criteria are missing.