# Agent-Friendly Documentation Spec | | | |--------------|--------------------------------------------------------------| | **Status** | Draft | | **Version** | 0.3.0 | | **Date** | 2026-03-31 | | **Author** | Dachary Carey + community contributors | | **URL** | https://agentdocsspec.com | | **Repository** | https://github.com/agent-ecosystem/agent-docs-spec | | **Reference Implementation** | [`afdocs`](https://afdocs.dev) · [npm](https://www.npmjs.com/package/afdocs) · [GitHub](https://github.com/agent-ecosystem/afdocs) | ## Abstract Documentation sites are increasingly consumed by coding agents rather than human readers, but most sites are not built for this access pattern. Agents hit truncation limits, get walls of CSS instead of content, can't follow cross-host redirects, and don't know about emerging discovery mechanisms like `llms.txt`. This spec defines 22 checks across 7 categories that evaluate how well a documentation site serves agent consumers. It is grounded in empirical observation of real agent workflows and is intended as a shared standard for documentation teams, tool builders, and platform providers. ## Scope This spec targets **coding agents that fetch documentation during real-time development workflows.** These are tools like Claude Code, Cursor, GitHub Copilot, and similar IDE-integrated or CLI-based agents that a developer uses while writing code. The agent fetches a docs page, extracts information, and uses it to complete a task, all in a single session. This spec does **not** target: - **Training crawlers** (GPTBot, ClaudeBot, etc.) that scrape content for model training. These have different access patterns, different user-agents, and different concerns. See [Appendix B](#appendix-b-notable-exclusions). - **Answer engines** (Perplexity, Google AI Overviews, ChatGPT search) that retrieve content to generate responses to user queries. These systems have their own retrieval pipelines that may or may not resemble the web fetch pipelines described here. - **RAG pipelines** that pre-index documentation into vector stores. These ingest content at build time, not at query time, so truncation limits and real-time fetch behavior are less relevant. The findings and checks in this spec are grounded in empirical observation of coding agents. Some recommendations (like providing `llms.txt` and serving markdown) will benefit other consumers too, but the pass/warn/fail criteria are calibrated for the coding agent use case. ## Background Agents don't use docs like humans. They retrieve URLs from training data rather than navigating table-of-contents structures. They struggle with HTML-heavy pages, silently lose content to truncation, and don't know about emerging standards like `llms.txt` unless explicitly told. These checks codify the patterns that empirically help or hinder agent access to documentation content. ## Terminology - **Agent**: An LLM operating in an agentic coding workflow (e.g., Claude Code, Cursor, Copilot) that fetches and consumes documentation as part of a development task. See [Scope](#scope) for what this spec does and does not cover. - **Web fetch pipeline**: The chain of processing between "agent requests a URL" and "model sees content." Typically involves HTTP fetch, HTML-to-markdown conversion, truncation, and sometimes a summarization model. - **Trusted site**: A domain hardcoded into an agent platform's web fetch implementation that receives more favorable processing (e.g., bypassing summarization). - **Truncation**: The silent removal of content that exceeds a platform's size limit. The agent receives partial content with no indication that anything was cut. See [Appendix A](#appendix-a-known-platform-truncation-limits) for known limits by platform. ## Conventions This spec uses the following language to distinguish between requirements and recommendations: - **Must** / **Required**: The item is an absolute requirement of the spec. Used sparingly; most checks in this spec are recommendations rather than hard requirements, because agent-friendliness is a spectrum. - **Should** / **Recommended**: The item is a strong recommendation. There may be valid reasons to deviate, but the implications should be understood. - **May** / **Optional**: The item is genuinely optional. Implementing it provides additional benefit but omitting it is not a deficiency. Sections of this spec are either **normative** (defining checks and their pass/warn/fail criteria) or **informational** (providing context, evidence, and recommendations). The distinction is noted where it matters: - **Normative sections**: Category 1-7 check definitions, Checks Summary table. - **Informational sections**: Background, Scope, Start Here, "How Agents Get Content", "Who Actually Uses llms.txt?", Progressive Disclosure recommendation, "Making Private Docs Agent-Accessible", Appendices. The progressive disclosure pattern for `llms.txt` is a recommendation from this spec, not a normative requirement. Sites that keep their `llms.txt` under 50,000 characters don't need it. ## Start Here: Top Recommendations If you're a documentarian and can only do a few things, start with these. They are ordered by impact based on observed agent behavior: 1. **Create an `llms.txt` that fits in a single agent fetch** (under 50K characters). This is the single highest-impact action. Agents that find an `llms.txt` navigate documentation dramatically better. If your docs set is large, use the [nested pattern](#progressive-disclosure-for-large-documentation-sets) to keep each file under the limit. Checks: `llms-txt-exists`, `llms-txt-size` 2. **Serve markdown versions of your pages.** Either via `.md` URL variants or content negotiation. Markdown is what agents actually want; HTML conversion is lossy and unpredictable. Checks: `markdown-url-support`, `content-negotiation` 3. **Keep pages under 50,000 characters of content.** If a page has tabbed or dropdown content that serializes into a massive blob, break it into separate pages or ensure the markdown version stays under the limit. Checks: `page-size-markdown`, `page-size-html`, `tabbed-content-serialization` 4. **Put a pointer to your `llms.txt` at the top of every docs page.** A simple blockquote directive that tells agents where to find the documentation index. Anthropic does this; it works. Check: `llms-txt-directive` 5. **Don't break your URLs.** If you must move content, use same-host HTTP redirects. Avoid cross-host redirects, JavaScript redirects, and soft 404s. Checks: `http-status-codes`, `redirect-behavior` 6. **Monitor your agent-facing resources.** Treat `llms.txt` and markdown endpoints like any other production surface: check freshness, verify content parity with HTML, and ensure cache headers allow timely updates. Checks: `llms-txt-freshness`, `markdown-content-parity`, `cache-header-hygiene` ## Spec Structure Each check has: - **ID**: A short identifier (e.g., `llms-txt-exists`). - **Category**: The area of agent-friendliness it evaluates. - **What it checks**: A description of what the check evaluates. - **Why it matters**: The observed agent behavior that motivates the check. - **Result levels**: What constitutes a pass, warn, or fail. - **Recommended action**: What to do to resolve a warn or failure state. - **Automation**: Whether the check can be fully automated, partially automated (heuristic), or is advisory only. ### Check Dependencies Some checks depend on the results of others: - `llms-txt-valid`, `llms-txt-size`, `llms-txt-links-resolve`, and `llms-txt-links-markdown` only run if `llms-txt-exists` passes. - `page-size-markdown` only runs if `markdown-url-support` or `content-negotiation` passes (the site must serve markdown for this check to apply). - `page-size-html` and `content-start-position` results should be flagged as unreliable if `rendering-strategy` fails (the measurements reflect a shell, not actual content). - `section-header-quality` is most relevant when `tabbed-content-serialization` detects tabbed content. - `markdown-code-fence-validity` only runs if `markdown-url-support` or `content-negotiation` passes (the site must serve markdown for this check to apply). It also runs against any discovered `llms.txt` files. - `llms-txt-freshness` only runs if `llms-txt-exists` passes. - `auth-alternative-access` only runs if `auth-gate-detection` returns warn or fail (the site must have auth-gated content for alternative access paths to be relevant). - `markdown-content-parity` only runs if `markdown-url-support` or `content-negotiation` passes (the site must serve markdown for this check to apply). Implementations should run checks in category order (1 through 7) and skip dependent checks when their prerequisites fail. ### A Note on Responsible Use This spec describes checks that involve making HTTP requests to documentation sites. Implementations should be respectful of the sites being evaluated: introduce delays between requests, cap concurrent connections, honor `Retry-After` headers, and avoid overwhelming sites with traffic. The goal is to help documentation teams improve agent accessibility, not to load-test their infrastructure. --- ## Category 1: Content Discoverability These checks evaluate whether agents can find and navigate the site's documentation content. This includes whether the site provides an `llms.txt` file, whether that file is useful to agents, and whether documentation pages include signals that direct agents to discovery resources. ### Location Discovery The [llmstxt.org proposal](https://llmstxt.org) specifies that `llms.txt` should be at the root path (`/llms.txt`), mirroring `robots.txt` and `sitemap.xml`. In practice, the location varies significantly across sites: | Site | Root `/llms.txt` | `/docs/llms.txt` | Notes | |------|:-:|:-:|-------| | MongoDB | 200 | 200 | Both locations, different content | | Neon | 200 | 200 | Both locations | | Stripe | 200 | 301 -> docs.stripe.com | Root + docs subdomain | | Vercel | 200 | 308 -> root | Root only, /docs redirects | | React | 200 | -- | Root only | | GitHub Docs | 200 | -- | Root only | | Claude Code | 302 -> product page | 200 | /docs only; root is not docs | | Anthropic (old) | 301 -> 404 | -- | Moved domain, redirect breaks | The proposal does not address whether sites should serve `llms.txt` at subpaths, or whether a site with docs at `/docs/` should place it at `/docs/llms.txt` vs `/llms.txt`. In practice, both patterns exist. Implementations should check multiple candidate locations. **Discovery algorithm**: Given a base URL, check for `llms.txt` at: 1. `{base_url}/llms.txt` (the exact URL the user provided, plus llms.txt) 2. `{origin}/llms.txt` (site root, per the proposal) 3. `{origin}/docs/llms.txt` (common docs subpath) Where `{origin}` is the scheme + host of the base URL, and `{base_url}` is the full URL the user provided (which might be `https://example.com/docs` or `https://example.com` or `https://docs.example.com`). Duplicate URLs are deduplicated before checking. For each location, record whether `llms.txt` exists and whether the response involved a redirect (and if so, what kind). All subsequent llms.txt checks run against every discovered `llms.txt` file. ### `llms-txt-exists` - **What it checks**: Whether `llms.txt` is discoverable at any of the candidate locations described above. - **Why it matters**: `llms.txt` was the single most effective discovery mechanism observed. When agents found one, it fundamentally changed their ability to navigate a documentation site. Agents don't know to look for `llms.txt` by default, but when pointed at one, they treat it as a primary navigation resource. - **Result levels**: - **Pass**: `llms.txt` exists at one or more candidate locations, returning 200 with text content (direct or after same-host redirect). - **Warn**: `llms.txt` exists but is only reachable via cross-host redirect (agents may not follow it). - **Fail**: `llms.txt` not found at any candidate location. - **Recommended action**: - **Warn**: Serve `llms.txt` directly from the same host as your documentation, or use a same-host redirect. Cross-host redirects are not followed by some agents. - **Fail**: Create an `llms.txt` file at your site root containing an H1 title, a blockquote summary, and markdown links to your key documentation pages. This is the single highest-impact improvement for agent access. - **Automation**: Full. - **Report details**: List all candidate URLs checked and their status (200, 404, redirect chain). When multiple locations return `llms.txt`, note whether they serve the same or different content. ### `llms-txt-valid` - **What it checks**: Whether the `llms.txt` follows the structure described in the [llmstxt.org proposal](https://llmstxt.org). The proposal specifies: - An H1 with the project/site name. - A blockquote with a short summary. - H2-delimited sections containing markdown link lists. - Each link entry: `[name](url)` optionally followed by `: description`. - An optional H2 "Optional" section for secondary content. - Optional companion file `llms-full.txt` with complete content. - **Why it matters**: A well-structured `llms.txt` gives agents a reliable map of the documentation. Inconsistent implementations reduce its value. That said, even a non-standard `llms.txt` that contains useful links is better than nothing. - **Result levels**: - **Pass**: Follows the proposed structure with H1, summary blockquote, and heading-delimited link sections. - **Warn**: Contains parseable markdown links but doesn't follow the proposed structure (still useful, just non-standard). - **Fail**: Exists but contains no parseable links, or is empty. - **Recommended action**: - **Warn**: Add an H1 title as the first line and a blockquote summary (lines starting with `>`) to improve agent parsing. - **Fail**: Add links in `[name](url): description` format under heading-delimited sections. - **Automation**: Full. - **Checks in detail**: - H1 present (first line starts with `# `). - Blockquote summary present (line starting with `> `). - At least one heading-delimited section with markdown links. - Links follow `[name](url)` format. - Optional: check for `llms-full.txt` companion file. - **Notes on heading levels**: The llmstxt.org proposal specifies H2 (`##`) for section delimiters. In practice, some implementations (notably MongoDB) use H1 (`#`) for sections instead. Implementations should accept any heading level for section delimiters when evaluating structure. The important thing is that sections exist and contain parseable links, not that they use a specific heading level. ### `llms-txt-links-resolve` - **What it checks**: Whether the URLs listed in `llms.txt` actually resolve (return 200). - **Why it matters**: A stale `llms.txt` with broken links is worse than no `llms.txt` at all. It sends agents down dead ends with high confidence. - **Result levels**: - **Pass**: All links resolve (200, following same-host redirects). - **Warn**: >90% of links resolve. - **Fail**: <=90% of links resolve. - **Recommended action**: Audit and fix or remove broken URLs. A stale `llms.txt` with broken links is worse than no `llms.txt` at all because it sends agents down dead ends with high confidence. - **Automation**: Full. - **Notes**: Requires making HTTP requests to each URL. For large files, implementations may choose to test a random subset rather than every link. ### `llms-txt-size` - **What it checks**: The character count of the `llms.txt` file, and whether it exceeds the truncation limits of known agent web fetch pipelines. - **Why it matters**: An `llms.txt` that exceeds an agent's truncation limit defeats its own purpose. The agent sees only a fraction of the index and may miss the section it needs entirely. This is the same truncation problem that affects documentation pages, but arguably worse because `llms.txt` is supposed to be the *solution* to discovery. Real-world sizes vary enormously: | Site | Size | Links | Notes | |------|------|-------|-------| | MongoDB `/docs/llms.txt` | 4.56 MB | 21,891 | Every version of every product | | Vercel | 287 KB | ~3,000 | Single file | | Stripe | 89 KB | ~1,000 | Single file | | Neon | 75 KB | ~600 | Points to .md URLs | | React | 14 KB | ~150 | Single file | | Claude Code | 11 KB | ~60 | Small, focused | | GitHub Docs | 2 KB | ~30 | Small index | | MongoDB `/llms.txt` (root) | 1.5 KB | 6 | Top-level index only | Claude Code's web fetch pipeline truncates at ~100KB. A 4.56MB file means the agent sees roughly 2% of it. Even Vercel's 287KB file would be heavily truncated. Only the files under ~100KB are reliably consumable in their entirety by current agent implementations. - **Result levels**: - **Pass**: Under 50,000 characters (fits comfortably within all known truncation limits, even accounting for overhead). - **Warn**: Between 50,000 and 100,000 characters (fits within Claude Code's limit but may not fit others; consider splitting). - **Fail**: Over 100,000 characters (will be truncated by Claude Code and likely all other agent platforms). - **Recommended action**: - **Warn**: If the file grows further, split into nested `llms.txt` files with a root index under 50,000 characters. - **Fail**: Split into a root index linking to section-level `llms.txt` files, each under 50,000 characters. See [Progressive Disclosure for Large Documentation Sets](#progressive-disclosure-for-large-documentation-sets) below. - **Automation**: Full. ### `llms-txt-links-markdown` - **What it checks**: Whether the URLs in `llms.txt` point to markdown content (`.md` extension in the URL, or response with `Content-Type: text/markdown`). - **Why it matters**: Markdown content is dramatically more useful to agents than HTML. An `llms.txt` that points agents to HTML pages misses an opportunity to deliver content in the most agent-friendly format. The best implementations (like Neon's) point to `.md` URLs that serve clean markdown directly. - **Result levels**: - **Pass**: All or most links point to markdown content. - **Warn**: Links point to HTML, but markdown versions are available (detected by trying `.md` variants of the URLs). - **Fail**: Links point to HTML and no markdown alternatives are detected. - **Recommended action**: Update `llms.txt` links to use `.md` URL variants so agents receive markdown instead of converted HTML. - **Automation**: Full. ### Progressive Disclosure for Large Documentation Sets The llmstxt.org proposal does not address what to do when a documentation site is too large for a single `llms.txt` file to fit within agent truncation limits. In practice, large documentation sets (like MongoDB's, with 185 products/versions and 21,891 links) produce `llms.txt` files that are orders of magnitude beyond what any current agent can consume in a single fetch. #### Who Actually Uses llms.txt? The original framing of `llms.txt` drew analogies to `robots.txt` and `sitemap.xml`, suggesting it would serve AI crawlers gathering training data. The evidence shows this hasn't happened: - An audit of 1,000 domains over 30 days found zero visits to `llms.txt` from GPTBot, ClaudeBot, or PerplexityBot ([Longato, August 2025](https://www.longato.ch/llms-recommendation-2025-august/)). - A 90-day study tracking 62,100+ AI bot visits found only 84 requests (0.1%) to `/llms.txt`, roughly 3x fewer visits than an average content page ([OtterlyAI GEO Study](https://otterly.ai/blog/the-llms-txt-experiment/)). - John Mueller from Google stated directly: "no AI system currently uses llms.txt." Training crawlers don't use `llms.txt` because they have their own discovery mechanisms (sitemaps, link following, pre-built datasets) and probing `/llms.txt` on every domain would waste crawl budget for an unestablished standard. The real consumers of `llms.txt` are **agents in real-time workflows**: a developer's coding assistant fetching documentation to verify an API pattern, an agent following a directive on a docs page that points it to `llms.txt`, or a user explicitly handing their agent an `llms.txt` URL as a discovery starting point. These are fetch-once, use-now interactions subject to the truncation limits of web fetch pipelines. This distinction matters for our recommendation. A progressive disclosure pattern that splits `llms.txt` into nested files has no practical impact on crawler consumption (since crawlers aren't consuming it). It directly benefits the agent use case, which is where `llms.txt` actually provides value today. #### Recommendation We recommend a **nested `llms.txt` pattern** for progressive disclosure: #### Structure A **root `llms.txt`** serves as a table of contents, listing the major sections of the documentation with links to **section-level `llms.txt` files**. Each section-level file contains the actual page links for that section. ``` # MongoDB Documentation > MongoDB is the leading document database. This index covers all MongoDB > products, drivers, and tools documentation. ## Products - [Atlas](https://www.mongodb.com/docs/atlas/llms.txt): MongoDB Atlas cloud database - [Atlas CLI](https://www.mongodb.com/docs/atlas-cli/llms.txt): Command-line interface for Atlas - [Compass](https://www.mongodb.com/docs/compass/llms.txt): GUI for MongoDB - [MongoDB Server](https://www.mongodb.com/docs/manual/llms.txt): Server documentation ## Drivers - [Python Driver](https://www.mongodb.com/docs/drivers/pymongo/llms.txt): PyMongo driver - [Node.js Driver](https://www.mongodb.com/docs/drivers/node/llms.txt): Node.js driver - [Java Driver](https://www.mongodb.com/docs/drivers/java/llms.txt): Java sync and reactive drivers ``` Each linked `llms.txt` then contains the actual page listings for that product or driver, scoped to the current version (or with a small number of version variants). #### Design Principles 1. **The root `llms.txt` should fit in a single agent fetch.** Target under 50,000 characters. This is the entry point that agents will discover first, and it must be fully consumable. It should contain enough descriptive context for an agent to identify which section-level file to fetch next. 2. **Section-level files should also fit in a single agent fetch.** If a section is still too large (e.g., a product with hundreds of pages across many versions), consider further nesting or limiting the index to the current version only. 3. **Version sprawl is the primary size driver.** The MongoDB `/docs/llms.txt` lists every version of every product. Linking to every historical version in the index provides diminishing returns for agents, who almost always want the current version. Historical versions could be listed in a separate `llms-versions.txt` or under the "Optional" H2 section that the proposal already defines for secondary content. 4. **Links between levels should use absolute URLs.** An agent following a link from root `llms.txt` to a section `llms.txt` needs to resolve it without ambiguity. 5. **Each `llms.txt` should be self-describing.** Include the H1 and blockquote summary at every level so an agent landing on a section-level file (via direct URL from training data, for example) has enough context to understand what it's looking at. #### Compatibility Note This nested pattern is a recommendation from this spec, not part of the llmstxt.org proposal as of February 2026. It is fully compatible with the existing proposal (which doesn't prohibit linking to other `llms.txt` files) but would benefit from formal standardization. The proposal's existing "Optional" H2 section could be leveraged for secondary/versioned content, but the nesting pattern goes further by distributing content across multiple files. ### `llms-txt-directive` - **What it checks**: Whether documentation pages include a directive, visible to agents but not necessarily to human readers, pointing to `llms.txt` or another discovery resource. - **Why it matters**: Anthropic's Claude Code documentation (`code.claude.com/docs`, hosted on Mintlify) includes a directive as a blockquote at the top of every markdown page telling agents to fetch the documentation index at `llms.txt`. In practice, agents see this directive, follow it, and use the index to find what they need. It's simple, low-effort, and observed to work in real agent workflows. This is the agent equivalent of a "You Are Here" marker. The directive can be visually hidden (e.g., using a CSS clip-rect technique) as long as it remains in the DOM and survives HTML-to-markdown conversion. Avoid `display: none`, which some converters strip. The directive should be present in server-rendered HTML or in the markdown source; avoid relying solely on client-side JavaScript injection, since most agents fetch pages without executing JS. - **Result levels**: - **Pass**: A directive pointing to `llms.txt` (or equivalent index) is present in all (or nearly all) documentation pages, ideally near the top of the content. - **Warn**: A directive exists in some pages but is missing from others, or is present but buried deep in the page (past 50% of content, where it may be past truncation). - **Fail**: No agent-facing directive detected in any tested page. - **Recommended action**: - **Warn**: Ensure the directive appears near the top of every documentation page, not just some. - **Fail**: Add a blockquote near the top of each page (e.g., "> For the complete documentation index, see [llms.txt](/llms.txt)"). This can be visually hidden with CSS while remaining accessible to agents. - **Automation**: Heuristic. Search the page HTML for patterns like links to `llms.txt`, phrases like "documentation index", or directives near the top of the content area. Check both visible text and visually-hidden elements. --- ## Category 2: Markdown Availability These checks evaluate whether the site serves documentation in markdown format, which agents consume far more effectively than HTML. ### `markdown-url-support` - **What it checks**: Whether appending `.md` to documentation page URLs returns valid markdown content. - **Why it matters**: Agents work dramatically better with markdown than HTML. The HTML-to-markdown conversion in web fetch pipelines is lossy and unpredictable. Sites that serve markdown directly bypass conversion issues entirely. However, agents don't discover this pattern on their own; it needs to be signaled. - **Result levels**: - **Pass**: `.md` URLs return valid markdown with 200 status. - **Warn**: Some pages support `.md` but not consistently. - **Fail**: `.md` URLs return errors or HTML. - **Recommended action**: - **Warn**: Ensure all documentation pages serve markdown when `.md` is appended to the URL, not just some. - **Fail**: Configure your docs platform to serve `.md` variants for all documentation pages. - **Automation**: Full. Test against a sample of page URLs (from `llms.txt`, sitemap, or user-provided list). ### `content-negotiation` - **What it checks**: Whether the server responds to `Accept: text/markdown` with markdown content and an appropriate `Content-Type` header. - **Why it matters**: Some agents (Claude Code, Cursor, OpenCode) send `Accept: text/markdown` as their preferred content type. If the server honors this, the agent gets clean markdown without needing to know about `.md` URL patterns. Most agents don't request markdown, but the ones that do should get it. - **Result levels**: - **Pass**: Server returns markdown content with `Content-Type: text/markdown` when requested. - **Warn**: Server returns markdown content but with incorrect `Content-Type`. - **Fail**: Server ignores the `Accept` header and returns HTML regardless. - **Recommended action**: - **Warn**: Set the response `Content-Type` to `text/markdown` when serving markdown content. The correct header enables optimizations in some agent pipelines. - **Fail**: Configure your server to honor `Accept: text/markdown` requests and return markdown content. Some agents (Claude Code, Cursor, OpenCode) request markdown this way. - **Automation**: Full. --- ## Category 3: Page Size and Truncation Risk These checks evaluate whether page content fits within the processing limits of agent web fetch pipelines. Truncation is silent: the agent doesn't know it's working with partial data. ### How Agents Get Content Not all agents see the same thing. The format an agent receives depends on the request it makes and the server's response: 1. **Agents that request markdown** (Claude Code, Cursor, OpenCode send `Accept: text/markdown`). If the server honors this and returns markdown, the agent gets clean content. If the server also returns `Content-Type: text/markdown` and the content is under 100K characters, Claude Code bypasses its summarization model entirely, delivering the content directly to the agent. This is the best-case path. 2. **Agents that request HTML** (most agents, including Gemini, Copilot, and others, send `Accept: text/html` or `*/*`). These agents receive the full HTML response. Some pipelines convert HTML to markdown before truncation (Claude Code uses Turndown); others may truncate raw HTML or use their own processing. The HTML path is where boilerplate CSS/JS causes the most damage. 3. **Agents that use `.md` URL variants.** If an agent knows to append `.md` to a URL (because `llms.txt` told it, or a directive on the page, or persistent context), it gets markdown directly regardless of Accept headers. Because different agents hit different paths, this spec defines size checks for **both** the markdown response (if available) and the HTML response. A site that's only optimized for the markdown path is leaving most agents behind. ### `rendering-strategy` - **What it checks**: Whether the HTTP response contains the page's actual content, or whether content requires JavaScript execution to render (client-side rendering / SPA). - **Why it matters**: Most coding agents fetch pages using HTTP libraries that do not execute JavaScript. GitHub Copilot is the only major agent observed to use headless browser rendering. When a site relies on client-side rendering, agents see an empty shell containing framework boilerplate, inline CSS, and navigation chrome, but none of the documentation content. This is not a truncation problem. It is a zero-content problem. The page returns HTTP 200, so the agent doesn't know anything is wrong. It attempts to extract information from whatever text is in the shell (typically nav links and footer text) and produces nonsensical results, or falls back on training data that may be outdated. The rendering strategy is a property of the framework configuration, not the framework itself. The same framework can produce either server-rendered or client-rendered output. Sites built with Next.js, Gatsby, and Nuxt appear on both sides: react.dev (Next.js) and docs.github.com (Next.js) are fully agent-accessible, while other sites using the same frameworks deliver empty shells. Text-to-HTML ratio alone is not a reliable signal; GitHub docs and Stripe docs have low ratios due to heavy bundled assets but contain real page content. The distinction is whether page-specific content is present in the response. A subtler variant exists where a page is statically generated but a specific component defers content rendering to JavaScript based on user selections (e.g., query parameters choosing a language or deployment type). The static HTML contains the page structure (title, navigation, selector UI) but none of the substantive content. From an agent's perspective, the effect is the same as a full SPA shell. - **Result levels**: - **Pass**: HTTP response contains substantive page content. Detected by the presence of multiple page-specific headings, paragraphs with prose content, or other content elements beyond navigation chrome. - **Warn**: HTTP response contains some content but appears sparse relative to the page's apparent scope. This covers client-side content population (statically generated pages where a component defers content to JavaScript), partial hydration or lazy loading, and legitimately minimal pages. - **Fail**: HTTP response is an SPA shell. Detected by the combination of known framework markers (e.g., `id="___gatsby"`, `id="__next"`, `id="__nuxt"`, `id="root"`), minimal visible text content, and absence of page-specific content elements. - **Recommended action**: - **Warn**: Verify that key content is present in the server-rendered HTML response. Pages with sparse content may rely on client-side JavaScript to populate. - **Fail**: Enable server-side rendering or pre-rendering for documentation pages. If only specific page templates use client-side content loading, target those templates rather than rebuilding the entire site. - **Automation**: Heuristic. Combine framework marker detection with content signal analysis (headings, paragraphs, code blocks after stripping `