# DocPulse

> Docs that stay in sync with the heartbeat of the codebase.
[](https://github.com/marketplace/actions/docpulse-self-healing-docs)
DocPulse detects documentation sections invalidated by a pull request's code
changes and commits surgical fixes directly onto the PR branch. A deterministic layer
(tree-sitter chunking → code↔doc link graph → diff-driven suspect selection)
decides *what to check*; a bounded agentic layer (an LLM verifier with read/grep
tools, then a style-preserving repairer with a validation pass) decides *stale
or not, and how to fix it*.
## Quickstart (GitHub Action)
```yaml
# .github/workflows/docpulse.yml
name: DocPulse
on: pull_request
jobs:
docpulse:
runs-on: ubuntu-latest
permissions:
contents: write # repair mode commits doc fixes onto the PR branch
pull-requests: write
steps:
- uses: actions/checkout@v4
with:
ref: ${{ github.event.pull_request.head.ref }}
fetch-depth: 0
- uses: YoniRaviv/DocPulse@v1
with:
mode: repair # or "check" to comment-only
env:
ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
DOCPULSE_PR_NUMBER: ${{ github.event.pull_request.number }}
```
Embeddings improve link recall and need an embedding key (`OPENAI_API_KEY`). To run without one, set the action input `heuristics-only: true` — linking then uses name mentions only.
In `repair` mode DocPulse commits the doc fix straight onto your PR's branch, so the fix lands in the same PR (same-repo PRs only — DocPulse can't push to a fork, so use `mode: check` for fork-based contributions). It skips itself when the most-recent commit on the branch is already a doc-sync commit, so the auto-fix push never re-triggers an endless loop.
Add a `docpulse.yml` at your repo root (see [Configuration](#configuration)).
## Quickstart (CLI)
```bash
uv tool install docpulse # or: pipx install docpulse
docpulse index --root . # build the code<->docs link index
docpulse check --base origin/main # verify docs vs the PR diff (exit 1 on drift)
docpulse check --base origin/main --suspects-only # keyless: list suspect sections only
docpulse repair --base origin/main # print proposed fixes + the dry-run commit plan
docpulse repair --base origin/main --write # apply fixes to doc files locally (no push)
docpulse repair --base origin/main --push # commit+push doc fixes onto the current branch (needs GH_TOKEN)
```
`check` exits 0 (clean), 1 (a doc section is stale at/above `flag_threshold`),
or 2 (setup/tool error). `unverified` never fails the build. `repair` uses the same
codes — it exits 1 if drift was found, even after committing a fix, so the auto-fixed
PR still flags for human review.
## Eval numbers
Measured on 12 hand-labeled seed cases (4 per language; 6 stale / 6 accurate)
with `anthropic/claude-haiku-4-5`:
| Metric | Value |
|-----------|-------|
| Precision | 1.00 |
| Recall | 1.00 |
The seed cases are deliberately clear-cut; robustness grows as real-world
false-positives are added back as cases. Run `docpulse eval --cases evals/cases`
to reproduce.
## Architecture
```mermaid
flowchart LR
subgraph Deterministic
A[tree-sitter chunker] --> B[code↔doc link graph]
B --> C[git diff → suspects]
end
subgraph Agentic
C --> D[verifier
read/grep tools]
D --> E[repairer]
E --> F[validator]
end
F --> G[destination:
PR comment + doc-sync commit]
```
## Configuration
```yaml
model: anthropic/claude-sonnet-4-6 # any LiteLLM model string
embedding_model: openai/text-embedding-3-small
docs:
- path: "docs/**/*.md"
- path: "README.md"
code:
include: ["src/**"]
exclude: ["**/*.test.*", "tests/**"]
confidence:
auto_fix_threshold: 0.85
flag_threshold: 0.5
budget:
max_suspects_per_run: 20
max_tool_calls_per_suspect: 10
context: [git]
```
Secrets come only from env vars (`ANTHROPIC_API_KEY`, `OPENAI_API_KEY`, …).
`docpulse index --heuristics-only` builds the link graph without embeddings (no key).
## Jenkins / any CI
```bash
# The image entrypoint is env-driven (it always runs `index` then the chosen
# mode with --push). Configure it with -e vars, not CLI args:
docker run --rm \
-e ANTHROPIC_API_KEY -e GH_TOKEN \
-e DOCPULSE_MODE=check \
-e DOCPULSE_BASE_REF=origin/main \
-e DOCPULSE_PR_NUMBER="$PR_NUMBER" \
-e DOCPULSE_HEURISTICS_ONLY=true \
-v "$PWD:/work" -w /work \
ghcr.io/yoniraviv/docpulse:latest
```
Set `DOCPULSE_PR_NUMBER` to your CI's PR/MR number so the flag comment is posted to
the PR (without it, the comment is only printed to the log). Drop
`DOCPULSE_HEURISTICS_ONLY` and add `-e OPENAI_API_KEY` to use embedding-based linking.
### Kubernetes (no Docker daemon)
A Jenkins agent that is itself a K8s pod has no Docker daemon, so `docker run`
won't work. Run DocPulse as a **pod sidecar container** (or `uv tool install
docpulse` / `pipx install docpulse` into the agent image) and call the CLI
directly — it self-preps (git `safe.directory`, base-ref fetch on shallow
checkouts, and the push loop guard), so no entrypoint wrapper is needed:
```groovy
container('docpulse') {
sh '''
docpulse index --root .
docpulse check --base "origin/$CHANGE_TARGET" --comment-via none --comment-out docpulse-flag.md
'''
}
// then post docpulse-flag.md with your SCM's Jenkins plugin (GitHub/GitLab/Bitbucket)
```
`--comment-out` writes the flag comment as portable markdown so **any** host's CI
posts it natively — DocPulse stays out of the per-vendor comment-API business.
GitHub users can instead let DocPulse post directly (the default `--comment-via
gh`, given `gh` + `GH_TOKEN` + `DOCPULSE_PR_NUMBER`); other hosts pass
`--comment-via none`. `repair --push` works on any host — it commits the doc-sync
fix with `git -c user.*` (bot-authored, so the loop guard catches it) and pushes
with the credentials your checkout already configured.
Map your CI's variables: Jenkins exposes the base branch as `$CHANGE_TARGET` and
the PR/MR number as `$CHANGE_ID` (set `DOCPULSE_PR_NUMBER="$CHANGE_ID"` for `gh`
posting). The bot identity is overridable via `DOCPULSE_BOT_NAME` /
`DOCPULSE_BOT_EMAIL`.