# Architecture

graphify is now an assistant skill backed by a TypeScript runtime rooted in `src/`. The skill orchestrates the runtime; the runtime can also be used standalone through the packaged CLI and public helper exports.

## Pipeline

```text
detect()  ->  extract()  ->  build()  ->  cluster()  ->  analyze()  ->  report()  ->  export()
```

Each stage is a single module with a narrow contract. They communicate through plain TypeScript objects and Graphology graphs - no shared mutable state, no side effects outside `graphify-out/`.

## Module responsibilities

| Module | Function | Input -> Output |
|--------|----------|-----------------|
| `src/detect.ts` | `detect(root)` / `detectIncremental(root, manifest)` | directory -> filtered corpus summary |
| `src/extract.ts` | `extract(...)` / `extractWithDiagnostics(...)` | code files -> extraction `{nodes, edges, hyperedges}` |
| `src/build.ts` | `buildFromJson(extraction)` | extraction dict -> `Graph` |
| `src/cluster.ts` | `cluster(G)` / `scoreAll(G, communities)` | graph -> communities + cohesion |
| `src/analyze.ts` | `godNodes(G)` / `surprisingConnections(G, communities)` / `suggestQuestions(...)` | graph -> analysis slices |
| `src/report.ts` | `generate(...)` | graph + analysis -> `GRAPH_REPORT.md` string |
| `src/export.ts` | `toJson`, `toHtml`, `toSvg`, `toGraphml`, `toCypher`, `pushToNeo4j` | graph -> exported artifacts |
| `src/ingest.ts` | `ingest(url, ...)` / `saveQueryResult(...)` | URL or Q&A -> saved corpus/memory file |
| `src/cache.ts` | `checkSemanticCache` / `saveSemanticCache` | files -> cached/uncached split |
| `src/security.ts` | validation helpers | URL / path / label -> validated or raises |
| `src/validate.ts` | `validateExtraction(data)` | extraction dict -> validation errors |
| `src/serve.ts` | `serve(graphPath)` | graph file path -> MCP stdio server |
| `src/watch.ts` | `watch(root, debounce)` / `rebuildCode(root)` | directory -> rebuild / update flag |
| `src/benchmark.ts` | `runBenchmark(graphPath)` | graph file -> corpus vs subgraph token comparison |
| `src/skill-runtime.ts` | `detect`, `extract-ast`, `finalize-build`, `finalize-update`, etc. | deterministic helper entrypoint for the Codex skill |
| `src/agent-stats/` | `discover`/`normalize`/`correlate`/`buildReport`; `buildProjectGraph` (`project-graph.ts`) | agentic-CLI transcripts (Claude/Codex/agy) -> branch/commit/work-package attribution per agent session (re-derivable `.graphify/agents/facts.jsonl`; `graphify.agent-stats/v1` report); `project-graph` builds a rename-aware project/conversation `graph.json` (reconciles repo renames into one project node) |
| `src/cli.ts` | packaged `graphify` CLI | user-facing commands |

## Extraction output schema

Every extractor returns:

```json
{
  "nodes": [
    {
      "id": "unique_string",
      "label": "human name",
      "file_type": "code|document|paper|image|rationale",
      "source_file": "path",
      "source_location": "L42"
    }
  ],
  "edges": [
    {
      "source": "id_a",
      "target": "id_b",
      "relation": "calls|imports|uses|references|...",
      "confidence": "EXTRACTED|INFERRED|AMBIGUOUS",
      "source_file": "path"
    }
  ],
  "hyperedges": []
}
```

`src/validate.ts` enforces this schema before `buildFromJson()` consumes it.

## Confidence labels

| Label | Meaning |
|-------|---------|
| `EXTRACTED` | Relationship is explicitly stated in the source |
| `INFERRED` | Relationship is a reasonable deduction from the source |
| `AMBIGUOUS` | Relationship is uncertain and should be surfaced for review |

## Skills vs runtime

The runtime itself does not dispatch Claude/Codex subagents. The skill markdown files under `src/skills/` instruct the assistant platform how to orchestrate:
- deterministic local runtime steps
- semantic extraction over docs, papers, and images
- optional parallel subagent fan-out on platforms that support it

So the package provides the graph pipeline, while the assistant client remains the orchestrator.

## Adding a new language extractor

1. Add the language support to `src/extract.ts` following the existing pattern:
   tree-sitter parse -> walk nodes -> collect `nodes` and `edges` -> add any second-pass inferred edges.
2. Register the suffix in `src/detect.ts` and any watcher handling in `src/watch.ts`.
3. Add the tree-sitter dependency to `package.json`.
4. Add a fixture file to `tests/fixtures/`.
5. Add or extend tests under `tests/` to cover the new extraction path.

## Security

All external input passes through `src/security.ts` before use:

- URLs -> `validateUrl()` and safe-fetch guards
- graph file paths -> `validateGraphPath()` so `serve` stays inside allowed graph outputs
- labels -> `sanitizeLabel()` before UI / text output

See `SECURITY.md` for the threat model.

## Testing

One test file per module or integration slice under `tests/`. Run with:

```bash
npm test
```

Notable integration coverage:
- `tests/pipeline.test.ts` for the end-to-end build pipeline
- `tests/serve.test.ts` for the MCP stdio server handshake and representative tool calls
- `scripts/smoke-test.sh` for package/tarball installation checks

## License

This repository is MIT licensed. The canonical license file is `LICENSE` at the repository root.