--- title: Graph Research description: Experimental agent and skill that build a knowledge graph over a codebase and let GitHub Copilot Chat answer structural questions through MCP queries sidebar_position: 1 sidebar_label: Overview keywords: - graphify - knowledge graph - codebase analysis - copilot chat - mcp - dependency analysis tags: - agents - experimental - knowledge-graph author: Microsoft ms.date: 2026-04-29 ms.topic: concept estimated_reading_time: 7 --- :::warning Experimental Graph Research ships in the [`experimental`](../../getting-started/collections.md) collection. It depends on the upstream [`graphifyy`](https://pypi.org/project/graphifyy/) Python package, which iterates rapidly. Pin a specific version and expect occasional breaking changes. ::: The Graph Research workflow turns a folder of source code, documentation, PDFs, and images into a navigable knowledge graph, then lets you query that graph from GitHub Copilot Chat. It surfaces structural relationships that grep cannot answer cleanly: "what depends on this module," "what cluster does this file belong to," "what is the shortest path between feature X and config Y." > The goal is not to replace grep or your IDE's "Find All References." It is to answer the *structural* questions that those tools cannot: god nodes, communities, multi-hop relationships, and surprising connections across file types. ## Why Use Graph Research? | Benefit | Description | |-----------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------| | Structural awareness | Get answers like "what is implicitly affected if I change `auth_middleware.py`?" without reading every importer by hand | | Audit-tagged evidence | Every edge in the graph is tagged `EXTRACTED` (deterministic AST), `INFERRED` (LLM-derived, with confidence score), or `AMBIGUOUS` (multiple candidates) | | Mixed-input ingestion | The graph spans code, markdown, PDFs, and images, so a docs/code drift question can be answered as a single query | | MCP-native | Tools surface in Copilot Chat as `mcp_graphify_*`, so the agent uses typed queries (shortest path, neighbors, community) rather than free-form retrieval | ## How It Fits Together ```mermaid flowchart LR subgraph Build A["graphify CLI
(graphifyy package)"] --> B["graphify-out/
graph.json + report"] end subgraph Query B --> C["Graphify MCP server
(stdio)"] C --> D["Copilot Chat
mcp_graphify_* tools"] D --> E["@graph-researcher
agent"] end F["You: ask a structural question"] --> E E --> G["Evidence-tagged answer
+ next-read suggestion"] ``` The build step is run on demand from the terminal. The query step is fully inside Copilot Chat, mediated by the agent and MCP server. ## Three Steps to Get Started ### 1. Install and build the graph ```bash pip install graphifyy==0.5.4 graphify . --mode standard --update ``` The first build can take several minutes and (in `deep` mode) issues parallel Claude API calls against your `ANTHROPIC_API_KEY`. For a quick first pass, use `--mode fast` (deterministic AST only, no LLM calls, no API key required). Outputs land in `graphify-out/`. Add the directory to `.gitignore` before the first build. ### 2. Register the MCP server with Copilot Chat Add an entry to your workspace's `.vscode/mcp.json`: ```json { "servers": { "graphify": { "command": "python3", "args": ["-m", "graphify.serve", "graphify-out/graph.json"], "type": "stdio" } } } ``` Reload the VS Code window. The seven `mcp_graphify_*` tools appear in the Copilot Chat tool list. ### 3. Ask the Graph Researcher agent In Copilot Chat, address `@graph-researcher` with a structural question. The agent picks the smallest sufficient MCP tool, reports findings with audit tags, and ends with one suggested file to read next. ## Questions It Answers Well | Question | Tool used | |--------------------------------------------------------------------|------------------------------| | "What other modules are implicitly affected if I change file X?" | `mcp_graphify_get_neighbors` | | "Show me the shortest path between feature A and legacy module B." | `mcp_graphify_shortest_path` | | "Which files are the most-connected hubs in this repo?" | `mcp_graphify_god_nodes` | | "What community does this auth code belong to?" | `mcp_graphify_get_community` | | "What clusters or themes exist in this codebase?" | `mcp_graphify_graph_stats` | ## Questions It Does Not Answer Well * "Where is the literal string `'TODO'`?" (use grep, not the graph) * "Is this code correct?" (graph centrality is not a code-quality signal) * "What did this commit change?" (use git, not the graph) The agent declines these gracefully and points you to the right tool. ## Reading the Audit Tags Every conclusion the agent reports identifies the evidence behind it: | Tag | How the agent reports it | |-------------|--------------------------------------------------------------------| | `EXTRACTED` | Stated as fact: "X depends on Y." | | `INFERRED` | Hedged with confidence: "X likely depends on Y (confidence 0.74)." | | `AMBIGUOUS` | Surfaced as a question: "It is unclear whether X depends on Y." | A path through the graph is only as strong as its weakest edge. A two-hop path that combines `EXTRACTED` and `INFERRED` is reported as inferred overall. ## Cost and Privacy Two things to think about before running `--mode deep`: 1. **Cost.** Deep mode dispatches many parallel Claude API calls. A first build over a 10k-file repo can run several USD. Subsequent builds with `--update` reuse a SHA256 cache and only re-process changed files. 2. **Upload scope.** Deep mode uploads file *contents* to the Claude API for semantic extraction. Do not run deep mode against trees containing secrets, credentials, or content under upload restrictions. Use `--mode fast` for sensitive trees (AST-only, no LLM). The agent will warn before recommending a deep rebuild and will recommend `--mode fast` when it detects sensitive files. ## What Ships With This Workflow | Artifact | Role | |---------------------------------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------| | [`graphify` skill](https://github.com/microsoft/hve-core/blob/main/.github/skills/experimental/graphify/SKILL.md) | Technical reference: install, build modes, MCP registration, troubleshooting | | [`@graph-researcher` agent](https://github.com/microsoft/hve-core/blob/main/.github/agents/experimental/graph-researcher.agent.md) | Orchestrates queries, picks the right MCP tool, reports findings with audit tags | | [`graphify-out/**` instruction](https://github.com/microsoft/hve-core/blob/main/.github/instructions/experimental/graphify.instructions.md) | Auto-applies whenever Copilot reads files under `graphify-out/` | ## Promotion Path Graph Research lives in `experimental` because the upstream `graphifyy` project is young and iterates rapidly. Promotion to a stable collection requires: * Several minor-version bumps without breaking changes to the MCP tool surface * A vetted version-pinning policy with a CHANGELOG-diff review process (already in place; see the skill) * Independent confirmation of the deep-mode cost envelope on representative repos ## Where Next * Skill technical reference: [`SKILL.md`](https://github.com/microsoft/hve-core/blob/main/.github/skills/experimental/graphify/SKILL.md) * Upstream project: [github.com/safishamsi/graphify](https://github.com/safishamsi/graphify) * Collections overview: [Available Collections](../../getting-started/collections.md) * How agents are structured: [Agent Systems Catalog](../README.md) *🤖 Crafted with precision by ✨Copilot following brilliant human instruction, then carefully refined by our team of discerning human reviewers.*