# Contributing to m1nd Thanks for your interest in contributing to m1nd. This document covers the basics. If you are contributing through real agent usage, also keep [docs/AGENT-TASKNOTES.md](docs/AGENT-TASKNOTES.md) current. It is the running capture surface for moments where an agent used `m1nd`, did not get the exact answer it needed, and had to compensate outside the graph. For large capability waves, follow [docs/internal/M1ND-MAJOR-UPDATE-WORKFLOW.md](docs/internal/M1ND-MAJOR-UPDATE-WORKFLOW.md) so code, docs, built docs, and release surfaces move together. ## Getting Started ```bash git clone https://github.com/maxkle1nz/m1nd.git cd m1nd cargo build cargo test --all ``` ## Project Structure ``` m1nd-core/ Graph engine, plasticity, spreading activation, hypothesis engine m1nd-ingest/ Language extractors (28 languages), memory adapter, JSON adapter m1nd-mcp/ MCP server, live MCP tool surface, JSON-RPC over stdio ``` --- ## Crate Architecture m1nd is a three-crate Rust workspace. Understanding what lives where saves you from editing the wrong crate. ### m1nd-core The graph engine. No I/O, no file system, no LLM calls. Pure computation. Key modules: | Module | Purpose | |--------|---------| | `graph.rs` | CSR adjacency, `NodeProvenance`, `Graph::finalize()` (required before any query) | | `activation.rs` | Spreading activation, `HybridEngine` auto-selection, XLR noise cancellation | | `plasticity.rs` | Hebbian LTP/LTD, `QueryMemory` ring buffer, homeostatic normalization | | `temporal.rs` | `CoChangeMatrix`, `TemporalDecayScorer`, per-NodeType half-lives | | `semantic.rs` | Trigram `CharNgramIndex`, `CoOccurrenceIndex` with PPMI, `SynonymExpander` | | `resonance.rs` | Standing wave analysis, `HarmonicAnalyzer`, `SympatheticResonanceDetector` | | `counterfactual.rs` | Cascade simulation, synergy analysis for multi-node removal | | `topology.rs` | Community detection, bridge detection, `ActivationFingerprinter` LSH | | `antibody.rs` | Bug immune memory — subgraph pattern matching with DFS + timeout budget | | `flow.rs` | Particle-based concurrent execution simulation, race condition detection | | `epidemic.rs` | SIR bug propagation model, `R0` estimation, burnout detection | | `tremor.rs` | Second-derivative acceleration detection on edge weight time series | | `trust.rs` | Actuarial per-module defect density, Bayesian prior adjustment | | `layer.rs` | Tarjan SCC + BFS depth → architectural layer detection and violation reporting | | `domain.rs` | `DomainConfig` — multi-domain presets: `code`, `music`, `memory`, `generic` | | `builder.rs` | Fluent `GraphBuilder` API for constructing graphs programmatically | | `snapshot.rs` | `save_graph()` / `load_graph()`, atomic write via temp + rename | | `seed.rs` | 5-level `SeedFinder`: exact → prefix → substring → tag → fuzzy trigram | | `types.rs` | `NodeType`, `EdgeType`, `PropagationConfig`, `DIMENSION_WEIGHTS`, newtypes | | `error.rs` | `M1ndError` variants — all map to MCP error responses | | `query.rs` | `QueryConfig` — `xlr_enabled`, `include_ghost_edges`, `GhostEdge` struct | | `xlr.rs` | XLR differential processing math — `sigmoid_gate()`, `spectral_overlap()` | **Design rule**: m1nd-core must compile with `no_std` ambitions. Keep stdlib use minimal and confined to `snapshot.rs` / persistence paths. ### m1nd-ingest File system walker, language extractors, graph construction pipeline. Depends on m1nd-core. Key modules: | Module | Purpose | |--------|---------| | `lib.rs` | `Ingestor` pipeline, `IngestConfig`, `IngestStats` | | `walker.rs` | `DirectoryWalker` — binary detection, git history enrichment | | `cross_file.rs` | Post-ingest `CrossFileResolver` — imports/tests/registers edges | | `resolve.rs` | `ReferenceResolver` — multi-value index, import hint disambiguation | | `diff.rs` | `GraphDiff` — incremental ingest engine (`DiffAction` enum) | | `merge.rs` | `merge_graphs()` — tag union, max weight, provenance merge (powers `federate`) | | `memory_adapter.rs` | `MemoryIngestAdapter` — markdown/text → memory graph | | `json_adapter.rs` | `JsonIngestAdapter` — JSON descriptor → any-domain graph | | `extract/tree_sitter_ext.rs` | `TreeSitterExtractor` — universal tree-sitter extractor, 22 languages | | `extract/generic.rs` | Regex fallback for unsupported file types | ### m1nd-mcp JSON-RPC stdio server. Tool dispatch, session state, protocol types. Depends on both crates. Key modules: | Module | Purpose | |--------|---------| | `main.rs` | Entry point, env/config loading, `./m1nd-mcp [config.json]` | | `server.rs` | `tool_schemas()` — 77 tool registrations, tool dispatch (normalize → match) | | `tools.rs` | Core tool handlers (ingest, activate, impact, learn, drift, ...) | | `layer_handlers.rs` | Antibody, flow, epidemic, tremor, trust, layers handlers | | `engine_ops.rs` | Shared engine helpers | | `session.rs` | Multi-agent session state, `SharedGraph`, generation counters | | `protocol/core.rs` | JSON-RPC types, request/response shapes | | `protocol/layers.rs` | Protocol types for the 9 Superpowers Extended tools | | `perspective_handlers.rs` | 12 perspective navigation handlers | | `lock_handlers.rs` | 5 lock system handlers | | `perspective/state.rs` | In-process perspective state machine | | `perspective/peek_security.rs` | Allowlist enforcement — only files within ingest roots | | `perspective/confidence.rs` | Suggestion confidence scoring | --- ## Adding New MCP Tools Every tool follows the same dispatch pattern in `server.rs` + handler in the appropriate `.rs` file. ### Step 1: Add the tool schema In `m1nd-mcp/src/server.rs`, find `tool_schemas()`. Add a new entry: ```rust ToolSchema { name: "your_tool".to_string(), description: "One sentence. What it does and when to use it.".to_string(), input_schema: json!({ "type": "object", "properties": { "agent_id": { "type": "string", "description": "Caller agent identifier" }, // your parameters here }, "required": ["agent_id"] }), }, ``` ### Step 2: Add the dispatch arm In `server.rs`, find the tool dispatch `match` block. The tool name is normalized (dots → underscores, leading `m1nd_` stripped) before matching: ```rust "your_tool" => handle_your_tool(&state, params).await, ``` ### Step 3: Write the handler Add your handler to the appropriate file. Use `engine_ops.rs` helpers for graph access. The standard signature: ```rust pub async fn handle_your_tool( state: &ServerState, params: serde_json::Value, ) -> Result { let agent_id = params["agent_id"].as_str().ok_or(M1ndError::InvalidInput(...))?; let graph = state.graph.read(); // ... Ok(json!({ "result": ... })) } ``` ### Step 4: Add a protocol type (optional) If your tool returns a complex struct, add request/response types in `m1nd-mcp/src/protocol/`. Mirror the naming convention of existing protocol files. ### Step 5: Tests Add unit tests in the handler file and, if the tool touches core logic, integration tests in `m1nd-core/src/` next to the module it exercises. ### Step 6: Update the agent-facing docs (CI-enforced) Adding or changing a tool changes what agents can do, so CI's **agent-docs gate** (`scripts/agent_docs_gate.py`, the `agent-docs-gate` job) requires your PR to ALSO touch at least one agent-facing doc surface: `skills/` (the packs installed into Claude/Codex/Gemini/Antigravity hosts), `docs/` (including the wiki under `docs/wiki/`), `README.md`, or `CONTRIBUTING.md`. The gate arms only when the diff touches an agent-workflow surface (the MCP instructions string / schemas / verb dispatch, `protocol/`, `help_guidance.rs`, `universal_docs.rs`, `skills/`, or the npm host installer under `npm/`), so unrelated internal changes are never blocked. An edit that only touches the `M1ND_INSTRUCTIONS` string self-satisfies. If a surface change genuinely has **no** agent-visible behavior (a pure refactor), add the `agent-docs-exempt` label to the PR to skip the gate. This exists because a surface change without a doc change once taught hosts a stale contract for two weeks (PR #216). --- ## Adding Language Extractors m1nd has two tiers of tree-sitter language support plus a manual extractor path. ### Tier system | Tier | Feature flag | Languages | |------|-------------|-----------| | Tier 1 | `--features tier1` | C/H, C++, C#, Ruby, PHP, Swift, Kotlin, Scala, Bash, Lua, R, HTML, CSS, JSON (14) | | Tier 2 | `--features tier2` (default) | Tier 1 + Elixir, Dart, Zig, Haskell, OCaml, TOML, YAML, SQL (22 total) | Tier 2 is the default build (`default = ["tier2"]` in `m1nd-ingest/Cargo.toml`). ### Adding a tree-sitter language (recommended path) 1. Find a `tree-sitter-` crate that depends on `tree-sitter-language` (new API), NOT the old `tree-sitter 0.19/0.20`. Crates that depend on the old API cause symbol collisions at link time and will silently return `None` from `parse()`. 2. Add the crate to `m1nd-ingest/Cargo.toml` as an optional dependency under the appropriate tier feature: ```toml [features] tier1 = [..., "dep:tree-sitter-yourlang"] [dependencies] tree-sitter-yourlang = { version = "x.y", optional = true } ``` 3. In `m1nd-ingest/src/extract/tree_sitter_ext.rs`, add a `LanguageConfig` entry: ```rust LanguageConfig { lang_tag: "yourlang", extensions: &["ext"], function_kinds: &["function_definition"], class_kinds: &["class_declaration"], struct_kinds: &[], enum_kinds: &[], type_kinds: &[], module_kinds: &["module"], import_kinds: &["import_statement"], name_field: "name", alt_name_fields: &[], name_from_first_child: false, } ``` The `name_field` is the tree-sitter field used to extract a definition's name. Use `alt_name_fields` for languages with complex name positions (e.g., C declarators). Set `name_from_first_child: true` for languages like OCaml or TOML where the name is the first named child. 4. Gate the config behind `#[cfg(feature = "tier1")]` or `#[cfg(feature = "tier2")]` matching the tier you added it to. ### Adding a manual extractor For languages where tree-sitter support is incomplete or you need deeper semantic understanding, add a manual extractor in `m1nd-ingest/src/extract/`: 1. Create `your_lang.rs` implementing the extractor logic. Return `Vec` and `Vec`. 2. Register the file extension in `m1nd-ingest/src/lib.rs` pipeline dispatch. 3. Existing examples: `m1nd-ingest/src/extract/` (Python, Rust, TypeScript, Go, Java). --- ## Memory Adapter `m1nd-ingest/src/memory_adapter.rs` turns markdown and plain text files into a graph. This is the path for AI agent memory, project wikis, and knowledge bases. ### How it works The adapter parses `.md`, `.markdown`, and `.txt` files and creates nodes for: - `file::` — the document itself - `section::` — H1–H6 headings - `entry::` — bullet points, checkboxes, table rows, plain text lines - `reference::` — file paths cross-referenced in text Entries are classified by keyword: `todo`/`task` → `Process` with tag `memory:task`; `decision`/`decided` → `Concept` with tag `memory:decision`; etc. Canonical source detection marks `YYYY-MM-DD.md`, `memory.md`, `*-active.md`, `*-history.md`, and files containing `briefing` as `canonical=true` in provenance. Node ID scheme: ``` memory::::file:: memory::::section::::- memory::::entry:::::: memory::::reference:: ``` ### Using the adapter via MCP Pass `adapter: "memory"` to `ingest`: ```json { "name": "ingest", "arguments": { "path": "/path/to/notes/", "adapter": "memory", "namespace": "project-x", "agent_id": "your-agent" } } ``` The `namespace` parameter scopes all node IDs (default: `"memory"`). Ingest multiple note directories with different namespaces and they coexist in the same graph. ### Extending the adapter To add a new content classification rule, edit the entry classification block in `memory_adapter.rs`. Each rule matches keywords in entry text and maps to a `(NodeType, tag, relation)` triple. The adapter uses the first matching rule, with a default catch-all of `(Concept, "memory:note", "contains")`. To add a new canonical source pattern, add to the `is_canonical()` function. --- ## Domain Configuration `M1ND_DOMAIN` (env var) or the `domain` field in the config JSON controls which `DomainConfig` preset is active. This affects temporal decay half-lives and which edge types are considered meaningful for co-change analysis. | Domain | Use case | git_co_change | |--------|---------|---------------| | `code` | Software codebases | true | | `music` | Audio/DAW graphs | false | | `memory` | Agent memory, wikis | false | | `generic` | Any other graph | false | New domain presets go in `m1nd-core/src/domain.rs`. Implement `DomainConfig::your_domain()` and add it to the `from_str()` dispatch. --- ## Testing ### Unit tests ```bash # All crates cargo test --all # Single crate cargo test -p m1nd-core cargo test -p m1nd-ingest cargo test -p m1nd-mcp ``` Each module has inline tests at the bottom of the file (`#[cfg(test)] mod tests { ... }`). ### E2E tests The `mcp/m1nd/` directory contains end-to-end test scripts that drive the server via its JSON-RPC interface: ```bash # Shell-based E2E ./tests/e2e/test_e2e.sh ./tests/e2e/test_mcp.sh ./tests/e2e/test_perspective_e2e.sh # Python-based scenarios python3 tests/e2e/test_layers_e2e.py python3 tests/e2e/test_advanced_usecases.py python3 tests/e2e/test_perspective_usecases.py ``` These scripts start the binary, send JSON-RPC calls over stdin, and assert on stdout. They are the ground truth for behavioral correctness. ### Integration test guidelines - New tools: add a test in the E2E shell script that exercises the happy path + one error case. - New extractors: add a fixture file in the test corpus and assert on node/edge counts. - Core algorithm changes: add both a unit test at the function level and an E2E test that exercises the full stack. ### Testing `apply` and `apply_batch` with `verify=true` `apply` and `apply_batch` both accept an optional `verify` flag (v0.5.0+). When `verify=true`, the server performs a post-write graph consistency check: it re-reads the written file, confirms the content round-trips through ingest cleanly, and returns a `verify` block in the response with `passed`, `node_delta`, and `edge_delta`. When adding tests for tools that call `apply` or `apply_batch`, include a case that sets `verify=true` and asserts on the `verify.passed` field: ```bash # E2E: apply with verify echo '{"method":"tools/call","params":{"name":"apply","arguments":{ "agent_id":"test","file_path":"/tmp/test_apply.py", "new_content":"def hello(): pass\n","verify":true }}}' | ./m1nd-mcp | jq '.result.verify' # Expected: {"passed": true, "node_delta": 1, "edge_delta": 0} # E2E: apply_batch with verify echo '{"method":"tools/call","params":{"name":"apply_batch","arguments":{ "agent_id":"test", "edits":[ {"file_path":"/tmp/a.py","new_content":"x = 1\n"}, {"file_path":"/tmp/b.py","new_content":"y = 2\n"} ], "verify":true }}}' | ./m1nd-mcp | jq '.result.verify' # Expected: {"passed": true, "files_verified": 2} ``` The `verify` flag is designed for CI and agent harnesses where silent write failures are unacceptable. It adds ~1–3ms per file written (one ingest round-trip). --- ## Feature Flags `m1nd-ingest/Cargo.toml` defines the tier system: ```toml [features] default = ["tier2"] tier1 = [...] # 14 tree-sitter languages tier2 = ["tier1", ...] # 8 more languages (default build) ``` `m1nd-mcp` has no additional feature flags — it inherits from m1nd-ingest via the workspace dependency chain. To build with only native extractors (smaller binary, faster compile): ```bash cargo build --release --no-default-features ``` To build with only Tier 1: ```bash cargo build --release --no-default-features --features tier1 ``` The full Tier 2 build (default) produces the release binary shipped in `target/release/m1nd-mcp`. --- ## What to Work On ### Language Extractors (high impact) m1nd currently supports 22 languages via tree-sitter (Tier 1+2) plus Python, Rust, TypeScript/JavaScript, Go, and Java via manual extractors. Adding more tree-sitter grammars is the fastest path to expanding language coverage. Before adding a grammar crate: verify it depends on `tree-sitter-language` (new API), not `tree-sitter 0.19/0.20`. Old-API crates cause silent parse failures at runtime. ### Graph Algorithms The core engine in `m1nd-core/` has room for improvement: - Community detection algorithms - Better spreading activation decay functions - Smarter ghost edge inference - Embedding-based semantic scoring (V1 is trigram-only) ### MCP Tools New tools that leverage the graph are welcome. Each tool is a handler in `m1nd-mcp/src/`. The pattern is consistent -- look at existing tools for the structure. ### Benchmarks Run m1nd on your codebase and report performance. We track real-world numbers, not synthetic benchmarks. --- ## Code Standards - `cargo fmt` before committing - `cargo clippy -- -D warnings` must pass - All new code needs tests - No `unsafe` without a comment explaining why ## Pull Requests 1. Fork the repo and create a branch from `main` 2. Make your changes with tests 3. Ensure `cargo test --all` passes 4. Ensure `cargo clippy --all -- -D warnings` passes 5. Ensure `cargo fmt --all -- --check` passes 6. Open a PR with a clear description of what and why ## Issues Use GitHub issues for bugs, feature requests, and questions. Label your issue: - `bug` -- something doesn't work - `enhancement` -- new feature or improvement - `good first issue` -- suitable for new contributors - `language-extractor` -- new language support - `algorithm` -- graph algorithm work ## License By contributing, you agree that your contributions will be licensed under the MIT License.