--- name: ontology-engineering description: Build, validate, and govern RDF/OWL ontologies using the Open Ontologies MCP server. Use when the user asks to create, modify, query, or manage ontologies, knowledge graphs, or RDF data. --- # Ontology Engineering Workflow You have access to the Open Ontologies MCP server, which provides 50+ tools for AI-native ontology engineering backed by an in-memory Oxigraph triple store. ## Core Workflow When building or modifying ontologies, follow this workflow. Decide which tools to call and in what order based on results -- this is not a fixed pipeline. ### 1. Generate - Understand the domain requirements (natural language, competency questions, methodology constraints) - Generate Turtle/OWL directly -- you know OWL, RDF, BORO, 4D modeling natively ### 2. Validate and Load - Call `onto_validate` on the generated Turtle -- if it fails, fix syntax errors and re-validate - Call `onto_load` to load into the Oxigraph triple store. For repos that mount a folder of `.ttl` files via `[general] ontology_dirs`, prefer `onto_repo_load` so the same compile-cache / TTL-eviction path is exercised. Use `onto_repo_list` to discover candidate files. - Call `onto_stats` to verify class count, property count, triple count match expectations ### 3. Verify - Call `onto_lint` to check for missing labels, comments, domains, ranges -- fix any issues found - Call `onto_query` with SPARQL to verify structure (expected classes, subclass hierarchies, competency questions) - If a reference ontology exists, call `onto_diff` to compare ### 4. Iterate - If any step reveals problems, fix the Turtle and restart from step 2 - Continue until validation passes, stats match, lint is clean, and SPARQL queries return expected results ### 5. Persist - Call `onto_save` to write the final ontology to a .ttl file - Call `onto_version` to save a named snapshot for rollback ## Cache and Multi-Ontology Loading The server keeps a single *active* ontology in memory plus an on-disk N-Triples compile cache for everything it has parsed. Switch between several ontologies without paying re-parse costs: - `onto_repo_list` — enumerate `.ttl` / `.owl` / `.nt` / `.rdf` / `.nq` / `.trig` / `.jsonld` files configured under `[general] ontology_dirs`. Container-friendly: mount a host folder of TTL files and discover them at runtime without hardcoded paths. - `onto_repo_load` — load by bare file stem, relative path, or absolute path inside a configured repo dir. Reuses the same compile-cache / TTL-eviction path as `onto_load`. - `onto_cache_status` / `onto_cache_list` — inspect what is cached, what is currently active, and the effective `[cache]` configuration (TTL, auto_refresh, dir). - `onto_cache_remove` — drop a cached entry by name (pass `delete_file=false` to keep the on-disk N-Triples for a later reload). - `onto_unload` — drop the active ontology (or a specific cached entry by name) from memory; the on-disk cache is preserved unless `delete_cache=true`. - `onto_recompile` — force a re-parse from source, ignoring the cache. Without `name`, recompiles the active ontology and reloads it; with `name`, rebuilds a non-active entry without disturbing the active in-memory store. ## Ontology Lifecycle (Terraform-style) For evolving ontologies in production: 1. **Plan** — `onto_plan` shows added/removed classes, blast radius, risk score. Check `onto_lock` for protected IRIs. 2. **Enforce** — `onto_enforce` with a rule pack (`generic`, `boro`, `value_partition`, `hierarchy`) checks design pattern compliance. 3. **Apply** — `onto_apply` with mode `safe` (clear + reload) or `migrate` (add equivalentClass bridges). 4. **Monitor** — `onto_monitor` runs SPARQL watchers with threshold alerts. Use `onto_monitor_clear` if blocked. 5. **Drift** — `onto_drift` compares versions with rename detection and self-calibrating confidence. 6. **Lineage** — `onto_lineage` shows the full plan → enforce → apply → monitor → drift trail for the current session. ## Data Extension Workflow When applying an ontology to external data: 1. `onto_map` — generate mapping config from data schema + loaded ontology 2. `onto_ingest` — parse a structured *file* (CSV, JSON, NDJSON, XML, YAML, XLSX, Parquet) into RDF 3. `onto_sql_ingest` — run a SQL query against PostgreSQL or DuckDB (via `postgres://`, `duckdb:///path.duckdb`, `:memory:`, or a `*.duckdb` file path) and ingest result rows. Use this when the source data lives in a database, or when you want to use DuckDB as a federation layer over remote Parquet/CSV/JSON via the `httpfs`, `postgres_scanner`, `iceberg`, etc. extensions. 4. `onto_import_schema` — introspect a PostgreSQL or DuckDB schema and generate OWL classes/properties/cardinality from tables/columns/PKs/FKs. 5. `onto_shacl` — validate against SHACL shapes 6. `onto_reason` — run RDFS or OWL-RL inference 7. Or use `onto_extend` to run the full file-based pipeline (ingest + SHACL + reason) in one call ## Reasoning and DL Explanation - `onto_reason` — RDFS / OWL-RL forward-chaining materialisation - `onto_dl_check` — check `subClass ⊑ superClass` using DL tableaux - `onto_dl_explain` — return the clash trace explaining why a class is unsatisfiable ## Semantic Search and Embeddings After loading, generate embeddings to enable natural-language search: - `onto_embed` — generate text + Poincaré structural embeddings for every class. Honours `[embeddings] provider = "local" | "openai"` and the `OPEN_ONTOLOGIES_EMBEDDINGS_*` env vars. - `onto_search` — natural-language query → most-similar classes (`mode: "text" | "structure" | "product"`). - `onto_similarity` — compute cosine + Poincaré distance between two specific IRIs. When embeddings exist, `onto_align` automatically uses them as a 7th alignment signal, catching semantically equivalent classes whose labels differ. ## Tool Reference | Tool | When to use | | ---- | ----------- | | `onto_status` | Check that the server is running and how many triples are loaded | | `onto_validate` | After generating or modifying Turtle -- always validate first | | `onto_load` | Load Turtle/N-Triples/RDF-XML into the triple store | | `onto_stats` | Sanity-check class / property / triple counts | | `onto_lint` | Catch missing labels, comments, domains, ranges | | `onto_query` | Verify structure, answer competency questions | | `onto_diff` | Compare against a reference or previous version | | `onto_save` | Persist the active ontology to a file | | `onto_convert` | Convert between Turtle, N-Triples, RDF/XML, N-Quads, TriG | | `onto_clear` | Reset the in-memory store | | `onto_pull` | Fetch ontology from a remote URL or SPARQL endpoint | | `onto_push` | Push triples to a SPARQL endpoint | | `onto_import` | Resolve and load `owl:imports` chains | | `onto_marketplace` | Browse / install standard ontologies from the curated catalogue | | `onto_version` | Save a named snapshot before making changes | | `onto_history` | List saved snapshots | | `onto_rollback` | Restore a previous snapshot | | `onto_unload` | Drop the active (or named) ontology from memory | | `onto_recompile` | Re-parse the source, ignoring the on-disk compile cache | | `onto_cache_status` | Inspect compile cache: active slot, all entries, `[cache]` config | | `onto_cache_list` | Lighter version of cache status — list cached ontologies with metadata | | `onto_cache_remove` | Remove a cached ontology by name | | `onto_repo_list` | Enumerate RDF/OWL files in configured `[general] ontology_dirs` | | `onto_repo_load` | Load by name / relative path / absolute path inside a configured repo dir | | `onto_ingest` | Parse a file (CSV, JSON, NDJSON, XML, YAML, XLSX, Parquet) into RDF | | `onto_sql_ingest` | **NEW** — SQL `SELECT` against PostgreSQL or DuckDB → RDF (uses the same mapping format as `onto_ingest`). DuckDB acts as a federation layer over CSV/Parquet/JSON/HTTPFS/postgres scanner via its extensions. | | `onto_import_schema` | Introspect PostgreSQL or DuckDB schema → generate OWL | | `onto_map` | Auto-generate mapping config from data schema + loaded ontology | | `onto_shacl` | Validate against SHACL shapes | | `onto_reason` | Run RDFS or OWL-RL inference | | `onto_extend` | File-based convenience: ingest + SHACL + reason | | `onto_dl_check` | Check `subClass ⊑ superClass` via DL tableaux | | `onto_dl_explain` | Explain why a class is unsatisfiable (DL clash trace) | | `onto_plan` | Show added/removed classes, blast radius, risk score | | `onto_apply` | Apply changes in `safe` or `migrate` mode | | `onto_lock` | Protect production IRIs from removal | | `onto_drift` | Compare versions with rename detection | | `onto_enforce` | Design pattern checks (`generic`, `boro`, `value_partition`, `hierarchy`) | | `onto_monitor` | Run SPARQL watchers with threshold alerts | | `onto_monitor_clear` | Clear blocked state after resolving alerts | | `onto_lineage` | View session lineage trail | | `onto_crosswalk` | Look up clinical terminology mappings (ICD-10, SNOMED, MeSH) | | `onto_enrich` | Add `skos:exactMatch` triples linking classes to clinical codes | | `onto_validate_clinical` | Check class labels against clinical crosswalk terminology | | `onto_align` | Detect alignment candidates between two ontologies (uses embeddings if loaded) | | `onto_align_feedback` | Accept/reject alignment candidates for self-calibrating weights | | `onto_lint_feedback` | Accept/dismiss a lint issue (teaches lint to suppress repeated false positives) | | `onto_enforce_feedback` | Accept/dismiss an enforce violation (same self-calibration mechanism) | | `onto_embed` | Generate text + Poincaré structural embeddings for all classes | | `onto_search` | Natural-language query → most-similar classes (text / structure / product) | | `onto_similarity` | Cosine + Poincaré distance between two IRIs | ## Key Principle Dynamically decide the next tool call based on what the previous tool returned. If `onto_validate` fails, fix and retry. If `onto_stats` shows wrong counts, regenerate. If `onto_lint` finds missing labels, add them. The MCP tools are individual operations -- you are the orchestrator.