--- name: knowledge-graph-builder description: > Implements knowledge graphs for AI-enhanced relational knowledge. Covers ontology design, graph database selection (Neo4j, Neptune, ArangoDB, TigerGraph), entity extraction, hybrid graph-vector architecture, query patterns, and AI integration. Use when implementing knowledge graphs, designing ontologies, extracting entities and relationships, selecting a graph database, or building hybrid graph-vector search. Use for knowledge graph, ontology design, entity resolution, graph RAG, hallucination detection. For architecture selection and governance, use the knowledge-base-manager skill. For document retrieval pipelines, use the rag-implementer skill. license: MIT metadata: author: oakoss version: '1.0' --- # Knowledge Graph Builder ## Overview Knowledge graphs make implicit relationships explicit, enabling AI systems to reason about connections, verify facts, and reduce hallucinations. They combine structured entity-relationship modeling with semantic search for powerful knowledge retrieval. **When to use:** Complex entity relationships central to the domain, verifying AI-generated facts against structured knowledge, semantic search combined with relationship traversal, recommendation systems, fraud detection, or pattern recognition. **When NOT to use:** Simple tabular data (use a relational database), purely document-based search with no relationships (use the `rag-implementer` skill), read-heavy workloads with no traversal needs, or when the team lacks graph modeling expertise. For KB architecture selection and governance, use the `knowledge-base-manager` skill. ## Quick Reference | Pattern | Approach | Key Points | | ------------------- | ------------------------------------------------------------------------------------------ | ------------------------------------------------------------------- | | Ontology first | Define entity types, relationships, properties before ingesting data | Changing schema later is expensive; validate with domain experts | | Entity resolution | Deduplicate aggressively during extraction | "Apple Inc" = "Apple" = "Apple Computer" must resolve to one entity | | Confidence scoring | Attach 0.0-1.0 score + source to every relationship | Enables filtering by reliability, critical for AI grounding | | Hybrid architecture | Graph traversal (structured) + vector search (semantic) | Vector finds candidates, graph expands context via relationships | | Incremental build | Core entities first, validate against target queries, then expand | Avoid building the full graph before testing with real queries | | Database selection | Neo4j (general), Neptune (AWS managed), ArangoDB (multi-model), TigerGraph (massive scale) | Match database to scale, infrastructure, and query complexity | ## Common Mistakes | Mistake | Correct Pattern | | ----------------------------------------------------------- | -------------------------------------------------------------------------------------------- | | Ingesting entities before designing the ontology | Define and validate the ontology with domain experts first; changing later is expensive | | Skipping entity resolution and deduplication | Deduplicate aggressively so "Apple Inc", "Apple", and "Apple Computer" resolve to one entity | | Omitting confidence scores on relationships | Attach a 0.0-1.0 confidence score and source to every relationship | | Using only graph traversal without vector search | Implement hybrid architecture combining graph traversal with semantic vector search | | Building the full graph before validating with real queries | Start with core entities, test against target queries, then expand incrementally | | Choosing a database before understanding scale requirements | Evaluate query patterns, data volume, and infrastructure constraints before selecting | ## Delegation - **Extract entities and relationships from unstructured text**: Use `Task` agent to run NER pipelines and build relationship triples - **Evaluate graph database options for project requirements**: Use `Explore` agent to compare Neo4j, Neptune, ArangoDB, and TigerGraph against scale and query needs - **Design ontology and hybrid architecture for a new domain**: Use `Plan` agent to define entity types, relationship schemas, and graph-vector integration strategy - For hybrid KG+RAG systems, delegate to the `rag-implementer` skill - For knowledge-graph-powered agent workflows, delegate to the `agent-patterns` skill ## References - [Ontology Design](references/ontology-design.md) — Entity types, relationships, properties, RDF schema, validation - [Database Selection](references/database-selection.md) — Neo4j, Neptune, ArangoDB, TigerGraph comparison and setup - [Entity Extraction](references/entity-extraction.md) — NER pipeline, relationship extraction, LLM-based extraction - [Hybrid Architecture](references/hybrid-architecture.md) — Graph + vector integration, hybrid search implementation - [Query Patterns](references/query-patterns.md) — Cypher queries, API design, common traversal patterns - [AI Integration](references/ai-integration.md) — KG-RAG, hallucination detection, grounded response generation