--- name: "rag-implementation" description: "RAG Implementation Workflow workflow skill. Use this skill when the user needs RAG (Retrieval-Augmented Generation) implementation workflow covering embedding selection, vector database setup, chunking strategies, retrieval optimization, and practical evaluation of retrieval quality before prompt tuning or handoff." version: "0.0.1" category: "ai-agents" tags: - "rag-implementation" - "rag" - "retrieval-augmented" - "generation" - "implementation" - "embedding" - "chunking" - "retrieval" - "omni-enhanced" complexity: "advanced" risk: "safe" tools: - "codex-cli" - "claude-code" - "cursor" - "gemini-cli" - "opencode" source: "omni-team" author: "Omni Skills Team" date_added: "2026-04-15" date_updated: "2026-04-19" source_type: "omni-curated" maintainer: "Omni Skills Team" family_id: "rag-implementation" family_name: "RAG Implementation Workflow" variant_id: "omni" variant_label: "Omni Curated" is_default_variant: true derived_from: "skills/rag-implementation" upstream_skill: "skills/rag-implementation" upstream_author: "sickn33" upstream_source: "community" upstream_pr: "79" upstream_head_repo: "diegosouzapw/awesome-omni-skills" upstream_head_sha: "6bf093920a93e68fa8263cf6ee767d7407989d56" curation_surface: "skills_omni" enhanced_origin: "omni-skills-private" source_repo: "diegosouzapw/awesome-omni-skills" replaces: - "rag-implementation" --- # RAG Implementation Workflow ## Overview This skill curates the upstream `rag-implementation` workflow into an execution-focused guide for designing, reviewing, and troubleshooting Retrieval-Augmented Generation systems. Use it when the task is not just “add a vector database,” but to make concrete decisions about: - whether RAG is the right solution at all - whether managed file search is sufficient or custom indexing is required - how to chunk, enrich, and index documents safely - how to choose dense, lexical, hybrid, and filtered retrieval patterns - how to evaluate retrieval quality separately from generation quality - how to debug failures such as missed passages, stale citations, duplicate chunks, and weak grounding Preserve the upstream intent: this remains a practical implementation workflow covering embedding selection, vector storage, chunking, and retrieval optimization. The enhancement adds stronger activation boundaries, clearer quality gates, and more operational troubleshooting. ## When to Use This Skill Use this skill when one or more of these are true: - The system must answer from external documents rather than model memory. - The source corpus changes often enough that prompt-only approaches become stale. - The user requires citations, provenance, or document-grounded answers. - The corpus is large enough that manual prompt stuffing is not realistic. - Retrieval quality, filtering, freshness, or tenant isolation are part of the implementation scope. - The team needs to compare managed retrieval with a custom vector pipeline. Do **not** default to this skill when: - The knowledge is small, static, and can fit directly in a prompt. - The task is mainly style adaptation or behavior shaping rather than external knowledge access. - There is no searchable corpus yet. - Latency or complexity budgets do not allow retrieval, indexing, and evaluation overhead. - A deterministic search/index system already exists and only answer formatting is needed. ### Fast routing | If the situation is... | Prefer... | | --- | --- | | Small static instructions, no corpus, no citations needed | Prompt-only solution | | File-grounded Q&A with limited customization needs | Managed file search | | Tenant-aware retrieval, custom ranking, custom ingestion, or compliance constraints | Custom RAG pipeline | | Knowledge is stable but behavior needs adaptation | Fine-tuning or task-specific prompting | ## Operating Table | Decision area | What to inspect | Practical guidance | | --- | --- | --- | | RAG vs alternatives | Corpus size, freshness, citation need, latency budget | If freshness and provenance matter, RAG is usually justified. If not, first test prompt-only or managed retrieval. | | Managed retrieval vs custom pipeline | ACL needs, custom chunking, reranking, observability, compliance | Use managed retrieval for speed. Choose custom indexing when you need tenant isolation, custom metadata filters, custom ranking, or strict ingestion control. | | Chunking strategy | Document type, section boundaries, tables, code blocks, FAQs, policy text | Preserve semantic units first. Avoid one fixed chunk size for every corpus. Store structural metadata with every chunk. | | Retrieval mode | Query type, corpus language, identifiers, jargon, versioned content | Dense retrieval is not always enough. Prefer hybrid or filtered retrieval for SKU-like terms, versions, legal text, or keyword-heavy corpora. | | Embedding/index choice | Corpus scale, latency, operational maturity, filter complexity | Choose based on workload tradeoffs, not fashion. Index choice affects recall, cost, reindex behavior, and debugging. | | Evaluation | Gold queries, expected passages, citation correctness, abstention behavior | Evaluate retrieval first, then generation. A fluent answer does not prove the right chunks were retrieved. | | Safety | Provenance, ACL metadata, prompt injection exposure, stale content | Treat retrieval as a trust boundary. Restrict scope, keep citations, and abstain when evidence is weak or conflicting. | For compact decision support, use: - `references/rag-decision-matrix.md` - `examples/rag-evaluation-example.md` ## Workflow ### Phase 1: Requirements and activation check **Inputs** - user task - target corpus or planned corpus - answer quality expectations - latency, cost, and compliance constraints **Actions** - Confirm whether RAG is actually needed. - Identify whether freshness, provenance, or grounded answers are required. - Define what the system must return: answer only, answer plus citations, or structured diagnostics. - Decide whether managed file search can satisfy the use case before proposing a custom stack. **Outputs** - a clear activation decision - initial architecture direction: prompt-only, managed retrieval, or custom RAG - explicit success criteria **Exit criteria** - The team can state why RAG is required. - The team knows what counts as a correct answer and what evidence must accompany it. ### Phase 2: Corpus preparation **Inputs** - source documents - update cadence - ownership and access rules **Actions** - inventory source types: policies, manuals, tickets, code, FAQs, tables, transcripts, product docs - remove duplicate or superseded content where possible - normalize encoding and extraction quality - assign metadata needed later for filtering and auditability **Recommended metadata per chunk** - source URI or document id - title or section label - version or effective date - ingestion timestamp - language - tenant, team, or ACL scope where applicable - document type - parent section lineage **Outputs** - clean corpus ready for chunking and indexing - metadata schema for retrieval and security controls **Exit criteria** - Operators can trace any chunk back to its source. - Access-control and freshness metadata exist before indexing. ### Phase 3: Chunking and enrichment **Inputs** - normalized documents - document-type inventory **Actions** - chunk by semantic boundaries first: headings, sections, paragraphs, FAQ items, code units, table neighborhoods - use overlap only where it preserves context across boundaries - keep surrounding structural cues that help retrieval and citation - test different chunking strategies on real questions instead of adopting universal token defaults **Document-specific heuristics** - **Policies / legal / compliance text:** preserve clause and section boundaries; attach effective date and policy id. - **Technical docs:** keep headings, version tags, API names, and nearby examples together. - **Code or config docs:** chunk by function, class, command, or config block; avoid splitting syntax from explanation. - **FAQs / support articles:** one question-answer pair per chunk is often clearer than broad paragraph chunks. - **Tables:** keep the caption, header row meaning, and nearby explanatory text with extracted values where possible. **Outputs** - chunked corpus with structural metadata **Exit criteria** - A reviewer can inspect a chunk and still understand what document section it came from. - Chunk boundaries do not destroy the meaning needed for retrieval. ### Phase 4: Embedding and index design **Inputs** - chunked corpus - query patterns - scale and latency targets **Actions** - choose embedding approach appropriate to corpus and query language - choose storage/index approach based on workload, not vendor preference - decide whether metadata filtering, hybrid retrieval, reranking, or database-native indexing are required - document re-embedding and reindex triggers before launch **Common decision factors** - corpus size and growth rate - latency target - lexical search importance for identifiers and exact terms - need for metadata filters and ACL enforcement - operational tolerance for running a separate search service - observability and debugging needs **Reindex or re-embed when** - the chunk schema changes - key metadata fields are added or corrected - a major document refresh lands - the embedding model changes - retrieval quality regresses on a stable test set **Outputs** - documented embedding and index plan **Exit criteria** - The team can explain why this storage/index path fits the workload. - Reindex triggers are known in advance, not discovered during incidents. ### Phase 5: Retrieval design **Inputs** - indexed corpus - representative user queries **Actions** - start with the simplest retrieval path that matches the corpus - test dense retrieval, lexical retrieval, or hybrid retrieval against real queries - apply metadata filters for tenant, freshness, product/version, language, or document scope - tune top-k only after inspecting what is being returned - consider reranking when initial recall is acceptable but final ordering is weak **Rules of thumb** - Use **dense retrieval** for semantic similarity and paraphrased questions. - Use **lexical signals** when exact identifiers, product names, versions, or policy labels matter. - Use **hybrid retrieval** when either semantic-only or keyword-only search misses relevant evidence. - Use **metadata filters** as both quality controls and security controls. **Outputs** - retrieval policy for query handling - logging requirements for top results, scores, filters, and citations **Exit criteria** - The team can inspect retrieved chunks and explain why they were selected. - The system can narrow retrieval scope safely using metadata. ### Phase 6: Grounded answer generation **Inputs** - retrieved chunks - answer policy **Actions** - instruct the model to answer from retrieved evidence when grounding is required - require citations or source references when the use case depends on provenance - define abstention behavior when retrieval is weak, missing, or contradictory - prefer structured outputs for diagnostics, eval runs, or review workflows **Minimum answer policy** - answer only from retrieved evidence when the task requires grounding - cite the supporting source or section when feasible - state uncertainty or ask a follow-up when the evidence is insufficient - do not silently fill missing facts from general model knowledge in a supposedly grounded workflow **Outputs** - grounded answer contract - operator-visible diagnostic format if needed **Exit criteria** - The answer behavior makes grounding failures visible rather than hiding them behind fluent prose. ### Phase 7: Offline evaluation **Inputs** - gold query set - expected documents or passages - generated answers with citations **Actions** - evaluate retrieval quality separately from answer quality - record whether the right document or passage appeared in the retrieved set - verify citation correctness and unsupported claims - group failures into buckets before changing prompts or models **Useful evaluation dimensions** - retrieval hit rate or recall proxy on expected documents/passages - citation correctness - groundedness or unsupported-claim rate - answer usefulness to the user task - abstention quality when evidence is weak **Outputs** - failure buckets tied to retrieval, chunking, metadata, ranking, or generation behavior **Exit criteria** - The team knows whether the main problem is ingestion/retrieval or answer generation. - Prompt tuning is not used to hide indexing defects. See `examples/rag-evaluation-example.md` for a worked mini-evaluation. ### Phase 8: Production monitoring and maintenance **Inputs** - live queries - retrieval logs - corpus refresh events **Actions** - monitor retrieval misses, stale citations, empty-result rates, and filter behavior - audit tenant or ACL scoping regularly - track corpus drift and reindex triggers - review examples where users report “hallucination” to confirm whether the real issue was retrieval failure **Outputs** - maintenance plan for refresh, reindex, and incident review **Exit criteria** - The team can detect degradation caused by corpus changes, not just model changes. ## Troubleshooting ### 1. Relevant document exists but is not retrieved **Likely causes** - chunk boundaries split the answer from its heading or context - semantic-only retrieval misses exact identifiers - metadata filters are too narrow - stale index or incomplete ingestion **Checks** - inspect top-k retrieved chunks and their metadata - test the same query with and without filters - test dense-only versus hybrid retrieval - verify the document was actually indexed in the expected version **Fixes** - rework chunking to preserve semantic units - add lexical or hybrid retrieval - correct filters or metadata - reindex the missing or updated content ### 2. Answer cites the wrong section or wrong version **Likely causes** - overlapping chunks produce near-duplicate candidates - superseded content remains searchable - ranking favors semantically similar but outdated text **Checks** - compare cited chunk metadata with effective date and version - inspect for duplicate or superseded documents in the index - review whether freshness metadata exists and is used **Fixes** - deduplicate or retire old content - filter by version/effective date where appropriate - store stronger provenance metadata and expose it in the answer ### 3. Answers are repetitive or contain duplicated evidence **Likely causes** - overlap is too high - many near-identical chunks from the same source dominate retrieval - top-k is too large for the query **Checks** - inspect neighboring retrieved chunks for near-duplicates - compare answer quality at lower top-k values **Fixes** - reduce unnecessary overlap - deduplicate chunk candidates before generation - tune top-k based on query class, not a universal default ### 4. Dense retrieval fails on SKUs, codes, policy IDs, or version numbers **Likely causes** - lexical specificity matters more than semantic similarity - identifiers were normalized or lost during ingestion **Checks** - run exact-term tests against representative identifier queries - confirm identifiers remain present in chunk text and metadata **Fixes** - add lexical or hybrid retrieval - preserve identifiers in chunk text and metadata - add filters for product, version, or document type ### 5. Users report hallucinations, but the real issue is bad retrieval **Likely causes** - answer generation is blamed before retrieval logs are inspected - weak evidence is still passed to the model as if it were sufficient **Checks** - review retrieved chunks before reviewing prompt wording - verify whether cited evidence truly supports the answer - test abstention behavior on weak-retrieval cases **Fixes** - enforce grounded-answer policy - require citations in evaluation runs - adjust retrieval, chunking, or ranking before changing prompts ### 6. Cross-tenant or unauthorized content appears in results **Likely causes** - missing ACL metadata - retrieval performed without required filters - shared index without proper scope enforcement **Checks** - inspect metadata fields on returned chunks - confirm filters are applied server-side where required - review ingestion pipeline for missing tenant or ACL attributes **Fixes** - add mandatory scope metadata to every chunk - require retrieval filters for tenant/access boundaries - reindex content after metadata correction ### 7. Retrieved content is stale or superseded **Likely causes** - corpus refresh does not trigger reindexing - old and new versions coexist without ranking or filtering rules **Checks** - compare source freshness with index freshness - inspect effective dates and ingestion timestamps **Fixes** - define explicit reindex triggers - filter or rank by freshness where the use case requires it - retire superseded content from searchable scope ### 8. Retrieved passages contain prompt injection or untrusted instructions **Likely causes** - the system treats retrieved text as trusted instructions instead of untrusted evidence - unreviewed external content is indexed without policy boundaries **Checks** - inspect whether the prompt clearly separates system policy from retrieved content - review the provenance and trust class of indexed sources **Fixes** - treat retrieved text as data, not instructions - limit indexing of untrusted sources or isolate them by policy - require the model to ground answers in evidence rather than obey document-embedded commands ## Additional Resources - `references/rag-decision-matrix.md` — compact architecture and retrieval decision matrix for implementation planning - `examples/rag-evaluation-example.md` — worked example that separates retrieval defects from generation defects Primary guidance to verify before final implementation decisions: - OpenAI Embeddings guide: `https://platform.openai.com/docs/guides/embeddings` - OpenAI Retrieval guide: `https://platform.openai.com/docs/guides/retrieval` - OpenAI File Search guide: `https://platform.openai.com/docs/guides/tools-file-search` - OpenAI structured output guidance: `https://platform.openai.com/docs/guides/text?api-mode=responses` If exact documentation URLs move, re-check current official docs before merge or handoff. ## Related Skills Use a different or adjacent skill when the task shifts to: - prompt engineering without external retrieval - search relevance tuning for a non-LLM application - evaluation framework design beyond a lightweight gold-set review - agent orchestration where retrieval is one tool among many - data governance or redaction workflows before indexing sensitive corpora