--- name: citations-retrieval description: Document citations and RAG (Retrieval-Augmented Generation) patterns for Claude. Activate for source attribution, document grounding, citation extraction, and contextual retrieval. allowed-tools: - Bash - Read - Write - Edit - Glob - Grep - Task - WebFetch triggers: - citation - citations - rag - retrieval - source - document - grounding - reference - attribution dependencies: - llm-integration related-skills: - vision-multimodal - prompt-caching --- # Citations & Retrieval Skill Implement document-based citations and RAG patterns for grounded, verifiable AI responses. ## When to Use This Skill - Document Q&A with source attribution - RAG (Retrieval-Augmented Generation) systems - Grounding responses in provided documents - Building trustworthy AI applications - Research and analysis with citations ## Core Concepts ### Citation Types | Type | Use Case | Format | |------|----------|--------| | `char_location` | Text documents | Character ranges | | `page_location` | PDFs | Page numbers | | `content_block_location` | Custom content | Block indexes | ## Basic Citations ### Enable Citations ```python import anthropic client = anthropic.Anthropic() response = client.messages.create( model="claude-sonnet-4-20250514", max_tokens=1024, documents=[ { "type": "document", "source": { "type": "text", "media_type": "text/plain", "data": "The company was founded in 2020. Revenue reached $10M in 2023." }, "title": "Company Overview", "citations": {"enabled": True} # Enable citations! } ], messages=[{"role": "user", "content": "When was the company founded and what was the revenue?"}] ) # Extract citations from response for block in response.content: if block.type == "text": for citation in block.citations: print(f"Cited: {citation.document_title}") print(f"Location: chars {citation.start_char_index}-{citation.end_char_index}") ``` ### Custom Content Blocks ```python # Fine-grained control over citation granularity documents = [{ "type": "document", "source": { "type": "content", "content": [ {"type": "text", "text": "Section 1: Introduction..."}, {"type": "text", "text": "Section 2: Methods..."}, {"type": "text", "text": "Section 3: Results..."} ] }, "title": "Research Paper", "citations": {"enabled": True} }] ``` ## RAG Implementation ### Basic RAG Pipeline ```python from sentence_transformers import SentenceTransformer import numpy as np # 1. Embed documents embedder = SentenceTransformer('all-MiniLM-L6-v2') def embed_documents(documents): chunks = [] embeddings = [] for doc in documents: # Chunk the document doc_chunks = chunk_document(doc, chunk_size=512) chunks.extend(doc_chunks) embeddings.extend(embedder.encode(doc_chunks)) return chunks, np.array(embeddings) # 2. Retrieve relevant chunks def retrieve(query, chunks, embeddings, top_k=5): query_embedding = embedder.encode([query])[0] similarities = np.dot(embeddings, query_embedding) top_indices = np.argsort(similarities)[-top_k:][::-1] return [chunks[i] for i in top_indices] # 3. Generate with retrieved context def rag_query(query, chunks, embeddings): relevant_chunks = retrieve(query, chunks, embeddings) response = client.messages.create( model="claude-sonnet-4-20250514", max_tokens=1024, documents=[{ "type": "document", "source": {"type": "text", "media_type": "text/plain", "data": chunk}, "title": f"Source {i+1}", "citations": {"enabled": True} } for i, chunk in enumerate(relevant_chunks)], messages=[{"role": "user", "content": query}] ) return response ``` ### Contextual Retrieval (49-67% Better) ```python # Add context to each chunk before embedding def add_chunk_context(chunk, full_document): """Prepend context to improve retrieval accuracy by 49-67%""" context_prompt = f""" {full_document} Please provide a short, succinct context for this chunk that will help with retrieval: {chunk} Context:""" response = client.messages.create( model="claude-haiku-4-20250514", # Fast, cheap max_tokens=100, messages=[{"role": "user", "content": context_prompt}] ) context = response.content[0].text return f"{context}\n\n{chunk}" # Apply to all chunks contextual_chunks = [add_chunk_context(chunk, full_doc) for chunk in chunks] ``` ## Citation Formatting ### Format as Numbered References ```python def format_with_citations(response): """Format response with numbered inline citations""" text = "" citations = [] citation_map = {} for block in response.content: if block.type == "text": current_text = block.text for citation in block.citations: key = (citation.document_title, citation.start_char_index) if key not in citation_map: citation_map[key] = len(citations) + 1 citations.append(citation) # Insert citation number ref_num = citation_map[key] current_text += f" [{ref_num}]" text += current_text # Add references section text += "\n\n## References\n" for i, citation in enumerate(citations, 1): text += f"[{i}] {citation.document_title}\n" return text ``` ### Academic Citation Formats ```python def format_apa(author, year, title, source): """APA format: Author (Year). Title. Source.""" return f"{author} ({year}). {title}. {source}." def format_mla(author, title, source, year): """MLA format: Author. "Title." Source, Year.""" return f'{author}. "{title}." {source}, {year}.' def format_chicago(author, title, source, year): """Chicago format: Author. Title. Source, Year.""" return f"{author}. {title}. {source}, {year}." ``` ## Multi-Document Q&A ```python def multi_doc_qa(question, documents): """Answer questions across multiple documents with citations""" doc_inputs = [] for i, doc in enumerate(documents): doc_inputs.append({ "type": "document", "source": { "type": "text", "media_type": "text/plain", "data": doc["content"] }, "title": doc.get("title", f"Document {i+1}"), "citations": {"enabled": True} }) response = client.messages.create( model="claude-sonnet-4-20250514", max_tokens=2048, documents=doc_inputs, messages=[{ "role": "user", "content": f"Answer this question based on the provided documents. Cite your sources.\n\nQuestion: {question}" }] ) return response ``` ## Prompt Caching for RAG ```python # Cache static documents for repeated queries response = client.messages.create( model="claude-sonnet-4-20250514", max_tokens=1024, documents=[{ "type": "document", "source": {"type": "text", "media_type": "text/plain", "data": large_document}, "title": "Knowledge Base", "citations": {"enabled": True}, "cache_control": {"type": "ephemeral"} # Cache this document! }], messages=[{"role": "user", "content": query}] ) ``` ## Error Handling and Validation ### Validate Citation Integrity ```python def validate_citations(response, documents): """Ensure all citations reference provided documents""" cited_titles = set() for block in response.content: if block.type == "text": for citation in block.citations: cited_titles.add(citation.document_title) provided_titles = {doc.get("title") for doc in documents} # Check for invalid citations invalid = cited_titles - provided_titles if invalid: raise ValueError(f"Citations reference unknown documents: {invalid}") return True def extract_citation_spans(response): """Extract text spans for each citation""" citation_data = [] for block in response.content: if block.type == "text": text = block.text for citation in block.citations: span = text[citation.start_char_index:citation.end_char_index] citation_data.append({ "text": span, "document": citation.document_title, "start": citation.start_char_index, "end": citation.end_char_index }) return citation_data ``` ## Best Practices ### DO: - Enable citations for all document-based queries - Use contextual retrieval for better accuracy (+49-67%) - Cache static documents with cache_control - Provide clear document titles for attribution - Chunk documents appropriately (512-1024 tokens) - Validate citation integrity before using responses - Format citations consistently (APA, MLA, Chicago) - Test citation extraction in production systems ### DON'T: - Rely on citations without enabling them - Use very small chunks (<100 tokens) - Ignore citation verification in production - Skip document preprocessing - Mix citation formats in the same document - Assume all LLM responses are cited by default - Deploy without citation validation tests ## Troubleshooting ### No Citations Returned ```python # Ensure citations are enabled documents = [{ "type": "document", "source": {"type": "text", "media_type": "text/plain", "data": content}, "citations": {"enabled": True} # Must be explicit! }] ``` ### Citations Point to Wrong Text ```python # Verify character indexes match actual text text = block.text cited_text = text[citation.start_char_index:citation.end_char_index] print(f"Cited text: {cited_text}") print(f"Expected: {expected_text}") ``` ### Large Document Performance ```python # Use chunking for large documents def chunk_with_overlap(text, chunk_size=1024, overlap=256): chunks = [] for i in range(0, len(text), chunk_size - overlap): chunks.append(text[i:i + chunk_size]) return chunks # Pass chunks individually for better retrieval large_chunks = chunk_with_overlap(large_text) ``` ## Integration Example ```python #!/usr/bin/env python3 """Complete RAG + Citations example""" import anthropic from sentence_transformers import SentenceTransformer import numpy as np def create_rag_system(): """Initialize RAG system with citations""" client = anthropic.Anthropic() embedder = SentenceTransformer('all-MiniLM-L6-v2') # Sample documents documents = [ { "title": "Python Guide", "content": "Python 3.11 introduced exception groups..." }, { "title": "Web Standards", "content": "HTTP/2 introduced multiplexing capabilities..." } ] # Embed documents chunks = [] embeddings = [] for doc in documents: # Add document title as context chunk = f"[{doc['title']}]\n{doc['content']}" chunks.append(chunk) embeddings.append(embedder.encode(chunk)) embeddings = np.array(embeddings) # Query function def query(question): # Retrieve relevant chunks query_emb = embedder.encode(question) similarities = np.dot(embeddings, query_emb) top_idx = np.argmax(similarities) relevant_chunk = chunks[top_idx] # Get cited answer response = client.messages.create( model="claude-sonnet-4-20250514", max_tokens=1024, documents=[{ "type": "document", "source": { "type": "text", "media_type": "text/plain", "data": relevant_chunk }, "title": documents[top_idx]["title"], "citations": {"enabled": True} }], messages=[{ "role": "user", "content": question }] ) return response return query if __name__ == "__main__": query_fn = create_rag_system() response = query_fn("What is Python 3.11?") # Display with citations for block in response.content: if block.type == "text": print(f"Answer: {block.text}") for citation in block.citations: print(f" - Cited from: {citation.document_title}") ``` ## Performance Tips - **Batch queries** for throughput (10-20 concurrent requests) - **Cache frequent documents** with prompt caching - **Use Haiku for context generation** (faster, cheaper) - **Chunk strategically** (sentence/paragraph boundaries) - **Monitor token usage** for citation overhead (~5-10%) ## Limitations - Citations only from provided documents - Character index citations require exact text matching - PDF support requires structured parsing - Citation extraction costs tokens (~5-10% overhead) - Batch operations not supported for cited responses ## See Also - [[llm-integration]] - API basics and authentication - [[prompt-caching]] - Cache documents for cost savings - [[vision-multimodal]] - PDF and image processing - [[complex-reasoning]] - Extended thinking with citations