# Troubleshooting Guide Quick reference for common issues and solutions. ## Recommended Workflow: Better Search Results For the best search quality, use manual metadata keys. This gives you control over what metadata is extracted and ensures consistent filtering across your documents. **Steps:** 1. **Initial upload**: Process a representative sample of documents with default settings (Haiku extraction, auto mode) 2. **Review extracted keys**: Go to Settings → Metadata Key Statistics to see what keys were discovered 3. **Select manual keys**: Switch to "Manual" extraction mode and select only the keys relevant to your use case (e.g., `surnames`, `topic`, `year`, `location`) 4. **Run full reindex**: Click "Reindex Knowledge Base" — this re-extracts metadata using only your selected keys and rebuilds the KB with consistent metadata 5. **Create filter examples**: After reindex, run "Analyze Metadata" to generate few-shot examples for query-time filter generation **Why this works:** Auto mode creates inconsistent metadata across documents. Manual mode enforces a consistent schema for better filtering. --- ## Best Practices Tips from production usage to help you get the most out of RAGStack. ### Model Selection | Use Case | Recommended | Also Good | Avoid | Why | |----------|-------------|-----------|-------|-----| | Chat primary | Claude Haiku 4.5 / Sonnet 4.6 | Nova 2 Lite, Nova Pro | Nova Micro | Quality and multimodal support matter for chat | | Chat fallback | Nova Lite | Nova 2 Lite | Nova Micro | Good balance of cost and quality for fallback | | Metadata extraction | Claude Haiku 4.5 | Nova 2 Lite (with manual keys) | Nova Lite | Nova Lite hallucinates fields; Nova 2 Lite has better reasoning | | Filter generation | Claude Haiku 4.5 | Nova 2 Lite | Nova Lite | Needs accurate query intent understanding | | OCR (Bedrock backend) | Claude Haiku 4.5 | Nova 2 Lite, Nova Pro | - | Vision-capable models required | ### Document Processing - **Large PDFs (100+ pages)**: Processing is automatic via batch queue. Monitor status in dashboard. - **Image-heavy documents**: Consider switching `ocr_backend` to `bedrock` for better accuracy. - **Mixed content**: RAGStack handles text, images, and media differently - each optimized for its type. ### Query Quality - **Multi-slice retrieval**: Keep enabled (default) - runs filtered and unfiltered queries in parallel for better recall. - **Metadata filtering**: Works best when documents have consistent metadata. Use manual extraction mode if you need specific keys. ### Filter Generation - **Filters require initialization**: Filter examples aren't created until you run "Analyze Metadata" in the Settings tab at least once. Without this, query-time filter generation won't have few-shot examples. - **Refreshing filter examples**: Filter examples are few-shot prompts that guide the model. If your queries aren't generating good filters, disable the problematic examples and run "Analyze Metadata" again - disabled filters will be replaced with new ones based on current active keys. - **Active keys**: A metadata key is "active" if it has an occurrence count > 0 (i.e., at least one document has that key). Only active keys are used for filter generation. After a full reindex, check that your expected keys are active. ### Cost Optimization Once you've set up manual keys (see "Recommended Workflow" above), you can reduce costs: 1. **Downgrade extraction model**: Switch to Nova Lite for metadata extraction — manual mode constrains its output so quality remains good 2. **Keep Haiku for filters**: Leave filter generation on Haiku — it needs better reasoning to translate queries into filters accurately This uses Haiku's quality for discovery and filter generation, while using Nova Lite's lower cost for bulk extraction. ### Filtered Results Ranking - **Increase boost ceiling:** Raise `multislice_filtered_boost` (1.3-1.5) if filtered results still buried by visual similarity - **Disable boost:** Set to 1.0 if filtered results dominate too aggressively - **Default is balanced:** 1.25 ceiling works well for most use cases --- ## Bedrock Model Access Issues Anthropic and some other third-party models on Bedrock are offered through AWS Marketplace and require a **one-time agreement acceptance** per model. This is a **runtime** issue — stack creation itself does not invoke third-party chat, OCR, or filter models, so missing agreements won't cause deployment failures. The errors surface when Lambdas first try to call a model that hasn't been accepted yet. | Problem | Cause | Solution | |---------|-------|----------| | `Model access is denied due to IAM user or service role is not authorized to perform the required AWS Marketplace actions` | The Marketplace agreement for this model hasn't been accepted yet. New models (e.g., Claude Sonnet 4.6) need a one-time acceptance even though the old model catalog is retired. | An admin with `aws-marketplace:Subscribe` permissions must accept the agreement via CLI or Bedrock console playground. After that, all Lambda roles can invoke the model with just `ViewSubscriptions`. | | Error after upgrading models in config | New model added but Marketplace agreement not accepted | Check Lambda logs to identify which model (`aws logs filter-log-events --log-group-name /aws/lambda/-query --filter-pattern "ERROR"`). Then accept the agreement — see diagnostic below. | | Error only on chat, not other features | `chat_primary_model` points to a model without accepted agreement | Either accept the agreement for the model, or change `chat_primary_model` in Settings to a model that already works. | **Quick diagnostic — accept agreement and test access:** ```bash # Test from CLI (your IAM user likely has marketplace permissions). # If this succeeds, the agreement is accepted for the account and Lambda roles will work too. echo '{"anthropic_version":"bedrock-2023-05-31","max_tokens":10,"messages":[{"role":"user","content":"hi"}]}' \ | aws bedrock-runtime invoke-model \ --model-id "us.anthropic.claude-sonnet-4-6" \ --region us-east-1 \ --body fileb:///dev/stdin \ --content-type "application/json" /dev/null # If this also fails with AccessDeniedException, your IAM user needs: # aws-marketplace:ViewSubscriptions # aws-marketplace:Subscribe # Or use the Bedrock console playground to invoke the model (triggers agreement acceptance). ``` **Note:** Runtime Lambda roles have `ViewSubscriptions` only. The `Subscribe` action is restricted to processing/admin roles (document pipeline Lambdas). For public-facing endpoints (chat, search), accepting agreements is an admin pre-deployment step, not a runtime auto-accept. --- ## Deployment Issues | Problem | Cause | Solution | |---------|-------|----------| | Stack creation fails with `ROLLBACK_COMPLETE` | IAM, S3, or CloudFormation permission issue | Check CloudFormation events for the specific failing resource. Stack creation does not invoke Bedrock models, so this is not a model agreement issue. | | `Invalid email address` error | Bad email format | Use valid email: `python publish.py --admin-email valid@example.com` | | `User is not authorized` | Insufficient IAM permissions | Need: `iam:*`, `cloudformation:*`, `lambda:*`, `s3:*` | | S3 bucket name exists | Bucket name collision | Change stack name: `python publish.py --stack-name ` | | `sam build` fails | Python version mismatch | Check: `python3.13 --version`. Install Python 3.13+ if needed. | | Docker connection error | Docker not running | Start Docker: macOS (open Docker Desktop), Linux (`sudo systemctl start docker`) | | SAM build timeout | Network or resource issue | `sam build --use-container` | ## Document Processing Issues | Problem | Symptoms | Solution | |---------|----------|----------| | Documents stuck in UPLOADED | Not processing | Verify EventBridge rule: `aws events list-rules --name-prefix RAGStack-`. Check Lambda logs: `aws logs tail /aws/lambda/RAGStack--ProcessDocument --follow` | | Documents stuck in PROCESSING | Still processing after hours | Lambda timeout (15 min limit). Split document or increase memory. Check Textract concurrency quota. | | Documents fail with ERROR | Error in dashboard | Check Lambda logs: `aws logs tail /aws/lambda/RAGStack--ProcessDocument --follow`. Image-heavy PDFs may need Bedrock OCR (set `ocr_backend` to `bedrock`). | | Slow processing | Takes >30 minutes | Text-native PDFs should be faster (~2-5 min). Image-heavy docs slower. Check CloudWatch for bottlenecks. | ## Media Processing Issues (Video/Audio) | Problem | Cause | Solution | |---------|-------|----------| | Media stuck in PROCESSING | Transcribe job still running | Check Transcribe console for job status. Large files (>1hr) take longer. Wait up to 30 minutes for long media. | | No transcript generated | Unsupported format or no audio | Verify file has audio track. MOV files not supported - convert to MP4. Check file isn't corrupted. | | Wrong language detected | Incorrect language setting | Set `transcribe_language_code` in Settings to match audio language. Default is `en-US`. | | Missing speaker labels | Diarization disabled | Enable `speaker_diarization_enabled` in Settings. Only works with supported languages. | | Timestamp links not working | Browser doesn't support media fragments | Try Chrome/Firefox (best support). Safari has limited `#t=` fragment support. Check presigned URL hasn't expired. | | Media player won't load | CORS or expired URL | Presigned URLs expire after 1 hour. Refresh chat to get new URLs. Check browser console for CORS errors. | | Transcribe access denied | Missing IAM permissions | Verify Lambda has `transcribe:StartTranscriptionJob` and `transcribe:GetTranscriptionJob` permissions. | **Debugging media processing:** ```bash # Check Transcribe job status aws transcribe list-transcription-jobs --status IN_PROGRESS # View ProcessMedia Lambda logs aws logs tail /aws/lambda/RAGStack--ProcessMedia --follow # Check document status in DynamoDB aws dynamodb get-item --table-name RAGStack--Documents \ --key '{"document_id": {"S": ""}}' ``` ## Knowledge Base Issues | Problem | Cause | Solution | |---------|-------|----------| | Chat returns no results | KB not created/synced | Verify KB: `aws bedrock-agent list-knowledge-bases --query "knowledgeBaseSummaries[?contains(name,'')]"`. Check sync: `aws bedrock-agent list-ingestion-jobs --knowledge-base-id --data-source-id `. Verify documents show INDEXED in DynamoDB. | | Chat results irrelevant | Query too vague | Try rephrasing query (be more specific). Ensure documents are fully processed. | | "Knowledge Base not found" error | KB ID incorrect or missing | Check SAM outputs for Knowledge Base ID. Set in environment variables. Verify KB in Bedrock console. | ## UI Issues | Problem | Cause | Solution | |---------|-------|----------| | UI not loading | CloudFront cache stale | Invalidate: `aws cloudfront create-invalidation --distribution-id --paths "/*"` | | Blank page after login | Cognito not configured | Check `.env.local` has correct Cognito IDs from SAM outputs. | | Upload fails | S3 permissions or bucket missing | Verify input bucket exists. Check Cognito user has S3 put permissions. | | API errors in console | GraphQL endpoint wrong | Check `VITE_GRAPHQL_URL` in `.env.local` matches SAM outputs. | | Dark mode not working | System preference not detected | Set dark mode in OS settings. Test in browser DevTools. | ## Authentication Issues | Problem | Cause | Solution | |---------|-------|----------| | Login fails | Wrong credentials | Check temporary password email. Use correct format (email not username). | | "User does not exist" | Account not created | Sign up first, verify email. Check correct user pool selected. | | MFA errors | MFA configured but not set up | Admin sets up MFA in Cognito console or disable if not needed. | | Session expires quickly | Token refresh issue | Clear browser cache. Check system clock (must be synchronized). Ensure HTTPS. | ## Performance Issues | Problem | Cause | Solution | |---------|-------|----------| | Lambda timeout (15 min limit) | Document too large or slow OCR | Use Textract (faster than Bedrock). Split large documents. Increase Lambda memory. | | High costs | Bedrock tokens expensive | Use Textract OCR instead of Bedrock. Text-native PDFs skip OCR entirely. | | Slow embeddings generation | Rate limiting or large batch | Reduce batch size. Add delay between batches. Check Bedrock quota in Service Quotas. | | DynamoDB throttling | High write rate | Change to on-demand billing mode. Increase provisioned capacity. | ## Chat Performance | Problem | Cause | Solution | |---------|-------|----------| | First chat response slow (500ms-2s) | Lambda cold start | **Expected behavior** for serverless. Subsequent requests ~200-500ms. | | Quota limits not enforced immediately | Race condition on high concurrency | Atomic quota checking prevents most races. Some overflow (<1%) possible under extreme load. | | Chat responses timeout | Bedrock query taking too long | Check Knowledge Base has indexed documents. Verify network connectivity to Bedrock. | | Config changes not applied | Config cached | Wait for cache refresh or redeploy to force. | ## Runtime Configuration Issues | Problem | Cause | Solution | |---------|-------|----------| | Config values not updating | Cached in Lambda | Config cached 60s (Amplify chat). Wait or force cold start. Check DynamoDB table has correct entries. | | "Configuration table not found" | Table name wrong | Verify `CONFIGURATION_TABLE_NAME` environment variable. Check table exists in DynamoDB. | | Invalid config value | Schema validation failed | Check format matches schema (docs/CONFIGURATION.md). Validate regex patterns. | ## Testing Issues | Problem | Cause | Solution | |---------|-------|----------| | Unit tests fail with imports | Library not installed | `npm install` in project root. `npm run test:backend` to verify. | | Integration tests fail | Stack not deployed or missing env vars | Export `STACK_NAME`, `DATA_BUCKET`, `TRACKING_TABLE`. Verify stack exists. | | Sample documents missing | Not generated | `cd tests/sample-documents && python3 generate_samples.py` | ## Debugging Tips **View Lambda Logs** ```bash # Stream live logs aws logs tail /aws/lambda/RAGStack-- --follow # View specific execution aws logs get-log-events --log-group-name /aws/lambda/RAGStack- \ --log-stream-name ``` **Check Step Functions Execution** ```bash # List executions aws stepfunctions list-executions --state-machine-arn # View execution details aws stepfunctions describe-execution --execution-arn # Get full history aws stepfunctions get-execution-history --execution-arn ``` **Check DynamoDB Data** ```bash # View document status aws dynamodb scan --table-name RAGStack--Documents # Check configuration aws dynamodb get-item --table-name RAGStack--Configuration \ --key '{"Configuration": {"S": "Schema"}}' ``` **Check Bedrock Knowledge Base** ```bash # List knowledge bases aws bedrock-agent list-knowledge-bases # Get KB details aws bedrock-agent get-knowledge-base --knowledge-base-id # Check data source sync status aws bedrock-agent list-data-sources --knowledge-base-id ```