--- name: bedrock-inference description: Amazon Bedrock Runtime API for model inference including Claude, Nova, Titan, and third-party models. Covers invoke-model, converse API, streaming responses, token counting, async invocation, and guardrails. Use when invoking foundation models, building conversational AI, streaming model responses, optimizing token usage, or implementing runtime guardrails. allowed-tools: Task, Read, Write, Edit, Glob, Grep, Bash --- # Amazon Bedrock Inference ## Overview Amazon Bedrock Runtime provides APIs for invoking foundation models including Claude (Opus, Sonnet, Haiku), Nova (Amazon), Titan (Amazon), and third-party models (Cohere, AI21, Meta). Supports both synchronous and asynchronous inference with streaming capabilities. **Purpose**: Production-grade model inference with unified API across all Bedrock models **Pattern**: Task-based (independent operations for different inference modes) **Key Capabilities**: 1. **Model Invocation** - Direct model calls with native or Converse API 2. **Streaming** - Real-time token streaming for low latency 3. **Async Invocation** - Long-running tasks up to 24 hours 4. **Token Counting** - Cost estimation before inference 5. **Guardrails** - Runtime content filtering and safety 6. **Inference Profiles** - Cross-region routing and cost optimization **Quality Targets**: - Latency: < 1s first token for streaming - Throughput: Up to 4,000 tokens/sec - Availability: 99.9% SLA with cross-region profiles --- ## When to Use Use bedrock-inference when: - Invoking Claude, Nova, Titan, or other Bedrock models - Building conversational AI applications - Implementing streaming responses for better UX - Running long-running async inference tasks - Applying runtime guardrails for content safety - Optimizing costs with inference profiles - Counting tokens before model invocation - Implementing multi-turn conversations **When NOT to Use**: - Building complex agents (use bedrock-agentcore) - Knowledge base RAG (use bedrock-knowledge-bases) - Model customization (use bedrock-fine-tuning) --- ## Prerequisites ### Required - AWS account with Bedrock access - Model access enabled in AWS Console - IAM permissions for Bedrock Runtime ### Recommended - `boto3 >= 1.34.0` (for latest Converse API) - Understanding of model-specific input formats - CloudWatch for monitoring ### Installation ```bash pip install boto3 botocore ``` ### Enable Model Access ```bash # Check available models aws bedrock list-foundation-models --region us-east-1 # Request model access via Console: # AWS Console → Bedrock → Model access → Manage model access ``` --- ## Model IDs and Inference Profiles ### Claude Models (Anthropic) | Model | Model ID | Inference Profile ID | Region | Max Tokens | |-------|----------|---------------------|--------|------------| | **Claude Opus 4.5** | `anthropic.claude-opus-4-5-20251101-v1:0` | `global.anthropic.claude-opus-4-5-20251101-v1:0` | Global | 200K | | **Claude Sonnet 4.5** | `anthropic.claude-sonnet-4-5-20250929-v1:0` | `us.anthropic.claude-sonnet-4-5-20250929-v1:0` | US | 200K | | **Claude Haiku 4.5** | `anthropic.claude-haiku-4-5-20251001-v1:0` | `us.anthropic.claude-haiku-4-5-20251001-v1:0` | US | 200K | | **Claude Sonnet 3.5 v2** | `anthropic.claude-3-5-sonnet-20241022-v2:0` | `us.anthropic.claude-3-5-sonnet-20241022-v2:0` | US | 200K | | **Claude Haiku 3.5** | `anthropic.claude-3-5-haiku-20241022-v1:0` | `us.anthropic.claude-3-5-haiku-20241022-v1:0` | US | 200K | ### Amazon Nova Models | Model | Model ID | Inference Profile ID | Region | Max Tokens | |-------|----------|---------------------|--------|------------| | **Nova Pro** | `amazon.nova-pro-v1:0` | `us.amazon.nova-pro-v1:0` | US | 300K | | **Nova Lite** | `amazon.nova-lite-v1:0` | `us.amazon.nova-lite-v1:0` | US | 300K | | **Nova Micro** | `amazon.nova-micro-v1:0` | `us.amazon.nova-micro-v1:0` | US | 128K | ### Amazon Titan Models | Model | Model ID | Region | Max Tokens | |-------|----------|--------|------------| | **Titan Text Premier** | `amazon.titan-text-premier-v1:0` | All | 32K | | **Titan Text Express** | `amazon.titan-text-express-v1` | All | 8K | ### Inference Profile Prefixes - `us.` - US-only routing (lower latency for US traffic) - `global.` - Global cross-region routing (highest availability) - `apac.` - Asia-Pacific routing (lower latency for APAC traffic) --- ## Quick Reference ### Client Initialization ```python import boto3 from typing import Optional def get_bedrock_client(region_name: str = 'us-east-1', profile_name: Optional[str] = None): """Initialize Bedrock Runtime client""" session = boto3.Session( region_name=region_name, profile_name=profile_name ) return session.client('bedrock-runtime') # Usage bedrock = get_bedrock_client(region_name='us-west-2') ``` --- ## Operations ### 1. Invoke Model (Native API) Direct model invocation using model-specific request format. **Basic Invocation**: ```python import json def invoke_claude(prompt: str, model_id: str = 'us.anthropic.claude-sonnet-4-5-20250929-v1:0'): """Invoke Claude with native API""" bedrock = get_bedrock_client() # Claude-specific request format request_body = { "anthropic_version": "bedrock-2023-05-31", "max_tokens": 2048, "messages": [ { "role": "user", "content": prompt } ], "temperature": 0.7, "top_p": 0.9 } response = bedrock.invoke_model( modelId=model_id, body=json.dumps(request_body) ) # Parse response response_body = json.loads(response['body'].read()) return response_body['content'][0]['text'] # Usage result = invoke_claude("Explain quantum computing in simple terms") print(result) ``` **With System Prompts**: ```python request_body = { "anthropic_version": "bedrock-2023-05-31", "max_tokens": 2048, "system": "You are a helpful AI assistant specialized in technical documentation.", "messages": [ { "role": "user", "content": "Write API documentation for a REST endpoint" } ] } ``` **With Tool Use**: ```python request_body = { "anthropic_version": "bedrock-2023-05-31", "max_tokens": 4096, "messages": [ { "role": "user", "content": "What's the weather in San Francisco?" } ], "tools": [ { "name": "get_weather", "description": "Get current weather for a location", "input_schema": { "type": "object", "properties": { "location": { "type": "string", "description": "City name" } }, "required": ["location"] } } ] } ``` --- ### 2. Converse API (Unified Interface) Model-agnostic API that works across all Bedrock models with consistent interface. **Basic Conversation**: ```python def converse_with_model( messages: list, model_id: str = 'us.anthropic.claude-sonnet-4-5-20250929-v1:0', system_prompts: Optional[list] = None, max_tokens: int = 2048 ): """Converse API for unified model interaction""" bedrock = get_bedrock_client() inference_config = { 'maxTokens': max_tokens, 'temperature': 0.7, 'topP': 0.9 } request_params = { 'modelId': model_id, 'messages': messages, 'inferenceConfig': inference_config } if system_prompts: request_params['system'] = system_prompts response = bedrock.converse(**request_params) return response # Usage messages = [ { 'role': 'user', 'content': [ {'text': 'What are the benefits of microservices architecture?'} ] } ] system_prompts = [ {'text': 'You are a software architecture expert.'} ] response = converse_with_model(messages, system_prompts=system_prompts) assistant_message = response['output']['message'] print(assistant_message['content'][0]['text']) ``` **Multi-turn Conversation**: ```python def multi_turn_conversation(): """Multi-turn conversation with context""" bedrock = get_bedrock_client() messages = [] model_id = 'us.anthropic.claude-sonnet-4-5-20250929-v1:0' # Turn 1 messages.append({ 'role': 'user', 'content': [{'text': 'My name is Alice and I work in healthcare.'}] }) response = bedrock.converse( modelId=model_id, messages=messages, inferenceConfig={'maxTokens': 1024} ) # Add assistant response to history messages.append(response['output']['message']) # Turn 2 (model remembers context) messages.append({ 'role': 'user', 'content': [{'text': 'What are some AI applications in my field?'}] }) response = bedrock.converse( modelId=model_id, messages=messages, inferenceConfig={'maxTokens': 1024} ) return response['output']['message']['content'][0]['text'] ``` **With Tool Use (Converse API)**: ```python def converse_with_tools(): """Converse API with tool use""" bedrock = get_bedrock_client() tools = [ { 'toolSpec': { 'name': 'get_stock_price', 'description': 'Get current stock price for a symbol', 'inputSchema': { 'json': { 'type': 'object', 'properties': { 'symbol': { 'type': 'string', 'description': 'Stock ticker symbol' } }, 'required': ['symbol'] } } } } ] messages = [ { 'role': 'user', 'content': [{'text': "What's the price of AAPL stock?"}] } ] response = bedrock.converse( modelId='us.anthropic.claude-sonnet-4-5-20250929-v1:0', messages=messages, toolConfig={'tools': tools}, inferenceConfig={'maxTokens': 2048} ) # Check if model wants to use a tool if response['stopReason'] == 'tool_use': tool_use = response['output']['message']['content'][0]['toolUse'] print(f"Tool requested: {tool_use['name']}") print(f"Tool input: {tool_use['input']}") # Execute tool and return result # (Add tool result to messages and call converse again) return response ``` --- ### 3. Stream Response (Real-time Tokens) Stream tokens as they're generated for lower perceived latency. **Streaming with Native API**: ```python def stream_claude_response(prompt: str): """Stream response tokens in real-time""" bedrock = get_bedrock_client() request_body = { "anthropic_version": "bedrock-2023-05-31", "max_tokens": 2048, "messages": [ { "role": "user", "content": prompt } ] } response = bedrock.invoke_model_with_response_stream( modelId='us.anthropic.claude-sonnet-4-5-20250929-v1:0', body=json.dumps(request_body) ) # Process event stream stream = response['body'] full_text = "" for event in stream: chunk = event.get('chunk') if chunk: chunk_obj = json.loads(chunk['bytes'].decode()) if chunk_obj['type'] == 'content_block_delta': delta = chunk_obj['delta'] if delta['type'] == 'text_delta': text = delta['text'] print(text, end='', flush=True) full_text += text elif chunk_obj['type'] == 'message_stop': print() # New line at end return full_text # Usage response = stream_claude_response("Write a short story about a robot") ``` **Streaming with Converse API**: ```python def stream_converse(messages: list, model_id: str): """Stream response using Converse API""" bedrock = get_bedrock_client() response = bedrock.converse_stream( modelId=model_id, messages=messages, inferenceConfig={'maxTokens': 2048} ) stream = response['stream'] full_text = "" for event in stream: if 'contentBlockDelta' in event: delta = event['contentBlockDelta']['delta'] if 'text' in delta: text = delta['text'] print(text, end='', flush=True) full_text += text elif 'messageStop' in event: print() break return full_text # Usage messages = [{'role': 'user', 'content': [{'text': 'Explain neural networks'}]}] stream_converse(messages, 'us.anthropic.claude-sonnet-4-5-20250929-v1:0') ``` **Streaming with Error Handling**: ```python def safe_streaming(prompt: str): """Streaming with comprehensive error handling""" bedrock = get_bedrock_client() request_body = { "anthropic_version": "bedrock-2023-05-31", "max_tokens": 2048, "messages": [{"role": "user", "content": prompt}] } try: response = bedrock.invoke_model_with_response_stream( modelId='us.anthropic.claude-sonnet-4-5-20250929-v1:0', body=json.dumps(request_body) ) full_text = "" for event in response['body']: chunk = event.get('chunk') if chunk: chunk_obj = json.loads(chunk['bytes'].decode()) if chunk_obj['type'] == 'content_block_delta': text = chunk_obj['delta'].get('text', '') print(text, end='', flush=True) full_text += text elif chunk_obj['type'] == 'error': print(f"\nStreaming error: {chunk_obj['error']}") break return full_text except Exception as e: print(f"Stream failed: {e}") raise ``` --- ### 4. Count Tokens Estimate token usage and costs before invoking models. **Converse Token Counting**: ```python def count_tokens(messages: list, model_id: str): """Count tokens for cost estimation""" bedrock = get_bedrock_client() # Optional system prompts system_prompts = [ {'text': 'You are a helpful assistant.'} ] # Optional tools tools = [ { 'toolSpec': { 'name': 'example_tool', 'description': 'Example tool', 'inputSchema': { 'json': { 'type': 'object', 'properties': {} } } } } ] response = bedrock.converse_count( modelId=model_id, messages=messages, system=system_prompts, toolConfig={'tools': tools} ) # Get token counts usage = response['usage'] print(f"Input tokens: {usage['inputTokens']}") print(f"System tokens: {usage.get('systemTokens', 0)}") print(f"Tool tokens: {usage.get('toolTokens', 0)}") print(f"Total input: {usage['totalTokens']}") return usage # Usage messages = [ {'role': 'user', 'content': [{'text': 'This is a test message'}]} ] tokens = count_tokens(messages, 'us.anthropic.claude-sonnet-4-5-20250929-v1:0') ``` **Cost Estimation**: ```python def estimate_cost(messages: list, model_id: str, estimated_output_tokens: int = 1000): """Estimate inference cost before invocation""" bedrock = get_bedrock_client() # Count input tokens token_response = bedrock.converse_count( modelId=model_id, messages=messages ) input_tokens = token_response['usage']['totalTokens'] # Pricing (as of December 2024, prices vary by region) pricing = { 'us.anthropic.claude-opus-4-5-20251101-v1:0': { 'input': 15.00 / 1_000_000, # $15 per 1M input tokens 'output': 75.00 / 1_000_000 # $75 per 1M output tokens }, 'us.anthropic.claude-sonnet-4-5-20250929-v1:0': { 'input': 3.00 / 1_000_000, 'output': 15.00 / 1_000_000 }, 'us.anthropic.claude-haiku-4-5-20251001-v1:0': { 'input': 0.80 / 1_000_000, 'output': 4.00 / 1_000_000 } } if model_id in pricing: input_cost = input_tokens * pricing[model_id]['input'] output_cost = estimated_output_tokens * pricing[model_id]['output'] total_cost = input_cost + output_cost print(f"Input tokens: {input_tokens:,} (${input_cost:.6f})") print(f"Estimated output: {estimated_output_tokens:,} (${output_cost:.6f})") print(f"Estimated total: ${total_cost:.6f}") return { 'input_tokens': input_tokens, 'estimated_output_tokens': estimated_output_tokens, 'input_cost': input_cost, 'output_cost': output_cost, 'total_cost': total_cost } else: print("Pricing not available for this model") return None ``` --- ### 5. Async Invoke (Long-Running Tasks) For inference tasks that take longer than 60 seconds (up to 24 hours). **Start Async Invocation**: ```python def async_invoke_model(prompt: str, s3_output_uri: str): """Start async model invocation for long tasks""" bedrock = get_bedrock_client() request_body = { "anthropic_version": "bedrock-2023-05-31", "max_tokens": 10000, "messages": [ { "role": "user", "content": prompt } ] } response = bedrock.invoke_model_async( modelId='us.anthropic.claude-sonnet-4-5-20250929-v1:0', modelInput=json.dumps(request_body), outputDataConfig={ 's3OutputDataConfig': { 's3Uri': s3_output_uri } } ) invocation_arn = response['invocationArn'] print(f"Async invocation started: {invocation_arn}") return invocation_arn # Usage s3_output = 's3://my-bucket/bedrock-outputs/result.json' arn = async_invoke_model("Write a 10,000 word technical guide", s3_output) ``` **Check Async Status**: ```python def check_async_status(invocation_arn: str): """Check status of async invocation""" bedrock = get_bedrock_client() response = bedrock.get_async_invoke( invocationArn=invocation_arn ) status = response['status'] print(f"Status: {status}") if status == 'Completed': output_uri = response['outputDataConfig']['s3OutputDataConfig']['s3Uri'] print(f"Output available at: {output_uri}") # Download and parse result # (Use boto3 S3 client to retrieve) elif status == 'Failed': print(f"Failure reason: {response.get('failureMessage', 'Unknown')}") return response # Usage status = check_async_status(arn) ``` **List Async Invocations**: ```python def list_async_invocations(status_filter: Optional[str] = None): """List all async invocations""" bedrock = get_bedrock_client() params = {} if status_filter: params['statusEquals'] = status_filter # 'InProgress', 'Completed', 'Failed' response = bedrock.list_async_invokes(**params) for invocation in response.get('asyncInvokeSummaries', []): print(f"ARN: {invocation['invocationArn']}") print(f"Status: {invocation['status']}") print(f"Submit time: {invocation['submitTime']}") print("---") return response ``` --- ### 6. Apply Guardrail (Runtime Safety) Apply content filtering and safety policies at runtime. **Invoke with Guardrail**: ```python def invoke_with_guardrail( prompt: str, guardrail_id: str, guardrail_version: str = 'DRAFT' ): """Invoke model with runtime guardrail""" bedrock = get_bedrock_client() request_body = { "anthropic_version": "bedrock-2023-05-31", "max_tokens": 2048, "messages": [ { "role": "user", "content": prompt } ] } response = bedrock.invoke_model( modelId='us.anthropic.claude-sonnet-4-5-20250929-v1:0', body=json.dumps(request_body), guardrailIdentifier=guardrail_id, guardrailVersion=guardrail_version ) # Check if content was blocked response_body = json.loads(response['body'].read()) if 'amazon-bedrock-guardrailAction' in response['ResponseMetadata']['HTTPHeaders']: action = response['ResponseMetadata']['HTTPHeaders']['amazon-bedrock-guardrailAction'] if action == 'GUARDRAIL_INTERVENED': print("Content blocked by guardrail") return None return response_body['content'][0]['text'] # Usage result = invoke_with_guardrail( "Tell me about quantum computing", guardrail_id='abc123xyz', guardrail_version='1' ) ``` **Converse with Guardrail**: ```python def converse_with_guardrail(messages: list, guardrail_config: dict): """Converse API with guardrail configuration""" bedrock = get_bedrock_client() response = bedrock.converse( modelId='us.anthropic.claude-sonnet-4-5-20250929-v1:0', messages=messages, inferenceConfig={'maxTokens': 2048}, guardrailConfig=guardrail_config ) # Check trace for guardrail intervention if 'trace' in response: trace = response['trace']['guardrail'] if trace.get('action') == 'GUARDRAIL_INTERVENED': print("Guardrail blocked content") for assessment in trace.get('assessments', []): print(f"Policy: {assessment['topicPolicy']}") return response # Usage guardrail_config = { 'guardrailIdentifier': 'abc123xyz', 'guardrailVersion': '1', 'trace': 'enabled' } messages = [{'role': 'user', 'content': [{'text': 'Test message'}]}] converse_with_guardrail(messages, guardrail_config) ``` --- ## Error Handling Patterns ### Comprehensive Error Handling ```python from botocore.exceptions import ClientError, BotoCoreError import time def robust_invoke(prompt: str, max_retries: int = 3): """Invoke model with retry logic and error handling""" bedrock = get_bedrock_client() request_body = { "anthropic_version": "bedrock-2023-05-31", "max_tokens": 2048, "messages": [{"role": "user", "content": prompt}] } for attempt in range(max_retries): try: response = bedrock.invoke_model( modelId='us.anthropic.claude-sonnet-4-5-20250929-v1:0', body=json.dumps(request_body) ) response_body = json.loads(response['body'].read()) return response_body['content'][0]['text'] except ClientError as e: error_code = e.response['Error']['Code'] if error_code == 'ThrottlingException': wait_time = (2 ** attempt) + 1 # Exponential backoff print(f"Throttled. Waiting {wait_time}s before retry {attempt + 1}/{max_retries}") time.sleep(wait_time) continue elif error_code == 'ModelTimeoutException': print("Model timeout - request took too long") if attempt < max_retries - 1: time.sleep(2) continue raise elif error_code == 'ModelErrorException': print("Model error - check input format") raise elif error_code == 'ValidationException': print("Invalid parameters") raise elif error_code == 'AccessDeniedException': print("Access denied - check IAM permissions and model access") raise elif error_code == 'ResourceNotFoundException': print("Model not found - check model ID") raise else: print(f"Unexpected error: {error_code}") raise except BotoCoreError as e: print(f"Connection error: {e}") if attempt < max_retries - 1: time.sleep(2) continue raise raise Exception(f"Failed after {max_retries} attempts") ``` ### Specific Error Scenarios ```python def handle_model_errors(): """Common error scenarios and solutions""" bedrock = get_bedrock_client() try: # Attempt invocation response = bedrock.invoke_model( modelId='us.anthropic.claude-sonnet-4-5-20250929-v1:0', body=json.dumps({ "anthropic_version": "bedrock-2023-05-31", "max_tokens": 2048, "messages": [{"role": "user", "content": "test"}] }) ) except ClientError as e: error_code = e.response['Error']['Code'] if error_code == 'ModelNotReadyException': # Model is still loading print("Model not ready, wait 30 seconds and retry") elif error_code == 'ServiceQuotaExceededException': # Hit service quota print("Exceeded quota - request increase or use different region") elif error_code == 'ModelStreamErrorException': # Error during streaming print("Stream interrupted - restart stream") ``` --- ## Best Practices ### 1. Cost Optimization ```python def cost_optimized_inference(prompt: str, require_high_accuracy: bool = False): """Choose model based on task complexity and cost""" # Simple tasks → Haiku (cheapest) # Moderate tasks → Sonnet (balanced) # Complex tasks → Opus (most capable) if not require_high_accuracy: model_id = 'us.anthropic.claude-haiku-4-5-20251001-v1:0' print("Using Haiku for cost efficiency") elif require_high_accuracy: model_id = 'global.anthropic.claude-opus-4-5-20251101-v1:0' print("Using Opus for maximum accuracy") else: model_id = 'us.anthropic.claude-sonnet-4-5-20250929-v1:0' print("Using Sonnet for balanced performance") return invoke_claude(prompt, model_id) ``` ### 2. Use Inference Profiles ```python def use_inference_profiles(): """Leverage inference profiles for cost savings""" # Cross-region profiles offer 30-50% cost savings # with automatic region failover profiles = { 'global_opus': 'global.anthropic.claude-opus-4-5-20251101-v1:0', 'us_sonnet': 'us.anthropic.claude-sonnet-4-5-20250929-v1:0', 'us_haiku': 'us.anthropic.claude-haiku-4-5-20251001-v1:0' } # Use global profile for high availability # Use regional profile for lower latency return profiles ``` ### 3. Implement Caching ```python from functools import lru_cache import hashlib @lru_cache(maxsize=100) def cached_inference(prompt: str, model_id: str): """Cache responses for identical prompts""" return invoke_claude(prompt, model_id) def cache_key(prompt: str) -> str: """Generate cache key for prompt""" return hashlib.sha256(prompt.encode()).hexdigest() ``` ### 4. Monitor Token Usage ```python def track_token_usage(messages: list, model_id: str): """Track and log token usage""" bedrock = get_bedrock_client() # Count before invocation token_count = bedrock.converse_count( modelId=model_id, messages=messages ) input_tokens = token_count['usage']['totalTokens'] # Invoke response = bedrock.converse( modelId=model_id, messages=messages, inferenceConfig={'maxTokens': 2048} ) # Get actual output tokens output_tokens = response['usage']['outputTokens'] total_tokens = response['usage']['totalInputTokens'] + output_tokens # Log to CloudWatch or database print(f"Input: {input_tokens}, Output: {output_tokens}, Total: {total_tokens}") return response ``` ### 5. Use Streaming for Better UX ```python def stream_for_user_experience(prompt: str): """Always use streaming for interactive applications""" # Streaming reduces perceived latency # Users see tokens immediately instead of waiting return stream_claude_response(prompt) ``` ### 6. Async for Long Tasks ```python def use_async_for_batch(prompts: list, s3_bucket: str): """Use async invocation for batch processing""" invocation_arns = [] for idx, prompt in enumerate(prompts): s3_uri = f's3://{s3_bucket}/outputs/result-{idx}.json' arn = async_invoke_model(prompt, s3_uri) invocation_arns.append(arn) return invocation_arns ``` --- ## IAM Permissions ### Minimum Runtime Permissions ```json { "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": [ "bedrock:InvokeModel", "bedrock:InvokeModelWithResponseStream" ], "Resource": [ "arn:aws:bedrock:*::foundation-model/anthropic.claude-*", "arn:aws:bedrock:*::foundation-model/amazon.nova-*", "arn:aws:bedrock:*::foundation-model/amazon.titan-*" ] }, { "Effect": "Allow", "Action": [ "bedrock:Converse", "bedrock:ConverseStream" ], "Resource": "*" } ] } ``` ### With Async Invocation ```json { "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": [ "bedrock:InvokeModel", "bedrock:InvokeModelWithResponseStream", "bedrock:InvokeModelAsync", "bedrock:GetAsyncInvoke", "bedrock:ListAsyncInvokes" ], "Resource": "*" }, { "Effect": "Allow", "Action": [ "s3:PutObject", "s3:GetObject" ], "Resource": "arn:aws:s3:::my-bedrock-bucket/*" } ] } ``` --- ## Progressive Disclosure ### Quick Start (This File) - Client initialization - Model IDs and inference profiles - Basic invocation (native and Converse API) - Streaming responses - Token counting - Async invocation - Guardrail application - Error handling patterns - Best practices ### Detailed References - **[Advanced Invocation Patterns](references/advanced-invocation.md)**: Batch processing, parallel requests, custom retry logic, response parsing - **[Multimodal Support](references/multimodal.md)**: Image inputs, document parsing, vision capabilities for Claude and Nova - **[Tool Use and Function Calling](references/tool-use.md)**: Complete tool use patterns, multi-turn tool conversations, error handling - **[Performance Optimization](references/performance.md)**: Latency optimization, throughput tuning, cost reduction strategies - **[Monitoring and Observability](references/monitoring.md)**: CloudWatch integration, custom metrics, cost tracking, usage analytics --- ## Related Skills - **bedrock-agentcore**: Build production AI agents with managed infrastructure - **bedrock-guardrails**: Configure content filters and safety policies - **bedrock-knowledge-bases**: RAG with vector stores and retrieval - **bedrock-prompts**: Manage and version prompts - **anthropic-expert**: Claude API patterns and best practices - **claude-cost-optimization**: Cost tracking and optimization for Claude - **boto3-eks**: For containerized Bedrock applications --- ## Sources - [Amazon Bedrock Runtime API](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_Operations_Amazon_Bedrock_Runtime.html) - [Boto3 Bedrock Runtime](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/bedrock-runtime.html) - [Converse API Documentation](https://docs.aws.amazon.com/bedrock/latest/userguide/conversation-inference.html) - [Claude on Bedrock](https://docs.anthropic.com/en/api/claude-on-amazon-bedrock) - [Inference Profiles](https://docs.aws.amazon.com/bedrock/latest/userguide/inference-profiles.html)