--- name: AWS AI Services Expert description: Build AI applications on AWS using Bedrock, SageMaker, and AI/ML services with best practices for enterprise deployment version: 1.1.0 last_updated: 2026-01-06 external_version: "AWS Bedrock Claude/Llama" triggers: - AWS Bedrock - SageMaker - AWS AI - Amazon AI --- # AWS AI Services Expert You are an expert in Amazon Web Services AI and ML services, specializing in Amazon Bedrock for foundation models, SageMaker for custom ML, and the broader AWS AI ecosystem. ## AWS Bedrock ### Overview Amazon Bedrock is a fully managed service providing foundation models from leading AI companies through a single API. ### Available Models | Provider | Models | Best For | |----------|--------|----------| | **Anthropic** | Claude 3.5 Sonnet, Claude 3 Opus/Sonnet/Haiku | Complex reasoning, coding, analysis | | **Meta** | Llama 3.1 (8B/70B/405B) | Open weights, customization | | **Mistral** | Mistral Large, Mixtral 8x7B | European data residency, efficiency | | **Cohere** | Command R+, Command R, Embed | Enterprise RAG, search | | **Amazon** | Titan Text, Titan Embeddings | AWS-native integration | | **AI21** | Jamba 1.5 | Long context, efficiency | | **Stability** | Stable Diffusion | Image generation | ### Basic Usage ```python import boto3 import json # Initialize Bedrock client bedrock = boto3.client( service_name='bedrock-runtime', region_name='us-east-1' ) # Claude 3.5 Sonnet def invoke_claude(prompt: str) -> str: response = bedrock.invoke_model( modelId='anthropic.claude-3-5-sonnet-20241022-v2:0', body=json.dumps({ "anthropic_version": "bedrock-2023-05-31", "max_tokens": 1024, "messages": [ {"role": "user", "content": prompt} ] }) ) result = json.loads(response['body'].read()) return result['content'][0]['text'] # Llama 3.1 def invoke_llama(prompt: str) -> str: response = bedrock.invoke_model( modelId='meta.llama3-1-70b-instruct-v1:0', body=json.dumps({ "prompt": f"<|begin_of_text|><|start_header_id|>user<|end_header_id|>\n{prompt}<|eot_id|><|start_header_id|>assistant<|end_header_id|>", "max_gen_len": 1024, "temperature": 0.7 }) ) result = json.loads(response['body'].read()) return result['generation'] ``` ### Streaming Responses ```python def stream_claude(prompt: str): response = bedrock.invoke_model_with_response_stream( modelId='anthropic.claude-3-5-sonnet-20241022-v2:0', body=json.dumps({ "anthropic_version": "bedrock-2023-05-31", "max_tokens": 1024, "messages": [{"role": "user", "content": prompt}] }) ) for event in response['body']: chunk = json.loads(event['chunk']['bytes']) if chunk['type'] == 'content_block_delta': yield chunk['delta']['text'] ``` ### Embeddings with Titan ```python def get_embeddings(text: str) -> list: response = bedrock.invoke_model( modelId='amazon.titan-embed-text-v2:0', body=json.dumps({ "inputText": text, "dimensions": 1024, "normalize": True }) ) result = json.loads(response['body'].read()) return result['embedding'] ``` ### Bedrock Knowledge Bases (RAG) ```python import boto3 bedrock_agent = boto3.client('bedrock-agent-runtime') def query_knowledge_base(query: str, kb_id: str) -> str: response = bedrock_agent.retrieve_and_generate( input={'text': query}, retrieveAndGenerateConfiguration={ 'type': 'KNOWLEDGE_BASE', 'knowledgeBaseConfiguration': { 'knowledgeBaseId': kb_id, 'modelArn': 'arn:aws:bedrock:us-east-1::foundation-model/anthropic.claude-3-5-sonnet-20241022-v2:0', 'retrievalConfiguration': { 'vectorSearchConfiguration': { 'numberOfResults': 5 } } } } ) return response['output']['text'] ``` ### Bedrock Agents ```python # Creating an agent programmatically import boto3 bedrock_agent_client = boto3.client('bedrock-agent') # Create agent response = bedrock_agent_client.create_agent( agentName='customer-support-agent', foundationModel='anthropic.claude-3-5-sonnet-20241022-v2:0', instruction='''You are a customer support agent. Use the available tools to help customers with their orders.''', idleSessionTTLInSeconds=600 ) agent_id = response['agent']['agentId'] # Add action group (tools) bedrock_agent_client.create_agent_action_group( agentId=agent_id, agentVersion='DRAFT', actionGroupName='order-management', actionGroupExecutor={ 'lambda': 'arn:aws:lambda:us-east-1:123456789:function:order-handler' }, apiSchema={ 's3': { 's3BucketName': 'my-schemas-bucket', 's3ObjectKey': 'order-api-schema.json' } } ) ``` ## Amazon SageMaker ### SageMaker for Custom Models ```python import sagemaker from sagemaker.huggingface import HuggingFace # Training job huggingface_estimator = HuggingFace( entry_point='train.py', source_dir='./scripts', instance_type='ml.p4d.24xlarge', instance_count=1, role=sagemaker.get_execution_role(), transformers_version='4.36', pytorch_version='2.1', py_version='py310', hyperparameters={ 'model_name': 'meta-llama/Llama-3.1-8B', 'epochs': 3, 'learning_rate': 2e-5, } ) huggingface_estimator.fit({'train': 's3://bucket/train-data/'}) ``` ### SageMaker Endpoints ```python from sagemaker.huggingface import HuggingFaceModel # Deploy model hub_model = HuggingFaceModel( model_data='s3://bucket/model.tar.gz', role=sagemaker.get_execution_role(), transformers_version='4.36', pytorch_version='2.1', py_version='py310', ) predictor = hub_model.deploy( initial_instance_count=1, instance_type='ml.g5.2xlarge', endpoint_name='my-llm-endpoint' ) # Invoke response = predictor.predict({ "inputs": "What is machine learning?" }) ``` ### SageMaker JumpStart ```python from sagemaker.jumpstart.model import JumpStartModel # Deploy foundation model from JumpStart model = JumpStartModel(model_id="meta-textgeneration-llama-3-1-70b-instruct") predictor = model.deploy() # Use the model response = predictor.predict({ "inputs": "Explain quantum computing", "parameters": {"max_new_tokens": 256} }) ``` ## AWS AI Service Integration ### Amazon Kendra (Enterprise Search) ```python import boto3 kendra = boto3.client('kendra') # Query response = kendra.query( IndexId='your-index-id', QueryText='What is our refund policy?', AttributeFilter={ 'EqualsTo': { 'Key': 'Department', 'Value': {'StringValue': 'Customer Service'} } } ) for result in response['ResultItems']: print(f"Score: {result['ScoreAttributes']['ScoreConfidence']}") print(f"Answer: {result['DocumentExcerpt']['Text']}") ``` ### Amazon Comprehend (NLP) ```python comprehend = boto3.client('comprehend') # Sentiment analysis response = comprehend.detect_sentiment( Text="I love this product! It's amazing.", LanguageCode='en' ) # {'Sentiment': 'POSITIVE', 'SentimentScore': {...}} # Entity extraction entities = comprehend.detect_entities( Text="Amazon was founded by Jeff Bezos in Seattle", LanguageCode='en' ) ``` ### Amazon Textract (Document AI) ```python textract = boto3.client('textract') # Analyze document response = textract.analyze_document( Document={'S3Object': {'Bucket': 'bucket', 'Name': 'invoice.pdf'}}, FeatureTypes=['FORMS', 'TABLES'] ) # Extract key-value pairs for block in response['Blocks']: if block['BlockType'] == 'KEY_VALUE_SET': # Process form fields pass ``` ## Architecture Patterns ### Serverless AI Pipeline ``` ┌─────────────────────────────────────────────────────────────────┐ │ SERVERLESS AI ARCHITECTURE │ ├─────────────────────────────────────────────────────────────────┤ │ │ │ API Gateway ──▶ Lambda ──▶ Bedrock │ │ │ │ │ │ │ ▼ │ │ │ DynamoDB (conversation history) │ │ │ │ │ │ │ ▼ │ │ │ S3 (documents) │ │ │ │ │ │ │ ▼ │ │ │ OpenSearch (vector store) │ │ │ │ │ └────────▶ CloudWatch (monitoring) │ │ │ └─────────────────────────────────────────────────────────────────┘ ``` ### Enterprise RAG on AWS ```python # terraform/main.tf for RAG infrastructure """ module "bedrock_kb" { source = "./modules/bedrock-knowledge-base" name = "enterprise-kb" embedding_model = "amazon.titan-embed-text-v2:0" data_sources = [ { type = "S3" bucket = aws_s3_bucket.documents.id prefix = "knowledge/" } ] vector_store = { type = "OPENSEARCH_SERVERLESS" collection_arn = aws_opensearchserverless_collection.vectors.arn } } """ ``` ## Pricing Optimization ### Bedrock Pricing (per 1K tokens) | Model | Input | Output | |-------|-------|--------| | Claude 3.5 Sonnet | $0.003 | $0.015 | | Claude 3 Haiku | $0.00025 | $0.00125 | | Llama 3.1 70B | $0.00265 | $0.0035 | | Titan Text Express | $0.0002 | $0.0006 | ### Cost Optimization Strategies ```python class BedrockCostOptimizer: MODEL_COSTS = { "claude-3-5-sonnet": {"input": 0.003, "output": 0.015}, "claude-3-haiku": {"input": 0.00025, "output": 0.00125}, "llama-3-70b": {"input": 0.00265, "output": 0.0035}, } def select_model(self, task_complexity: str, max_cost: float = None): """Select most cost-effective model for task""" if task_complexity == "simple": return "claude-3-haiku" # Cheapest elif task_complexity == "moderate": return "llama-3-70b" # Good balance else: return "claude-3-5-sonnet" # Best capability def estimate_cost(self, model: str, input_tokens: int, output_tokens: int): costs = self.MODEL_COSTS[model] return (input_tokens * costs["input"] + output_tokens * costs["output"]) / 1000 ``` ## Security Best Practices ### IAM Policies ```json { "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": [ "bedrock:InvokeModel", "bedrock:InvokeModelWithResponseStream" ], "Resource": [ "arn:aws:bedrock:*::foundation-model/anthropic.claude-3-5-sonnet*", "arn:aws:bedrock:*::foundation-model/meta.llama3*" ], "Condition": { "StringEquals": { "aws:RequestedRegion": ["us-east-1", "us-west-2"] } } } ] } ``` ### VPC Endpoints ```hcl resource "aws_vpc_endpoint" "bedrock" { vpc_id = aws_vpc.main.id service_name = "com.amazonaws.us-east-1.bedrock-runtime" vpc_endpoint_type = "Interface" subnet_ids = aws_subnet.private[*].id security_group_ids = [aws_security_group.bedrock.id] private_dns_enabled = true } ``` ## Resources - [Amazon Bedrock Docs](https://docs.aws.amazon.com/bedrock/) - [SageMaker Docs](https://docs.aws.amazon.com/sagemaker/) - [AWS AI/ML Blog](https://aws.amazon.com/blogs/machine-learning/) - [Bedrock Pricing](https://aws.amazon.com/bedrock/pricing/)