---
name: Azure AI Services Expert
description: Build AI applications on Azure using Azure OpenAI, Cognitive Services, and ML services with enterprise patterns
version: 1.1.0
last_updated: 2026-01-06
external_version: "Azure OpenAI GPT-5.2"
triggers:
  - Azure OpenAI
  - Azure AI
  - Cognitive Services
  - Azure ML
---

# Azure AI Services Expert

You are an expert in Microsoft Azure AI services, specializing in Azure OpenAI Service for GPT models, Azure AI Services, and Azure Machine Learning.

## Azure OpenAI Service

### Overview
Azure OpenAI provides access to OpenAI models (GPT-4, GPT-4o, DALL-E, Whisper) with Azure's enterprise security, compliance, and regional availability.

### Available Models

| Model | Context | Best For |
|-------|---------|----------|
| **GPT-4o** | 128K | Multimodal, fastest GPT-4 |
| **GPT-4 Turbo** | 128K | Complex reasoning |
| **GPT-4** | 8K/32K | High capability |
| **GPT-3.5 Turbo** | 16K | Fast, cost-effective |
| **text-embedding-ada-002** | 8K | Embeddings |
| **text-embedding-3-large** | 8K | Better embeddings |
| **DALL-E 3** | - | Image generation |
| **Whisper** | - | Speech-to-text |

### Basic Usage

```python
from openai import AzureOpenAI

client = AzureOpenAI(
    api_key=os.environ["AZURE_OPENAI_API_KEY"],
    api_version="2024-02-01",
    azure_endpoint=os.environ["AZURE_OPENAI_ENDPOINT"]
)

# Chat completion
response = client.chat.completions.create(
    model="gpt-4o",  # Deployment name
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Explain machine learning"}
    ],
    temperature=0.7,
    max_tokens=1000
)

print(response.choices[0].message.content)
```

### Streaming

```python
stream = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Write a story"}],
    stream=True
)

for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="")
```

### Function Calling

```python
tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get current weather for a location",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {"type": "string", "description": "City name"},
                    "unit": {"type": "string", "enum": ["celsius", "fahrenheit"]}
                },
                "required": ["location"]
            }
        }
    }
]

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "What's the weather in Paris?"}],
    tools=tools,
    tool_choice="auto"
)

if response.choices[0].message.tool_calls:
    tool_call = response.choices[0].message.tool_calls[0]
    # Execute function and send result back
```

### Embeddings

```python
response = client.embeddings.create(
    model="text-embedding-3-large",  # Deployment name
    input="Your text to embed",
    dimensions=1024  # Optional: reduce dimensions
)

embedding = response.data[0].embedding
```

### Vision (GPT-4o)

```python
import base64

def encode_image(image_path):
    with open(image_path, "rb") as f:
        return base64.b64encode(f.read()).decode('utf-8')

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "What's in this image?"},
                {
                    "type": "image_url",
                    "image_url": {
                        "url": f"data:image/jpeg;base64,{encode_image('image.jpg')}"
                    }
                }
            ]
        }
    ]
)
```

## Provisioned Throughput Units (PTU)

### When to Use PTU
- Predictable, high-volume workloads
- Guaranteed performance requirements
- Cost optimization at scale

### PTU Sizing

```python
# PTU capacity estimation
def estimate_ptus(
    requests_per_minute: int,
    avg_input_tokens: int,
    avg_output_tokens: int,
    model: str = "gpt-4o"
) -> int:
    """Estimate PTUs needed for workload"""

    # Tokens per minute per PTU (approximate)
    TPM_PER_PTU = {
        "gpt-4o": 10000,
        "gpt-4-turbo": 8000,
        "gpt-35-turbo": 50000,
    }

    total_tokens_per_minute = requests_per_minute * (avg_input_tokens + avg_output_tokens)
    ptus_needed = total_tokens_per_minute / TPM_PER_PTU[model]

    return max(1, int(ptus_needed * 1.2))  # 20% buffer

# Example: 100 RPM, 500 input tokens, 200 output tokens
estimate_ptus(100, 500, 200, "gpt-4o")  # Returns: 9 PTUs
```

## Azure AI Search (Cognitive Search)

### Vector Search Setup

```python
from azure.search.documents import SearchClient
from azure.search.documents.indexes import SearchIndexClient
from azure.search.documents.indexes.models import (
    SearchIndex,
    SearchField,
    VectorSearch,
    HnswAlgorithmConfiguration,
    VectorSearchProfile,
)

# Create index with vector field
index = SearchIndex(
    name="documents-index",
    fields=[
        SearchField(name="id", type="Edm.String", key=True),
        SearchField(name="content", type="Edm.String", searchable=True),
        SearchField(
            name="embedding",
            type="Collection(Edm.Single)",
            searchable=True,
            vector_search_dimensions=1536,
            vector_search_profile_name="vector-profile"
        ),
    ],
    vector_search=VectorSearch(
        algorithms=[HnswAlgorithmConfiguration(name="hnsw")],
        profiles=[
            VectorSearchProfile(name="vector-profile", algorithm_configuration_name="hnsw")
        ]
    )
)

index_client = SearchIndexClient(endpoint, credential)
index_client.create_index(index)
```

### Hybrid Search (Vector + Keyword)

```python
from azure.search.documents.models import VectorizedQuery

search_client = SearchClient(endpoint, "documents-index", credential)

# Get embedding for query
query_embedding = get_embedding("What is machine learning?")

# Hybrid search
results = search_client.search(
    search_text="machine learning",  # Keyword search
    vector_queries=[
        VectorizedQuery(
            vector=query_embedding,
            k_nearest_neighbors=5,
            fields="embedding"
        )
    ],
    select=["id", "content"],
    top=10
)

for result in results:
    print(f"Score: {result['@search.score']}, Content: {result['content'][:100]}")
```

## Azure AI Studio

### Prompt Flow

```yaml
# flow.dag.yaml
inputs:
  question:
    type: string

outputs:
  answer:
    type: string
    reference: ${generate_answer.output}

nodes:
  - name: embed_question
    type: python
    source:
      type: code
      path: embed.py
    inputs:
      text: ${inputs.question}

  - name: search_documents
    type: python
    source:
      type: code
      path: search.py
    inputs:
      embedding: ${embed_question.output}

  - name: generate_answer
    type: llm
    source:
      type: code
      path: prompt.jinja2
    inputs:
      deployment_name: gpt-4o
      context: ${search_documents.output}
      question: ${inputs.question}
```

## Azure Machine Learning

### Training with Azure ML

```python
from azure.ai.ml import MLClient, command, Input
from azure.identity import DefaultAzureCredential

ml_client = MLClient(
    DefaultAzureCredential(),
    subscription_id="...",
    resource_group_name="...",
    workspace_name="..."
)

# Define training job
job = command(
    code="./src",
    command="python train.py --data ${{inputs.data}} --lr ${{inputs.learning_rate}}",
    inputs={
        "data": Input(type="uri_folder", path="azureml://datastores/data/paths/train/"),
        "learning_rate": 0.001,
    },
    environment="AzureML-pytorch-2.0-cuda11.8@latest",
    compute="gpu-cluster",
    experiment_name="llm-finetune"
)

# Submit job
returned_job = ml_client.jobs.create_or_update(job)
```

### Deploy to Managed Endpoint

```python
from azure.ai.ml.entities import (
    ManagedOnlineEndpoint,
    ManagedOnlineDeployment,
    Model,
)

# Create endpoint
endpoint = ManagedOnlineEndpoint(
    name="llm-endpoint",
    auth_mode="key"
)
ml_client.online_endpoints.begin_create_or_update(endpoint).result()

# Deploy model
deployment = ManagedOnlineDeployment(
    name="llm-deployment",
    endpoint_name="llm-endpoint",
    model=Model(path="./model"),
    instance_type="Standard_NC24ads_A100_v4",
    instance_count=1,
)
ml_client.online_deployments.begin_create_or_update(deployment).result()
```

## Architecture Patterns

### Enterprise RAG on Azure

```
┌─────────────────────────────────────────────────────────────────┐
│                    AZURE RAG ARCHITECTURE                        │
├─────────────────────────────────────────────────────────────────┤
│                                                                  │
│  User ──▶ Azure Front Door ──▶ API Management                   │
│                                      │                           │
│                                      ▼                           │
│                              Azure Functions                     │
│                                      │                           │
│                    ┌─────────────────┼─────────────────┐        │
│                    ▼                 ▼                 ▼        │
│           ┌──────────────┐  ┌──────────────┐  ┌──────────────┐ │
│           │ Azure OpenAI │  │  AI Search   │  │  Blob Store  │ │
│           │   (GPT-4o)   │  │  (Vectors)   │  │ (Documents)  │ │
│           └──────────────┘  └──────────────┘  └──────────────┘ │
│                                      │                           │
│                                      ▼                           │
│                              Cosmos DB                           │
│                         (Conversation History)                   │
│                                                                  │
└─────────────────────────────────────────────────────────────────┘
```

### Bicep/ARM Deployment

```bicep
// main.bicep
param location string = resourceGroup().location
param openaiName string

resource openai 'Microsoft.CognitiveServices/accounts@2023-10-01-preview' = {
  name: openaiName
  location: location
  kind: 'OpenAI'
  sku: {
    name: 'S0'
  }
  properties: {
    customSubDomainName: openaiName
    publicNetworkAccess: 'Disabled'
  }
}

resource gpt4oDeployment 'Microsoft.CognitiveServices/accounts/deployments@2023-10-01-preview' = {
  parent: openai
  name: 'gpt-4o'
  properties: {
    model: {
      format: 'OpenAI'
      name: 'gpt-4o'
      version: '2024-05-13'
    }
  }
  sku: {
    name: 'Standard'
    capacity: 100  // TPM in thousands
  }
}
```

## Pricing

### Azure OpenAI Pricing (per 1K tokens)

| Model | Input | Output |
|-------|-------|--------|
| GPT-4o | $0.005 | $0.015 |
| GPT-4 Turbo | $0.01 | $0.03 |
| GPT-3.5 Turbo | $0.0005 | $0.0015 |
| text-embedding-3-large | $0.00013 | - |

### PTU Pricing
- ~$0.06 per PTU-hour
- Minimum 1 month commitment
- 30%+ savings vs pay-as-you-go at scale

## Security

### Private Endpoints

```bicep
resource privateEndpoint 'Microsoft.Network/privateEndpoints@2023-04-01' = {
  name: '${openaiName}-pe'
  location: location
  properties: {
    subnet: {
      id: subnetId
    }
    privateLinkServiceConnections: [
      {
        name: '${openaiName}-plsc'
        properties: {
          privateLinkServiceId: openai.id
          groupIds: ['account']
        }
      }
    ]
  }
}
```

### Managed Identity

```python
from azure.identity import DefaultAzureCredential
from openai import AzureOpenAI

# Use managed identity (no API keys!)
credential = DefaultAzureCredential()
token = credential.get_token("https://cognitiveservices.azure.com/.default")

client = AzureOpenAI(
    azure_endpoint=os.environ["AZURE_OPENAI_ENDPOINT"],
    api_version="2024-02-01",
    azure_ad_token=token.token
)
```

## Content Filtering

```python
# Azure OpenAI has built-in content filtering
# Configure via Azure Portal or API

# Check for filtered content in response
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "..."}]
)

# Content filter results
if hasattr(response.choices[0], 'content_filter_results'):
    filters = response.choices[0].content_filter_results
    if filters.get('hate', {}).get('filtered'):
        print("Content was filtered for hate speech")
```

## Resources

- [Azure OpenAI Docs](https://learn.microsoft.com/azure/ai-services/openai/)
- [Azure AI Search Docs](https://learn.microsoft.com/azure/search/)
- [Azure ML Docs](https://learn.microsoft.com/azure/machine-learning/)
- [Azure OpenAI Pricing](https://azure.microsoft.com/pricing/details/cognitive-services/openai-service/)