--- name: sf-ai-agentforce-observability description: > Extract and analyze Agentforce session tracing data from Salesforce Data Cloud. Supports high-volume extraction (1-10M records/day), Polars-based analysis, and debugging workflows for agent sessions. license: MIT compatibility: "Requires Data Cloud enabled org with Agentforce Session Tracing" metadata: version: "1.0.0" author: "Jag Valaiyapathy" data_model: "Session Tracing Data Model (STDM)" storage_format: "Parquet (via PyArrow)" analysis_library: "Polars" hooks: PreToolUse: - matcher: Bash hooks: - type: command command: "python3 ${SHARED_HOOKS}/scripts/guardrails.py" timeout: 5000 PostToolUse: - matcher: "Write|Edit" hooks: - type: command command: "python3 ${SKILL_HOOKS}/validate-extraction.py" timeout: 10000 - type: command command: "python3 ${SHARED_HOOKS}/suggest-related-skills.py sf-ai-agentforce-observability" timeout: 5000 SubagentStop: - type: command command: "python3 ${SHARED_HOOKS}/scripts/chain-validator.py sf-ai-agentforce-observability" timeout: 5000 --- # sf-ai-agentforce-observability: Agentforce Session Tracing Extraction & Analysis Expert in extracting and analyzing Agentforce session tracing data from Salesforce Data Cloud. Supports high-volume data extraction (1-10M records/day), Parquet storage, and Polars-based analysis for debugging agent behavior. ## Core Responsibilities 1. **Session Extraction**: Extract STDM (Session Tracing Data Model) data via Data Cloud Query API 2. **Data Storage**: Write to Parquet format with PyArrow for efficient storage 3. **Analysis**: Polars-based lazy evaluation for memory-efficient analysis 4. **Debugging**: Session timeline reconstruction for troubleshooting agent issues 5. **Cross-Skill Integration**: Works with sf-connected-apps for auth, sf-ai-agentscript for fixes ## Document Map | Need | Document | Description | |------|----------|-------------| | **Quick start** | [README.md](README.md) | Installation & basic usage | | **Data model** | [resources/data-model-reference.md](resources/data-model-reference.md) | Full STDM schema documentation | | **Query patterns** | [resources/query-patterns.md](resources/query-patterns.md) | Data Cloud SQL examples | | **Analysis recipes** | [resources/analysis-cookbook.md](resources/analysis-cookbook.md) | Common Polars patterns | | **CLI reference** | [docs/cli-reference.md](docs/cli-reference.md) | Complete command documentation | | **Auth setup** | [docs/auth-setup.md](docs/auth-setup.md) | JWT Bearer configuration | | **Troubleshooting** | [resources/troubleshooting.md](resources/troubleshooting.md) | Common issues & fixes | **Quick Links:** - [Data Model Overview](#session-tracing-data-model-stdm) - [CLI Quick Reference](#cli-quick-reference) - [Analysis Examples](#analysis-examples) - [Cross-Skill Integration](#cross-skill-integration) --- ## CRITICAL: Prerequisites Checklist Before extracting session data, verify: | Check | How to Verify | Why | |-------|---------------|-----| | **Data Cloud enabled** | Setup → Data Cloud | Required for Query API | | **Agentforce activated** | Setup → Agentforce | Generates session data | | **Session Tracing enabled** | Agent Settings | Must be ON to collect data | | **JWT Auth configured** | Use `sf-connected-apps` | Required for Data Cloud API | ### Auth Setup (via sf-connected-apps) ```bash # 1. Generate certificate openssl req -x509 -sha256 -nodes -days 365 -newkey rsa:2048 \ -keyout ~/.sf/jwt/myorg.key \ -out ~/.sf/jwt/myorg.crt \ -subj "/CN=DataCloudAuth" # 2. Create External Client App (use sf-connected-apps skill) Skill(skill="sf-connected-apps", args="Create ECA with JWT Bearer for Data Cloud") # Required scopes: cdp_query_api, cdp_profile_api ``` See [docs/auth-setup.md](docs/auth-setup.md) for detailed instructions. --- ## Session Tracing Data Model (STDM) The STDM consists of 4 Data Model Objects (DMOs) in a hierarchical structure: ``` ssot__AIAgentSession__dlm (SESSION) ├── ssot__Id__c # Session ID ├── ssot__AIAgentApiName__c # Agent API name ├── ssot__StartTimestamp__c # Session start ├── ssot__EndTimestamp__c # Session end ├── ssot__AIAgentSessionEndType__c # End type (Completed, Abandoned, etc.) ├── ssot__RelatedMessagingSessionId__c # Linked messaging session └── ssot__OrganizationId__c # Org ID └── ssot__AIAgentInteraction__dlm (TURN/SESSION_END) [1:N] ├── ssot__Id__c # Interaction ID ├── ssot__aiAgentSessionId__c # FK to Session ├── ssot__InteractionType__c # TURN or SESSION_END ├── ssot__TopicApiName__c # Topic that handled this turn ├── ssot__StartTimestamp__c # Turn start └── ssot__EndTimestamp__c # Turn end ├── ssot__AIAgentInteractionStep__dlm (STEP) [1:N] │ ├── ssot__Id__c # Step ID │ ├── ssot__AIAgentInteractionId__c # FK to Interaction │ ├── ssot__AIAgentInteractionStepType__c # LLM_STEP or ACTION_STEP │ ├── ssot__Name__c # Action/step name │ ├── ssot__InputValueText__c # Input to step │ ├── ssot__OutputValueText__c # Output from step │ ├── ssot__PreStepVariableText__c # Variables before │ ├── ssot__PostStepVariableText__c # Variables after │ └── ssot__GenerationId__c # LLM generation ID └── ssot__AIAgentMoment__dlm (MESSAGE) [1:N] ├── ssot__Id__c # Message ID ├── ssot__AIAgentInteractionId__c # FK to Interaction ├── ssot__ContentText__c # Message content ├── ssot__AIAgentInteractionMessageType__c # INPUT or OUTPUT └── ssot__MessageSentTimestamp__c # Timestamp ``` See [resources/data-model-reference.md](resources/data-model-reference.md) for full field documentation. --- ## Workflow (5-Phase Pattern) ### Phase 1: Requirements Gathering Use **AskUserQuestion** to gather: | # | Question | Options | |---|----------|---------| | 1 | Target org | Org alias from `sf org list` | | 2 | Time range | Last N days / Date range | | 3 | Agent filter | All agents / Specific API names | | 4 | Output format | Parquet (default) / CSV | | 5 | Analysis type | Summary / Debug session / Full extraction | ### Phase 2: Auth Configuration Verify JWT auth is configured: ```python from scripts.auth import DataCloudAuth auth = DataCloudAuth( org_alias="myorg", consumer_key="YOUR_CONSUMER_KEY" ) # Test authentication token = auth.get_token() print(f"Auth successful: {token[:20]}...") ``` If auth fails, invoke: ``` Skill(skill="sf-connected-apps", args="Setup JWT Bearer for Data Cloud") ``` ### Phase 3: Extraction **Basic Extraction (last 7 days):** ```bash python3 scripts/cli.py extract \ --org prod \ --days 7 \ --output ./stdm_data ``` **Filtered Extraction:** ```bash python3 scripts/cli.py extract \ --org prod \ --since 2026-01-01 \ --until 2026-01-28 \ --agent Customer_Support_Agent \ --output ./stdm_data ``` **Session Tree (specific session):** ```bash python3 scripts/cli.py extract-tree \ --org prod \ --session-id "a0x..." \ --output ./debug_session ``` ### Phase 4: Analysis **Session Summary:** ```python from scripts.analyzer import STDMAnalyzer from pathlib import Path analyzer = STDMAnalyzer(Path("./stdm_data")) # High-level summary summary = analyzer.session_summary() print(summary) # Step distribution by agent steps = analyzer.step_distribution(agent_name="Customer_Support_Agent") print(steps) # Topic routing analysis topics = analyzer.topic_analysis() print(topics) ``` **Debug Specific Session:** ```bash python3 scripts/cli.py debug-session \ --data-dir ./stdm_data \ --session-id "a0x..." ``` ### Phase 5: Integration & Next Steps Based on analysis findings: | Finding | Next Step | Skill | |---------|-----------|-------| | Topic mismatch | Improve topic descriptions | `sf-ai-agentscript` | | Action failures | Debug Flow/Apex | `sf-flow`, `sf-debug` | | Slow responses | Optimize actions | `sf-apex` | | Missing coverage | Add test cases | `sf-ai-agentforce-testing` | --- ## CLI Quick Reference ### Extraction Commands | Command | Purpose | Example | |---------|---------|---------| | `extract` | Extract session data | `extract --org prod --days 7` | | `extract-tree` | Extract full session tree | `extract-tree --org prod --session-id "a0x..."` | | `extract-incremental` | Resume from last run | `extract-incremental --org prod` | ### Analysis Commands | Command | Purpose | Example | |---------|---------|---------| | `analyze` | Generate summary stats | `analyze --data-dir ./stdm_data` | | `debug-session` | Timeline view | `debug-session --session-id "a0x..."` | | `topics` | Topic analysis | `topics --data-dir ./stdm_data` | ### Common Flags | Flag | Description | Default | |------|-------------|---------| | `--org` | Target org alias | Required | | `--days` | Last N days | 7 | | `--since` | Start date (YYYY-MM-DD) | - | | `--until` | End date (YYYY-MM-DD) | Today | | `--agent` | Filter by agent API name | All | | `--output` | Output directory | `./stdm_data` | | `--verbose` | Detailed logging | False | | `--format` | Output format (table/json/csv) | table | See [docs/cli-reference.md](docs/cli-reference.md) for complete documentation. --- ## Analysis Examples ### Session Summary ``` 📊 SESSION SUMMARY ════════════════════════════════════════════════════════════════ Period: 2026-01-21 to 2026-01-28 Total Sessions: 15,234 Unique Agents: 3 SESSIONS BY AGENT ──────────────────────────────────────────────────────────────── Agent │ Sessions │ Avg Turns │ Avg Duration ───────────────────────────────┼──────────┼───────────┼───────────── Customer_Support_Agent │ 8,502 │ 4.2 │ 3m 15s Order_Tracking_Agent │ 4,128 │ 2.8 │ 1m 45s Product_FAQ_Agent │ 2,604 │ 1.9 │ 45s END TYPE DISTRIBUTION ──────────────────────────────────────────────────────────────── ✅ Completed: 12,890 (84.6%) 🔄 Escalated: 1,523 (10.0%) ❌ Abandoned: 821 (5.4%) ``` ### Debug Session Timeline ``` 🔍 SESSION DEBUG: a0x1234567890ABC ════════════════════════════════════════════════════════════════ Agent: Customer_Support_Agent Started: 2026-01-28 10:15:23 UTC Duration: 4m 32s End Type: Completed Turns: 5 TIMELINE ──────────────────────────────────────────────────────────────── 10:15:23 │ [INPUT] "I need help with my order #12345" 10:15:24 │ [TOPIC] → Order_Tracking (confidence: 0.95) 10:15:24 │ [STEP] LLM_STEP: Identify intent 10:15:25 │ [STEP] ACTION_STEP: Get_Order_Status │ Input: {"orderId": "12345"} │ Output: {"status": "Shipped", "eta": "2026-01-30"} 10:15:26 │ [OUTPUT] "Your order #12345 has shipped and will arrive by Jan 30." 10:16:01 │ [INPUT] "Can I change the delivery address?" 10:16:02 │ [TOPIC] → Order_Tracking (same topic) 10:16:02 │ [STEP] LLM_STEP: Clarify request 10:16:03 │ [STEP] ACTION_STEP: Check_Modification_Eligibility │ Input: {"orderId": "12345", "type": "address_change"} │ Output: {"eligible": false, "reason": "Already shipped"} 10:16:04 │ [OUTPUT] "I'm sorry, the order has already shipped..." ``` --- ## Cross-Skill Integration ### Prerequisite Skills | Skill | When | How to Invoke | |-------|------|---------------| | `sf-connected-apps` | Auth setup | `Skill(skill="sf-connected-apps", args="JWT Bearer for Data Cloud")` | ### Follow-up Skills | Finding | Skill | How to Invoke | |---------|-------|---------------| | Topic routing issues | `sf-ai-agentscript` | `Skill(skill="sf-ai-agentscript", args="Fix topic: [issue]")` | | Action failures | `sf-flow` / `sf-debug` | `Skill(skill="sf-debug", args="Analyze agent action failure")` | | Test coverage gaps | `sf-ai-agentforce-testing` | `Skill(skill="sf-ai-agentforce-testing", args="Add test cases")` | ### Commonly Used With | Skill | Use Case | Confidence | |-------|----------|------------| | `sf-ai-agentscript` | Fix agent based on trace analysis | ⭐⭐⭐ Required | | `sf-ai-agentforce-testing` | Create test cases from observed patterns | ⭐⭐ Recommended | | `sf-debug` | Deep-dive into action failures | ⭐⭐ Recommended | --- ## Key Insights | Insight | Description | Action | |---------|-------------|--------| | **STDM is read-only** | Data Cloud stores traces; cannot modify | Use for analysis only | | **Session lag** | Data may lag 5-15 minutes | Don't expect real-time | | **Volume limits** | Query API: 10M records/day | Use incremental extraction | | **Parquet efficiency** | 10x smaller than JSON | Always use Parquet for storage | | **Lazy evaluation** | Polars scans without loading | Handles 100M+ rows | --- ## Common Issues & Fixes | Error | Cause | Fix | |-------|-------|-----| | `401 Unauthorized` | JWT auth expired/invalid | Refresh token or reconfigure ECA | | `No session data` | Tracing not enabled | Enable Session Tracing in Agent Settings | | `Query timeout` | Too much data | Add date filters, use incremental | | `Memory error` | Loading all data | Use Polars lazy frames | | `Missing DMO` | Wrong API version | Use API v60.0+ | See [resources/troubleshooting.md](resources/troubleshooting.md) for detailed solutions. --- ## Output Directory Structure After extraction: ``` stdm_data/ ├── sessions/ │ └── date=2026-01-28/ │ └── part-0000.parquet ├── interactions/ │ └── date=2026-01-28/ │ └── part-0000.parquet ├── steps/ │ └── date=2026-01-28/ │ └── part-0000.parquet ├── messages/ │ └── date=2026-01-28/ │ └── part-0000.parquet └── metadata/ ├── extraction.json # Extraction parameters └── watermark.json # For incremental extraction ``` --- ## Dependencies **Python 3.10+** with: ``` polars>=1.0.0 # DataFrame library (lazy evaluation) pyarrow>=15.0.0 # Parquet support pyjwt>=2.8.0 # JWT generation cryptography>=42.0.0 # Certificate handling httpx>=0.27.0 # HTTP client rich>=13.0.0 # CLI progress bars click>=8.1.0 # CLI framework pydantic>=2.6.0 # Data validation ``` Install: `pip install -r requirements.txt` --- ## License MIT License. See [LICENSE](LICENSE) file. Copyright (c) 2024-2026 Jag Valaiyapathy