# Adaptive Load Balancing Guide AxonHub provides an intelligent adaptive load balancing system that automatically selects optimal AI channels based on multiple dimensions, ensuring high availability and optimal performance. ## 🎯 Core Features ### Intelligent Channel Selection - **Priority Grouping** - Candidates are first grouped by model association priority (Lower value = Higher priority) - **Session Consistency** - Requests from the same conversation are prioritized to route to previously successful channels - **Health Awareness** - Automatically avoids channels with high error rates - **Fair Distribution** - Uses Weighted Round Robin to distribute requests proportionally based on channel weights - **Real-time Load** - Dynamically adjusts based on current connection count ### Multi-Strategy Scoring System Load balancing follows a hierarchical process: first by **Association Priority**, then by **Strategy Scoring** within each priority group. | Level | Strategy | Score Range | Description | |-------|----------|-------------|-------------| | **1** | **Association Priority** | 0-N (Lower is better) | Hard grouping defined in model associations | | **2** | **Trace Aware** | 0-1000 points | Same session priority, ensures conversation continuity | | **3** | **Error Aware** | 0-200 points | Based on success rate and error history | | **4** | **Weight Round Robin** | 10-150 points | Proportional distribution based on weight and history | | **5** | **Connection Load** | 0-50 points | Current connection utilization | ## 🚀 Quick Start ### 1. Configure Multiple Channels Add multiple channels for the same model in the management interface: ```yaml # Channel A - Primary channel name: "openai-primary" type: "openai" weight: 100 # High priority base_url: "https://api.openai.com/v1" # Channel B - Backup channel name: "openai-backup" type: "openai" weight: 50 # Medium priority base_url: "https://api.openai.com/v1" # Channel C - Third-party channel name: "azure-openai" type: "azure" weight: 30 # Low priority base_url: "https://your-resource.openai.azure.com" ``` ### 2. Enable Load Balancing Load balancing is automatically enabled, no additional configuration needed. The system will: - Automatically detect channel health status - Sort channels based on strategy scores - Intelligently select the optimal channel - Automatically switch to the next channel on failure ### 3. Send Requests Use standard OpenAI API format: ```python from openai import OpenAI client = OpenAI( api_key="your-axonhub-api-key", base_url="http://localhost:8090/v1" ) # System will automatically select the optimal channel response = client.chat.completions.create( model="gpt-4", messages=[{"role": "user", "content": "Hello!"}] ) ``` ## 📊 Load Balancing Strategy Details ### Model Association Priority - **Purpose**: High-level traffic control and hard grouping. - **Mechanism**: Candidates are first sorted by the `priority` field in the Model Association. - **Rule**: Lower values have higher priority. All candidates in priority group `N` will be exhausted before any candidate in group `N+1` is considered. - **Use Case**: Primary/Secondary channel separation, A/B testing (by setting same priority). ### Trace Aware Strategy - **Purpose**: Maintain channel consistency for multi-turn conversations - **Mechanism**: If request contains trace ID, prioritize previously successful channel - **Advantage**: Avoids initialization delays from channel switching - **Scoring**: Matching channel gets 1000 points, otherwise 0 points ### Error Aware Strategy - **Purpose**: Avoid unhealthy channels - **Base Score**: 200 points for healthy channels - **Scoring Factors**: - Consecutive failures: -50 points per failure - Recent failure (within 5 min): up to -100 points (time-decaying) - Success rate >90%: +30 points - Success rate <50%: -50 points - **Recovery**: Failed channels automatically recover priority over time as the time-decay penalty decreases. ### Weight Round Robin Strategy - **Purpose**: Proportional distribution based on weight and historical load. - **Algorithm**: Normalizes historical request counts by channel weight. Higher weight channels can handle more requests before their score drops. - **Scoring**: `150 * exp(-normalized_request_count / 150)` - **Range**: 10-150 points ### Connection Strategy - **Purpose**: Prevent individual channel overload (saturation protection) - **Scoring**: Based on current connection utilization - **Mechanism**: `50 * (1 - active_connections / max_connections)` ## 🔧 Advanced Configuration ### Enable Debug Mode View detailed load balancing decision process by setting the environment variable. ```bash # Set environment variable export AXONHUB_DEBUG_LOAD_BALANCER_ENABLED=true ``` ### View Decision Logs ```bash # View load balancing decisions tail -f axonhub.log | grep "Load balancing decision" # View specific channel scoring tail -f axonhub.log | grep "Channel load balancing details" # Use jq to format JSON logs tail -f axonhub.log | jq 'select(.msg | contains("Load balancing"))' ``` ## 📈 Monitoring and Troubleshooting ### Key Metrics - **Channel switching frequency** - Should be relatively low under normal conditions - **Error rate distribution** - High error rate on a channel may indicate configuration issues - **Response time** - Load balancing should optimize overall response time ### Common Issues **Q: Why do requests always route to the same channel?** A: Check if session consistency is enabled. Requests with the same trace ID will prioritize the same channel. **Q: What to do if channels don't switch?** A: Check Error Aware strategy scoring. The channel may still be healthy or needs time to recover. **Q: How to verify load balancing is working?** A: Enable debug mode and view channel scoring and sorting in logs. ## 🎛️ Best Practices ### 1. Channel Configuration - Set different weight values to reflect priorities - Configure multiple different provider channels for higher availability - Regularly check channel health status ### 2. Monitoring Setup - Monitor error rates and response times for each channel - Set alerts when a channel continuously fails - Regularly analyze load balancing decision logs ### 3. Performance Optimization - Set higher weights for geographically closer channels - Adjust channel priorities based on cost considerations - Use session consistency to improve user experience ## 🔗 Related Documentation - [OpenAI API](../api-reference/openai-api.md) - [Anthropic API](../api-reference/anthropic-api.md) - [Gemini API](../api-reference/gemini-api.md) - [Channel Management Guide](../getting-started/quick-start.md) - [Tracing and Debugging](tracing.md)