---
name: optimize-llm
description: Get LLM optimization recommendations for serving latency, inference costs, and throughput improvements
allowed-tools: Read, Glob, Grep, Task
argument-hint: "[focus: latency|cost|throughput]"
---

# Optimize LLM Command

Get quick, actionable recommendations for LLM serving optimization.

## Usage

```text
/sd:optimize-llm [focus]
```

## Arguments

- `focus` (optional): Optimization priority
  - `latency` - Focus on reducing response time
  - `cost` - Focus on reducing inference costs
  - `throughput` - Focus on maximizing requests/second
  - If omitted: Provide balanced recommendations

## Examples

```text
/sd:optimize-llm
/sd:optimize-llm latency
/sd:optimize-llm cost
```

## Workflow

1. **Gather Context**
   - Search for LLM-related configuration files
   - Look for: model configs, serving configs, inference scripts
   - Identify current serving stack (vLLM, TGI, TensorRT-LLM, etc.)

2. **Spawn LLM Optimization Advisor Agent**
   Use the `llm-optimization-advisor` agent to analyze and provide recommendations. The agent specializes in:
   - Quantization strategies (INT8, INT4, FP16)
   - Batching optimization (continuous, dynamic)
   - KV cache optimization (PagedAttention)
   - Serving framework selection
   - Cost reduction strategies

3. **Present Recommendations**
   Display optimization opportunities organized by:
   - **Quick Wins** - Low effort, high impact changes
   - **Medium Effort** - Moderate changes with significant benefits
   - **Advanced** - Architectural changes for maximum performance

## Output Format

```text
## LLM Optimization Report

### Current Setup
- Model: [detected or ask]
- Framework: [detected or unknown]
- Hardware: [detected or ask]

### Quick Wins
1. [Optimization] - [Expected impact]
2. ...

### Medium Effort Optimizations
1. [Optimization] - [Expected impact]
2. ...

### Advanced Optimizations
1. [Optimization] - [Expected impact]
2. ...

### Estimated Total Impact
- Latency: [X]% improvement
- Cost: [X]% reduction
- Throughput: [X]x increase
```