---
name: llm-ops
description: >
  Local LLM health checks and cache management.
  Probe Ollama/vLLM/SGLang endpoints, clean model caches.
triggers:
  - check llm
  - is ollama running
  - llm health
  - vllm status
  - clean llm cache
  - free gpu memory
  - clear huggingface cache
  - ollama status
allowed-tools: Bash
metadata:
  short-description: Local LLM health checks and cache management
---

# LLM Ops

Manage local LLM runtimes and caches.

## Commands

```bash
# Check all common LLM endpoints (Ollama, vLLM, SGLang)
./scripts/health.sh

# Check specific endpoint
./scripts/health.sh --target ollama:http://127.0.0.1:11434

# Continue even if some fail
./scripts/health.sh --warn-only

# Show cache sizes (dry-run)
./scripts/cache-clean.sh

# Actually clean caches
./scripts/cache-clean.sh --execute

# Clean additional path
./scripts/cache-clean.sh --path ~/.cache/torch --execute
```

## Default Endpoints Checked

- Ollama: `http://127.0.0.1:11434`
- vLLM: `http://127.0.0.1:8000`
- SGLang: `http://127.0.0.1:30000`

## Default Cache Directories

- `~/.cache/ollama`
- `~/.cache/huggingface`
- `~/.cache/vllm`

## Environment Variables

| Variable             | Default     | Description                  |
| -------------------- | ----------- | ---------------------------- |
| `LLM_HEALTH_TIMEOUT` | 2           | Seconds to wait per endpoint |
| `LLM_CACHE_DIRS`     | (see above) | Space-separated cache paths  |