--- name: external-llm description: Multi-turn conversation with an external reasoning model (DeepSeek Reasoner, o1, o3) via any OpenAI-compatible API. Session persistence, pushback workflow, and structured reports. Use for proof verification, reasoning tasks, or getting a second opinion. argument-hint: " | /report | /sessions | /new [name]" compatibility: Requires LLM_API_KEY environment variable. Optionally set LLM_BASE_URL and LLM_MODEL. license: Apache-2.0 metadata: version: "1.0" category: ai-integration --- # External LLM Multi-turn conversation with an external LLM via any OpenAI-compatible API. Designed primarily for reasoning models (DeepSeek Reasoner, o1, o3) where multi-turn verification and pushback are valuable. Works with any OpenAI-compatible provider. ## Setup Set your API credentials as environment variables: ```bash # Required export LLM_API_KEY="your-key-here" # Optional (defaults shown) export LLM_BASE_URL="https://api.deepseek.com" # any OpenAI-compatible endpoint export LLM_MODEL="deepseek-reasoner" # model to use ``` ### Common configurations | Provider | LLM_BASE_URL | LLM_MODEL | Get a key | |----------|-------------|-----------|-----------| | DeepSeek | `https://api.deepseek.com` | `deepseek-reasoner` | https://platform.deepseek.com | | OpenAI | `https://api.openai.com/v1` | `o1`, `o3-mini`, `gpt-4o` | https://platform.openai.com | | Groq | `https://api.groq.com/openai/v1` | `llama-3.3-70b-versatile` | https://console.groq.com | | Together | `https://api.together.xyz/v1` | `meta-llama/Llama-3-70b-chat-hf` | https://api.together.xyz | | Ollama (local) | `http://localhost:11434/v1` | `llama3` | -- | ## Invocation ``` /external-llm # Send message to external LLM /external-llm /report # Get final report from conversation /external-llm /new [name] # Start fresh session /external-llm /sessions # List sessions /external-llm /transcript # Show full conversation ``` ## How Agents Should Use This ### Starting a Conversation ``` /external-llm /new verify-theorem3 ``` Then send your prompt: ``` /external-llm "You are verifying a mathematical proof. Be SKEPTICAL. Show explicit work, not vague assurances. ## Content to Verify [content] ## Concerns [list] For each concern, show the explicit verification or flag as ISSUE." ``` ### Analyzing and Pushing Back After each response: 1. **Check for vagueness** — "clearly", "obviously", "by inspection" 2. **Check for unverified claims** — results stated without justification 3. **Check for brief "ALL CLEAR"** — needs explicit verification If any of these, send pushback: ``` /external-llm "You said X but didn't show work. Please compute Y explicitly. Don't just say it works." ``` ### Getting the Report After the conversation is complete (typically 3-15 turns): ``` /external-llm /report ``` Then synthesize the conversation into a structured report. ## Session Management Sessions are stored as JSON files in `~/.cache/external-llm/sessions/`. | Command | Description | |---------|-------------| | `/new [name]` | Start fresh session | | `/sessions` | List all sessions with turn counts | | `/transcript` | Show full conversation | | `/report` | Mark report requested (agent should generate) | ## Implementation Requires `requests` (`pip install requests`). ```bash python3 << 'PYEOF' import requests, json, os, sys from datetime import datetime SESSIONS_DIR = os.path.expanduser("~/.cache/external-llm/sessions") os.makedirs(SESSIONS_DIR, exist_ok=True) def load_session(name="default"): path = f"{SESSIONS_DIR}/{name}.json" if os.path.exists(path): with open(path) as f: return json.load(f) return {"name": name, "created": datetime.now().isoformat(), "messages": [], "metadata": {}} def save_session(session): path = f"{SESSIONS_DIR}/{session['name']}.json" with open(path, 'w') as f: json.dump(session, f, indent=2) def get_current_session_name(): marker = f"{SESSIONS_DIR}/.current" if os.path.exists(marker): with open(marker) as f: return f.read().strip() return "default" def set_current_session(name): with open(f"{SESSIONS_DIR}/.current", 'w') as f: f.write(name) def call_llm(messages): api_key = os.environ.get("LLM_API_KEY") if not api_key: return "ERROR: LLM_API_KEY environment variable not set." base_url = os.environ.get("LLM_BASE_URL", "https://api.deepseek.com").rstrip("/") model = os.environ.get("LLM_MODEL", "deepseek-reasoner") r = requests.post( f"{base_url}/chat/completions", headers={"Authorization": f"Bearer {api_key}", "Content-Type": "application/json"}, json={"model": model, "messages": messages, "max_tokens": 16384}, timeout=600 ) if r.status_code == 200: msg = r.json()["choices"][0]["message"] content = msg.get("content", "") or "" reasoning = msg.get("reasoning_content", "") or "" if reasoning and content: return f"[REASONING]\n{reasoning}\n\n[OUTPUT]\n{content}" return reasoning or content return f"ERROR {r.status_code}: {r.text}" arg = " ".join(sys.argv[1:]) if len(sys.argv) > 1 else "" if arg.startswith("/new"): parts = arg.split(maxsplit=1) name = parts[1] if len(parts) > 1 else "default" session = {"name": name, "created": datetime.now().isoformat(), "messages": [], "metadata": {}} save_session(session) set_current_session(name) print(f"Started new session: {name}") sys.exit(0) elif arg == "/sessions": files = sorted([f for f in os.listdir(SESSIONS_DIR) if f.endswith('.json')]) if not files: print("No sessions found.") for fname in files: with open(f"{SESSIONS_DIR}/{fname}") as f: s = json.load(f) turns = len(s['messages']) // 2 print(f" {s['name']}: {turns} turns") sys.exit(0) elif arg == "/transcript": name = get_current_session_name() session = load_session(name) print(f"=== TRANSCRIPT: {name} ({len(session['messages'])//2} turns) ===\n") for i, msg in enumerate(session['messages']): role = msg['role'].upper() content = msg['content'] print(f"\n[{role}]:\n{content}\n") print("-" * 40) sys.exit(0) elif arg == "/report": name = get_current_session_name() session = load_session(name) turns = len(session['messages']) // 2 print(f"Session: {name} ({turns} turns)") print("REPORT_REQUESTED") sys.exit(0) # Regular message current_name = get_current_session_name() session = load_session(current_name) session['messages'].append({"role": "user", "content": arg}) reply = call_llm(session['messages']) if not reply.startswith("ERROR"): session['messages'].append({"role": "assistant", "content": reply}) save_session(session) print(reply) PYEOF ``` ## Example Multi-Turn Verification ``` # Turn 1: Initial prompt /external-llm /new verify-convergence /external-llm "Verify this claim: the series converges for all p > 2. Show the explicit bound and check the boundary case p = 2." # LLM responds with analysis # Turn 2: Push back on vague step /external-llm "You claimed the tail bound works but didn't show the computation. Write out the sum explicitly for N = 10, 100, 1000." # LLM works through the cases # Turn 3: Check edge case /external-llm "What happens at p = 2 exactly? Does it diverge? How fast?" # LLM addresses edge case # Turn 4: Request report /external-llm /report ``` ## Notes - **Response time:** Varies by provider — reasoning models (DeepSeek, o1) take 30-120s, chat models are faster - **Max turns:** Aim for 3-6 turns per topic - **Push back:** Don't accept vague answers — ask for explicit work - **Session files:** Inspect `.json` files in `~/.cache/external-llm/sessions/` to debug conversations - **`reasoning_content`:** DeepSeek Reasoner returns this field; other providers may not. The script handles both cases.