--- name: zhipuai-glm-cli-and-agents description: CLI invocation, direct API usage (curl/Python), non-interactive Claude Code with GLM backend, sub-agent model configuration, tool calling, streaming, and multi-agent orchestration patterns on the Z.ai GLM Coding Plan. type: reference --- # Z.ai GLM — CLI Invocation & Agent Communication ## API Endpoints | Format | Endpoint | |--------|----------| | Native Z.ai | `https://api.z.ai/api/paas/v4/chat/completions` | | OpenAI-compatible | `https://api.z.ai/v1/chat/completions` | | Anthropic-compatible | `https://api.z.ai/api/anthropic` | Authentication: `Authorization: Bearer ` on all endpoints. --- ## Direct curl Invocation ### Basic completion (OpenAI-compatible endpoint) ```bash curl -s https://api.z.ai/v1/chat/completions \ -H "Authorization: Bearer $ZAI_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "model": "GLM-4.7", "messages": [ {"role": "system", "content": "You are a senior software engineer."}, {"role": "user", "content": "Write a Python function to parse ISO 8601 dates."} ], "temperature": 1.0, "max_tokens": 4096 }' | jq '.choices[0].message.content' ``` ### Streaming (SSE) ```bash curl -sN https://api.z.ai/v1/chat/completions \ -H "Authorization: Bearer $ZAI_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "model": "GLM-4.7", "messages": [{"role": "user", "content": "Implement a binary search in Go."}], "stream": true, "temperature": 1.0 }' # -N disables curl buffering — required for SSE streams # Parse: each line starting with "data:" contains a JSON chunk; stream ends at data: [DONE] ``` ### With thinking/reasoning mode ```bash curl -s https://api.z.ai/v1/chat/completions \ -H "Authorization: Bearer $ZAI_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "model": "GLM-5.1", "messages": [{"role": "user", "content": "Design a distributed rate limiter."}], "thinking": {"type": "enabled"}, "temperature": 1.0, "max_tokens": 8192 }' ``` --- ## Official Python SDK (`zai`) ```bash pip install zai ``` ### Basic usage ```python from zai import ZaiClient client = ZaiClient(api_key="your-z.ai-api-key") response = client.chat.completions.create( model="GLM-4.7", messages=[{"role": "user", "content": "Refactor this function for clarity."}], temperature=1.0, max_tokens=4096 ) print(response.choices[0].message.content) ``` ### Streaming ```python stream = client.chat.completions.create( model="GLM-4.7", messages=[{"role": "user", "content": "Implement a Fibonacci generator."}], stream=True, temperature=1.0 ) for chunk in stream: if chunk.choices[0].delta.content: print(chunk.choices[0].delta.content, end="", flush=True) ``` ### Thinking/reasoning mode ```python response = client.chat.completions.create( model="GLM-5.1", messages=[{"role": "user", "content": "Debug this race condition."}], thinking={"type": "enabled"}, temperature=1.0, max_tokens=8192 ) ``` ### Via OpenAI SDK (drop-in) ```python from openai import OpenAI client = OpenAI( api_key="your-z.ai-api-key", base_url="https://api.z.ai/v1" ) response = client.chat.completions.create( model="GLM-4.7", messages=[{"role": "user", "content": "Write unit tests for this module."}], temperature=1.0 ) ``` --- ## Tool Calling (Function Calling) Standard OpenAI-compatible schema. Supported on `GLM-4.6`, `GLM-4.7`, `GLM-5`. ```python from openai import OpenAI client = OpenAI(api_key="your-key", base_url="https://api.z.ai/v1") tools = [ { "type": "function", "function": { "name": "run_tests", "description": "Execute the test suite and return results", "parameters": { "type": "object", "properties": { "test_path": {"type": "string", "description": "Path to test file or directory"}, "verbose": {"type": "boolean", "default": False} }, "required": ["test_path"] } } } ] response = client.chat.completions.create( model="GLM-4.7", messages=[{"role": "user", "content": "Run the auth tests and summarize failures."}], tools=tools, tool_choice="auto" ) if response.choices[0].message.tool_calls: for call in response.choices[0].message.tool_calls: print(f"Tool: {call.function.name}") print(f"Args: {call.function.arguments}") ``` ### Streaming tool calls ```python response = client.chat.completions.create( model="GLM-4.7", messages=[{"role": "user", "content": "Check the weather in Beijing."}], tools=tools, stream=True, tool_stream=True # enables streaming of tool call parameters ) for chunk in response: if not chunk.choices: continue delta = chunk.choices[0].delta if delta.content: print(delta.content, end="", flush=True) if delta.tool_calls: for tc in delta.tool_calls: # accumulate streamed JSON arguments pass ``` `tool_stream=True` reduces latency by streaming tool argument JSON as it's generated rather than buffering the full call. --- ## Non-Interactive Claude Code (CLI Automation) Claude Code supports non-interactive mode via `-p` flag — useful for scripts, CI, and agent-to-agent pipelines. ### One-shot task ```bash # Using native claude with GLM env vars pre-set ANTHROPIC_AUTH_TOKEN=$ZAI_API_KEY \ ANTHROPIC_BASE_URL=https://api.z.ai/api/anthropic \ API_TIMEOUT_MS=3000000 \ claude -p "Add docstrings to all public functions in src/api.py" ``` ### Using the `z`/`glm` community wrapper Two community projects provide `z` and `glm` as drop-in wrappers: **geoh/z.ai-powered-claude-code** — installs `z` and `glm` commands: ```bash bash scripts/install.sh # Linux/macOS # Windows: .\scripts\install.ps1 z "Implement the pagination feature described in JIRA-123" glm --model opus "Refactor the auth module for testability" z -p "Run the test suite and fix all failures" # non-interactive ``` **ankurkakroo2/claude-code-glm-setup** — adds `glm` shell function: ```bash glm -p "what is 2+2" # non-interactive single prompt glm --debug api # enables API request logging ``` Both wrappers use `--settings ~/.claude/glm-settings.json` to inject Z.ai credentials without touching the default Claude config. --- ## Sub-Agent Model Configuration When Claude Code spawns sub-agents (background tasks, parallel tool execution), you can control which GLM model they use independently of the main model. ### `.zai.json` config (geoh wrapper) ```json { "apiKey": "your-z.ai-key", "opusModel": "GLM-5.1", "sonnetModel": "GLM-4.7", "haikuModel": "GLM-4.5-Air", "subagentModel": "GLM-4.7", "defaultModel": "opus", "enableThinking": "true", "reasoningEffort": "high" } ``` **`subagentModel`** — the model used for sub-agent tasks. Recommended: `GLM-4.7` (1× quota cost) to prevent sub-agents from burning through advanced model budget while the main agent uses GLM-5.1. ### Per-project config Place `.zai.json` in a project root to override global settings for that project: ```json { "defaultModel": "sonnet", "subagentModel": "GLM-4.5-Air", "reasoningEffort": "medium" } ``` Config search order (first found wins): `./.zai.json` → `$ZAI_CONFIG_PATH` → `~/.config/zai/config.json` → `~/.zai.json` ### Env var override (highest priority) ```bash export ZAI_API_KEY="your-key" # overrides apiKey in any config file ``` --- ## Agent-to-Agent Patterns ### Pattern: Orchestrator + Worker agents The 1-concurrent-request limit means true parallel fan-out won't work. Use a serial dispatch queue: ```python import time from openai import OpenAI client = OpenAI(api_key="your-key", base_url="https://api.z.ai/v1") def glm_call(prompt, model="GLM-4.7"): """Single agent call — serialize these, don't parallelize.""" response = client.chat.completions.create( model=model, messages=[{"role": "user", "content": prompt}], temperature=1.0 ) return response.choices[0].message.content # Orchestrator plans, then dispatches workers serially plan = glm_call("Break this task into 3 subtasks: implement OAuth login", model="GLM-5.1") results = [] for subtask in parse_subtasks(plan): result = glm_call(subtask, model="GLM-4.7") # workers use cheaper model results.append(result) time.sleep(0.5) # optional: be gentle on rate limits ``` ### Pattern: Shell script agent loop ```bash #!/usr/bin/env bash # Non-interactive agentic loop — each step fed to next set -e CONTEXT="Fix all failing tests in ./tests/" # Step 1: analyze ANALYSIS=$(ANTHROPIC_AUTH_TOKEN=$ZAI_API_KEY \ ANTHROPIC_BASE_URL=https://api.z.ai/api/anthropic \ API_TIMEOUT_MS=3000000 \ claude -p "Analyze: $CONTEXT. List specific failures.") # Step 2: fix based on analysis ANTHROPIC_AUTH_TOKEN=$ZAI_API_KEY \ ANTHROPIC_BASE_URL=https://api.z.ai/api/anthropic \ API_TIMEOUT_MS=3000000 \ claude -p "Given this analysis: $ANALYSIS — fix the failures." ``` ### Pattern: Claude Code as sub-agent caller Within a Claude Code session backed by GLM, you can spawn Claude Code subprocesses pointed at the same Z.ai backend: ```python import subprocess result = subprocess.run( ["claude", "-p", "Implement the data validation layer"], env={ **os.environ, "ANTHROPIC_AUTH_TOKEN": os.getenv("ZAI_API_KEY"), "ANTHROPIC_BASE_URL": "https://api.z.ai/api/anthropic", "API_TIMEOUT_MS": "3000000" }, capture_output=True, text=True, timeout=300 ) print(result.stdout) ``` --- ## LiteLLM Integration (Multi-Provider Orchestration) LiteLLM provides a unified interface for routing across providers. Prefix Z.ai models with `zai/`: ```python import litellm # Route to Z.ai GLM response = litellm.completion( model="zai/GLM-4.7", messages=[{"role": "user", "content": "Implement error handling."}], api_key=os.getenv("ZAI_API_KEY") ) # Fallback: try GLM-5.1 first, fall back to GLM-4.7 response = litellm.completion( model="zai/GLM-5.1", messages=[{"role": "user", "content": "Solve this hard bug."}], api_key=os.getenv("ZAI_API_KEY"), fallbacks=["zai/GLM-4.7"] ) ``` --- ## Reasoning Effort Levels (geoh wrapper) The `reasoningEffort` field in `.zai.json` maps to model thinking depth: | Value | Use Case | |-------|---------| | `auto` | Let model decide | | `low` | Simple tasks, fast response | | `medium` | Balanced (default) | | `high` | Complex reasoning, architecture | | `max` | Maximum — most quota intensive | --- ## Windows Notes | Shell | Wrapper | Limitations | |-------|---------|------------| | Git Bash | `z` / `glm` | Full functionality, status line works | | PowerShell | `z.ps1` / `glm.ps1` | Core features, no status line | | CMD | `z.cmd` / `glm.cmd` | Core features, no status line | Status line requires bash + stdin piping. On Windows, use Git Bash for full experience. Requires `jq` installed and on PATH for JSON processing in wrapper scripts.