--- name: z-ai-api description: | Z.ai API integration for building applications with GLM models. Use when working with Z.ai/ZhipuAI APIs for: (1) Chat completions with GLM-4.7/4.6/4.5 models, (2) Vision/multimodal tasks with GLM-4.6V, (3) Image generation with GLM-Image or CogView-4, (4) Video generation with CogVideoX-3 or Vidu models, (5) Audio transcription with GLM-ASR-2512, (6) Function calling and tool use, (7) Web search integration, (8) Translation, slide/poster generation agents. Triggers: Z.ai, ZhipuAI, GLM, BigModel, Zhipu, CogVideoX, CogView, Vidu. --- # Z.ai API Skill ## Quick Reference **Base URL:** `https://api.z.ai/api/paas/v4` **Coding Plan URL:** `https://api.z.ai/api/coding/paas/v4` **Auth:** `Authorization: Bearer YOUR_API_KEY` ## Core Endpoints | Endpoint | Purpose | |----------|---------| | `/chat/completions` | Text/vision chat | | `/images/generations` | Image generation | | `/videos/generations` | Video generation (async) | | `/audio/transcriptions` | Speech-to-text | | `/web_search` | Web search | | `/async-result/{id}` | Poll async tasks | | `/v1/agents` | Translation, slides, effects | ## Model Selection **Chat (pick by need):** - `glm-4.7` — Latest flagship, best quality, agentic coding - `glm-4.7-flash` — Fast, high quality - `glm-4.6` — Reliable general use - `glm-4.5-flash` — Fastest, lower cost **Vision:** - `glm-4.6v` — Best multimodal (images, video, files) - `glm-4.6v-flash` — Fast vision **Media:** - `glm-image` — High-quality images (HD, ~20s) - `cogview-4-250304` — Fast images (~5-10s) - `cogvideox-3` — Video, up to 4K, 5-10s - `viduq1-text/image` — Vidu video generation ## Implementation Patterns ### Basic Chat ```python from zai import ZaiClient client = ZaiClient(api_key="YOUR_KEY") response = client.chat.completions.create( model="glm-4.7", messages=[ {"role": "system", "content": "You are helpful."}, {"role": "user", "content": "Hello!"} ] ) print(response.choices[0].message.content) ``` ### OpenAI SDK Compatibility ```python from openai import OpenAI client = OpenAI( api_key="YOUR_ZAI_KEY", base_url="https://api.z.ai/api/paas/v4/" ) # Use exactly like OpenAI SDK ``` ### Streaming ```python response = client.chat.completions.create( model="glm-4.7", messages=[...], stream=True ) for chunk in response: print(chunk.choices[0].delta.content, end="") ``` ### Function Calling ```python tools = [{ "type": "function", "function": { "name": "get_weather", "description": "Get weather for a city", "parameters": { "type": "object", "properties": { "city": {"type": "string"} }, "required": ["city"] } } }] response = client.chat.completions.create( model="glm-4.7", messages=[{"role": "user", "content": "Weather in Tokyo?"}], tools=tools, tool_choice="auto" ) # Handle tool_calls in response.choices[0].message.tool_calls ``` ### Vision (Images/Video/Files) ```python response = client.chat.completions.create( model="glm-4.6v", messages=[{ "role": "user", "content": [ {"type": "image_url", "image_url": {"url": "https://..."}}, {"type": "text", "text": "Describe this image"} ] }] ) ``` ### Image Generation ```python response = client.images.generate( model="glm-image", prompt="A serene mountain at sunset", size="1280x1280", quality="hd" ) print(response.data[0].url) # Expires in 30 days ``` ### Video Generation (Async) ```python # Submit response = client.videos.generate( model="cogvideox-3", prompt="A cat playing with yarn", size="1920x1080", duration=5 ) task_id = response.id # Poll for result import time while True: result = client.async_result.get(task_id) if result.task_status == "SUCCESS": print(result.video_result[0].url) break time.sleep(5) ``` ### Web Search Integration ```python response = client.chat.completions.create( model="glm-4.7", messages=[{"role": "user", "content": "Latest AI news?"}], tools=[{ "type": "web_search", "web_search": { "enable": True, "search_result": True } }] ) # Access response.web_search for sources ``` ### Thinking Mode (Chain-of-Thought) ```python response = client.chat.completions.create( model="glm-4.7", messages=[...], thinking={"type": "enabled"}, stream=True # Recommended with thinking ) # Access reasoning_content in response ``` ## Key Parameters | Parameter | Values | Notes | |-----------|--------|-------| | `temperature` | 0.0-1.0 | GLM-4.7: 1.0, GLM-4.5: 0.6 default | | `top_p` | 0.01-1.0 | Default ~0.95 | | `max_tokens` | varies | GLM-4.7: 128K, GLM-4.5: 96K max | | `stream` | bool | Enable SSE streaming | | `response_format` | `{"type": "json_object"}` | Force JSON output | ## Error Handling - **429**: Rate limited — implement exponential backoff - **401**: Bad API key — verify credentials - **sensitive**: Content filtered — modify input ```python if response.choices[0].finish_reason == "tool_calls": # Execute function and continue conversation elif response.choices[0].finish_reason == "length": # Increase max_tokens or truncate elif response.choices[0].finish_reason == "sensitive": # Content was filtered ``` ## Reference Files For detailed API specifications, consult: - `references/chat-completions.md` — Full chat API, parameters, models - `references/tools-and-functions.md` — Function calling, web search, retrieval - `references/media-generation.md` — Image, video, audio APIs - `references/agents.md` — Translation, slides, effects agents - `references/error-codes.md` — Error handling, rate limits