# Sogni SDK - LLM Context Index > AI-friendly documentation for the Sogni SDK (JavaScript/Node.js) > Full documentation: llms-full.txt | TypeScript types: dist/index.d.ts > npm: @sogni-ai/sogni-client | GitHub: https://github.com/Sogni-AI/sogni-client > Current package: 4.2.0-alpha.24 | Runtime: Node.js >=22 or modern browser ## Quick Reference ### Installation ```bash npm install @sogni-ai/sogni-client ``` The SDK publishes CommonJS (`dist/index.js`), ESM (`dist-esm/index.js`), and TypeScript declarations (`dist/index.d.ts`). Public entry point: `SogniClient.createInstance()`. ### Minimal Image Generation ```javascript import { SogniClient } from '@sogni-ai/sogni-client'; // Option 1: API key auth (recommended) — no login() needed const sogni = await SogniClient.createInstance({ appId: 'my-app-uuid', apiKey: 'your-api-key' }); // Option 2: Username/password auth // const sogni = await SogniClient.createInstance({ appId: 'my-app-uuid' }); // await sogni.account.login('username', 'password'); await sogni.projects.waitForModels(); const project = await sogni.projects.create({ type: 'image', modelId: 'flux1-schnell-fp8', positivePrompt: 'A cat wearing a hat', numberOfMedia: 1, steps: 4, guidance: 1 }); const urls = await project.waitForCompletion(); console.log(urls[0]); // Image URL (valid 24 hours) ``` ### Minimal Video Generation ```javascript const project = await sogni.projects.create({ type: 'video', network: 'fast', // Required for video modelId: 'wan_v2.2-14b-fp8_t2v_lightx2v', positivePrompt: 'Ocean waves at sunset', numberOfMedia: 1, duration: 5, // seconds fps: 16 }); const urls = await project.waitForCompletion(); console.log(urls[0]); // Video URL (valid 24 hours) ``` --- ## Index of Topics 1. **Authentication & Setup** - Client initialization, login, network types 2. **Image Generation** - Text-to-image, img2img, ControlNets 3. **Video Generation (WAN 2.2)** - t2v, i2v, s2v, animate-move, animate-replace 4. **Video Generation (LTX-2.3)** - Recommended video models with different fps behavior 5. **Video Generation (Seedance 2.0)** - External API T2V/I2V/V2V at 24fps 6. **Audio Generation (ACE-Step 1.5)** - Text-to-music with optional lyrics 7. **LLM Text Generation** - Chat completions, streaming, multi-turn conversations 8. **LLM Tool Calling** - Function calling with custom tools and Sogni platform tools 9. **Vision Chat** - Multimodal image understanding with VLM (scene description, OCR, object detection, visual analysis) 10. **Project Parameters** - Complete parameter reference 11. **Events & Progress** - Real-time tracking, completion handling 12. **Models & Presets** - Discovering available models, size presets, samplers 13. **Error Handling** - Common errors and recovery 14. **API Reference** - Full method signatures --- ## 1. Authentication & Setup ### Client Creation with API Key (Recommended) Get your API key: Log in to dashboard.sogni.ai and click your Username dropdown in the top-right corner. Each email-verified account is allowed 400 free Spark render credits per month. On the API, free credits can be used with Z-Image Turbo; paid credits can access all models and features. ```javascript const sogni = await SogniClient.createInstance({ appId: 'unique-uuid', // Required - identifies your app network: 'fast', // 'fast' (GPU) or 'relaxed' (Mac) apiKey: 'your-api-key' // Auto-authenticates, no login() needed }); ``` ### Client Creation with Username/Password ```javascript const sogni = await SogniClient.createInstance({ appId: 'unique-uuid', network: 'fast' }); await sogni.account.login(username, password); ``` ### API Key vs Username/Password - **API key**: Pass `apiKey` to `createInstance()`. Auto-authenticates via WebSocket. No `login()` call needed. Most REST API calls (balance, profile, etc.) available. Sensitive operations (withdrawals, staking, 2FA) not available. - **Username/password**: Call `sogni.account.login()` after creating instance. Full REST API access. ### Network Types - `fast` - High-end GPUs, faster, more expensive. **Required for video.** - `relaxed` - Mac devices, cheaper. Image only. --- ## 2. Image Generation ### Basic Parameters ```javascript { type: 'image', modelId: string, // e.g., 'flux1-schnell-fp8' positivePrompt: string, negativePrompt?: string, stylePrompt?: string, numberOfMedia: number, // How many images steps?: number, // 4 for Flux, 20-40 for SD guidance?: number, // 1 for Flux, 7.5 for SD sizePreset?: string, // or 'custom' with width/height width?: number, height?: number, seed?: number, sampler?: string, scheduler?: string, outputFormat?: 'png' | 'jpg' | 'webp' } ``` ### With Starting Image (img2img) ```javascript { type: 'image', startingImage: fs.readFileSync('./input.png'), startingImageStrength: 0.5 // 0-1, higher = more influence } ``` ### With ControlNet ```javascript { type: 'image', controlNet: { name: 'canny' | 'depth' | 'openpose' | 'lineart' | ..., image: imageBuffer, strength: 0.8, mode: 'balanced' | 'prompt_priority' | 'cn_priority' } } ``` --- ## 3. Video Generation (WAN 2.2) ### CRITICAL: WAN 2.2 FPS Behavior WAN models always generate at 16fps internally. The `fps` parameter only controls post-render interpolation: - `fps: 16` - No interpolation, output is 16fps - `fps: 32` - Frames doubled via interpolation Frame calculation: `duration * 16 + 1` Example: 5 seconds = 81 frames (regardless of fps setting) ### Model IDs | Workflow | Speed Model | Quality Model | | --------------- | ----------------------------------------- | -------------------- | | Text-to-Video | wan_v2.2-14b-fp8_t2v_lightx2v | wan_v2.2-14b-fp8_t2v | | Image-to-Video | wan_v2.2-14b-fp8_i2v_lightx2v | wan_v2.2-14b-fp8_i2v | | Sound-to-Video | wan_v2.2-14b-fp8_s2v_lightx2v | wan_v2.2-14b-fp8_s2v | | Animate-Move | wan_v2.2-14b-fp8_animate-move_lightx2v | - | | Animate-Replace | wan_v2.2-14b-fp8_animate-replace_lightx2v | - | ### Workflow Asset Requirements | Workflow | referenceImage | referenceAudio | referenceVideo | | --------------- | -------------- | -------------- | -------------- | | t2v | - | - | - | | i2v | Required | - | - | | s2v | Required | Required | - | | animate-move | Required | - | Required | | animate-replace | Required | - | Required | ### Image-to-Video Example ```javascript const project = await sogni.projects.create({ type: 'video', network: 'fast', modelId: 'wan_v2.2-14b-fp8_i2v_lightx2v', positivePrompt: 'camera slowly zooms in', referenceImage: fs.readFileSync('./image.png'), duration: 5, fps: 16, numberOfMedia: 1 }); ``` ### Sound-to-Video (Lip Sync) ```javascript const project = await sogni.projects.create({ type: 'video', network: 'fast', modelId: 'wan_v2.2-14b-fp8_s2v_lightx2v', referenceImage: fs.readFileSync('./face.jpg'), referenceAudio: fs.readFileSync('./speech.m4a'), audioStart: 0, // Start position in audio audioDuration: 5, // Seconds of audio to use duration: 5, fps: 16, numberOfMedia: 1 }); ``` --- ## 4. Video Generation (LTX-2.3) ### LTX FPS Behavior (Different from WAN!) LTX models generate at the actual specified FPS (1-60 range). No interpolation. - Frame calculation: `duration * fps + 1` - Frame count must follow: `1 + n*8` (1, 9, 17, 25, 33, ...) Example: 5 seconds at 24fps = 121 frames ### LTX Model IDs - Speed models (`_distilled` suffix): 8-step, faster - Quality models (`_dev` suffix or no suffix): 20-step, best quality **LTX-2.3 22B (Recommended):** | Workflow | Fast | Quality | |----------|------|---------| | Text-to-Video | `ltx23-22b-fp8_t2v_distilled` | `ltx23-22b-fp8_t2v_dev` | | Image-to-Video | `ltx23-22b-fp8_i2v_distilled` | `ltx23-22b-fp8_i2v_dev` | | Audio-to-Video | `ltx23-22b-fp8_a2v_distilled` | `ltx23-22b-fp8_a2v_dev` | | Image+Audio-to-Video | `ltx23-22b-fp8_ia2v_distilled` | `ltx23-22b-fp8_ia2v_dev` | **Video-to-Video ControlNet (LTX-2.3):** | Workflow | Fast | Quality | |----------|------|---------| | Video-to-Video (ControlNet) | `ltx23-22b-fp8_v2v_distilled` | `ltx23-22b-fp8_v2v_dev` | IMPORTANT: ControlNet (canny/pose/depth/detailer) requires a `_v2v` model, NOT `_i2v`. --- ## 5. Video Generation (Seedance 2.0) Seedance models are external API-backed Spark-only video models. They generate at fixed 24fps and currently support 4-15 second direct SDK outputs. Seedance can combine image, video, and audio reference assets in one request: up to 9 images, 3 videos, 3 audios, and 12 total assets. Text+audio-only is unsupported; include at least one image or video reference when using audio references. In prompts, use `@Image1`, `@Video1`, and `@Audio1` tags counted independently by modality in attachment order. Assign each useful reference a role, prefer positive preservation language, and review exact text/logos, lip-sync, voice cloning, and real-human-reference behavior. | Model ID | Context Assets | Notes | | --------------------- | -------------------------------------- | -------------------- | | `seedance-2-0` | Optional image, video, and audio refs | 24fps, 4-15s, 1080p | | `seedance-2-0-fast` | Optional image, video, and audio refs | 24fps, 4-15s, 720p | ```javascript const project = await sogni.projects.create({ type: 'video', network: 'fast', modelId: 'seedance-2-0', positivePrompt: 'A cinematic neon skyline time lapse', duration: 5, fps: 24, width: 1920, height: 1088, tokenType: 'spark' }); ``` Use `referenceImageUrls`, `referenceVideoUrls`, and `referenceAudioUrls` for Seedance HTTPS context URLs. Use `referenceImage`/`referenceImageEnd`, `referenceVideo`, and `referenceAudio` for single local file references. Example prompt: `Use @Image1 for product identity, @Video1 for camera movement, and @Audio1 for music rhythm. Keep the product silhouette and logo placement consistent.` Aliases used by chat tools and examples: `seedance2` and `seedance2-fast`. V2V uses `seedance2` as the model selector and `seedance-v2v` as the control mode. Seedance Fast accepts optional image, video, and audio references and caps at 720p. See `examples/workflow_partner_seedance_video.mjs` for focused endpoint coverage across Seedance T2V/I2V/IA2V/V2V and multimodal context prompt expansion. --- ## 6. Audio Generation (ACE-Step 1.5) ### Model Variants | Model ID | Name | Description | | ------------------ | ------------- | ------------------------------------ | | ace_step_1.5_turbo | Fast & Catchy | Quick generation, best quality sound | | ace_step_1.5_sft | More Control | More accurate lyrics, less stable | ### Text-to-Music (Instrumental) ```javascript const project = await sogni.projects.create({ type: 'audio', modelId: 'ace_step_1.5_turbo', positivePrompt: 'Upbeat electronic dance music with synth leads', numberOfMedia: 1, duration: 30, // 10-600 seconds bpm: 128, // 30-300 keyscale: 'C major', timesignature: '4', // 4/4 time steps: 8, outputFormat: 'mp3' }); const urls = await project.waitForCompletion(); ``` ### Text-to-Music (With Lyrics) ```javascript const project = await sogni.projects.create({ type: 'audio', modelId: 'ace_step_1.5_sft', positivePrompt: 'Soft acoustic folk ballad', lyrics: 'Verse 1:\nWalking down a quiet road...', language: 'en', numberOfMedia: 2, // Generate 2 versions duration: 60, bpm: 90, keyscale: 'A minor', composerMode: true, creativity: 0.85, promptStrength: 2.0 }); ``` ### Key Parameters | Parameter | Range | Default | Notes | | -------------- | ------------ | ------- | --------------------------- | | duration | 10-600 | 30 | Seconds of audio | | bpm | 30-300 | 120 | Beats per minute | | keyscale | key + scale | C major | e.g., "A minor", "F# major" | | timesignature | 2, 3, 4, 6 | 4 | Time signature | | lyrics | string | - | Omit for instrumental | | language | code | en | 51 languages supported | | composerMode | boolean | true | AI composer mode | | promptStrength | 0-10 | 2.0 | Prompt adherence | | creativity | 0-2 | 0.85 | Composition temperature | | steps | 4-16 | 8 | Inference steps | | outputFormat | mp3/wav/flac | mp3 | Audio format | --- ## 7. LLM Text Generation The Sogni SDK supports LLM text generation via the Supernet. Socket-native chat uses `sogni.chat.completions.create()`. Hosted REST chat uses `sogni.chat.hosted.create()`. Durable server-side chat runs use `sogni.chat.runs`. ### Chat Completion (Non-Streaming) ```javascript const response = await sogni.chat.completions.create({ model: 'qwen3.6-35b-a3b-gguf-iq4xs', messages: [ { role: 'system', content: 'You are a helpful assistant.' }, { role: 'user', content: 'Explain quantum computing briefly.' } ], max_tokens: 4096, temperature: 0.7, top_p: 0.9 }); console.log(response.content); ``` ### Streaming Chat Completion ```javascript const stream = await sogni.chat.completions.create({ model: 'qwen3.6-35b-a3b-gguf-iq4xs', messages: [{ role: 'user', content: 'Tell me a story' }], max_tokens: 4096, stream: true }); for await (const chunk of stream) { process.stdout.write(chunk.content || ''); } ``` ### Key Parameters | Parameter | Type | Default | Notes | | ----------------- | ------- | -------------------------- | ------------------------------------------------ | | model | string | qwen3.6-35b-a3b-gguf-iq4xs | LLM model ID | | messages | array | - | Chat messages (system/user/assistant/tool roles) | | max_tokens | number | 4096 | Maximum output tokens | | temperature | number | 0.7 | Sampling temperature (0-2) | | top_p | number | 0.9 | Nucleus sampling (0-1) | | frequency_penalty | number | 0 | Repetition penalty (-2 to 2) | | presence_penalty | number | 0 | Topic penalty (-2 to 2) | | stream | boolean | false | Enable token-by-token streaming | | think | boolean | server default | Sends `chat_template_kwargs.enable_thinking` | | taskProfile | string | - | `general`, `coding`, or `reasoning` preset hint | | response_format | object | - | OpenAI-compatible JSON object/schema constraint | | tools | array | - | OpenAI-compatible function tools | | tool_choice | string/object | auto | `auto`, `none`, `required`, or function choice | | sogni_tools | boolean/string | - | Inject hosted creative tools server-side | ### Thinking Mode Control model reasoning/thinking with `think`: - `think: true` sends `chat_template_kwargs: { enable_thinking: true }`. - `think: false` sends `chat_template_kwargs: { enable_thinking: false }`. - Omit `think` to use server defaults. Streaming chunks and completion results expose generated text through `content` and tool invocations through `tool_calls`. --- ## 8. LLM Tool Calling (Function Calling) Define custom tools the LLM can invoke. The LLM returns structured tool call arguments; you execute the function and feed results back. ### Tool Definition (OpenAI-compatible format) ```javascript const tools = [ { type: 'function', function: { name: 'get_weather', description: 'Get current weather for a city', parameters: { type: 'object', properties: { location: { type: 'string', description: 'City name' } }, required: ['location'] } } } ]; const response = await sogni.chat.completions.create({ model: 'qwen3.6-35b-a3b-gguf-iq4xs', messages: [{ role: 'user', content: "What's the weather in Austin?" }], tools, tool_choice: 'auto' }); // Check response.tool_calls for tool invocations. ``` ### Sogni Platform Tools — Media Generation via Chat Combine LLM tool calling with Sogni's generation APIs to create media from natural language: - **Image Generation** — "Create an image of a cyberpunk city at night" - **Image Editing / Reference-Guided Images** — "Edit this portrait to look like a renaissance painting" - **Video Generation** — "Generate a video of ocean waves at sunset" - **Sound-to-Video** — "Turn this song into a music video" - **Video-to-Video** — "Restyle this video as anime" - **Music Generation** — "Compose a jazz song about the rain" Canonical hosted creative-tool surface (`SogniTools.all`, 24 tools, executed server-side via `chat.hosted.create()` / `chat.runs.create()`): - `generate_image`, `edit_image`, `generate_video`, `generate_music`, `sound_to_video`, `video_to_video` (generation) - `restore_photo`, `apply_style`, `refine_result`, `change_angle` (image-edit adapters) - `animate_photo` (image-to-video with multi-source fan-out via `sourceImageIndices`; composed into one MP4 by default — opt out with `stitched: false`) - `stitch_video`, `orbit_video`, `dance_montage` (server-side composition into a single MP4) - `extend_video`, `replace_video_segment` (append or replace bounded windows; replacements can trim source windows with `replacementStartSeconds` / `replacementEndSeconds`) - `overlay_video`, `add_subtitles` (ffmpeg post-production on existing videos) - `enhance_prompt` (model-ready image/video/music/edit prompt expansion) - `compose_script` (scripts, storyboards, trailers, social shorts, campaign beats, and video prompts) - `compose_lyrics`, `compose_instrumental` (vocal lyrics or instrumental music structures) - `compose_workflow`, `compose_workflow_template` (durable workflow and template planners) The default `creative-tools` surface includes the tools above. `sogni_tools: "creative-agent"` adds workflow control and asset-manifest tools. Per-request media context lets generated images / videos / audio be addressed by stable indices across tool rounds (each tool result envelope returns `startIndex` and `indices`). Media-conditioned tools accept explicit inline base64 `data:` URIs such as `source_image_url`, `reference_image_url`, `reference_audio_url`, and `reference_video_url`. Tool image inputs accept PNG/JPEG, tool audio inputs accept MP3/M4A/WAV, and tool video inputs accept MP4 or MOV/QuickTime. For uploaded or Sogni-hosted HTTP(S) assets, pass media through request-level `mediaReferences` / `mediaContext` or through `sogni.workflows` dependencies. The `workflow_text_chat_sogni_tools.mjs` example demonstrates the core image/video/music composition flows. `workflow_creative_agent_tools.mjs` demonstrates server-side hosted Sogni tool injection through `/v1/chat/completions`. Pass `SogniTools.all` (or individual tool defs) to the LLM and execute calls through `sogni.chat.hosted.create()` / `sogni.chat.runs.create()`. Durable creative workflows: the SDK exposes persistent multi-step workflows through `sogni.workflows` (`start`, `list`, `get`, `events`, `streamEvents`, `resume`, `reseed`, `cancel`) backed by `/v1/creative-agent/workflows`. `start()` accepts either an inline `input` plan or a saved `workflowId` + `inputs`. Saved workflow templates are managed via `sogni.workflows.templates` (`list`, `get`, `create`, `update`, `delete`, `fork`). All endpoints use the SDK's active auth. See `workflow_creative_agent_workflows.mjs` for start/list/get/events/stream/cancel coverage. Default generation models used by platform tools: | Tool | Model | Description | |------|-------|-------------| | `generate_image` | z_image_turbo_bf16 | Z-Image Turbo, ultra-fast 8-step | | `edit_image` | workflow/model-dependent | Reference-guided image editing with edit-capable Qwen/Flux models | | `generate_video` | ltx23-22b-fp8_t2v_distilled / ltx23-22b-fp8_i2v_distilled / Seedance selectors | Text-to-video or image-to-video | | `sound_to_video` | ltx23-22b-fp8_a2v_distilled / ltx23-22b-fp8_ia2v_distilled | Audio-driven video generation | | `video_to_video` | ltx23-22b-fp8_v2v_distilled / seedance-2-0 / WAN animate models | Video transformation / motion transfer | | `generate_music` | ace_step_1.5_turbo | ACE-Step 1.5 Turbo, fast music | --- ## 9. Vision Chat (Multimodal Image Understanding) The SDK supports multimodal vision chat via VLM (Vision-Language Model) workers on the Sogni network. Send images alongside text messages for scene description, OCR, object detection, visual analysis, and multi-image comparison. ### VLM Model | Model ID | Name | Description | | ---------------------------- | --------------- | -------------------------------------------------------- | | `qwen3.6-35b-a3b-gguf-iq4xs` | Qwen3.6 35B VLM | Vision-language model with 262,144 native context length | ### Multimodal Message Format ```javascript const messages = [ { role: 'system', content: 'You are a visual analysis assistant.' }, { role: 'user', content: [ { type: 'image_url', image_url: { url: 'data:image/jpeg;base64,...' } }, { type: 'text', text: 'Describe this image in detail.' } ] } ]; const stream = await sogni.chat.completions.create({ model: 'qwen3.6-35b-a3b-gguf-iq4xs', messages, max_tokens: 4096, stream: true, think: false, taskProfile: 'general' }); ``` Vision `image_url` parts accept inline JPEG or PNG `data:` URIs only. A maximum of 20 images is allowed per request, each image must be 10MB or smaller, and the longest side limit of 1024px applies only to this vision path. QWEN3.6 PRESET HINTS: - General thoughtful work: `think: true, taskProfile: 'general'` - Precise coding: `think: true, taskProfile: 'coding'` - Everyday direct responses: `think: false, taskProfile: 'general'` - Analytical non-thinking tasks: `think: false, taskProfile: 'reasoning'` OPTIONAL SAMPLING OVERRIDES: - `temperature`, `top_p`, `top_k`, `min_p`, `presence_penalty`, `repetition_penalty` ### Supported Capabilities - **Scene description** — Detailed image descriptions including subjects, colors, lighting, mood - **OCR / Text extraction** — Extract visible text with layout preservation - **Object detection** — Identify objects with location, size, and spatial relationships - **Structured analysis** — Subject, composition, lighting, color, technical, mood, style, context - **Multi-image comparison** — Compare two images across multiple aspects See `workflow_text_chat_vision.mjs` for a complete interactive vision chat implementation. --- ## 10. Events & Progress ### Promise-based ```javascript const urls = await project.waitForCompletion(); ``` ### Event-based ```javascript project.on('progress', (percent) => console.log(`${percent}%`)); project.on('jobCompleted', (job) => console.log(job.resultUrl)); project.on('completed', (urls) => console.log('Done:', urls)); project.on('failed', (error) => console.error(error)); ``` External API-backed jobs may not report diffusion steps. The SDK keeps project/job progress finite by using provider progress or ETA-derived progress, and direct provider result URLs are preserved on `job.resultUrl`. ### Job-level Events ```javascript project.jobs[0].on('progress', ({ step, stepCount }) => { console.log(`Step ${step}/${stepCount}`); }); ``` --- ## 11. Discovering Models & Options ### List Available Models ```javascript const models = await sogni.projects.waitForModels(); // or const models = sogni.projects.availableModels; ``` ### Get Size Presets ```javascript const presets = await sogni.projects.getSizePresets('fast', 'flux1-schnell-fp8'); ``` ### Get Sampler/Scheduler Options ```javascript const options = await sogni.projects.getModelOptions('flux1-schnell-fp8'); console.log(options.sampler.allowed); console.log(options.scheduler.allowed); ``` --- ## 12. Cost Estimation ```javascript const cost = await sogni.projects.estimateCost({ network: 'fast', tokenType: 'spark', model: 'flux1-schnell-fp8', imageCount: 1, stepCount: 4, previewCount: 0 }); console.log(cost.sogni, cost.usd); ``` Use `estimateVideoCost()` for video jobs and `estimateAudioCost()` for audio jobs. --- ## 13. Key Types Summary ```typescript type ProjectParams = ImageProjectParams | VideoProjectParams | AudioProjectParams; interface ImageProjectParams { type: 'image'; modelId: string; positivePrompt: string; numberOfMedia: number; // ... see llms-full.txt for complete list } interface VideoProjectParams { type: 'video'; modelId: string; positivePrompt: string; numberOfMedia: number; duration?: number; fps?: number; referenceImage?: File | Buffer | Blob; referenceAudio?: File | Buffer | Blob; referenceVideo?: File | Buffer | Blob; // ... see llms-full.txt for complete list } interface AudioProjectParams { type: 'audio'; modelId: string; positivePrompt: string; numberOfMedia: number; duration?: number; bpm?: number; lyrics?: string; language?: string; // ... see llms-full.txt for complete list } ``` --- ## Full Documentation For complete API reference, all parameters, and advanced usage: - **llms-full.txt** - Comprehensive guide in this repository - **https://sdk-docs.sogni.ai** - TypeDoc API documentation - **https://github.com/Sogni-AI/sogni-client/tree/main/examples** - Working examples