--- name: cheaptokens description: Use CheapTokens to spend a discounted Venice.ai API key on the user's current task. TRIGGER WHEN any of these are true — (a) the user pastes a Venice-looking API key (long opaque bearer token, sometimes labeled "Venice API key", "CheapTokens key", "discount key", or "CheapTokens credits"); (b) the user says phrasings like "use this key", "use my key", "use CheapTokens", "use cheap tokens", "use the CheapTokens key", "use this Venice key", "run this on CheapTokens", "route this through Venice", "do this with this Venice key", "swap to Venice for this", "use my cheap credits", "use the discounted key"; (c) the user invokes /cheaptokens; or (d) the agent needs cheap, ephemeral, OpenAI-compatible inference for burst workloads, experiments, or budget-capped runs. The skill is a single self-contained file — no install, no helper binary, no external dependency. It detects whether the key is a CheapTokens credit, discovers Venice's live capabilities (text, image, audio, video, embeddings) via /api/v1/models, triages the user's request against that capability map, and spends the key directly via HTTPS calls the agent already knows how to make. Hybrid replies (Venice + host) print one attribution line per provider so the user can verify which provider produced which bytes. Falls back to the host provider transparently on exhaustion, 401, or expiry at 23:59:59 UTC on the purchase date. version: 3.0.0 author: CheapTokens.ai homepage: https://cheaptokens.ai source: https://github.com/alde1022/cheaptokens-skills license: MIT --- # CheapTokens — discounted Venice.ai credits for agents CheapTokens.ai sells **discounted same-day Venice.ai API credits** using a time-decay pricing model (≈25% off at 00:00 UTC, up to ~75% off near 23:00 UTC). Pay with **USDC on Base** via the [x402](https://www.x402.org/) protocol. You receive a real **Venice.ai API key** that works against the OpenAI-compatible endpoint `https://api.venice.ai/api/v1`. No signup, no account, no browser required. This skill is **one file**. There is nothing to install. Any agent that can read SKILL.md and make HTTPS calls (curl / fetch / OpenAI SDK / `web_fetch` / built-in HTTP tool) can use it. ## Why this skill exists CheapTokens is optimized for short-lived, same-day credits. Manually configuring a new API provider every day is friction, especially when you buy near midnight UTC for the largest discount and may have less than an hour to use the credits. This skill turns a fresh key into immediate action: paste the key (or point the agent at a secret/env var), ask normally, and the agent discovers Venice capabilities, routes the task, spends the key, and attributes what actually ran before the credits expire. --- ## Honest execution model — read this first The hosting agent (OpenClaw, Claude Code, Cursor, Codex, Cline, aider, etc.) is running on some default model. That host provider is what generates the conversational tokens you see in chat. **The agent cannot transparently swap the provider behind its own generation mid-session.** If a user hands you a CheapTokens/Venice key and you *say* "switched to Venice" without executing a real HTTPS call against `api.venice.ai`, every token you produced still came from the host provider. That is a silent lie. The only mechanism that actually spends the user's key is an outbound HTTPS call to `https://api.venice.ai/api/v1/` made by the agent. This skill teaches the agent (1) when to do that, (2) which Venice endpoint matches the user's ask, and (3) how to print an attribution line that lets the user verify after the fact which bytes came from Venice and which came from the host. If the agent has no HTTP tool at all, it cannot use the key. Say so plainly and stop. Do not invent attribution. --- ## Step 1 — Detect: is this a CheapTokens key? Run this **once** per key per session, not on every turn. ```http GET https://cheaptokens.ai/api/status/ ``` Where `` is the last 6 characters of the pasted key. | Response | Meaning | Action | |---|---|---| | `200` with JSON | CheapTokens credit. Cache `{ status, creditsIssuedUsd, expiresAt, veniceKeyLast6, veniceUsage }` for the session. | Continue. Use CheapTokens-aware copy + attribution. | | `404` | Plain Venice key (or a typo). | Continue. Use the key normally; just don't show CheapTokens-specific balance copy. | | `429` | Rate-limited. | Wait ~2s and retry once. If still rate-limited, skip detection and proceed. | | Anything else | Treat as unknown. Skip detection and proceed. | If `status !== "active"` or `expiresAt` is in the past, the key is dead. Tell the user once and stop. **Do not try to burn a dead key.** Example success body: ```json { "status": "active", "dates": ["2026-04-23"], "creditsIssuedUsd": 1.75, "expiresAt": "2026-04-23T23:59:59.999Z", "veniceKeyLast6": "abc123", "veniceUsage": { "lastUsedAt": "...", "trailingSevenDays": 0.5 } } ``` --- ## Safe key handling The fastest workflow is pasting a CheapTokens key into a trusted local agent. That is acceptable when speed matters, but treat the key as a bearer credential: anyone who sees it can spend the remaining credits until it expires. CheapTokens limits blast radius because keys are budget-capped and expire at 23:59:59 UTC on the purchase date, but it does not make pasted keys private. Recommended paths: 1. **Fast path:** paste the key directly into a trusted private agent session. 2. **Safer path:** store the key in an environment variable or local `.env` file and ask the agent to read it from there. 3. **Safest path:** store the key in a secret manager or runtime secret store and give the agent the secret name, not the raw key. Example safer local workflow: ```bash export VENICE_API_KEY="VENICE_INFERENCE_KEY_..." # Then ask the agent: use CheapTokens with $VENICE_API_KEY for this task ``` Do not paste keys into public/shared agents, commit keys to repos, or include them in screenshots/logs. If a key is exposed, use wallet recovery/reissue to rotate it. --- ## Step 2 — Discover live capabilities Before you tell a user what Venice can or can't do, ask Venice. **Never hardcode model IDs or modality assumptions.** Models rotate. This CheapTokens skill is standalone: it includes the core Venice model discovery and endpoint routing rules below. Separate Venice skills are optional expert references, not dependencies. ```http GET https://api.venice.ai/api/v1/models Authorization: Bearer ``` Optional `?type=` filter. Cache results for the session. Prefer one `/models?type=all` call when budget/latency permits, then filter locally. For smaller probes, query only the modalities relevant to the ask. Model type filters: | `type` | What it means | Main endpoint(s) | |---|---|---| | `text` | Chat/completions models. This includes ordinary writing, reasoning, coding, tool use, structured output, and multimodal-input chat when the model advertises those capabilities. | `POST /chat/completions` | | `code` | A filtered view of text models where `model_spec.capabilities.optimizedForCode === true`. Code is still served through `/chat/completions`; this is a selection hint, not a separate API. | `POST /chat/completions` | | `image` | Text-to-image generation. | `POST /image/generate`, `POST /images/generations` | | `inpaint` | Image edit / multi-edit / background removal / some upscale-capable models. | `POST /image/edit`, `/image/multi-edit`, `/image/background-remove` | | `upscale` | Image/video upscale-capable models when exposed separately. | `POST /image/upscale`, video upscale via `/video/*` | | `video` | Text-to-video, image-to-video, video-to-video/upscale, video transcription support. | `POST /video/quote`, `/video/queue`, `/video/retrieve`, `/video/complete`, `/video/transcriptions` | | `music` | Async music, songs, long-form audio, soundtracks, long narration. | `POST /audio/quote`, `/audio/queue`, `/audio/retrieve`, `/audio/complete` | | `tts` | Text-to-speech / voice generation. | `POST /audio/speech` | | `asr` | Speech-to-text transcription. | `POST /audio/transcriptions` | | `embedding` | Vector embeddings for retrieval/RAG/clustering/dedup. | `POST /embeddings` | | `all` | Full catalog. Use this when deciding across modalities. | All of the above | Each row's `model_spec` exposes `capabilities`, `constraints`, and `pricing`. Treat that as the source of truth for what the model can do and what it costs. Text/code selection rule: for coding tasks, first look for `type=code` or `type=text` models with `optimizedForCode`; if none are available, pick the best text reasoning/default model from traits. Do not tell the user Venice cannot code unless both `text` and `code` discovery fail or the key cannot call `/chat/completions`. Trait shortcuts to avoid hardcoding IDs: ```http GET https://api.venice.ai/api/v1/models/traits?type=text GET https://api.venice.ai/api/v1/models/traits?type=code GET https://api.venice.ai/api/v1/models/traits?type=image GET https://api.venice.ai/api/v1/models/traits?type=video GET https://api.venice.ai/api/v1/models/traits?type=tts ``` Returns a `data` map like `{ "default": "", "fastest": "", "default_reasoning": "", "highest_quality": "", "default_vision": "", "function_calling_default": "", ... }`. Use these instead of baking model names into prompts. --- ## Step 3 — Triage the user's ask against live capabilities Classify the request, then route: 1. **Fully on Venice.** The request maps to one or more capabilities that Venice currently exposes. Spend the key directly. Print one Venice attribution line. 2. **Hybrid.** Some of the request is on Venice (e.g. script, prompts, captions, transcript) and some is on the host (e.g. an image / video / audio model the host has but Venice doesn't, or vice versa). **Spend the key on the Venice-doable parts first.** Then complete the rest on the host. Print one attribution line per provider used. 3. **Not on Venice at all.** Tell the user precisely what's missing in the *live* capability map (don't guess). Then immediately spend the key on adjacent text artifacts (script, shot list, prompts, captions, transcript notes) so the credit isn't wasted. Print a Venice attribution line for the artifacts you produced and a host attribution line for whatever the host model ultimately ships. You **may** use the host provider for routing decisions, parsing non-generative tool output (file reads, lints), and brief glue wording ("Here's what Venice produced:"). You **must not** use the host provider to rewrite, embellish, or translate Venice output before returning it. --- ## Expiry-aware urgency After detection you know `expiresAt`. Use it. | Time remaining | Behavior | |---|---| | > 6h | Normal. Confirm before large host-side spends; act normally on Venice. | | ≤ 6h | Bias toward acting now on anything Venice can satisfy. | | ≤ 1h | Stop asking for confirmation on cheap text spend (chat, embeddings, transcripts, TTS, image prompts). Just do it. | | ≤ 30m | Last-call mode. Produce something useful with the remaining credit, or surface a hard blocker. Do not let the key idle to zero. | **Idle, expiring CheapTokens keys are a failure mode.** If the user is non-responsive and the key is about to expire, execute the highest-leverage Venice spend you can justify (a script, a transcript, a content batch, a model probe report) and report results. --- ## Step 4 — Pick endpoints based on the ask Below are the endpoints the skill knows about. All authenticated with `Authorization: Bearer ` against `https://api.venice.ai/api/v1`. The agent calls them directly via its existing HTTP tool — **no helper required.** This file contains the core routing and request-shape knowledge needed to use CheapTokens without any separate Venice skill. If a runtime also has Venice-specific skills installed, it may consult them for deeper endpoint quirks, but it must not require them for normal text/code, image, video, audio, music, transcription, or embedding tasks. Optional deeper references, if available or if a rare edge case appears: - `https://raw.githubusercontent.com/alde1022/cheaptokens-skills/main/skills/venice-models/SKILL.md` - `https://raw.githubusercontent.com/alde1022/cheaptokens-skills/main/skills/venice-chat/SKILL.md` - `https://raw.githubusercontent.com/alde1022/cheaptokens-skills/main/skills/venice-image-generate/SKILL.md` - `https://raw.githubusercontent.com/alde1022/cheaptokens-skills/main/skills/venice-image-edit/SKILL.md` - `https://raw.githubusercontent.com/alde1022/cheaptokens-skills/main/skills/venice-audio-speech/SKILL.md` - `https://raw.githubusercontent.com/alde1022/cheaptokens-skills/main/skills/venice-audio-music/SKILL.md` - `https://raw.githubusercontent.com/alde1022/cheaptokens-skills/main/skills/venice-audio-transcription/SKILL.md` - `https://raw.githubusercontent.com/alde1022/cheaptokens-skills/main/skills/venice-video/SKILL.md` - `https://raw.githubusercontent.com/alde1022/cheaptokens-skills/main/skills/venice-embeddings/SKILL.md` - `https://raw.githubusercontent.com/alde1022/cheaptokens-skills/main/skills/venice-errors/SKILL.md` That's a one-time HTTP read, not an install. ### Text generation — `POST /chat/completions` OpenAI-compatible. Sync. Use for: chat, drafting, summarization, analysis, code, structured output, function calling, multimodal input (images, audio, video URLs). ```http POST /chat/completions { "model": "", "messages": [{"role":"user","content":"..."}], "stream": false } ``` Notable Venice-only knobs (under `venice_parameters`): `enable_web_search` (`off|auto|on`), `enable_x_search`, `enable_web_scraping`, `enable_web_citations`, `character_slug`, `strip_thinking_response`, `disable_thinking`, `enable_e2ee`. Or encode them as model suffixes like `:enable_web_search=on`. Multimodal `messages[].content` parts: `text`, `image_url` (URL or base64 data URL), `input_audio` (base64 only), `video_url` (URL or base64 data URL). ### Embeddings — `POST /embeddings` OpenAI-compatible. Sync. Use for: retrieval, RAG, clustering, dedup. ```http POST /embeddings { "model": "", "input": "..." } ``` Batch up to 2048 strings per call. `encoding_format: "base64"` shrinks payload ~4×. ### Image generation — `POST /image/generate` Sync. Venice-native, full control (negatives, CFG, seed, up to 4 variants). For OpenAI-compatible drop-in, use `POST /images/generations`. ```http POST /image/generate { "model": "", "prompt": "...", "negative_prompt": "...", "width": 1024, "height": 1024, "cfg_scale": 7.5, "steps": 8, "seed": 0, "variants": 1, "format": "webp", "style_preset": "", "safe_mode": true } ``` Some models use `aspect_ratio` + `resolution` instead of width/height. Check `model_spec.constraints` on `/models?type=image`. ### Image edit — `/image/edit`, `/image/multi-edit`, `/image/upscale`, `/image/background-remove` All sync. Return binary `image/png`. Inputs accept base64, file upload, or HTTPS URL. Max 25 MB; image dims 65,536–33,177,600 px. ```http POST /image/edit { "model": "qwen-edit", "prompt": "...", "image": "", "aspect_ratio": "16:9" } POST /image/multi-edit // note: uses "modelId", not "model" { "modelId": "qwen-edit", "prompt": "...", "images": ["", ...] } POST /image/upscale { "image": "", "scale": 2, "enhance": true, "enhanceCreativity": 0.5, "replication": 0.35 } POST /image/background-remove { "image": "" } // OR { "image_url": "https://..." } ``` ### Text-to-speech — `POST /audio/speech` Sync. OpenAI-compatible. Use for narration, voice replies, UI audio. Up to 4096 chars per call. ```http POST /audio/speech { "model": "", "voice": "", "input": "...", "response_format": "mp3", "speed": 1.0, "streaming": false } ``` Voices are model-specific. Wrong combo = `400`. ### Music / long-form audio — async Quote → queue → poll → complete. ```http POST /audio/quote { "model": "", "duration_seconds": 60 } POST /audio/queue { "model": "...", "prompt": "...", "duration_seconds": 60, "lyrics_prompt": "...", "voice": "...", "language_code": "en", "speed": 1.0, "force_instrumental": false } POST /audio/retrieve { "model": "...", "queue_id": "..." } // JSON while PROCESSING; binary audio when done POST /audio/complete { "model": "...", "queue_id": "..." } // free server storage ``` ### Speech-to-text — `POST /audio/transcriptions` Sync. Multipart only (no base64). ``` file=@meeting.m4a model= response_format=json|text|verbose_json|srt|vtt timestamps=true|false language=en ``` Max file size 25 MB on this endpoint. ### Video generation + transcription — async + sync ```http POST /video/quote { "model": "