--- title: Anam AI avatars with VideoSDK agents description: "Add real-time, lip-synced Anam avatars to your VideoSDK AI voice agents. Works with both RealTimePipeline and CascadingPipeline." tags: [python, videosdk, avatar, ai-agents, voice-assistant] date: 2026-03-25 authors: [sebvanleuven] --- VideoSDK's AI agent framework lets you build voice assistants that answer questions, call function tools, and handle real-time conversation. The [Anam AI Avatar plugin](https://docs.videosdk.live/ai_agents/plugins/avatar/anam-ai) gives those agents a face: a lip-synced avatar that moves with the speech. You can add Anam avatars to **RealTimePipeline** (low-latency, native audio like Gemini Live) or **CascadingPipeline** (modular STT -> LLM -> TTS). Either way, a few lines of config. The complete code is at [examples/videosdk-anam-avatar](https://github.com/anam-org/anam-cookbook/tree/main/examples/videosdk-anam-avatar). ## What you'll build A VideoSDK voice agent with an Anam avatar that: - Speaks with lip-synced facial animation - Works with either RealTimePipeline (e.g. Gemini Live) or CascadingPipeline (e.g. STT + LLM + TTS, in this case Deepgram + OpenAI + ElevenLabs) - Supports function tools (e.g. weather lookup) - Greets the user on join and says goodbye on exit ## Prerequisites - Python 3.12+ - [uv](https://docs.astral.sh/uv/) for project management - A VideoSDK auth token from [videosdk.live](https://videosdk.live) (required to join rooms and stream) - An Anam API key from [lab.anam.ai](https://lab.anam.ai) - An avatar ID from [lab.anam.ai/avatars](https://lab.anam.ai/avatars) - For **RealTimePipeline**: A [Google AI API key](https://aistudio.google.com/apikey) (Gemini) - For **CascadingPipeline**: [Deepgram](https://deepgram.com), [OpenAI](https://platform.openai.com), and [ElevenLabs](https://elevenlabs.io) API keys ## Project setup If you want to follow along with the cookbook example, set up the project first: ```bash git clone https://github.com/anam-org/anam-cookbook.git cd anam-cookbook cd examples/videosdk-anam-avatar uv sync cp .env.example .env ``` Edit `.env` with your credentials: ```bash # VideoSDK (required to join rooms) VIDEOSDK_AUTH_TOKEN=your_videosdk_auth_token # Anam (required for avatar) ANAM_API_KEY=your_anam_api_key ANAM_AVATAR_ID=your_avatar_id # RealTimePipeline (Gemini) GOOGLE_API_KEY=your_google_api_key # CascadingPipeline (STT, LLM, TTS) DEEPGRAM_API_KEY=your_deepgram_key OPENAI_API_KEY=your_openai_key ELEVENLABS_API_KEY=your_elevenlabs_key ``` The agent reads `VIDEOSDK_AUTH_TOKEN` from the environment to authenticate with VideoSDK when joining rooms. Never expose your API keys in client-side code. The VideoSDK agent runs server-side. Use environment variables or a secrets manager. ## Installation If you're wiring this into an existing project, the package to install is: ```bash uv add "videosdk-plugins-anam" ``` If you're starting a new project the installation is done as part of the project setup above. ## Adding the Anam avatar The plugin exposes `AnamAvatar` to create a new avatar instance. Use your API key and an Anam avatar ID, then pass it to the pipeline's `avatar` parameter: ```python import os from videosdk.plugins.anam import AnamAvatar anam_avatar = AnamAvatar( api_key=os.getenv("ANAM_API_KEY"), avatar_id=os.getenv("ANAM_AVATAR_ID"), ) ``` The avatar ID is the unique identifier for the avatar you want to use. You can browse and create avatars at [lab.anam.ai/avatars](https://lab.anam.ai/avatars). The avatar returns a synchronised audio and video stream of the avatar speaking. ## CascadingPipeline (STT -> LLM -> TTS) In the CascadingPipeline, all components are sequenced after one other. You can plug in your own STT, LLM, and TTS. Add the avatar as part of the CascadingPipeline ```python from videosdk.agents import Agent, AgentSession, CascadingPipeline, ConversationFlow, JobContext, RoomOptions, WorkerJob from videosdk.plugins.anam import AnamAvatar from videosdk.plugins.openai import OpenAILLM from videosdk.plugins.deepgram import DeepgramSTT from videosdk.plugins.elevenlabs import ElevenLabsTTS from videosdk.plugins.silero import SileroVAD from videosdk.plugins.turn_detector import TurnDetector, pre_download_model pre_download_model() stt = DeepgramSTT(model="nova-3", language="multi", api_key=os.getenv("DEEPGRAM_API_KEY")) llm = OpenAILLM(model="gpt-4o-mini", api_key=os.getenv("OPENAI_API_KEY")) tts = ElevenLabsTTS(api_key=os.getenv("ELEVENLABS_API_KEY"), enable_streaming=True) vad = SileroVAD() turn_detector = TurnDetector(threshold=0.8) anam_avatar = AnamAvatar( api_key=os.getenv("ANAM_API_KEY"), avatar_id=os.getenv("ANAM_AVATAR_ID"), ) pipeline = CascadingPipeline( stt=stt, llm=llm, tts=tts, vad=vad, turn_detector=turn_detector, avatar=anam_avatar, ) ``` In this example, the TTS output goes to the avatar, which returns a lip-synced video stream (synchronized audio/video) that is published in your VideoSDK room. Full example: [Anam Cascading Example on GitHub](https://github.com/videosdk-live/agents/blob/main/examples/avatar/anam_cascading_example.py). ## RealTimePipeline (Gemini Live) RealTimePipeline uses native audio models like Gemini Live. Add the Anam avatar alongside your model and the audio is forwarded direclty to Anam to render the avatar: ```python from videosdk.agents import Agent, AgentSession, RealTimePipeline, JobContext, RoomOptions, WorkerJob from videosdk.plugins.google import GeminiRealtime, GeminiLiveConfig from videosdk.plugins.anam import AnamAvatar model = GeminiRealtime( model="gemini-2.5-flash-native-audio-preview-12-2025", config=GeminiLiveConfig( voice="Leda", response_modalities=["AUDIO"], ), ) anam_avatar = AnamAvatar( api_key=os.getenv("ANAM_API_KEY"), avatar_id=os.getenv("ANAM_AVATAR_ID"), ) pipeline = RealTimePipeline(model=model, avatar=anam_avatar) ``` The model's audio drives the avatar; the avatar video streams to participants. Full example: [Anam Realtime Example on GitHub](https://github.com/videosdk-live/agents/blob/main/examples/avatar/anam_realtime_example.py). ## RealTimePipeline with tool calling Let's expand the RealTimePipeline to include a tool calling example. We'll add a basic functionality for the agent to get the weather for a given location. As the toolcall might take some time to complete, we need to avoid that the user interrupts the toolcall wondering if the session is still active. Therefore, we'll provide the user with immediate feedback, to indicate that the agent is checking the weather, and the result of the toolcall will be added afterwards. We can achieve this by tweaking the LLM prompt (`generate a short response first`): ```python class AnamVoiceAgent(Agent): def __init__(self): super().__init__( instructions="You are a helpful AI avatar assistant powered by VideoSDK and Anam. " "You have a visual avatar that speaks with you. Answer questions about weather and other tasks. " "You know how to provide real time weather information." "When the user asks about the weather, generate a short response first to indicate you are checking the weather." "Consider your initial response when providing the weather information afterwards." "Keep responses concise and conversational.", tools=[get_weather], ) ``` The toolcall can be any business logic that applies to your product, but for this demo, we'll use a simple weather API. ```python @function_tool async def get_weather(location: str): """Called when the user asks about the weather. Returns the weather for the given location. Args: location: The location to get the weather for """ geocode_url = ( "https://geocoding-api.open-meteo.com/v1/search" f"?name={location}&count=1&language=en&format=json" ) async with aiohttp.ClientSession() as session: async with session.get(geocode_url) as response: data = await response.json() results = data.get("results") or [] if not results: raise Exception(f"Could not find coordinates for {location}") lat = results[0]["latitude"] lon = results[0]["longitude"] resolved_name = results[0]["name"] forecast_url = ( "https://api.open-meteo.com/v1/forecast" f"?latitude={lat}&longitude={lon}¤t=temperature_2m&timezone=auto" ) async with session.get(forecast_url) as response: weather = await response.json() return { "location": resolved_name, "temperature": weather["current"]["temperature_2m"], "temperature_unit": "Celsius", } ``` The toolcall response pre-amble is more relevant when toolcalls require non-trivial amounts of time (e.g. fetching data from a database). To simulate this, add a sleep inside the toolcall to delay the response: ```python await asyncio.sleep(10) ``` You should see the avatar responding immediately, while being idle during the toolcall, and seamlessly continues the conversation with the response later. ## Running the agent The example includes both pipeline types. Run the RealTime (Gemini) agent: ```bash uv run python realtime_agent.py ``` Or run the Cascading agent: ```bash uv run python cascading_agent.py ``` When you run the agent, a playground URL is printed in the terminal. Open it in your browser to join the room and see the avatar. The agent auto-creates a room when `room_id` is omitted in `RoomOptions`. To use a specific room, create it via the [Create Room API](https://docs.videosdk.live/api-reference/realtime-communication/create-room) and pass the `room_id`. ## Use cases Typical fits: customer support agents that guide users through flows, sales demos and product FAQs, training and onboarding, accessibility (lip-synced video for users who prefer visual feedback), learning platforms, and meeting companions. Docs: [VideoSDK AI Agents](https://docs.videosdk.live/ai_agents/overview), [Anam](https://docs.anam.ai). ## Terminology - **Avatar** - The visual character (face, expressions, lip sync) - **Persona** - In Anam's ecosystem, the full AI character (avatar + voice + LLM + system prompt) - **RealTimePipeline** - Single-model pipeline with native audio (e.g. Gemini Live) - **CascadingPipeline** - Modular pipeline with separate STT, LLM, and TTS components The plugin handles avatar rendering. You write the agent logic; the avatar follows.