### YamlMime:FAQ
metadata:
  title: Voice Live frequently asked questions (FAQ)
  titleSuffix: Foundry Tools
  description: Get answers to frequently asked questions about the Voice Live API in Azure Speech in Foundry Tools.
  author: PatrickFarley
  reviewers: pafarley
  manager: nitinme
  ms.service: azure-ai-speech
  ms.topic: faq
  ms.date: 03/31/2026
  ms.author: pafarley
  ms.reviewer: pafarley
title: Voice Live FAQ
summary: |
  This article answers commonly asked questions about the Voice Live API. If you can't find answers to your questions here, check out [other support options](../cognitive-services-support-options.md?context=%2fazure%2fcognitive-services%2fspeech-service%2fcontext%2fcontext%253fcontext%253d%2fazure%2fcognitive-services%2fspeech-service%2fcontext%2fcontext).

sections:
  - name: General
    questions:
      - question: |
          What scenarios does Voice Live support?
        answer: |
          Voice Live API supports a wide range of real-time, natural voice interaction scenarios: contact centers, automotive assistants, accessibility applications, virtual tutors and learning companions, multilingual public service agents, HR support, and training. Used by customers like eClinicalWorks and the Government of Malta.
      - question: |
          How does Voice Live compare to the Azure OpenAI Realtime API? When should I choose which?
        answer: |
          Voice Live API enhances the AOAI Realtime API by offering: expanded model selection (including GPT-Realtime, GPT-5, GPT-4.1, PHI), more natural voice options, more supported speech languages, avatar integration, advanced semantic voice activity detection (VAD), seamless Microsoft Foundry Agent Service integration, and telephony integration via Azure Communication Services.
      - question: |
          Which regions does Voice Live support?
        answer: |
          Voice Live is available in 10+ Azure regions. For more information, see [Region support](./regions.md?tabs=voice-live).
      - question: |
          What is the tokens-per-minute threshold?
        answer: |
          The current limit is 100,000 tokens per minute per resource. Customers can request an increase. For more information, see [Speech service quotas and limits](./speech-services-quotas-and-limits.md).
  - name: Generative AI Models
    questions:
      - question: |
          What generative AI models are supported?
        answer: |
          Voice Live supports OpenAI models in Microsoft Foundry, Phi-based LLMs, and SLMs. For more information, see [Voice Live overview](./voice-live.md). Voice Live also provides an option to bring-your-own model (preview).
      - question: |
          How do I choose the LLM for my use case?
        answer: |
          Consider: accuracy (Azure Speech-based models are more robust for noisy audio), existing LLM solutions (reuse prompts and grounding data), latency (text-based LLMs can have slightly higher latency), inference cost (smaller models can be more cost-effective).
      - question: |
          What is response instruction?
        answer: |
          Response instruction guides model behavior and context. Define agent personality, specify questions, control response formatting. Responses should be concise and normalized for optimal audio synthesis.
      - question: |
          What is response temperature?
        answer: |
          Response temperature controls randomness of output. Lower values = deterministic, higher = creative. Adjust temperature or Top-P, not both.
  - name: Speech Input
    questions:
      - question: |
          Which languages does Voice Live support?
        answer: |
          Voice live supports 146 languages/locales for input, 151 for output, 600+ neural voices. See [Voice Live language support](./voice-live-language-support.md?tabs=speechinput).
      - question: |
          How do I get the live transcripts from the call?
        answer: |
          Use text output events. Details at [Voice live API reference](./voice-live-api-reference-2025-10-01.md).
      - question: |
          What is a phrase list?
        answer: |
          Domain-specific terms to improve recognition. Limit to <500 words/phrases. See [How to customize Voice Live](./voice-live-how-to-customize.md).
      - question: |
          Are there other ways to improve speech input recognition accuracy?
        answer: |
          Use Azure AI Custom Speech models. Configure multiple custom models per language. See [How to customize Voice Live](./voice-live-how-to-customize.md).
  - name: Speech Output
    questions:
      - question: |
          What voices does Voice Live support?
        answer: |
          Native audio output with preferred model, Azure Speech in Foundry Tools TTS voices (600+ voices, 150+ locales, 30+ Neural HD voices). Custom voice models via Professional Voice Fine-tuning. For more information, see [Voice Live API supported languages](./voice-live-language-support.md?tabs=speechoutput).
      - question: |
          How do I pick a voice?
        answer: |
          Use [Voice Gallery](https://speech.microsoft.com/portal/voicegallery) in Microsoft Foundry Speech Playground. Consider gender, age, capability, style, personality.
      - question: |
          What is voice temperature?
        answer: |
          Controls expressiveness. Higher = dynamic/emotive, lower = neutral. Applies to Neural HD voices.
      - question: |
          What is speaking rate?
        answer: |
          Controls agent's speech speed.
      - question: |
          What is a custom lexicon?
        answer: |
          Define pronunciation rules for specific words. See [How to customize Voice Live](./voice-live-how-to-customize.md#speech-output-customization).
      - question: |
          What is Custom Voice?
        answer: |
          Create brand-specific synthetic voices using your own audio data. See [How to customize Voice Live](./voice-live-how-to-customize.md#azure-custom-voices).
      - question: |
          What is Avatar support?
        answer: |
          Pair speech output with visual avatars for multimodal experiences.
      - question: |
          What is Custom Avatar?
        answer: |
          Photorealistic digital human using Azure AI TTS. Built from video recordings, tailored to specific actor's appearance and voice.
  - name: Conversational Enhancements
    questions:
      - question: |
          What is the difference between Azure Semantic VAD and Basic Server VAD?
        answer: |
          Azure Semantic VAD is more noise robust and accurate for detecting utterance boundaries.
      - question: |
          What is EOU (End of Utterance) detection?
        answer: |
          Uses context to determine if a user finished speaking or just paused.
      - question: |
          How does noise suppression work?
        answer: |
          Filters background noise based on advanced technology.
      - question: |
          How does echo cancellation work?
        answer: |
          Removes echo of agent’s own voice picked up by microphone.
  - name: Function Calling
    questions:
      - question: |
          Does Voice Live support function calling?
        answer: |
          Yes, including asynchronous function calling.
      - question: |
          Is there model context protocol (MCP) support?
        answer: |
          Currently, MCP is supported in with model mode with the exception of phi models. Further with Foundry (new) agents. It's not supported with Foundry (classic) agents.
  - name: Pricing
    questions:
      - question: |
          Where is the pricing listed?
        answer: |
          [Voice Live overview](./voice-live.md#pricing)
      - question: |
          How do I estimate the cost based on my use case?
        answer: |
          Estimate by audio minutes; tokens are billing unit. See [pricing](./voice-live.md#pricing) and [token usage and cost estimation](./voice-live.md#token-usage-and-cost-estimation).
      - question: |
          Are there separate quota and throttling limits for voice-live?
        answer: |
          Yes, quota applies specifically to Voice Live API (default: 100k tokens/min).
  - name: Additional
    questions:
      - question: |
          Does this service provide an SDK?
        answer: |
          Yes, SDKs for Python, C#, Java (Preview) and JavaScript/TypeScript (Preview) are available. See [Voice Live - Reference - Voice Live SDK](./voice-live-sdk.md).
      - question: |
          Does this service include content filtering?
        answer: |
          Yes, content filtering is included.
      - question: |
          Can you modify or disable the content filtering in Voice Live API?
        answer: |
          No. If you need custom content filtering, you can use the bring-your-own-model (PREVIEW) feature.
      - question: |
          Is SIP supported?
        answer: |
          SIP is currently not supported.

additionalContent: |

  ## Next steps
  
  - Learn more about [How to use the Voice Live API](./voice-live-how-to.md)
  - See the [Voice Live API reference](./voice-live-api-reference-2025-10-01.md)
  - [What's new](./releasenotes.md)