--- manager: nitinme author: goergenj ms.author: jagoerge ms.service: azure-ai-speech ms.topic: include ms.date: 01/31/2026 --- [Reference documentation](/python/api/overview/azure/ai-transcription-readme) | [Package (PyPi)](https://pypi.org/project/azure-ai-transcription/) | [GitHub samples](https://github.com/Azure/azure-sdk-for-python/tree/azure-ai-transcription_1.0.0b2/sdk/cognitiveservices/azure-ai-transcription/samples) ## Prerequisites - An Azure subscription. [Create one for free](https://azure.microsoft.com/pricing/purchase-options/azure-account?cid=msft_learn). - Python 3.9 or later version. If you don't have a suitable version of Python installed, you can follow the instructions in the [Visual Studio Code Python tutorial](https://code.visualstudio.com/docs/python/python-tutorial#_install-a-python-interpreter). This tutorial shows you the easiest way of installing Python on your operating system. - A [Microsoft Foundry resource](/azure/ai-services/multi-service-resource) created in one of the supported regions. For more information about region availability, see [Region support](/azure/ai-services/speech-service/regions?tabs=stt). - A sample `.wav` audio file to transcribe. ### Microsoft Entra ID prerequisites For the recommended keyless authentication with Microsoft Entra ID, you need to: - Install the [Azure CLI](/cli/azure/install-azure-cli) used for keyless authentication with Microsoft Entra ID. - Assign the Cognitive Services User role to your user account. You can assign roles in the Azure portal under **Access control (IAM)** > **Add role assignment**. ## Set up the environment 1. Create a new folder named `llm-speech-quickstart` and then go to the folder with the following command: ```shell mkdir llm-speech-quickstart && cd llm-speech-quickstart ``` 1. To install the packages that you need for this article, create and activate a virtual Python environment. We recommend that you always use a virtual or conda environment when you install Python packages. Otherwise, you can break your global installation of Python. If you already have Python 3.9 or later installed, create a virtual environment by using the following commands: # [Windows](#tab/windows) ```powershell py -3 -m venv .venv .venv\Scripts\Activate.ps1 ``` # [Linux](#tab/linux) ```bash python3 -m venv .venv source .venv/bin/activate ``` # [macOS](#tab/macos) ```bash python3 -m venv .venv source .venv/bin/activate ``` --- When you activate the Python environment, running `python` or `pip` from the command line uses the Python interpreter in the `.venv` folder of your application. Use the `deactivate` command to exit the Python virtual environment. You can reactivate it later when needed. 1. Create a file named **requirements.txt**. Add the following packages to the file: ```txt azure-ai-transcription azure-identity ``` 1. Install the packages: ```bash pip install -r requirements.txt ``` ## Set environment variables You need to retrieve your resource endpoint and API key for authentication. 1. Sign in to [Foundry portal (classic)](https://ai.azure.com). 1. Select **Management center** from the left menu. 1. Select **Connected resources**, and find your Microsoft Foundry resource (or add a connection if it isn't there). Then copy the **API Key** and **Target** (endpoint) values. Use these values to set environment variables. 1. Set the following environment variables: # [Windows](#tab/windows) ```powershell $env:AZURE_SPEECH_ENDPOINT="" $env:AZURE_SPEECH_API_KEY="" ``` # [Linux](#tab/linux) ```bash export AZURE_SPEECH_ENDPOINT="" export AZURE_SPEECH_API_KEY="" ``` # [macOS](#tab/macos) ```bash export AZURE_SPEECH_ENDPOINT="" export AZURE_SPEECH_API_KEY="" ``` --- > [!NOTE] > For Microsoft Entra ID authentication (recommended for production), install `azure-identity`. Configure authentication as described in the [Microsoft Entra ID prerequisites](#microsoft-entra-id-prerequisites) section. ## Transcribe audio with LLM Speech LLM Speech uses the `EnhancedModeProperties` class to enable transcription that's enhanced by a large language model. The model automatically detects the language in your audio. 1. Create a file named `llm_speech_transcribe.py` with the following code: ```python import os from dotenv import load_dotenv from azure.core.credentials import AzureKeyCredential from azure.ai.transcription import TranscriptionClient load_dotenv() from azure.ai.transcription.models import ( TranscriptionContent, TranscriptionOptions, EnhancedModeProperties, ) # Get configuration from environment variables endpoint = os.environ["AZURE_SPEECH_ENDPOINT"] # Optional: we recommend using role based access control (RBAC) for production scenarios api_key = os.environ["AZURE_SPEECH_API_KEY"] if api_key: credential = AzureKeyCredential(api_key) else: from azure.identity import DefaultAzureCredential credential = DefaultAzureCredential() # Create the transcription client client = TranscriptionClient(endpoint=endpoint, credential=credential) # Path to your audio file (replace with your own file path) audio_file_path = "" # Open and read the audio file with open(audio_file_path, "rb") as audio_file: # Create enhanced mode properties for LLM Speech transcription enhanced_mode = EnhancedModeProperties( task="transcribe", prompt=[], ) # Create transcription options with enhanced mode options = TranscriptionOptions(enhanced_mode=enhanced_mode) # Create the request content request_content = TranscriptionContent(definition=options, audio=audio_file) # Transcribe the audio result = client.transcribe(request_content) # Print the transcription result print(f"Transcription: {result.combined_phrases[0].text}") # Print detailed phrase information if result.phrases: print("\nDetailed phrases:") for phrase in result.phrases: print(f" [{phrase.offset_milliseconds}ms]: {phrase.text}") ``` For more information, see the following references: [TranscriptionClient](/python/api/azure-ai-transcription/azure.ai.transcription.transcriptionclient), [TranscriptionContent](/python/api/azure-ai-transcription/azure.ai.transcription.models.transcriptioncontent), [TranscriptionOptions](/python/api/azure-ai-transcription/azure.ai.transcription.models.transcriptionoptions), and [EnhancedModeProperties](/python/api/azure-ai-transcription/azure.ai.transcription.models.enhancedmodeproperties). 1. Replace `` with the path to your audio file. The service supports WAV, MP3, FLAC, OGG, and other common audio formats. 1. Run the Python script. ```bash python llm_speech_transcribe.py ``` ### Transcription output The script prints the transcription result to the console: ```console Transcription: Hi there. This is a sample voice recording created for speech synthesis testing. The quick brown fox jumps over the lazy dog. Just a fun way to include every letter of the alphabet. Numbers, like one, two, three, are spoken clearly. Let's see how well this voice captures tone, timing, and natural rhythm. This audio is provided by samplefiles.com. Detailed phrases: [40ms]: Hi there. [800ms]: This is a sample voice recording created for speech synthesis testing. [5440ms]: The quick brown fox jumps over the lazy dog. [9040ms]: Just a fun way to include every letter of the alphabet. [12720ms]: Numbers, like one, two, three, are spoken clearly. [17200ms]: Let's see how well this voice captures tone, timing, and natural rhythm. [22480ms]: This audio is provided by samplefiles.com. ``` ## Translate audio with LLM Speech You can also use LLM Speech to translate audio into a target language. Set the `task` to `translate`, and specify the `target_language`. 1. Create a file named `llm_speech_translate.py` with the following code: ```python import os from dotenv import load_dotenv from azure.core.credentials import AzureKeyCredential from azure.ai.transcription import TranscriptionClient load_dotenv() from azure.ai.transcription.models import ( TranscriptionContent, TranscriptionOptions, EnhancedModeProperties, ) # Get configuration from environment variables endpoint = os.environ["AZURE_SPEECH_ENDPOINT"] # Optional: we recommend using role based access control (RBAC) for production scenarios api_key = os.environ["AZURE_SPEECH_API_KEY"] if api_key: credential = AzureKeyCredential(api_key) else: from azure.identity import DefaultAzureCredential credential = DefaultAzureCredential() # Create the transcription client client = TranscriptionClient(endpoint=endpoint, credential=credential) # Path to your audio file (replace with your own file path) audio_file_path = "" # Open and read the audio file with open(audio_file_path, "rb") as audio_file: # Create enhanced mode properties for LLM Speech translation # Translate to another language enhanced_mode = EnhancedModeProperties( task="translate", target_language="de", prompt=[ "Translate the following audio to German.", "Convert number words to numbers." ], # Optional prompts to guide the enhanced mode ) # Create transcription options with enhanced mode options = TranscriptionOptions(locales=["en-US"], enhanced_mode=enhanced_mode) # Create the request content request_content = TranscriptionContent(definition=options, audio=audio_file) # Translate the audio result = client.transcribe(request_content) # Print the translation result print(f"Translation: {result.combined_phrases[0].text}") ``` For more information, see the following references: [TranscriptionClient](/python/api/azure-ai-transcription/azure.ai.transcription.transcriptionclient) and [EnhancedModeProperties](/python/api/azure-ai-transcription/azure.ai.transcription.models.enhancedmodeproperties). 1. Replace `` with the path to your audio file. 1. Run the Python script. ```bash python llm_speech_translate.py ``` ## Use prompt-tuning You can provide an optional prompt to guide the output style for transcription or translation tasks. Replace the `prompt` value in the `EnhancedModeProperties` object. ```python import os from dotenv import load_dotenv from azure.core.credentials import AzureKeyCredential from azure.ai.transcription import TranscriptionClient load_dotenv() from azure.ai.transcription.models import ( TranscriptionContent, TranscriptionOptions, EnhancedModeProperties, ) # Get configuration from environment variables endpoint = os.environ["AZURE_SPEECH_ENDPOINT"] # Optional: we recommend using role based access control (RBAC) for production scenarios api_key = os.environ["AZURE_SPEECH_API_KEY"] if api_key: credential = AzureKeyCredential(api_key) else: from azure.identity import DefaultAzureCredential credential = DefaultAzureCredential() # Create the transcription client client = TranscriptionClient(endpoint=endpoint, credential=credential) # Path to your audio file (replace with your own file path) audio_file_path = "" # Open and read the audio file with open(audio_file_path, "rb") as audio_file: # Create enhanced mode properties for LLM Speech transcription enhanced_mode = EnhancedModeProperties( task="transcribe", prompt=[ "Create lexical output only,", "Convert number words to numbers." ], # Optional prompts to guide the enhanced mode, prompt="Create lexical transcription.") ) # Create transcription options with enhanced mode options = TranscriptionOptions(enhanced_mode=enhanced_mode) # Create the request content request_content = TranscriptionContent(definition=options, audio=audio_file) # Print request content for debugging print("Request Content:", request_content, "\n") # Transcribe the audio result = client.transcribe(request_content) # Print the transcription result print(f"Transcription: {result.combined_phrases[0].text}") # Print detailed phrase information if result.phrases: print("\nDetailed phrases:") for phrase in result.phrases: ``` ### Best practices for prompts - Prompts have a maximum length of 4,096 characters. - Prompts should preferably be written in English. - Use `Output must be in lexical format.` to enforce lexical formatting instead of the default display format. - Use `Pay attention to *phrase1*, *phrase2*, …` to improve recognition of specific phrases or acronyms. For more information, see the following reference: [EnhancedModeProperties](/python/api/azure-ai-transcription/azure.ai.transcription.models.enhancedmodeproperties). ### Output The script prints the transcription result to the console: ```output Transcription: Hello, this is a test of the LLM Speech transcription service. Detailed phrases: [0ms]: Hello, this is a test [1500ms]: of the LLM Speech transcription service. ```