--- name: azure-ai-voicelive-java description: | Azure AI VoiceLive SDK for Java. Real-time bidirectional voice conversations with AI assistants using WebSocket. Triggers: "VoiceLiveClient java", "voice assistant java", "real-time voice java", "audio streaming java", "voice activity detection java". package: com.azure:azure-ai-voicelive --- # Azure AI VoiceLive SDK for Java Real-time, bidirectional voice conversations with AI assistants using WebSocket technology. ## Installation ```xml com.azure azure-ai-voicelive 1.0.0-beta.2 ``` ## Environment Variables ```bash AZURE_VOICELIVE_ENDPOINT=https://.openai.azure.com/ AZURE_VOICELIVE_API_KEY= ``` ## Authentication ### API Key ```java import com.azure.ai.voicelive.VoiceLiveAsyncClient; import com.azure.ai.voicelive.VoiceLiveClientBuilder; import com.azure.core.credential.AzureKeyCredential; VoiceLiveAsyncClient client = new VoiceLiveClientBuilder() .endpoint(System.getenv("AZURE_VOICELIVE_ENDPOINT")) .credential(new AzureKeyCredential(System.getenv("AZURE_VOICELIVE_API_KEY"))) .buildAsyncClient(); ``` ### DefaultAzureCredential (Recommended) ```java import com.azure.identity.DefaultAzureCredentialBuilder; VoiceLiveAsyncClient client = new VoiceLiveClientBuilder() .endpoint(System.getenv("AZURE_VOICELIVE_ENDPOINT")) .credential(new DefaultAzureCredentialBuilder().build()) .buildAsyncClient(); ``` ## Key Concepts | Concept | Description | |---------|-------------| | `VoiceLiveAsyncClient` | Main entry point for voice sessions | | `VoiceLiveSessionAsyncClient` | Active WebSocket connection for streaming | | `VoiceLiveSessionOptions` | Configuration for session behavior | ### Audio Requirements - **Sample Rate**: 24kHz (24000 Hz) - **Bit Depth**: 16-bit PCM - **Channels**: Mono (1 channel) - **Format**: Signed PCM, little-endian ## Core Workflow ### 1. Start Session ```java import reactor.core.publisher.Mono; client.startSession("gpt-4o-realtime-preview") .flatMap(session -> { System.out.println("Session started"); // Subscribe to events session.receiveEvents() .subscribe( event -> System.out.println("Event: " + event.getType()), error -> System.err.println("Error: " + error.getMessage()) ); return Mono.just(session); }) .block(); ``` ### 2. Configure Session Options ```java import com.azure.ai.voicelive.models.*; import java.util.Arrays; ServerVadTurnDetection turnDetection = new ServerVadTurnDetection() .setThreshold(0.5) // Sensitivity (0.0-1.0) .setPrefixPaddingMs(300) // Audio before speech .setSilenceDurationMs(500) // Silence to end turn .setInterruptResponse(true) // Allow interruptions .setAutoTruncate(true) .setCreateResponse(true); AudioInputTranscriptionOptions transcription = new AudioInputTranscriptionOptions( AudioInputTranscriptionOptionsModel.WHISPER_1); VoiceLiveSessionOptions options = new VoiceLiveSessionOptions() .setInstructions("You are a helpful AI voice assistant.") .setVoice(BinaryData.fromObject(new OpenAIVoice(OpenAIVoiceName.ALLOY))) .setModalities(Arrays.asList(InteractionModality.TEXT, InteractionModality.AUDIO)) .setInputAudioFormat(InputAudioFormat.PCM16) .setOutputAudioFormat(OutputAudioFormat.PCM16) .setInputAudioSamplingRate(24000) .setInputAudioNoiseReduction(new AudioNoiseReduction(AudioNoiseReductionType.NEAR_FIELD)) .setInputAudioEchoCancellation(new AudioEchoCancellation()) .setInputAudioTranscription(transcription) .setTurnDetection(turnDetection); // Send configuration ClientEventSessionUpdate updateEvent = new ClientEventSessionUpdate(options); session.sendEvent(updateEvent).subscribe(); ``` ### 3. Send Audio Input ```java byte[] audioData = readAudioChunk(); // Your PCM16 audio data session.sendInputAudio(BinaryData.fromBytes(audioData)).subscribe(); ``` ### 4. Handle Events ```java session.receiveEvents().subscribe(event -> { ServerEventType eventType = event.getType(); if (ServerEventType.SESSION_CREATED.equals(eventType)) { System.out.println("Session created"); } else if (ServerEventType.INPUT_AUDIO_BUFFER_SPEECH_STARTED.equals(eventType)) { System.out.println("User started speaking"); } else if (ServerEventType.INPUT_AUDIO_BUFFER_SPEECH_STOPPED.equals(eventType)) { System.out.println("User stopped speaking"); } else if (ServerEventType.RESPONSE_AUDIO_DELTA.equals(eventType)) { if (event instanceof SessionUpdateResponseAudioDelta) { SessionUpdateResponseAudioDelta audioEvent = (SessionUpdateResponseAudioDelta) event; playAudioChunk(audioEvent.getDelta()); } } else if (ServerEventType.RESPONSE_DONE.equals(eventType)) { System.out.println("Response complete"); } else if (ServerEventType.ERROR.equals(eventType)) { if (event instanceof SessionUpdateError) { SessionUpdateError errorEvent = (SessionUpdateError) event; System.err.println("Error: " + errorEvent.getError().getMessage()); } } }); ``` ## Voice Configuration ### OpenAI Voices ```java // Available: ALLOY, ASH, BALLAD, CORAL, ECHO, SAGE, SHIMMER, VERSE VoiceLiveSessionOptions options = new VoiceLiveSessionOptions() .setVoice(BinaryData.fromObject(new OpenAIVoice(OpenAIVoiceName.ALLOY))); ``` ### Azure Voices ```java // Azure Standard Voice options.setVoice(BinaryData.fromObject(new AzureStandardVoice("en-US-JennyNeural"))); // Azure Custom Voice options.setVoice(BinaryData.fromObject(new AzureCustomVoice("myVoice", "endpointId"))); // Azure Personal Voice options.setVoice(BinaryData.fromObject( new AzurePersonalVoice("speakerProfileId", PersonalVoiceModels.PHOENIX_LATEST_NEURAL))); ``` ## Function Calling ```java VoiceLiveFunctionDefinition weatherFunction = new VoiceLiveFunctionDefinition("get_weather") .setDescription("Get current weather for a location") .setParameters(BinaryData.fromObject(parametersSchema)); VoiceLiveSessionOptions options = new VoiceLiveSessionOptions() .setTools(Arrays.asList(weatherFunction)) .setInstructions("You have access to weather information."); ``` ## Best Practices 1. **Use async client** — VoiceLive requires reactive patterns 2. **Configure turn detection** for natural conversation flow 3. **Enable noise reduction** for better speech recognition 4. **Handle interruptions** gracefully with `setInterruptResponse(true)` 5. **Use Whisper transcription** for input audio transcription 6. **Close sessions** properly when conversation ends ## Error Handling ```java session.receiveEvents() .doOnError(error -> System.err.println("Connection error: " + error.getMessage())) .onErrorResume(error -> { // Attempt reconnection or cleanup return Flux.empty(); }) .subscribe(); ``` ## Reference Links | Resource | URL | |----------|-----| | GitHub Source | https://github.com/Azure/azure-sdk-for-java/tree/main/sdk/ai/azure-ai-voicelive | | Samples | https://github.com/Azure/azure-sdk-for-java/tree/main/sdk/ai/azure-ai-voicelive/src/samples |