How to build a voice agent (preview)

Note

This feature is currently in public preview. This preview is provided without a service-level agreement, and is not recommended for production workloads. Certain features might not be supported or might have constrained capabilities. For more information, see Supplemental Terms of Use for Microsoft Azure Previews.

Note

Foundry agent integration currently only supports agents available on public endpoints. Foundry agents deployed in private VNet aren't supported.

Learn how to use Voice Live with Microsoft Foundry Agent Service using the VoiceLive SDK for Python. This article builds on the Quickstart: Create a Voice Agent with Foundry Agent Service and Voice Live with advanced features and integration options.

Reference documentation | Package (PyPi) | Additional samples on GitHub

Create and run applications to use Voice Live with agents for real-time conversations.

Agents provide several advantages:

Use centralized configuration in the agent itself instead of session code.
Handle complex logic and conversational behaviors for easier updates.
Connect automatically by using your agent ID.
Support multiple variations without changing client code.

To use Voice Live without Foundry agents, see the Voice Live API quickstart.

Tip

You don't need to deploy an audio model with Microsoft Foundry to use Voice Live. Voice Live is fully managed and automatically deploys the model for you. For model availability, see the Voice Live overview documentation.

Prerequisites

Note

This document refers to the Microsoft Foundry (new) portal and the latest Foundry Agent Service version.

An Azure subscription. Create one for free.
Python 3.10 or later version. If you don't have a suitable version of Python installed, you can follow the instructions in the VS Code Python Tutorial for the easiest way of installing Python on your operating system.
The required language runtimes, global tools, and Visual Studio Code extensions as described in Prepare your development environment.
A Microsoft Foundry resource created in one of the supported regions. For more information about region availability, see the Voice Live overview documentation.
A model deployed in Microsoft Foundry. If you don't have a model, first complete Quickstart: Set up Microsoft Foundry resources.

Assign the Azure AI User role to your user account. You can assign roles in the Azure portal under Access control (IAM) > Add role assignment.

Prepare the environment and create the agent

Complete the Quickstart: Create a Voice Agent with Foundry Agent Service and Voice Live to set up your environment, configure the agent with Voice Live settings, and test your first conversation.

Agent integration concepts

Use these concepts to understand how Voice Live and Foundry Agent Service work together in the Python sample.

Agent configuration contract

Set agent_config in your session setup to identify the target agent and project. At minimum, include agent_name and project_name. Add agent_version when you want to pin behavior to a specific version.

Authentication model for agent mode

Use Microsoft Entra ID credentials for agent mode. Agent invocation in this flow doesn't support key-based authentication, so configure AzureCliCredential (or another Entra token credential) for local development and deployment.

API version pinning

Pin a supported api_version in the client to keep behavior predictable across preview updates. Use the same version consistently across quickstart and how-to samples to avoid schema drift.

Conversation and trace alignment

Treat agent thread and trace records as text-turn history, not exact playback history. If your app allows interruption or truncation, enable truncation-aware handling so persisted history better matches what the user actually heard.

Connect to a specific agent version

Pin your agent to a specific version to enable controlled deployments. This lets production use stable versions while development tests newer iterations.

Set the AGENT_VERSION environment variable or pass the agent_version parameter when initializing the assistant:

    Uses the new AgentSessionConfig for strongly-typed agent configuration at connection time.
    This sample also demonstrates how to collect a conversation log of user and agent interactions.
    """

    # <agent_config>
    def __init__(
        self,
        endpoint: str,
        credential: Union[AzureKeyCredential, AsyncTokenCredential],
        voice: str,
        agent_name: str,
        project_name: str,
        agent_version: Optional[str] = None,
        conversation_id: Optional[str] = None,
        foundry_resource_override: Optional[str] = None,
        agent_authentication_identity_client_id: Optional[str] = None,
    ):
        self.endpoint = endpoint
        self.credential = credential
        self.voice = voice
        # Build AgentSessionConfig internally
        self.agent_config: AgentSessionConfig = {
            "agent_name": agent_name,
            "agent_version": agent_version if agent_version else None,
                self.agent_config.get("project_name"),
                self.agent_config.get("agent_version"),
                self.agent_config.get("conversation_id"),
                self.agent_config.get("foundry_resource_override"),
                self.agent_config.get("authentication_identity_client_id")
            )

            else:
                logger.error("❌ VoiceLive error: %s", msg)
                print(f"Error: {msg}")

        elif event.type == ServerEventType.CONVERSATION_ITEM_CREATED:
            logger.debug("Conversation item created: %s", event.item.id)

        else:
            logger.debug("Unhandled event type: %s", event.type)
    # </handle_events>
# </voice_assistant>

async def write_conversation_log(message: str) -> None:
    """Write a message to the conversation log."""
    log_path = os.path.join(_script_dir, 'logs', logfilename)
    await asyncio.to_thread(
        lambda: open(log_path, 'a', encoding='utf-8').write(message + "\n")
    )

# <main>
def main() -> None:
    """Main function."""
    endpoint = os.environ.get("VOICELIVE_ENDPOINT", "")
    voice_name = os.environ.get("VOICE_NAME", "en-US-Ava:DragonHDLatestNeural")
    agent_name = os.environ.get("AGENT_NAME", "")
    agent_version = os.environ.get("AGENT_VERSION")
    project_name = os.environ.get("PROJECT_NAME", "")
    conversation_id = os.environ.get("CONVERSATION_ID")
    foundry_resource_override = os.environ.get("FOUNDRY_RESOURCE_OVERRIDE")
    agent_authentication_identity_client_id = os.environ.get("AGENT_AUTHENTICATION_IDENTITY_CLIENT_ID")

    print("Environment variables:")
    print(f"VOICELIVE_ENDPOINT: {endpoint}")
    print(f"VOICE_NAME: {voice_name}")
    print(f"AGENT_NAME: {agent_name}")
    print(f"AGENT_VERSION: {agent_version}")
    print(f"PROJECT_NAME: {project_name}")

In this sample, the version configuration is applied in three places:

In main(), AGENT_VERSION is read from the environment.
In the BasicVoiceAssistant(...) call, agent_version is passed into the class constructor.
In BasicVoiceAssistant.__init__, the value is added to self.agent_config, and then sent to Voice Live via connect(..., agent_config=self.agent_config).

The agent_version value corresponds to the version string returned when you create or update an agent using the Foundry Agent SDK. If not specified, Voice Live connects to the latest version of the agent.

Connect to an agent on a different Foundry resource

Configure Voice Live to connect to an agent on a different Foundry resource for audio processing. This is useful when:

The agent is deployed in a region that has different feature availability
You want to separate development/staging environments from production
Your organization uses different resources for different workloads

To connect to an agent on a different resource, configure two additional environment variables:

FOUNDRY_RESOURCE_OVERRIDE: The Foundry resource name hosting the agent project (for example, my-agent-resource).
AGENT_AUTHENTICATION_IDENTITY_CLIENT_ID: The managed identity client ID of the Voice Live resource, required for cross-resource authentication.

        credential: Union[AzureKeyCredential, AsyncTokenCredential],
        voice: str,
        agent_name: str,
        project_name: str,
        agent_version: Optional[str] = None,
        conversation_id: Optional[str] = None,
        foundry_resource_override: Optional[str] = None,
        agent_authentication_identity_client_id: Optional[str] = None,
    ):
        self.endpoint = endpoint
        self.credential = credential
        self.voice = voice
        # Build AgentSessionConfig internally
        self.agent_config: AgentSessionConfig = {
            "agent_name": agent_name,
            "agent_version": agent_version if agent_version else None,
        else:
            logger.debug("Unhandled event type: %s", event.type)
    # </handle_events>
# </voice_assistant>

async def write_conversation_log(message: str) -> None:
    """Write a message to the conversation log."""
    log_path = os.path.join(_script_dir, 'logs', logfilename)
    await asyncio.to_thread(
        lambda: open(log_path, 'a', encoding='utf-8').write(message + "\n")
    )

# <main>
def main() -> None:
    """Main function."""
    endpoint = os.environ.get("VOICELIVE_ENDPOINT", "")
    voice_name = os.environ.get("VOICE_NAME", "en-US-Ava:DragonHDLatestNeural")
    agent_name = os.environ.get("AGENT_NAME", "")
    agent_version = os.environ.get("AGENT_VERSION")
    project_name = os.environ.get("PROJECT_NAME", "")
    conversation_id = os.environ.get("CONVERSATION_ID")
    foundry_resource_override = os.environ.get("FOUNDRY_RESOURCE_OVERRIDE")
    agent_authentication_identity_client_id = os.environ.get("AGENT_AUTHENTICATION_IDENTITY_CLIENT_ID")

    print("Environment variables:")
    print(f"VOICELIVE_ENDPOINT: {endpoint}")
    print(f"VOICE_NAME: {voice_name}")
    print(f"AGENT_NAME: {agent_name}")
    print(f"AGENT_VERSION: {agent_version}")
    print(f"PROJECT_NAME: {project_name}")
    print(f"CONVERSATION_ID: {conversation_id}")
    print(f"FOUNDRY_RESOURCE_OVERRIDE: {foundry_resource_override}")
                self.agent_config.get("project_name"),
                self.agent_config.get("agent_version"),
                self.agent_config.get("conversation_id"),
                self.agent_config.get("foundry_resource_override"),
                self.agent_config.get("authentication_identity_client_id")
            )

This configuration is resolved in main() and then applied when the assistant is created:

FOUNDRY_RESOURCE_OVERRIDE and AGENT_AUTHENTICATION_IDENTITY_CLIENT_ID are read from environment variables.
Both values are passed to BasicVoiceAssistant(...).
In BasicVoiceAssistant.__init__, the values are added to self.agent_config, which is sent in connect(..., agent_config=self.agent_config).

Important

Cross-resource connections require proper role assignments. Ensure the Voice Live resource's managed identity has the Azure AI User role on the target agent resource.

Add a proactive message at session start

Send a proactive message to initiate conversations as soon as the session is ready. This sample checks a one-time flag in the SESSION_UPDATED event handler, sends a greeting prompt, and triggers a response.

    }        
    except Exception:
        logger.exception("Error processing events")
        raise
# </process_events>

# <handle_events>
async def _handle_event(self, event: Any) -> None:
    """Handle different types of events from VoiceLive."""
    logger.debug("Received event: %s", event.type)
    ap = self.audio_processor
    conn = self.connection
    if ap is None or conn is None:
        raise RuntimeError("AudioProcessor and Connection must be initialized")

    if event.type == ServerEventType.SESSION_UPDATED:
        # <session_updated_metadata>
        logger.info("Session ready: %s", event.session.id)
        s, a, v = event.session, event.session.agent, event.session.voice
        await write_conversation_log("\n".join([
            f"SessionID: {s.id}", f"Agent Name: {a.name}",
            f"Agent Description: {a.description}", f"Agent ID: {a.agent_id}",
            f"Voice Name: {v['name']}", f"Voice Type: {v['type']}",
            f"Voice Temperature: {v['temperature']}", ""
        ]))
        # </session_updated_metadata>
        self.session_ready = True

        # <proactive_greeting>
        # Invoke Proactive greeting
        if not self.greeting_sent:
            self.greeting_sent = True
            logger.info("Sending proactive greeting request")

In this sample, proactive messaging is applied in three steps:

self.greeting_sent = False initializes one-time greeting state.
In the SESSION_UPDATED branch, if not self.greeting_sent: gates proactive execution to run once per session.
conn.conversation.item.create(...) adds the greeting instruction to conversation context, and conn.response.create() generates spoken output.

Improve tool calling and latency wait times

Use Voice Live's interim_response feature to bridge wait times during tool calling or when generating agent responses with high latency.

This feature supports two modes:

LlmInterimResponseConfig: LLM-generated interim response - best for dynamic and adaptive starts
InterimResponseTrigger: Pre-generated interim response - best for deterministic or branded messaging

The voice-live-agents-quickstart.py created with the quickstart shows the required code additions to configure this feature as follows:

from azure.ai.voicelive.aio import connect, AgentSessionConfig
from azure.ai.voicelive.models import (
    InputAudioFormat,
    Modality,
    OutputAudioFormat,
    RequestSession,
    ServerEventType,
    MessageItem,
    InputTextContentPart,
    LlmInterimResponseConfig,
    InterimResponseTrigger,
    AzureStandardVoice,
    AudioNoiseReduction,
    AudioEchoCancellation,
    AzureSemanticVadMultilingual

                # Process events
                await self._process_events()
        finally:
            if self.audio_processor:
                self.audio_processor.shutdown()
    # </start_session>

    # <setup_session>
    async def _setup_session(self) -> None:
        """Configure the VoiceLive session for audio conversation."""
        logger.info("Setting up voice conversation session...")

        # Set up interim response configuration to bridge latency gaps during processing
        interim_response_config = LlmInterimResponseConfig(
            triggers=[InterimResponseTrigger.TOOL, InterimResponseTrigger.LATENCY],
            latency_threshold_ms=100,
            instructions="""Create friendly interim responses indicating wait time due to ongoing processing, if any. Do not include
                            in all responses! Do not say you don't have real-time access to information when calling tools!"""
        )

        # Create session configuration
        session_config = RequestSession(
            modalities=[Modality.TEXT, Modality.AUDIO],

In this sample, the interim response setup is applied inside BasicVoiceAssistant._setup_session():

LlmInterimResponseConfig(...) defines when interim responses trigger and what style they use.
RequestSession(...) attaches that config through the interim_response field.
conn.session.update(session=session_config) sends the session configuration to Voice Live.

Use auto truncation for interrupted responses

When users interrupt agent audio, conversation text can drift from what users actually heard. Auto truncation helps keep session context aligned with delivered audio, which improves follow-up response quality after barge-in and keeps voice conversation history logging more accurate.

This sample currently shows interruption handling with response.cancel() during speech start, but it doesn't configure auto_truncate in turn_detection.

Note

In Foundry Agent Service, thread messages and tracing agent threads are based on text content in the thread. Without auto truncation, those records can differ from the exact portion of audio the user actually heard before interruption.

For setup details and supported options, see Handle voice interruptions in chat history (preview).

Reconnect to a previous agent conversation

Reconnect to a previous conversation by specifying the conversation ID. This preserves history and context, allowing users to continue where they left off.

Voice Live returns session metadata in the SESSION_UPDATED event when a session connects successfully:

        logger.exception("Error processing events")
        raise
# </process_events>

# <handle_events>
async def _handle_event(self, event: Any) -> None:
    """Handle different types of events from VoiceLive."""
    logger.debug("Received event: %s", event.type)
    ap = self.audio_processor

In this event handler, session and agent metadata is logged when the session is ready.

The sample code automatically writes session details to a conversation log file in the logs/ folder (for example, logs/2026-02-19_14-30-00_conversation.log). You can retrieve the session ID from this file after running a session.

To reconnect to that conversation, pass the conversation ID as the CONVERSATION_ID environment variable (or the conversation_id parameter):

credential: Union[AzureKeyCredential, AsyncTokenCredential],
# Build AgentSessionConfig internally
else:

In this sample, conversation reconnect is applied in three places:

In main(), CONVERSATION_ID is read from the environment.
In the BasicVoiceAssistant(...) call, conversation_id is passed into the class constructor.
In BasicVoiceAssistant.__init__, the value is assigned into self.agent_config as conversation_id.

When a valid conversation_id is provided, the agent retrieves the previous conversation context and can reference earlier exchanges in its responses.

Note

Conversation IDs are tied to the agent and project. Attempting to use a conversation ID with a different agent results in a new conversation being created.

Log session metadata for continuity and diagnostics

Log key session metadata, including the session ID, to a timestamped conversation log file under logs/. This helps you:

Identify the session for debugging and support scenarios.
Correlate user-reported behavior with session metadata.
Track runs over time by preserving per-session log files.

The following code creates the log filename and writes session metadata when SESSION_UPDATED is received:

load_dotenv(os.path.join(_script_dir, './.env'), override=True)

# Set up logging
## Add folder for logging
os.makedirs(os.path.join(_script_dir, 'logs'), exist_ok=True)

## Add timestamp for logfiles
    # </process_events>

    # <handle_events>
    async def _handle_event(self, event: Any) -> None:
        """Handle different types of events from VoiceLive."""
        logger.debug("Received event: %s", event.type)
        ap = self.audio_processor
            self._active_response = False
            self._response_api_done = True

        elif event.type == ServerEventType.ERROR:
            msg = event.error.message
            if "Cancellation failed: no active response" in msg:

In this sample, session metadata logging is applied in three places:

A timestamped conversation log file is created per run.
On SESSION_UPDATED, metadata including session ID, agent name, and voice configuration is appended.
write_conversation_log(...) appends entries to the same file throughout the conversation lifecycle.

Use the logged session metadata with CONVERSATION_ID to resume the same agent conversation in a later session.

In this article, you'll learn how to use Voice Live with Microsoft Foundry Agent Service using the VoiceLive SDK for C#. This article extends the Quickstart: Create a Voice Agent with Foundry Agent Service and Voice Live with more details on features and integration options.

Reference documentation | Package (NuGet) | Additional samples on GitHub

Create and run applications to use Voice Live with agents for real-time conversations.

Agents provide several advantages:

Use centralized configuration in the agent itself instead of session code.
Handle complex logic and conversational behaviors for easier updates.
Connect automatically by using your agent ID.
Support multiple variations without changing client code.

To use Voice Live without Foundry agents, see the Voice Live API quickstart.

Tip

Prerequisites

Note

This guide refers to the Microsoft Foundry (new) portal and the latest Foundry Agent Service version.

An Azure subscription. Create one for free.
.NET 8.0 SDK or later.
The required language runtimes, global tools, and Visual Studio Code extensions. See Prepare your development environment.
A Microsoft Foundry resource created in a supported region. See Voice Live overview documentation for region availability.
A deployed model in Microsoft Foundry. If you don't have one, first complete Quickstart: Set up Microsoft Foundry resources.
The Azure AI User role assigned to your user account. Assign roles in the Azure portal under Access control (IAM) > Add role assignment.

Prepare the environment and create the agent

Complete the Quickstart: Create a Voice Agent with Foundry Agent Service and Voice Live to prepare your environment, set up the agent with Voice Live settings, and run your first test.

Agent integration concepts

These concepts help you understand how Voice Live and Foundry Agent Service work together in the C# sample.

Agent configuration contract

Set AgentSessionConfig in your session setup to identify the target agent and project. Include at minimum agentName and projectName. Add AgentVersion when you want to pin behavior to a specific version.

Authentication for agent mode

Use Microsoft Entra ID credentials for agent mode. Agent invocation doesn't support key-based authentication, so configure AzureCliCredential (or another Entra token credential) for local development and deployment.

API version pinning

Use a consistent SDK version (Azure.AI.VoiceLive 1.1.0-beta.3) in your project file. Consistent versioning keeps behavior predictable across preview updates and avoids schema drift.

Conversation and trace alignment

Treat agent thread and trace records as text-turn history, not exact playback history. If your app allows interruption or truncation, enable truncation-aware handling. This ensures persisted history better matches what users actually heard.

Connect to a specific agent version

Voice Live lets you connect to a specific version of your agent. This enables controlled deployments where production uses a stable version while development tests newer iterations.

To connect to a specific agent version, set the AGENT_VERSION environment variable or pass the agentVersion parameter when initializing the assistant:

    // <agent_config>
    public BasicVoiceAssistant(string endpoint, string agentName, string projectName,
        string? agentVersion = null, string? conversationId = null,
        string? foundryResourceOverride = null, string? authIdentityClientId = null)
    {
        _endpoint = endpoint;

        // Build the agent session configuration
        var config = new AgentSessionConfig(agentName, projectName);
        if (!string.IsNullOrEmpty(agentVersion))
        {
            config.AgentVersion = agentVersion;
        }
        if (!string.IsNullOrEmpty(conversationId))
        {
            config.ConversationId = conversationId;
        }
        if (!string.IsNullOrEmpty(foundryResourceOverride))
        {
            config.FoundryResourceOverride = foundryResourceOverride;
            if (!string.IsNullOrEmpty(authIdentityClientId))
            {
                config.AuthenticationIdentityClientId = authIdentityClientId;
            }
        }
        _agentConfig = config;
    }
    // </agent_config>
// <main>
class Program
{
    static async Task Main(string[] args)
    {
        var endpoint = Environment.GetEnvironmentVariable("VOICELIVE_ENDPOINT");
        var agentName = Environment.GetEnvironmentVariable("AGENT_NAME");
        var projectName = Environment.GetEnvironmentVariable("PROJECT_NAME");
        var agentVersion = Environment.GetEnvironmentVariable("AGENT_VERSION");
        var conversationId = Environment.GetEnvironmentVariable("CONVERSATION_ID");
        var foundryResourceOverride = Environment.GetEnvironmentVariable("FOUNDRY_RESOURCE_OVERRIDE");
        var authIdentityClientId = Environment.GetEnvironmentVariable("AGENT_AUTHENTICATION_IDENTITY_CLIENT_ID");

        Console.WriteLine("Environment variables:");
        Console.WriteLine($"VOICELIVE_ENDPOINT: {endpoint}");
        Console.WriteLine($"AGENT_NAME: {agentName}");
        Console.WriteLine($"PROJECT_NAME: {projectName}");
        Console.WriteLine($"AGENT_VERSION: {agentVersion}");
        Console.WriteLine($"CONVERSATION_ID: {conversationId}");
        Console.WriteLine($"FOUNDRY_RESOURCE_OVERRIDE: {foundryResourceOverride}");

        if (string.IsNullOrEmpty(endpoint) || string.IsNullOrEmpty(agentName)
            || string.IsNullOrEmpty(projectName))
        {
            Console.Error.WriteLine("Set VOICELIVE_ENDPOINT, AGENT_NAME, and PROJECT_NAME environment variables.");
            return;
        }

        // Verify audio devices
        CheckAudioDevices();

        Console.WriteLine("🎙️ Basic Foundry Voice Agent with Azure VoiceLive SDK (Agent Mode)");
        Console.WriteLine(new string('=', 65));

        using var assistant = new BasicVoiceAssistant(
            endpoint, agentName, projectName,
            agentVersion, conversationId,
            foundryResourceOverride, authIdentityClientId);

        // Handle graceful shutdown
        using var cts = new CancellationTokenSource();
        Console.CancelKeyPress += (sender, e) =>
        {
            e.Cancel = true;
            cts.Cancel();
        };

        try
        {
            await assistant.StartAsync(cts.Token);
        }
        catch (OperationCanceledException)
        {
            Console.WriteLine("\n👋 Voice assistant shut down. Goodbye!");
        }
        catch (Exception ex)
        {
            Console.Error.WriteLine($"Fatal Error: {ex.Message}");
        }

The version configuration is applied in three places:

In Main(), read AGENT_VERSION from the environment.
Pass agentVersion to the BasicVoiceAssistant(...) constructor.
In the constructor, set the value on AgentSessionConfig via config.AgentVersion. Send it to Voice Live via StartSessionAsync(SessionTarget.FromAgent(agentConfig)).

The agentVersion value corresponds to the version string returned when you create or update an agent using the Foundry Agent SDK. If not specified, Voice Live connects to the latest agent version.

Connect to an agent on a different Foundry resource

Configure Voice Live to connect to an agent hosted on a different Foundry resource than the one used for audio processing.

This is useful in these scenarios:

The agent is deployed in a region with different feature availability.
You want to separate development and staging from production.
Your organization uses different resources for different workloads.

To connect to an agent on a different resource, configure two environment variables:

FOUNDRY_RESOURCE_OVERRIDE: The Foundry resource name hosting the agent project (for example, my-agent-resource).
AGENT_AUTHENTICATION_IDENTITY_CLIENT_ID: The managed identity client ID of the Voice Live resource, required for cross-resource authentication.

    // <agent_config>
    public BasicVoiceAssistant(string endpoint, string agentName, string projectName,
        string? agentVersion = null, string? conversationId = null,
        string? foundryResourceOverride = null, string? authIdentityClientId = null)
    {
        _endpoint = endpoint;

        // Build the agent session configuration
        var config = new AgentSessionConfig(agentName, projectName);
        if (!string.IsNullOrEmpty(agentVersion))
        {
            config.AgentVersion = agentVersion;
        }
        if (!string.IsNullOrEmpty(conversationId))
        {
            config.ConversationId = conversationId;
        }
        if (!string.IsNullOrEmpty(foundryResourceOverride))
        {
            config.FoundryResourceOverride = foundryResourceOverride;
            if (!string.IsNullOrEmpty(authIdentityClientId))
            {
                config.AuthenticationIdentityClientId = authIdentityClientId;
            }
        }
        _agentConfig = config;
    }
    // </agent_config>
// <main>
class Program
{
    static async Task Main(string[] args)
    {
        var endpoint = Environment.GetEnvironmentVariable("VOICELIVE_ENDPOINT");
        var agentName = Environment.GetEnvironmentVariable("AGENT_NAME");
        var projectName = Environment.GetEnvironmentVariable("PROJECT_NAME");
        var agentVersion = Environment.GetEnvironmentVariable("AGENT_VERSION");
        var conversationId = Environment.GetEnvironmentVariable("CONVERSATION_ID");
        var foundryResourceOverride = Environment.GetEnvironmentVariable("FOUNDRY_RESOURCE_OVERRIDE");
        var authIdentityClientId = Environment.GetEnvironmentVariable("AGENT_AUTHENTICATION_IDENTITY_CLIENT_ID");

        Console.WriteLine("Environment variables:");
        Console.WriteLine($"VOICELIVE_ENDPOINT: {endpoint}");
        Console.WriteLine($"AGENT_NAME: {agentName}");
        Console.WriteLine($"PROJECT_NAME: {projectName}");
        Console.WriteLine($"AGENT_VERSION: {agentVersion}");
        Console.WriteLine($"CONVERSATION_ID: {conversationId}");
        Console.WriteLine($"FOUNDRY_RESOURCE_OVERRIDE: {foundryResourceOverride}");

        if (string.IsNullOrEmpty(endpoint) || string.IsNullOrEmpty(agentName)
            || string.IsNullOrEmpty(projectName))
        {
            Console.Error.WriteLine("Set VOICELIVE_ENDPOINT, AGENT_NAME, and PROJECT_NAME environment variables.");
            return;
        }

        // Verify audio devices
        CheckAudioDevices();

        Console.WriteLine("🎙️ Basic Foundry Voice Agent with Azure VoiceLive SDK (Agent Mode)");
        Console.WriteLine(new string('=', 65));

        using var assistant = new BasicVoiceAssistant(
            endpoint, agentName, projectName,
            agentVersion, conversationId,
            foundryResourceOverride, authIdentityClientId);

        // Handle graceful shutdown
        using var cts = new CancellationTokenSource();
        Console.CancelKeyPress += (sender, e) =>
        {
            e.Cancel = true;
            cts.Cancel();
        };

        try
        {
            await assistant.StartAsync(cts.Token);
        }
        catch (OperationCanceledException)
        {
            Console.WriteLine("\n👋 Voice assistant shut down. Goodbye!");
        }
        catch (Exception ex)
        {
            Console.Error.WriteLine($"Fatal Error: {ex.Message}");
        }

The configuration is resolved in Main() and applied when the assistant is created:

Read FOUNDRY_RESOURCE_OVERRIDE and AGENT_AUTHENTICATION_IDENTITY_CLIENT_ID from environment variables.
Pass both values to the BasicVoiceAssistant(...) constructor.
In the constructor, set both values on AgentSessionConfig via config.FoundryResourceOverride and config.AuthenticationIdentityClientId. Send them in StartSessionAsync(SessionTarget.FromAgent(agentConfig)).

Important

Cross-resource connections require proper role assignments. Ensure the Voice Live resource's managed identity has the Azure AI User role on the target agent resource.

Add a proactive message at session start

Send a proactive message to initiate conversations when the session is ready. The assistant checks a one-time flag in the SessionUpdateSessionUpdated event handler, sends a greeting prompt, and triggers a response.

// <proactive_greeting>
private async Task SendProactiveGreetingAsync(CancellationToken cancellationToken)
{
    Console.WriteLine("Sending proactive greeting request");
    try
    {
        // Create a system message to trigger greeting
        await _session!.SendCommandAsync(
            BinaryData.FromObjectAsJson(new
            {
                type = "conversation.item.create",
                item = new
                {
                    type = "message",
                    role = "system",
                    content = new[]
                    {
                        new { type = "input_text", text = "Say something to welcome the user in English." }
                    }
                }
            }), cancellationToken).ConfigureAwait(false);

        // Request a response
        await _session!.SendCommandAsync(
            BinaryData.FromObjectAsJson(new { type = "response.create" }),
            cancellationToken).ConfigureAwait(false);
    }
    catch (Exception ex)
    {
        Console.Error.WriteLine($"Failed to send proactive greeting: {ex.Message}");
    }
}
// </proactive_greeting>

Proactive messaging is applied in three steps:

_greetingSent is a bool initialized to false to track one-time greeting state.
In the SessionUpdateSessionUpdated branch, if (!_greetingSent) gates execution to run once per session.
SendCommandAsync(...) with a conversation.item.create payload adds the greeting to conversation context. A response.create command generates spoken output.

Improve tool calling and latency wait times

Voice Live offers InterimResponse to bridge wait times during tool calling or when generating responses with high latency.

The feature supports two modes:

LlmInterimResponseConfig: LLM-generated interim response—best for dynamic starts.
InterimResponseTrigger: Pre-generated interim response—best for deterministic or branded messaging.

The quickstart voice assistant shows the required code additions:

// <setup_session>
private async Task SetupSessionAsync(CancellationToken cancellationToken)
{
    Console.WriteLine("Setting up voice conversation session...");

    // Create session configuration with interim response to bridge latency gaps
    var interimConfig = new LlmInterimResponseConfig
    {
        Instructions = "Create friendly interim responses indicating wait time due to "
            + "ongoing processing, if any. Do not include in all responses! Do not "
            + "say you don't have real-time access to information when calling tools!",
    };
    interimConfig.Triggers.Add(InterimResponseTrigger.Tool);
    interimConfig.Triggers.Add(InterimResponseTrigger.Latency);
    interimConfig.LatencyThresholdMs = 100;

    var options = new VoiceLiveSessionOptions
    {
        InputAudioFormat = InputAudioFormat.Pcm16,
        OutputAudioFormat = OutputAudioFormat.Pcm16,
        InterimResponse = BinaryData.FromObjectAsJson(interimConfig)
    };

    // Send session configuration
    await _session!.ConfigureSessionAsync(options, cancellationToken).ConfigureAwait(false);

    Console.WriteLine("Session configuration sent");
}
// </setup_session>

The interim response setup is applied inside SetupSessionAsync():

A LlmInterimResponseConfig is created with custom instructions and triggers for Tool and Latency events.
The config is serialized via BinaryData.FromObjectAsJson() and assigned to VoiceLiveSessionOptions.InterimResponse.
ConfigureSessionAsync(options) sends the complete session configuration—including interim response—to Voice Live.

Use auto truncation for interrupted responses

When users interrupt agent audio, conversation text can drift from what users actually heard. Auto truncation keeps session context aligned with delivered audio. This improves follow-up responses after barge-in and keeps conversation logging more accurate.

The sample currently shows interruption handling with CancelResponseAsync() during speech start, but it doesn't configure auto_truncate in turn_detection.

Note

In Foundry Agent Service, thread messages and trace records are based on text content. Without auto truncation, these records can differ from the exact portion of audio users heard before interruption.

See Handle voice interruptions in chat history (preview) for setup details and supported options.

Reconnect to a previous agent conversation

Reconnect to a previous conversation by specifying the conversation ID. This preserves history and context, allowing users to continue where they left off.

When a session connects successfully, Voice Live returns session metadata in the SessionUpdateSessionUpdated event. Extract the session ID and log it to the conversation file:

// <handle_events>
private async Task HandleEventAsync(SessionUpdate serverEvent, CancellationToken cancellationToken)
{
    switch (serverEvent)
    {
        case SessionUpdateSessionUpdated sessionUpdated:
            Console.WriteLine("Session updated and ready");

            var sessionId = sessionUpdated.Session?.Id;
            WriteLog($"SessionID: {sessionId}\n");

            // Send a proactive greeting
            if (!_greetingSent)
            {
                _greetingSent = true;
                await SendProactiveGreetingAsync(cancellationToken).ConfigureAwait(false);
            }

            // Start audio capture once session is ready
            _audioProcessor?.StartCapture();
            break;

In this event handler, the session ID is extracted from sessionUpdated.Session?.Id and written to the conversation log.

The sample writes session details to a conversation log file in the logs/ folder (for example, logs/conversation_20260219_143000.log).

To reconnect, pass the conversation ID as the CONVERSATION_ID environment variable or the conversationId parameter:

var conversationId = Environment.GetEnvironmentVariable("CONVERSATION_ID");
using var assistant = new BasicVoiceAssistant(
    endpoint, agentName, projectName,
    agentVersion, conversationId,
    foundryResourceOverride, authIdentityClientId);

Conversation reconnect is applied in three places:

In Main(), read CONVERSATION_ID from the environment (line 458).
Pass the value to the BasicVoiceAssistant(...) constructor (lines 483-486).
In the constructor, set the value on AgentSessionConfig via config.ConversationId.

When a valid conversationId is provided, the agent retrieves the previous conversation context and can reference earlier exchanges.

Note

Conversation IDs are tied to the agent and project. Using a conversation ID with a different agent creates a new conversation.

Log session metadata for continuity and diagnostics

The sample logs key session metadata, including the session ID, to a timestamped conversation log file under logs/. This helps you:

Identify the session for debugging and support.
Correlate user-reported behavior with session metadata.
Track runs over time by preserving per-session log files.

The following code creates the log filename and writes session metadata when SessionUpdateSessionUpdated fires:

private static readonly string LogFilename = $"conversation_{DateTime.Now:yyyyMMdd_HHmmss}.log";

// <handle_events>
private async Task HandleEventAsync(SessionUpdate serverEvent, CancellationToken cancellationToken)
{
    switch (serverEvent)
    {
        case SessionUpdateSessionUpdated sessionUpdated:
            Console.WriteLine("Session updated and ready");

            var sessionId = sessionUpdated.Session?.Id;
            WriteLog($"SessionID: {sessionId}\n");

            // Send a proactive greeting
            if (!_greetingSent)
            {
                _greetingSent = true;
                await SendProactiveGreetingAsync(cancellationToken).ConfigureAwait(false);
            }

            // Start audio capture once session is ready
            _audioProcessor?.StartCapture();
            break;
private static void WriteLog(string message)
{
    try
    {
        var logDir = Path.Combine(Directory.GetCurrentDirectory(), "logs");
        Directory.CreateDirectory(logDir);
        File.AppendAllText(Path.Combine(logDir, LogFilename), message + Environment.NewLine);
    }
    catch (IOException ex)
    {
        Console.Error.WriteLine($"Failed to write conversation log: {ex.Message}");
    }
}

Session metadata logging is applied in three places:

A timestamped conversation log file (conversation_YYYYMMDD_HHmmss.log) is created per run (lines 188–189).
On SessionUpdateSessionUpdated, the handler extracts the session ID and writes it to the log (lines 309–310).
WriteLog(...) appends entries throughout the conversation lifecycle (lines 427–439).

Use the logged session metadata with CONVERSATION_ID to resume the same agent conversation later. Use the session ID alongside your conversation ID for diagnostics and reconnect scenarios.

Learn how to use Voice Live with Microsoft Foundry Agent Service using the VoiceLive SDK for JavaScript. This article builds on the Quickstart: Create a Voice Agent with Foundry Agent Service and Voice Live with advanced features and integration options.

Reference documentation | Package (npm) | Additional samples on GitHub

Create and run applications to use Voice Live with agents for real-time conversations.

Agents provide several advantages:

Use centralized configuration in the agent itself instead of session code.
Handle complex logic and conversational behaviors for easier updates.
Connect automatically by using your agent ID.
Support multiple variations without changing client code.

To use Voice Live without Foundry agents, see the Voice Live API quickstart.

Tip

Note

The JavaScript Voice Live SDK is designed for browser-based applications with built-in WebSocket and Web Audio support. This how-to guide uses Node.js with node-record-lpcm16 and speaker for a console experience. For a full browser-based voice UI, see the Voice Live universal assistant sample.

Prerequisites

Note

This document refers to the Microsoft Foundry (new) portal and the latest Foundry Agent Service version.

An Azure subscription. Create one for free.
Node.js version 18 or later.
SoX installed on your system (required by node-record-lpcm16 for microphone capture).
The required language runtimes, global tools, and Visual Studio Code extensions as described in Prepare your development environment.
A Microsoft Foundry resource created in one of the supported regions. For more information about region availability, see the Voice Live overview documentation.
A model deployed in Microsoft Foundry. If you don't have a model, first complete Quickstart: Set up Microsoft Foundry resources.

Assign the Azure AI User role to your user account. You can assign roles in the Azure portal under Access control (IAM) > Add role assignment.

Prepare the environment and create the agent

Complete the Quickstart: Create a Voice Agent with Foundry Agent Service and Voice Live to set up your environment, configure the agent with Voice Live settings, and test your first conversation.

Agent integration concepts

Use these concepts to understand how Voice Live and Foundry Agent Service work together in the JavaScript sample.

Agent configuration contract

Set the agent property with an AgentSessionConfig object in your createSession(...) call to identify the target agent and project. At minimum, include agentName and projectName. Add agentVersion when you want to pin behavior to a specific version.

Authentication model for agent mode

Use Microsoft Entra ID credentials for agent mode. Agent invocation in this flow doesn't support key-based authentication, so configure DefaultAzureCredential (or another Entra token credential) for local development and deployment.

API version pinning

Use a consistent SDK version (@azure/ai-voicelive@1.0.0-beta.3) in your package.json to keep behavior predictable across preview updates. Use the same version consistently across quickstart and how-to samples to avoid schema drift.

Conversation and trace alignment

Connect to a specific agent version

Pin your agent to a specific version to enable controlled deployments. This lets production use stable versions while development tests newer iterations.

Set the AGENT_VERSION environment variable or pass the agentVersion property when initializing the assistant:

   * @param {string} [opts.greetingText]
   * @param {boolean} [opts.noAudio]
   */
  // <agent_config>
  constructor(opts) {
    this.endpoint = opts.endpoint;
    this.credential = opts.credential;
    this.greetingText = opts.greetingText;
    this.noAudio = opts.noAudio;
    this.agentConfig = {
      agentName: opts.agentName,
      projectName: opts.projectName,
      ...(opts.agentVersion && { agentVersion: opts.agentVersion }),
      ...(opts.conversationId && { conversationId: opts.conversationId }),
      ...(opts.foundryResourceOverride && {
        foundryResourceOverride: opts.foundryResourceOverride,
      }),
      ...(opts.foundryResourceOverride &&
        opts.authenticationIdentityClientId && {
          authenticationIdentityClientId: opts.authenticationIdentityClientId,
        }),
    };

    this._session = null;
    this._audio = new AudioProcessor(!opts.noAudio, opts.audioInputDevice);
  console.log("");
  console.log("Options:");
  console.log("  --endpoint <url>            VoiceLive endpoint URL");
  console.log("  --agent-name <name>         Foundry agent name");
  console.log("  --project-name <name>       Foundry project name");
  console.log("  --agent-version <ver>       Agent version");
  console.log("  --conversation-id <id>      Conversation ID to resume");
  console.log("  --foundry-resource <name>   Foundry resource override");
  console.log("  --auth-client-id <id>       Authentication identity client ID");
  console.log("  --audio-input-device <name> Explicit SoX input device name (Windows)");
  console.log("  --list-audio-devices        List available audio input devices and exit");
  console.log("  --greeting-text <text>      Send a pre-defined greeting instead of LLM-generated");
  console.log("  --no-audio                  Connect and configure session without mic/speaker");
  console.log("  -h, --help                  Show this help text");
}

  console.log(`  CONVERSATION_ID: ${args.conversationId ?? "(not set)"}`);
  console.log(
    `  FOUNDRY_RESOURCE_OVERRIDE: ${args.foundryResourceOverride ?? "(not set)"}`,
  );
  console.log(
    `  AGENT_AUTHENTICATION_IDENTITY_CLIENT_ID: ${args.authenticationIdentityClientId ?? "(not set)"}`,
  );
  console.log(`  AUDIO_INPUT_DEVICE: ${args.audioInputDevice ?? "(not set)"}`);
  if (args.greetingText) {
    console.log(`  Proactive greeting: pre-defined`);
  } else {
    console.log(`  Proactive greeting: LLM-generated (default)`);
  }

In this sample, the version configuration is applied in three places:

In main(), AGENT_VERSION is read from process.env.
In the BasicVoiceAssistant constructor, agentVersion is spread into the agentConfig object.
The config is passed to client.createSession({ agent: this.agentConfig }), which sends it to Voice Live.

The agentVersion value corresponds to the version string returned when you create or update an agent using the Foundry Agent SDK. If not specified, Voice Live connects to the latest version of the agent.

Connect to an agent on a different Foundry resource

Configure Voice Live to connect to an agent on a different Foundry resource for audio processing. This is useful when:

The agent is deployed in a region that has different feature availability
You want to separate development/staging environments from production
Your organization uses different resources for different workloads

To connect to an agent on a different resource, configure two additional environment variables:

FOUNDRY_RESOURCE_OVERRIDE: The Foundry resource name hosting the agent project (for example, my-agent-resource).
AGENT_AUTHENTICATION_IDENTITY_CLIENT_ID: The managed identity client ID of the Voice Live resource, required for cross-resource authentication.

   * @param {string} [opts.greetingText]
   * @param {boolean} [opts.noAudio]
   */
  // <agent_config>
  constructor(opts) {
    this.endpoint = opts.endpoint;
    this.credential = opts.credential;
    this.greetingText = opts.greetingText;
    this.noAudio = opts.noAudio;
    this.agentConfig = {
      agentName: opts.agentName,
      projectName: opts.projectName,
      ...(opts.agentVersion && { agentVersion: opts.agentVersion }),
      ...(opts.conversationId && { conversationId: opts.conversationId }),
      ...(opts.foundryResourceOverride && {
        foundryResourceOverride: opts.foundryResourceOverride,
      }),
      ...(opts.foundryResourceOverride &&
        opts.authenticationIdentityClientId && {
          authenticationIdentityClientId: opts.authenticationIdentityClientId,
        }),
    };

    this._session = null;
    this._audio = new AudioProcessor(!opts.noAudio, opts.audioInputDevice);
  console.log("");
  console.log("Options:");
  console.log("  --endpoint <url>            VoiceLive endpoint URL");
  console.log("  --agent-name <name>         Foundry agent name");
  console.log("  --project-name <name>       Foundry project name");
  console.log("  --agent-version <ver>       Agent version");
  console.log("  --conversation-id <id>      Conversation ID to resume");
  console.log("  --foundry-resource <name>   Foundry resource override");
  console.log("  --auth-client-id <id>       Authentication identity client ID");
  console.log("  --audio-input-device <name> Explicit SoX input device name (Windows)");
  console.log("  --list-audio-devices        List available audio input devices and exit");
  console.log("  --greeting-text <text>      Send a pre-defined greeting instead of LLM-generated");
  console.log("  --no-audio                  Connect and configure session without mic/speaker");
  console.log("  -h, --help                  Show this help text");
}

  console.log(`  CONVERSATION_ID: ${args.conversationId ?? "(not set)"}`);
  console.log(
    `  FOUNDRY_RESOURCE_OVERRIDE: ${args.foundryResourceOverride ?? "(not set)"}`,
  );
  console.log(
    `  AGENT_AUTHENTICATION_IDENTITY_CLIENT_ID: ${args.authenticationIdentityClientId ?? "(not set)"}`,
  );
  console.log(`  AUDIO_INPUT_DEVICE: ${args.audioInputDevice ?? "(not set)"}`);
  if (args.greetingText) {
    console.log(`  Proactive greeting: pre-defined`);
  } else {
    console.log(`  Proactive greeting: LLM-generated (default)`);
  }

This configuration is resolved in main() and then applied when the assistant is created:

FOUNDRY_RESOURCE_OVERRIDE and AGENT_AUTHENTICATION_IDENTITY_CLIENT_ID are read from process.env.
Both values are spread into the constructor options.
In the constructor, the values are conditionally set on the agentConfig object, which is sent in client.createSession({ agent: this.agentConfig }).

Important

Cross-resource connections require proper role assignments. Ensure the Voice Live resource's managed identity has the Azure AI User role on the target agent resource.

Add a proactive message at session start

Voice Live can initiate the conversation by sending a proactive message as soon as the session is ready. In this sample, the assistant checks a one-time flag in the onSessionUpdated handler, sends a greeting prompt, and then triggers a response.

    await session.dispose();
  } catch {
    // ignore dispose errors during shutdown
  }
}
// </start_session>

// <proactive_greeting>
/**
 * Send a proactive greeting when the session starts.
 * Supports pre-defined (--greeting-text) or LLM-generated (default).
 */
async _sendProactiveGreeting() {
  const session = this._session;

  if (this.greetingText) {
    // Pre-generated assistant message (deterministic)
    console.log("[session] Sending pre-generated greeting ...");
    try {
      await session.sendEvent({
        type: "response.create",
        response: {
          preGeneratedAssistantMessage: {
            content: [{ type: "text", text: this.greetingText }],
          },
        },
      });
    } catch (err) {
      console.error("[session] Failed to send pre-generated greeting:", err.message);
    }
  } else {
    // LLM-generated greeting (default)
    console.log("[session] Sending proactive greeting ...");
    try {
      await session.addConversationItem({
        type: "message",
        role: "system",
        content: [

In this sample, proactive messaging is applied in three steps:

_greetingSent is a boolean initialized to false to track one-time greeting state.
In the onSessionUpdated handler, if (!this._greetingSent) gates proactive execution to run once per session.
session.addConversationItem(...) adds the greeting instruction to conversation context, and session.sendEvent({ type: "response.create" }) generates spoken output.

Improving tool calling and latency wait times

Voice Live provides a feature called interimResponse to bridge wait times when tool calling is required or a high latency is experienced to generate an agent response.

The voice assistant created with the quickstart shows the required code additions to configure this feature as follows:

            text: "Say something to welcome the user in English.",
          },
        ],
      });
      await session.sendEvent({ type: "response.create" });
    } catch (err) {
      console.error("[session] Failed to send greeting:", err.message);
    }
  }
}
// </proactive_greeting>

// <setup_session>
/** Configure session modalities, audio format, and interim response. */
async _setupSession() {
  console.log("[session] Configuring session ...");
  await this._session.updateSession({

In this sample, the interim response setup is applied inside _setupSession():

interimResponse defines when interim responses trigger and what style they use.
session.updateSession(...) sends the session configuration to Voice Live, including the interim response settings.

Use auto truncation for interrupted responses

This sample currently shows interruption handling with response.cancel during speech start, but it doesn't configure auto_truncate in turn_detection.

Note

For setup details and supported options, see Handle voice interruptions in chat history (preview).

Reconnect to a previous agent conversation

Voice Live enables you to reconnect to a previous conversation by specifying the conversation ID. This preserves the conversation history and context, allowing users to continue where they left off.

When a session connects successfully, Voice Live returns session metadata in the onSessionUpdated handler. The sample extracts the session ID from the context and logs it to the conversation file:

    `for project "${this.agentConfig.projectName}" ...`,
);

// Subscribe to VoiceLive events BEFORE connecting, so the
// SESSION_UPDATED event is not missed.
// <handle_events>
const subscription = session.subscribe({
  // <session_updated_metadata>
  onSessionUpdated: async (event, context) => {
    const s = event.session;
    const agent = s?.agent;
    const voice = s?.voice;
    console.log(`[session] Session ready: ${context.sessionId}`);
    writeConversationLog(
      [
        `SessionID: ${context.sessionId}`,
        `Agent Name: ${agent?.name ?? ""}`,

In this event handler, the session ID is extracted from context.sessionId and written to the conversation log along with agent metadata.

The sample code writes session details to a conversation log file in the logs/ folder (for example, logs/conversation_20260219_143000.log).

To reconnect to that conversation, pass the conversation ID as the CONVERSATION_ID environment variable (or the conversationId property):

console.log("  --conversation-id <id>      Conversation ID to resume");
);

In this sample, conversation reconnect is applied in three places:

In main(), CONVERSATION_ID is read from process.env (line 542).
The value is passed to the BasicVoiceAssistant constructor.
In the constructor, conversationId is conditionally spread into the agentConfig object.

When a valid conversationId is provided, the agent retrieves the previous conversation context and can reference earlier exchanges in its responses.

Note

Conversation IDs are tied to the agent and project. Attempting to use a conversation ID with a different agent results in a new conversation being created.

Log session metadata for continuity and diagnostics

The sample logs key session metadata, including the session ID, to a timestamped conversation log file under logs/. This helps you:

Identify the session for debugging and support scenarios.
Correlate user-reported behavior with session metadata.
Track runs over time by preserving per-session log files.

The following code creates the log filename and writes session metadata when onSessionUpdated is received:

// ---------------------------------------------------------------------------
const logsDir = join(__dirname, "logs");
if (!existsSync(logsDir)) mkdirSync(logsDir, { recursive: true });

const timestamp = new Date()
  .toISOString()
  .replace(/[:.]/g, "-")
  .replace("T", "_")
  .slice(0, 19);
const conversationLogFile = join(logsDir, `conversation_${timestamp}.log`);

function writeConversationLog(message) {
  appendFileSync(conversationLogFile, message + "\n", "utf-8");
}
        `for project "${this.agentConfig.projectName}" ...`,
    );

    // Subscribe to VoiceLive events BEFORE connecting, so the
    // SESSION_UPDATED event is not missed.
    // <handle_events>
    const subscription = session.subscribe({
      // <session_updated_metadata>
      onSessionUpdated: async (event, context) => {
        const s = event.session;
        const agent = s?.agent;
        const voice = s?.voice;
        console.log(`[session] Session ready: ${context.sessionId}`);
        writeConversationLog(
          [
            `SessionID: ${context.sessionId}`,
            `Agent Name: ${agent?.name ?? ""}`,

In this sample, session metadata logging is applied in three places:

A logs/ directory is created if it doesn't exist, and a timestamped conversation log file (conversation_YYYYMMDD_HHmmss.log) is created per run (lines 20–28).
On onSessionUpdated, the handler extracts the session ID from context.sessionId and writes it along with agent metadata to the log (lines 302–305).
writeConversationLog(...) appends entries to the same log file throughout the conversation lifecycle (lines 30–33).

Use the logged session metadata with CONVERSATION_ID to resume the same agent conversation in a later session.

Use the session ID value alongside your conversation ID for diagnostics and reconnect scenarios.

Learn how to use Voice Live with Microsoft Foundry Agent Service using the VoiceLive SDK for Java. This article builds on the Quickstart: Create a Voice Agent with Foundry Agent Service and Voice Live with advanced features and integration options.

Reference documentation | Package (Maven) | Additional samples on GitHub

Create and run applications to use Voice Live with agents for real-time conversations.

Agents provide several advantages:

Use centralized configuration in the agent itself instead of session code.
Handle complex logic and conversational behaviors for easier updates.
Connect automatically by using your agent ID.
Support multiple variations without changing client code.

To use Voice Live without Foundry agents, see the Voice Live API quickstart.

Tip

Prerequisites

Note

This document refers to the Microsoft Foundry (new) portal and the latest Foundry Agent Service version.

An Azure subscription. Create one for free.
Java Development Kit (JDK) version 11 or later.
Apache Maven installed.
The required language runtimes, global tools, and Visual Studio Code extensions as described in Prepare your development environment.
A Microsoft Foundry resource created in one of the supported regions. For more information about region availability, see the Voice Live overview documentation.
A model deployed in Microsoft Foundry. If you don't have a model, first complete Quickstart: Set up Microsoft Foundry resources.

Assign the Azure AI User role to your user account. You can assign roles in the Azure portal under Access control (IAM) > Add role assignment.

Prepare the environment and create the agent

Complete the Quickstart: Create a Voice Agent with Foundry Agent Service and Voice Live to set up your environment, configure the agent with Voice Live settings, and test your first conversation.

Agent integration concepts

Use these concepts to understand how Voice Live and Foundry Agent Service work together in the Java sample.

Agent configuration contract

Set AgentSessionConfig in your session setup to identify the target agent and project. At minimum, include agentName and projectName. Add agentVersion when you want to pin behavior to a specific version.

Authentication model for agent mode

API version pinning

Use a consistent SDK version (azure-ai-voicelive:1.0.0-beta.5) in the Maven POM to keep behavior predictable across preview updates. Use the same version consistently across quickstart and how-to samples to avoid schema drift.

Conversation and trace alignment

Connect to a specific agent version

Pin your agent to a specific version to enable controlled deployments. This lets production use stable versions while development tests newer iterations.

Set the AGENT_VERSION environment variable or pass the agentVersion parameter when initializing the assistant:

    // <agent_config>
    BasicVoiceAssistant(String endpoint, String agentName, String projectName,
                        String agentVersion, String conversationId,
                        String foundryResourceOverride, String authIdentityClientId) {
        this.endpoint = endpoint;

        // Build the agent session configuration
        AgentSessionConfig config = new AgentSessionConfig(agentName, projectName);
        if (agentVersion != null && !agentVersion.isEmpty()) {
            config.setAgentVersion(agentVersion);
        }
        if (conversationId != null && !conversationId.isEmpty()) {
            config.setConversationId(conversationId);
        }
        if (foundryResourceOverride != null && !foundryResourceOverride.isEmpty()) {
            config.setFoundryResourceOverride(foundryResourceOverride);
            if (authIdentityClientId != null && !authIdentityClientId.isEmpty()) {
                config.setAuthenticationIdentityClientId(authIdentityClientId);
            }
        }
        this.agentConfig = config;
    }
    // </agent_config>
// <main>
public static void main(String[] args) {
    String endpoint = System.getenv("VOICELIVE_ENDPOINT");
    String agentName = System.getenv("AGENT_NAME");
    String projectName = System.getenv("PROJECT_NAME");
    String agentVersion = System.getenv("AGENT_VERSION");
    String conversationId = System.getenv("CONVERSATION_ID");
    String foundryResourceOverride = System.getenv("FOUNDRY_RESOURCE_OVERRIDE");
    String authIdentityClientId = System.getenv("AGENT_AUTHENTICATION_IDENTITY_CLIENT_ID");

    System.out.println("Environment variables:");
    System.out.println("VOICELIVE_ENDPOINT: " + endpoint);
    System.out.println("AGENT_NAME: " + agentName);
    System.out.println("PROJECT_NAME: " + projectName);
    System.out.println("AGENT_VERSION: " + agentVersion);
    System.out.println("CONVERSATION_ID: " + conversationId);
    System.out.println("FOUNDRY_RESOURCE_OVERRIDE: " + foundryResourceOverride);

    if (endpoint == null || endpoint.isEmpty()
            || agentName == null || agentName.isEmpty()
            || projectName == null || projectName.isEmpty()) {
        System.err.println("Set VOICELIVE_ENDPOINT, AGENT_NAME, and PROJECT_NAME environment variables.");
        System.exit(1);
    }

    // Verify audio devices
    checkAudioDevices();

    System.out.println("🎙️ Basic Foundry Voice Agent with Azure VoiceLive SDK (Agent Mode)");
    System.out.println("=".repeat(65));

    BasicVoiceAssistant assistant = new BasicVoiceAssistant(
            endpoint, agentName, projectName,
            agentVersion, conversationId,
            foundryResourceOverride, authIdentityClientId);

    // Handle graceful shutdown
    Runtime.getRuntime().addShutdownHook(new Thread(() -> {
        System.out.println("\n👋 Voice assistant shut down. Goodbye!");
    }));

    try {
        assistant.start();
    } catch (InterruptedException e) {
        Thread.currentThread().interrupt();
        System.out.println("\n👋 Voice assistant shut down. Goodbye!");
    } catch (Exception e) {
        System.err.println("Fatal Error: " + e.getMessage());
        e.printStackTrace();
    }
}
// </main>

In this sample, the version configuration is applied in three places:

In main(), AGENT_VERSION is read from the environment.
In the BasicVoiceAssistant(...) constructor, agentVersion is passed in.
In the constructor, the value is set on AgentSessionConfig via config.setAgentVersion(agentVersion), and then sent to Voice Live via client.startSession(agentConfig).

Connect to an agent on a different Foundry resource

Configure Voice Live to connect to an agent on a different Foundry resource for audio processing. This is useful when:

The agent is deployed in a region that has different feature availability
You want to separate development/staging environments from production
Your organization uses different resources for different workloads

To connect to an agent on a different resource, configure two additional environment variables:

FOUNDRY_RESOURCE_OVERRIDE: The Foundry resource name hosting the agent project (for example, my-agent-resource).
AGENT_AUTHENTICATION_IDENTITY_CLIENT_ID: The managed identity client ID of the Voice Live resource, required for cross-resource authentication.

    // <agent_config>
    BasicVoiceAssistant(String endpoint, String agentName, String projectName,
                        String agentVersion, String conversationId,
                        String foundryResourceOverride, String authIdentityClientId) {
        this.endpoint = endpoint;

        // Build the agent session configuration
        AgentSessionConfig config = new AgentSessionConfig(agentName, projectName);
        if (agentVersion != null && !agentVersion.isEmpty()) {
            config.setAgentVersion(agentVersion);
        }
        if (conversationId != null && !conversationId.isEmpty()) {
            config.setConversationId(conversationId);
        }
        if (foundryResourceOverride != null && !foundryResourceOverride.isEmpty()) {
            config.setFoundryResourceOverride(foundryResourceOverride);
            if (authIdentityClientId != null && !authIdentityClientId.isEmpty()) {
                config.setAuthenticationIdentityClientId(authIdentityClientId);
            }
        }
        this.agentConfig = config;
    }
    // </agent_config>
// <main>
public static void main(String[] args) {
    String endpoint = System.getenv("VOICELIVE_ENDPOINT");
    String agentName = System.getenv("AGENT_NAME");
    String projectName = System.getenv("PROJECT_NAME");
    String agentVersion = System.getenv("AGENT_VERSION");
    String conversationId = System.getenv("CONVERSATION_ID");
    String foundryResourceOverride = System.getenv("FOUNDRY_RESOURCE_OVERRIDE");
    String authIdentityClientId = System.getenv("AGENT_AUTHENTICATION_IDENTITY_CLIENT_ID");

    System.out.println("Environment variables:");
    System.out.println("VOICELIVE_ENDPOINT: " + endpoint);
    System.out.println("AGENT_NAME: " + agentName);
    System.out.println("PROJECT_NAME: " + projectName);
    System.out.println("AGENT_VERSION: " + agentVersion);
    System.out.println("CONVERSATION_ID: " + conversationId);
    System.out.println("FOUNDRY_RESOURCE_OVERRIDE: " + foundryResourceOverride);

    if (endpoint == null || endpoint.isEmpty()
            || agentName == null || agentName.isEmpty()
            || projectName == null || projectName.isEmpty()) {
        System.err.println("Set VOICELIVE_ENDPOINT, AGENT_NAME, and PROJECT_NAME environment variables.");
        System.exit(1);
    }

    // Verify audio devices
    checkAudioDevices();

    System.out.println("🎙️ Basic Foundry Voice Agent with Azure VoiceLive SDK (Agent Mode)");
    System.out.println("=".repeat(65));

    BasicVoiceAssistant assistant = new BasicVoiceAssistant(
            endpoint, agentName, projectName,
            agentVersion, conversationId,
            foundryResourceOverride, authIdentityClientId);

    // Handle graceful shutdown
    Runtime.getRuntime().addShutdownHook(new Thread(() -> {
        System.out.println("\n👋 Voice assistant shut down. Goodbye!");
    }));

    try {
        assistant.start();
    } catch (InterruptedException e) {
        Thread.currentThread().interrupt();
        System.out.println("\n👋 Voice assistant shut down. Goodbye!");
    } catch (Exception e) {
        System.err.println("Fatal Error: " + e.getMessage());
        e.printStackTrace();
    }
}
// </main>

This configuration is resolved in main() and then applied when the assistant is created:

FOUNDRY_RESOURCE_OVERRIDE and AGENT_AUTHENTICATION_IDENTITY_CLIENT_ID are read from environment variables.
Both values are passed to the BasicVoiceAssistant(...) constructor.
In the constructor, the values are set on AgentSessionConfig via config.setFoundryResourceOverride(...) and config.setAuthenticationIdentityClientId(...), which is sent in client.startSession(agentConfig).

Important

Cross-resource connections require proper role assignments. Ensure the Voice Live resource's managed identity has the Azure AI User role on the target agent resource.

Add a proactive message at session start

// <proactive_greeting>
private void sendProactiveGreeting() {
    logger.info("Sending proactive greeting request");
    try {
        // Create a system message to trigger greeting
        SystemMessageItem greetingMessage = new SystemMessageItem(
                Arrays.asList(new InputTextContentPart("Say something to welcome the user in English.")));
        ClientEventConversationItemCreate createEvent = new ClientEventConversationItemCreate()
                .setItem(greetingMessage);
        session.sendEvent(createEvent).block();

        // Request a response
        session.sendEvent(new ClientEventResponseCreate()).block();
    } catch (Exception e) {
        logger.log(Level.WARNING, "Failed to send proactive greeting", e);
    }
}
// </proactive_greeting>

In this sample, proactive messaging is applied in three steps:

greetingSent is a boolean initialized to false to track one-time greeting state.
In the SESSION_UPDATED branch, if (!greetingSent) gates proactive execution to run once per session.
sendEvent(new ClientEventConversationItemCreate()...) adds the greeting instruction to conversation context, and sendEvent(new ClientEventResponseCreate()) generates spoken output.

Improve tool calling and latency wait times

Use Voice Live's interimResponse feature to bridge wait times during tool calling or when generating agent responses with high latency.

This feature supports two modes:

LlmInterimResponseConfig: LLM-generated interim response - best for dynamic and adaptive starts
InterimResponseTrigger: Pre-generated interim response - best for deterministic or branded messaging

The voice assistant created with the quickstart shows the required code additions to configure this feature as follows:

// <setup_session>
private void setupSession() {
    logger.info("Setting up voice conversation session...");

    // Configure interim responses to bridge latency gaps during processing
    LlmInterimResponseConfig interimResponseConfig = new LlmInterimResponseConfig()
            .setTriggers(Arrays.asList(
                    InterimResponseTrigger.TOOL,
                    InterimResponseTrigger.LATENCY))
            .setLatencyThresholdMs(100)
            .setInstructions("Create friendly interim responses indicating wait time due to "
                    + "ongoing processing, if any. Do not include in all responses! Do not "
                    + "say you don't have real-time access to information when calling tools!");

    // Create session configuration
    VoiceLiveSessionOptions sessionOptions = new VoiceLiveSessionOptions()
            .setModalities(Arrays.asList(InteractionModality.TEXT, InteractionModality.AUDIO))
            .setInputAudioFormat(InputAudioFormat.PCM16)
            .setOutputAudioFormat(OutputAudioFormat.PCM16)
            .setInterimResponse(BinaryData.fromObject(interimResponseConfig));

    // Send session update
    session.sendEvent(new ClientEventSessionUpdate(sessionOptions)).block();
    logger.info("Session configuration sent");
}
// </setup_session>

In this sample, the interim response setup is applied inside BasicVoiceAssistant.setupSession():

LlmInterimResponseConfig defines when interim responses trigger and what style they use.
VoiceLiveSessionOptions attaches that config through the interimResponse field (serialized via BinaryData.fromObject(...)).
session.sendEvent(new ClientEventSessionUpdate(sessionOptions)) sends the session configuration to Voice Live.

Use auto truncation for interrupted responses

This sample currently shows interruption handling with ClientEventResponseCancel during speech start, but it doesn't configure auto_truncate in turn_detection.

Note

For setup details and supported options, see Handle voice interruptions in chat history (preview).

Reconnect to a previous agent conversation

Reconnect to a previous conversation by specifying the conversation ID. This preserves history and context, allowing users to continue where they left off.

When a session connects successfully, Voice Live returns session metadata in the SESSION_UPDATED event. The sample extracts the session ID and logs it to the conversation file:

if (type == ServerEventType.SESSION_UPDATED) {
    logger.info("Session updated and ready");
    sessionReady = true;
    String sessionId = extractField(event, "id");
    writeLog(String.format("SessionID: %s\n", sessionId));

    // Send a proactive greeting
    if (!greetingSent) {
        greetingSent = true;
        sendProactiveGreeting();
    }

    // Start audio capture once session is ready
    try {
        audioProcessor.startCapture();
    } catch (LineUnavailableException e) {
        logger.log(Level.SEVERE, "Failed to start audio capture", e);
    }

In this event handler, the session ID is extracted from the event JSON using extractField(event, "id") and written to the conversation log.

The sample code writes session details to a conversation log file in the logs/ folder (for example, logs/conversation_20260219_143000.log).

To reconnect to that conversation, pass the conversation ID as the CONVERSATION_ID environment variable (or the conversationId parameter):

String conversationId = System.getenv("CONVERSATION_ID");
BasicVoiceAssistant assistant = new BasicVoiceAssistant(
        endpoint, agentName, projectName,
        agentVersion, conversationId,
        foundryResourceOverride, authIdentityClientId);

In this sample, conversation reconnect is applied in three places:

In main(), CONVERSATION_ID is read from the environment (line 512).
The value is passed to the BasicVoiceAssistant(...) constructor (lines 537-540).
In the constructor, the value is set on AgentSessionConfig via config.setConversationId(conversationId).

When a valid conversationId is provided, the agent retrieves the previous conversation context and can reference earlier exchanges in its responses.

Note

Conversation IDs are tied to the agent and project. Attempting to use a conversation ID with a different agent results in a new conversation being created.

Log session metadata for continuity and diagnostics

Log key session metadata, including the session ID, to a timestamped conversation log file under logs/. This helps you:

Identify the session for debugging and support scenarios.
Correlate user-reported behavior with session metadata.
Track runs over time by preserving per-session log files.

The following code creates the log filename and writes session metadata when SESSION_UPDATED is received:

// Conversation log
private static final String LOG_FILENAME = "conversation_"
        + LocalDateTime.now().format(DateTimeFormatter.ofPattern("yyyyMMdd_HHmmss")) + ".log";

        if (type == ServerEventType.SESSION_UPDATED) {
            logger.info("Session updated and ready");
            sessionReady = true;
            String sessionId = extractField(event, "id");
            writeLog(String.format("SessionID: %s\n", sessionId));

            // Send a proactive greeting
            if (!greetingSent) {
                greetingSent = true;
                sendProactiveGreeting();
            }

            // Start audio capture once session is ready
            try {
                audioProcessor.startCapture();
            } catch (LineUnavailableException e) {
                logger.log(Level.SEVERE, "Failed to start audio capture", e);
            }
    private void writeLog(String message) {
        try {
            Path logDir = Paths.get("logs");
            Files.createDirectories(logDir);
            try (PrintWriter writer = new PrintWriter(
                    new FileWriter(logDir.resolve(LOG_FILENAME).toString(), true))) {
                writer.println(message);
            }
        } catch (IOException e) {
            logger.warning("Failed to write conversation log: " + e.getMessage());
        }
    }

In this sample, session metadata logging is applied in three places:

A timestamped conversation log file (conversation_YYYYMMDD_HHmmss.log) is created per run (lines 92–95).
On SESSION_UPDATED, the handler extracts the session ID from the event JSON and writes it to the log (lines 365–366).
writeLog(...) appends entries to the same log file throughout the conversation lifecycle (lines 471–482).

Use the logged session metadata with CONVERSATION_ID to resume the same agent conversation in a later session.

Use the session ID value alongside your conversation ID for diagnostics and reconnect scenarios.

Migrate from Agent Service (classic)

If you're using Voice Live with Agent Service (classic), we recommend you migrate to the new Foundry Agent Service. For general Agent Service migration steps, see Migrate from Agent Service (classic) to Foundry Agent Service.

Voice Live SDK changes

The Voice Live SDK introduces typed configuration classes that replace the raw query parameters used in the classic integration:

Classic (v1)	New (v2)
`agent-id` query parameter	`agent_name` in `AgentConfig` / `AgentSessionConfig`
`agent-project-name` query parameter	Project endpoint in client constructor
`agent-access-token` query parameter	Handled automatically by SDK
Manual `connect()` with query dict	Strongly-typed `AgentSessionConfig` passed to session options

Minimum SDK versions

Language	Package	Minimum version
Python	`azure-ai-voicelive`	1.0.0b5
C#	`Azure.AI.VoiceLive`	1.1.0-beta.2
Java	`azure-ai-voicelive`	1.0.0-beta.5
JavaScript	`@azure/ai-voicelive`	1.0.0-beta.3

Before and after: Python connection setup

Classic (v1) — raw query parameters in connect():

async with connect(
    endpoint=self.endpoint,
    credential=self.credential,
    query={
        "agent-id": self.agent_id,
        "agent-project-name": self.foundry_project_name,
        "agent-access-token": agent_access_token
    },
) as connection:

New (v2) — strongly-typed AgentSessionConfig:

from azure.ai.voicelive import AgentConfig, AgentSessionConfig

agent_config = AgentConfig(agent_name=agent_name)
agent_session_config = AgentSessionConfig(agent_config=agent_config)

session_options = VoiceLiveSessionOptions(
    agent_session_config=agent_session_config,
    # ... other options
)

For complete code examples, see the new agent quickstart. The classic quickstart remains available.

Explore How to add proactive messages
Explore How to improve tool calling and latency wait times
Learn more about How to use the Voice Live API
See the Voice Live API reference

Feedback

Was this page helpful?

Last updated on 2026-02-25

Share via

How to build a voice agent (preview)

Prerequisites

Prepare the environment and create the agent

Agent integration concepts

Agent configuration contract

Authentication model for agent mode

API version pinning

Conversation and trace alignment

Connect to a specific agent version

Connect to an agent on a different Foundry resource

Add a proactive message at session start

Improve tool calling and latency wait times

Use auto truncation for interrupted responses

Reconnect to a previous agent conversation

Log session metadata for continuity and diagnostics

Prerequisites

Prepare the environment and create the agent

Agent integration concepts

Agent configuration contract

Authentication for agent mode

API version pinning

Conversation and trace alignment

Connect to a specific agent version

Connect to an agent on a different Foundry resource

Add a proactive message at session start

Improve tool calling and latency wait times

Use auto truncation for interrupted responses

Reconnect to a previous agent conversation

Log session metadata for continuity and diagnostics

Prerequisites

Prepare the environment and create the agent

Agent integration concepts

Agent configuration contract

Authentication model for agent mode

API version pinning

Conversation and trace alignment

Connect to a specific agent version

Connect to an agent on a different Foundry resource

Add a proactive message at session start

Improving tool calling and latency wait times

Use auto truncation for interrupted responses

Reconnect to a previous agent conversation

Log session metadata for continuity and diagnostics

Prerequisites

Prepare the environment and create the agent

Agent integration concepts

Agent configuration contract

Authentication model for agent mode

API version pinning

Conversation and trace alignment

Connect to a specific agent version

Connect to an agent on a different Foundry resource

Add a proactive message at session start

Improve tool calling and latency wait times

Use auto truncation for interrupted responses

Reconnect to a previous agent conversation

Log session metadata for continuity and diagnostics

Migrate from Agent Service (classic)

Voice Live SDK changes

Minimum SDK versions

Before and after: Python connection setup

Related content

Feedback

Additional resources