Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
Note
This feature is currently in public preview. This preview is provided without a service-level agreement, and is not recommended for production workloads. Certain features might not be supported or might have constrained capabilities. For more information, see Supplemental Terms of Use for Microsoft Azure Previews.
Note
Foundry agent integration currently only supports agents available on public endpoints. Foundry agents deployed in private VNet aren't supported.
Learn how to use Voice Live with Microsoft Foundry Agent Service using the VoiceLive SDK for Python. This article builds on the Quickstart: Create a Voice Agent with Foundry Agent Service and Voice Live with advanced features and integration options.
Reference documentation | Package (PyPi) | Additional samples on GitHub
Create and run applications to use Voice Live with agents for real-time conversations.
Agents provide several advantages:
- Use centralized configuration in the agent itself instead of session code.
- Handle complex logic and conversational behaviors for easier updates.
- Connect automatically by using your agent ID.
- Support multiple variations without changing client code.
To use Voice Live without Foundry agents, see the Voice Live API quickstart.
Tip
You don't need to deploy an audio model with Microsoft Foundry to use Voice Live. Voice Live is fully managed and automatically deploys the model for you. For model availability, see the Voice Live overview documentation.
Prerequisites
Note
This document refers to the Microsoft Foundry (new) portal and the latest Foundry Agent Service version.
- An Azure subscription. Create one for free.
- Python 3.10 or later version. If you don't have a suitable version of Python installed, you can follow the instructions in the VS Code Python Tutorial for the easiest way of installing Python on your operating system.
- The required language runtimes, global tools, and Visual Studio Code extensions as described in Prepare your development environment.
- A Microsoft Foundry resource created in one of the supported regions. For more information about region availability, see the Voice Live overview documentation.
- A model deployed in Microsoft Foundry. If you don't have a model, first complete Quickstart: Set up Microsoft Foundry resources.
- Assign the
Azure AI Userrole to your user account. You can assign roles in the Azure portal under Access control (IAM) > Add role assignment.
Prepare the environment and create the agent
Complete the Quickstart: Create a Voice Agent with Foundry Agent Service and Voice Live to set up your environment, configure the agent with Voice Live settings, and test your first conversation.
Agent integration concepts
Use these concepts to understand how Voice Live and Foundry Agent Service work together in the Python sample.
Agent configuration contract
Set agent_config in your session setup to identify the target agent and project. At minimum, include agent_name and project_name. Add agent_version when you want to pin behavior to a specific version.
Authentication model for agent mode
Use Microsoft Entra ID credentials for agent mode. Agent invocation in this flow doesn't support key-based authentication, so configure AzureCliCredential (or another Entra token credential) for local development and deployment.
API version pinning
Pin a supported api_version in the client to keep behavior predictable across preview updates. Use the same version consistently across quickstart and how-to samples to avoid schema drift.
Conversation and trace alignment
Treat agent thread and trace records as text-turn history, not exact playback history. If your app allows interruption or truncation, enable truncation-aware handling so persisted history better matches what the user actually heard.
Connect to a specific agent version
Pin your agent to a specific version to enable controlled deployments. This lets production use stable versions while development tests newer iterations.
Set the AGENT_VERSION environment variable or pass the agent_version parameter when initializing the assistant:
Uses the new AgentSessionConfig for strongly-typed agent configuration at connection time.
This sample also demonstrates how to collect a conversation log of user and agent interactions.
"""
# <agent_config>
def __init__(
self,
endpoint: str,
credential: Union[AzureKeyCredential, AsyncTokenCredential],
voice: str,
agent_name: str,
project_name: str,
agent_version: Optional[str] = None,
conversation_id: Optional[str] = None,
foundry_resource_override: Optional[str] = None,
agent_authentication_identity_client_id: Optional[str] = None,
):
self.endpoint = endpoint
self.credential = credential
self.voice = voice
# Build AgentSessionConfig internally
self.agent_config: AgentSessionConfig = {
"agent_name": agent_name,
"agent_version": agent_version if agent_version else None,
self.agent_config.get("project_name"),
self.agent_config.get("agent_version"),
self.agent_config.get("conversation_id"),
self.agent_config.get("foundry_resource_override"),
self.agent_config.get("authentication_identity_client_id")
)
else:
logger.error("ā VoiceLive error: %s", msg)
print(f"Error: {msg}")
elif event.type == ServerEventType.CONVERSATION_ITEM_CREATED:
logger.debug("Conversation item created: %s", event.item.id)
else:
logger.debug("Unhandled event type: %s", event.type)
# </handle_events>
# </voice_assistant>
async def write_conversation_log(message: str) -> None:
"""Write a message to the conversation log."""
log_path = os.path.join(_script_dir, 'logs', logfilename)
await asyncio.to_thread(
lambda: open(log_path, 'a', encoding='utf-8').write(message + "\n")
)
# <main>
def main() -> None:
"""Main function."""
endpoint = os.environ.get("VOICELIVE_ENDPOINT", "")
voice_name = os.environ.get("VOICE_NAME", "en-US-Ava:DragonHDLatestNeural")
agent_name = os.environ.get("AGENT_NAME", "")
agent_version = os.environ.get("AGENT_VERSION")
project_name = os.environ.get("PROJECT_NAME", "")
conversation_id = os.environ.get("CONVERSATION_ID")
foundry_resource_override = os.environ.get("FOUNDRY_RESOURCE_OVERRIDE")
agent_authentication_identity_client_id = os.environ.get("AGENT_AUTHENTICATION_IDENTITY_CLIENT_ID")
print("Environment variables:")
print(f"VOICELIVE_ENDPOINT: {endpoint}")
print(f"VOICE_NAME: {voice_name}")
print(f"AGENT_NAME: {agent_name}")
print(f"AGENT_VERSION: {agent_version}")
print(f"PROJECT_NAME: {project_name}")
In this sample, the version configuration is applied in three places:
- In
main(),AGENT_VERSIONis read from the environment. - In the
BasicVoiceAssistant(...)call,agent_versionis passed into the class constructor. - In
BasicVoiceAssistant.__init__, the value is added toself.agent_config, and then sent to Voice Live viaconnect(..., agent_config=self.agent_config).
The agent_version value corresponds to the version string returned when you create or update an agent using the Foundry Agent SDK. If not specified, Voice Live connects to the latest version of the agent.
Connect to an agent on a different Foundry resource
Configure Voice Live to connect to an agent on a different Foundry resource for audio processing. This is useful when:
- The agent is deployed in a region that has different feature availability
- You want to separate development/staging environments from production
- Your organization uses different resources for different workloads
To connect to an agent on a different resource, configure two additional environment variables:
FOUNDRY_RESOURCE_OVERRIDE: The Foundry resource name hosting the agent project (for example,my-agent-resource).AGENT_AUTHENTICATION_IDENTITY_CLIENT_ID: The managed identity client ID of the Voice Live resource, required for cross-resource authentication.
credential: Union[AzureKeyCredential, AsyncTokenCredential],
voice: str,
agent_name: str,
project_name: str,
agent_version: Optional[str] = None,
conversation_id: Optional[str] = None,
foundry_resource_override: Optional[str] = None,
agent_authentication_identity_client_id: Optional[str] = None,
):
self.endpoint = endpoint
self.credential = credential
self.voice = voice
# Build AgentSessionConfig internally
self.agent_config: AgentSessionConfig = {
"agent_name": agent_name,
"agent_version": agent_version if agent_version else None,
else:
logger.debug("Unhandled event type: %s", event.type)
# </handle_events>
# </voice_assistant>
async def write_conversation_log(message: str) -> None:
"""Write a message to the conversation log."""
log_path = os.path.join(_script_dir, 'logs', logfilename)
await asyncio.to_thread(
lambda: open(log_path, 'a', encoding='utf-8').write(message + "\n")
)
# <main>
def main() -> None:
"""Main function."""
endpoint = os.environ.get("VOICELIVE_ENDPOINT", "")
voice_name = os.environ.get("VOICE_NAME", "en-US-Ava:DragonHDLatestNeural")
agent_name = os.environ.get("AGENT_NAME", "")
agent_version = os.environ.get("AGENT_VERSION")
project_name = os.environ.get("PROJECT_NAME", "")
conversation_id = os.environ.get("CONVERSATION_ID")
foundry_resource_override = os.environ.get("FOUNDRY_RESOURCE_OVERRIDE")
agent_authentication_identity_client_id = os.environ.get("AGENT_AUTHENTICATION_IDENTITY_CLIENT_ID")
print("Environment variables:")
print(f"VOICELIVE_ENDPOINT: {endpoint}")
print(f"VOICE_NAME: {voice_name}")
print(f"AGENT_NAME: {agent_name}")
print(f"AGENT_VERSION: {agent_version}")
print(f"PROJECT_NAME: {project_name}")
print(f"CONVERSATION_ID: {conversation_id}")
print(f"FOUNDRY_RESOURCE_OVERRIDE: {foundry_resource_override}")
self.agent_config.get("project_name"),
self.agent_config.get("agent_version"),
self.agent_config.get("conversation_id"),
self.agent_config.get("foundry_resource_override"),
self.agent_config.get("authentication_identity_client_id")
)
This configuration is resolved in main() and then applied when the assistant is created:
FOUNDRY_RESOURCE_OVERRIDEandAGENT_AUTHENTICATION_IDENTITY_CLIENT_IDare read from environment variables.- Both values are passed to
BasicVoiceAssistant(...). - In
BasicVoiceAssistant.__init__, the values are added toself.agent_config, which is sent inconnect(..., agent_config=self.agent_config).
Important
Cross-resource connections require proper role assignments. Ensure the Voice Live resource's managed identity has the Azure AI User role on the target agent resource.
Add a proactive message at session start
Send a proactive message to initiate conversations as soon as the session is ready. This sample checks a one-time flag in the SESSION_UPDATED event handler, sends a greeting prompt, and triggers a response.
}
except Exception:
logger.exception("Error processing events")
raise
# </process_events>
# <handle_events>
async def _handle_event(self, event: Any) -> None:
"""Handle different types of events from VoiceLive."""
logger.debug("Received event: %s", event.type)
ap = self.audio_processor
conn = self.connection
if ap is None or conn is None:
raise RuntimeError("AudioProcessor and Connection must be initialized")
if event.type == ServerEventType.SESSION_UPDATED:
# <session_updated_metadata>
logger.info("Session ready: %s", event.session.id)
s, a, v = event.session, event.session.agent, event.session.voice
await write_conversation_log("\n".join([
f"SessionID: {s.id}", f"Agent Name: {a.name}",
f"Agent Description: {a.description}", f"Agent ID: {a.agent_id}",
f"Voice Name: {v['name']}", f"Voice Type: {v['type']}",
f"Voice Temperature: {v['temperature']}", ""
]))
# </session_updated_metadata>
self.session_ready = True
# <proactive_greeting>
# Invoke Proactive greeting
if not self.greeting_sent:
self.greeting_sent = True
logger.info("Sending proactive greeting request")
In this sample, proactive messaging is applied in three steps:
self.greeting_sent = Falseinitializes one-time greeting state.- In the
SESSION_UPDATEDbranch,if not self.greeting_sent:gates proactive execution to run once per session. conn.conversation.item.create(...)adds the greeting instruction to conversation context, andconn.response.create()generates spoken output.
Improve tool calling and latency wait times
Use Voice Live's interim_response feature to bridge wait times during tool calling or when generating agent responses with high latency.
This feature supports two modes:
- LlmInterimResponseConfig: LLM-generated interim response - best for dynamic and adaptive starts
- InterimResponseTrigger: Pre-generated interim response - best for deterministic or branded messaging
The voice-live-agents-quickstart.py created with the quickstart shows the required code additions to configure this feature as follows:
from azure.ai.voicelive.aio import connect, AgentSessionConfig
from azure.ai.voicelive.models import (
InputAudioFormat,
Modality,
OutputAudioFormat,
RequestSession,
ServerEventType,
MessageItem,
InputTextContentPart,
LlmInterimResponseConfig,
InterimResponseTrigger,
AzureStandardVoice,
AudioNoiseReduction,
AudioEchoCancellation,
AzureSemanticVadMultilingual
# Process events
await self._process_events()
finally:
if self.audio_processor:
self.audio_processor.shutdown()
# </start_session>
# <setup_session>
async def _setup_session(self) -> None:
"""Configure the VoiceLive session for audio conversation."""
logger.info("Setting up voice conversation session...")
# Set up interim response configuration to bridge latency gaps during processing
interim_response_config = LlmInterimResponseConfig(
triggers=[InterimResponseTrigger.TOOL, InterimResponseTrigger.LATENCY],
latency_threshold_ms=100,
instructions="""Create friendly interim responses indicating wait time due to ongoing processing, if any. Do not include
in all responses! Do not say you don't have real-time access to information when calling tools!"""
)
# Create session configuration
session_config = RequestSession(
modalities=[Modality.TEXT, Modality.AUDIO],
In this sample, the interim response setup is applied inside BasicVoiceAssistant._setup_session():
LlmInterimResponseConfig(...)defines when interim responses trigger and what style they use.RequestSession(...)attaches that config through theinterim_responsefield.conn.session.update(session=session_config)sends the session configuration to Voice Live.
Use auto truncation for interrupted responses
When users interrupt agent audio, conversation text can drift from what users actually heard. Auto truncation helps keep session context aligned with delivered audio, which improves follow-up response quality after barge-in and keeps voice conversation history logging more accurate.
This sample currently shows interruption handling with response.cancel() during speech start, but it doesn't configure auto_truncate in turn_detection.
Note
In Foundry Agent Service, thread messages and tracing agent threads are based on text content in the thread. Without auto truncation, those records can differ from the exact portion of audio the user actually heard before interruption.
For setup details and supported options, see Handle voice interruptions in chat history (preview).
Reconnect to a previous agent conversation
Reconnect to a previous conversation by specifying the conversation ID. This preserves history and context, allowing users to continue where they left off.
Voice Live returns session metadata in the SESSION_UPDATED event when a session connects successfully:
logger.exception("Error processing events")
raise
# </process_events>
# <handle_events>
async def _handle_event(self, event: Any) -> None:
"""Handle different types of events from VoiceLive."""
logger.debug("Received event: %s", event.type)
ap = self.audio_processor
In this event handler, session and agent metadata is logged when the session is ready.
The sample code automatically writes session details to a conversation log file in the logs/ folder (for example, logs/2026-02-19_14-30-00_conversation.log). You can retrieve the session ID from this file after running a session.
To reconnect to that conversation, pass the conversation ID as the CONVERSATION_ID environment variable (or the conversation_id parameter):
credential: Union[AzureKeyCredential, AsyncTokenCredential],
# Build AgentSessionConfig internally
else:
In this sample, conversation reconnect is applied in three places:
- In
main(),CONVERSATION_IDis read from the environment. - In the
BasicVoiceAssistant(...)call,conversation_idis passed into the class constructor. - In
BasicVoiceAssistant.__init__, the value is assigned intoself.agent_configasconversation_id.
When a valid conversation_id is provided, the agent retrieves the previous conversation context and can reference earlier exchanges in its responses.
Note
Conversation IDs are tied to the agent and project. Attempting to use a conversation ID with a different agent results in a new conversation being created.
Log session metadata for continuity and diagnostics
Log key session metadata, including the session ID, to a timestamped conversation log file under logs/. This helps you:
- Identify the session for debugging and support scenarios.
- Correlate user-reported behavior with session metadata.
- Track runs over time by preserving per-session log files.
The following code creates the log filename and writes session metadata when SESSION_UPDATED is received:
load_dotenv(os.path.join(_script_dir, './.env'), override=True)
# Set up logging
## Add folder for logging
os.makedirs(os.path.join(_script_dir, 'logs'), exist_ok=True)
## Add timestamp for logfiles
# </process_events>
# <handle_events>
async def _handle_event(self, event: Any) -> None:
"""Handle different types of events from VoiceLive."""
logger.debug("Received event: %s", event.type)
ap = self.audio_processor
self._active_response = False
self._response_api_done = True
elif event.type == ServerEventType.ERROR:
msg = event.error.message
if "Cancellation failed: no active response" in msg:
In this sample, session metadata logging is applied in three places:
- A timestamped conversation log file is created per run.
- On
SESSION_UPDATED, metadata including session ID, agent name, and voice configuration is appended. write_conversation_log(...)appends entries to the same file throughout the conversation lifecycle.
Use the logged session metadata with CONVERSATION_ID to resume the same agent conversation in a later session.
In this article, you'll learn how to use Voice Live with Microsoft Foundry Agent Service using the VoiceLive SDK for C#. This article extends the Quickstart: Create a Voice Agent with Foundry Agent Service and Voice Live with more details on features and integration options.
Reference documentation | Package (NuGet) | Additional samples on GitHub
Create and run applications to use Voice Live with agents for real-time conversations.
Agents provide several advantages:
- Use centralized configuration in the agent itself instead of session code.
- Handle complex logic and conversational behaviors for easier updates.
- Connect automatically by using your agent ID.
- Support multiple variations without changing client code.
To use Voice Live without Foundry agents, see the Voice Live API quickstart.
Tip
You don't need to deploy an audio model with Microsoft Foundry to use Voice Live. Voice Live is fully managed and automatically deploys the model for you. For model availability, see the Voice Live overview documentation.
Prerequisites
Note
This guide refers to the Microsoft Foundry (new) portal and the latest Foundry Agent Service version.
- An Azure subscription. Create one for free.
- .NET 8.0 SDK or later.
- The required language runtimes, global tools, and Visual Studio Code extensions. See Prepare your development environment.
- A Microsoft Foundry resource created in a supported region. See Voice Live overview documentation for region availability.
- A deployed model in Microsoft Foundry. If you don't have one, first complete Quickstart: Set up Microsoft Foundry resources.
- The
Azure AI Userrole assigned to your user account. Assign roles in the Azure portal under Access control (IAM) > Add role assignment.
Prepare the environment and create the agent
Complete the Quickstart: Create a Voice Agent with Foundry Agent Service and Voice Live to prepare your environment, set up the agent with Voice Live settings, and run your first test.
Agent integration concepts
These concepts help you understand how Voice Live and Foundry Agent Service work together in the C# sample.
Agent configuration contract
Set AgentSessionConfig in your session setup to identify the target agent and project. Include at minimum agentName and projectName. Add AgentVersion when you want to pin behavior to a specific version.
Authentication for agent mode
Use Microsoft Entra ID credentials for agent mode. Agent invocation doesn't support key-based authentication, so configure AzureCliCredential (or another Entra token credential) for local development and deployment.
API version pinning
Use a consistent SDK version (Azure.AI.VoiceLive 1.1.0-beta.3) in your project file. Consistent versioning keeps behavior predictable across preview updates and avoids schema drift.
Conversation and trace alignment
Treat agent thread and trace records as text-turn history, not exact playback history. If your app allows interruption or truncation, enable truncation-aware handling. This ensures persisted history better matches what users actually heard.
Connect to a specific agent version
Voice Live lets you connect to a specific version of your agent. This enables controlled deployments where production uses a stable version while development tests newer iterations.
To connect to a specific agent version, set the AGENT_VERSION environment variable or pass the agentVersion parameter when initializing the assistant:
// <agent_config>
public BasicVoiceAssistant(string endpoint, string agentName, string projectName,
string? agentVersion = null, string? conversationId = null,
string? foundryResourceOverride = null, string? authIdentityClientId = null)
{
_endpoint = endpoint;
// Build the agent session configuration
var config = new AgentSessionConfig(agentName, projectName);
if (!string.IsNullOrEmpty(agentVersion))
{
config.AgentVersion = agentVersion;
}
if (!string.IsNullOrEmpty(conversationId))
{
config.ConversationId = conversationId;
}
if (!string.IsNullOrEmpty(foundryResourceOverride))
{
config.FoundryResourceOverride = foundryResourceOverride;
if (!string.IsNullOrEmpty(authIdentityClientId))
{
config.AuthenticationIdentityClientId = authIdentityClientId;
}
}
_agentConfig = config;
}
// </agent_config>
// <main>
class Program
{
static async Task Main(string[] args)
{
var endpoint = Environment.GetEnvironmentVariable("VOICELIVE_ENDPOINT");
var agentName = Environment.GetEnvironmentVariable("AGENT_NAME");
var projectName = Environment.GetEnvironmentVariable("PROJECT_NAME");
var agentVersion = Environment.GetEnvironmentVariable("AGENT_VERSION");
var conversationId = Environment.GetEnvironmentVariable("CONVERSATION_ID");
var foundryResourceOverride = Environment.GetEnvironmentVariable("FOUNDRY_RESOURCE_OVERRIDE");
var authIdentityClientId = Environment.GetEnvironmentVariable("AGENT_AUTHENTICATION_IDENTITY_CLIENT_ID");
Console.WriteLine("Environment variables:");
Console.WriteLine($"VOICELIVE_ENDPOINT: {endpoint}");
Console.WriteLine($"AGENT_NAME: {agentName}");
Console.WriteLine($"PROJECT_NAME: {projectName}");
Console.WriteLine($"AGENT_VERSION: {agentVersion}");
Console.WriteLine($"CONVERSATION_ID: {conversationId}");
Console.WriteLine($"FOUNDRY_RESOURCE_OVERRIDE: {foundryResourceOverride}");
if (string.IsNullOrEmpty(endpoint) || string.IsNullOrEmpty(agentName)
|| string.IsNullOrEmpty(projectName))
{
Console.Error.WriteLine("Set VOICELIVE_ENDPOINT, AGENT_NAME, and PROJECT_NAME environment variables.");
return;
}
// Verify audio devices
CheckAudioDevices();
Console.WriteLine("šļø Basic Foundry Voice Agent with Azure VoiceLive SDK (Agent Mode)");
Console.WriteLine(new string('=', 65));
using var assistant = new BasicVoiceAssistant(
endpoint, agentName, projectName,
agentVersion, conversationId,
foundryResourceOverride, authIdentityClientId);
// Handle graceful shutdown
using var cts = new CancellationTokenSource();
Console.CancelKeyPress += (sender, e) =>
{
e.Cancel = true;
cts.Cancel();
};
try
{
await assistant.StartAsync(cts.Token);
}
catch (OperationCanceledException)
{
Console.WriteLine("\nš Voice assistant shut down. Goodbye!");
}
catch (Exception ex)
{
Console.Error.WriteLine($"Fatal Error: {ex.Message}");
}
The version configuration is applied in three places:
- In
Main(), readAGENT_VERSIONfrom the environment. - Pass
agentVersionto theBasicVoiceAssistant(...)constructor. - In the constructor, set the value on
AgentSessionConfigviaconfig.AgentVersion. Send it to Voice Live viaStartSessionAsync(SessionTarget.FromAgent(agentConfig)).
The agentVersion value corresponds to the version string returned when you create or update an agent using the Foundry Agent SDK. If not specified, Voice Live connects to the latest agent version.
Connect to an agent on a different Foundry resource
Configure Voice Live to connect to an agent hosted on a different Foundry resource than the one used for audio processing.
This is useful in these scenarios:
- The agent is deployed in a region with different feature availability.
- You want to separate development and staging from production.
- Your organization uses different resources for different workloads.
To connect to an agent on a different resource, configure two environment variables:
FOUNDRY_RESOURCE_OVERRIDE: The Foundry resource name hosting the agent project (for example,my-agent-resource).AGENT_AUTHENTICATION_IDENTITY_CLIENT_ID: The managed identity client ID of the Voice Live resource, required for cross-resource authentication.
// <agent_config>
public BasicVoiceAssistant(string endpoint, string agentName, string projectName,
string? agentVersion = null, string? conversationId = null,
string? foundryResourceOverride = null, string? authIdentityClientId = null)
{
_endpoint = endpoint;
// Build the agent session configuration
var config = new AgentSessionConfig(agentName, projectName);
if (!string.IsNullOrEmpty(agentVersion))
{
config.AgentVersion = agentVersion;
}
if (!string.IsNullOrEmpty(conversationId))
{
config.ConversationId = conversationId;
}
if (!string.IsNullOrEmpty(foundryResourceOverride))
{
config.FoundryResourceOverride = foundryResourceOverride;
if (!string.IsNullOrEmpty(authIdentityClientId))
{
config.AuthenticationIdentityClientId = authIdentityClientId;
}
}
_agentConfig = config;
}
// </agent_config>
// <main>
class Program
{
static async Task Main(string[] args)
{
var endpoint = Environment.GetEnvironmentVariable("VOICELIVE_ENDPOINT");
var agentName = Environment.GetEnvironmentVariable("AGENT_NAME");
var projectName = Environment.GetEnvironmentVariable("PROJECT_NAME");
var agentVersion = Environment.GetEnvironmentVariable("AGENT_VERSION");
var conversationId = Environment.GetEnvironmentVariable("CONVERSATION_ID");
var foundryResourceOverride = Environment.GetEnvironmentVariable("FOUNDRY_RESOURCE_OVERRIDE");
var authIdentityClientId = Environment.GetEnvironmentVariable("AGENT_AUTHENTICATION_IDENTITY_CLIENT_ID");
Console.WriteLine("Environment variables:");
Console.WriteLine($"VOICELIVE_ENDPOINT: {endpoint}");
Console.WriteLine($"AGENT_NAME: {agentName}");
Console.WriteLine($"PROJECT_NAME: {projectName}");
Console.WriteLine($"AGENT_VERSION: {agentVersion}");
Console.WriteLine($"CONVERSATION_ID: {conversationId}");
Console.WriteLine($"FOUNDRY_RESOURCE_OVERRIDE: {foundryResourceOverride}");
if (string.IsNullOrEmpty(endpoint) || string.IsNullOrEmpty(agentName)
|| string.IsNullOrEmpty(projectName))
{
Console.Error.WriteLine("Set VOICELIVE_ENDPOINT, AGENT_NAME, and PROJECT_NAME environment variables.");
return;
}
// Verify audio devices
CheckAudioDevices();
Console.WriteLine("šļø Basic Foundry Voice Agent with Azure VoiceLive SDK (Agent Mode)");
Console.WriteLine(new string('=', 65));
using var assistant = new BasicVoiceAssistant(
endpoint, agentName, projectName,
agentVersion, conversationId,
foundryResourceOverride, authIdentityClientId);
// Handle graceful shutdown
using var cts = new CancellationTokenSource();
Console.CancelKeyPress += (sender, e) =>
{
e.Cancel = true;
cts.Cancel();
};
try
{
await assistant.StartAsync(cts.Token);
}
catch (OperationCanceledException)
{
Console.WriteLine("\nš Voice assistant shut down. Goodbye!");
}
catch (Exception ex)
{
Console.Error.WriteLine($"Fatal Error: {ex.Message}");
}
The configuration is resolved in Main() and applied when the assistant is created:
- Read
FOUNDRY_RESOURCE_OVERRIDEandAGENT_AUTHENTICATION_IDENTITY_CLIENT_IDfrom environment variables. - Pass both values to the
BasicVoiceAssistant(...)constructor. - In the constructor, set both values on
AgentSessionConfigviaconfig.FoundryResourceOverrideandconfig.AuthenticationIdentityClientId. Send them inStartSessionAsync(SessionTarget.FromAgent(agentConfig)).
Important
Cross-resource connections require proper role assignments. Ensure the Voice Live resource's managed identity has the Azure AI User role on the target agent resource.
Add a proactive message at session start
Send a proactive message to initiate conversations when the session is ready. The assistant checks a one-time flag in the SessionUpdateSessionUpdated event handler, sends a greeting prompt, and triggers a response.
// <proactive_greeting>
private async Task SendProactiveGreetingAsync(CancellationToken cancellationToken)
{
Console.WriteLine("Sending proactive greeting request");
try
{
// Create a system message to trigger greeting
await _session!.SendCommandAsync(
BinaryData.FromObjectAsJson(new
{
type = "conversation.item.create",
item = new
{
type = "message",
role = "system",
content = new[]
{
new { type = "input_text", text = "Say something to welcome the user in English." }
}
}
}), cancellationToken).ConfigureAwait(false);
// Request a response
await _session!.SendCommandAsync(
BinaryData.FromObjectAsJson(new { type = "response.create" }),
cancellationToken).ConfigureAwait(false);
}
catch (Exception ex)
{
Console.Error.WriteLine($"Failed to send proactive greeting: {ex.Message}");
}
}
// </proactive_greeting>
Proactive messaging is applied in three steps:
_greetingSentis aboolinitialized tofalseto track one-time greeting state.- In the
SessionUpdateSessionUpdatedbranch,if (!_greetingSent)gates execution to run once per session. SendCommandAsync(...)with aconversation.item.createpayload adds the greeting to conversation context. Aresponse.createcommand generates spoken output.
Improve tool calling and latency wait times
Voice Live offers InterimResponse to bridge wait times during tool calling or when generating responses with high latency.
The feature supports two modes:
LlmInterimResponseConfig: LLM-generated interim responseābest for dynamic starts.InterimResponseTrigger: Pre-generated interim responseābest for deterministic or branded messaging.
The quickstart voice assistant shows the required code additions:
// <setup_session>
private async Task SetupSessionAsync(CancellationToken cancellationToken)
{
Console.WriteLine("Setting up voice conversation session...");
// Create session configuration with interim response to bridge latency gaps
var interimConfig = new LlmInterimResponseConfig
{
Instructions = "Create friendly interim responses indicating wait time due to "
+ "ongoing processing, if any. Do not include in all responses! Do not "
+ "say you don't have real-time access to information when calling tools!",
};
interimConfig.Triggers.Add(InterimResponseTrigger.Tool);
interimConfig.Triggers.Add(InterimResponseTrigger.Latency);
interimConfig.LatencyThresholdMs = 100;
var options = new VoiceLiveSessionOptions
{
InputAudioFormat = InputAudioFormat.Pcm16,
OutputAudioFormat = OutputAudioFormat.Pcm16,
InterimResponse = BinaryData.FromObjectAsJson(interimConfig)
};
// Send session configuration
await _session!.ConfigureSessionAsync(options, cancellationToken).ConfigureAwait(false);
Console.WriteLine("Session configuration sent");
}
// </setup_session>
The interim response setup is applied inside SetupSessionAsync():
- A
LlmInterimResponseConfigis created with custom instructions and triggers forToolandLatencyevents. - The config is serialized via
BinaryData.FromObjectAsJson()and assigned toVoiceLiveSessionOptions.InterimResponse. ConfigureSessionAsync(options)sends the complete session configurationāincluding interim responseāto Voice Live.
Use auto truncation for interrupted responses
When users interrupt agent audio, conversation text can drift from what users actually heard. Auto truncation keeps session context aligned with delivered audio. This improves follow-up responses after barge-in and keeps conversation logging more accurate.
The sample currently shows interruption handling with CancelResponseAsync() during speech start, but it doesn't configure auto_truncate in turn_detection.
Note
In Foundry Agent Service, thread messages and trace records are based on text content. Without auto truncation, these records can differ from the exact portion of audio users heard before interruption.
See Handle voice interruptions in chat history (preview) for setup details and supported options.
Reconnect to a previous agent conversation
Reconnect to a previous conversation by specifying the conversation ID. This preserves history and context, allowing users to continue where they left off.
When a session connects successfully, Voice Live returns session metadata in the SessionUpdateSessionUpdated event. Extract the session ID and log it to the conversation file:
// <handle_events>
private async Task HandleEventAsync(SessionUpdate serverEvent, CancellationToken cancellationToken)
{
switch (serverEvent)
{
case SessionUpdateSessionUpdated sessionUpdated:
Console.WriteLine("Session updated and ready");
var sessionId = sessionUpdated.Session?.Id;
WriteLog($"SessionID: {sessionId}\n");
// Send a proactive greeting
if (!_greetingSent)
{
_greetingSent = true;
await SendProactiveGreetingAsync(cancellationToken).ConfigureAwait(false);
}
// Start audio capture once session is ready
_audioProcessor?.StartCapture();
break;
In this event handler, the session ID is extracted from sessionUpdated.Session?.Id and written to the conversation log.
The sample writes session details to a conversation log file in the logs/ folder (for example, logs/conversation_20260219_143000.log).
To reconnect, pass the conversation ID as the CONVERSATION_ID environment variable or the conversationId parameter:
var conversationId = Environment.GetEnvironmentVariable("CONVERSATION_ID");
using var assistant = new BasicVoiceAssistant(
endpoint, agentName, projectName,
agentVersion, conversationId,
foundryResourceOverride, authIdentityClientId);
Conversation reconnect is applied in three places:
- In
Main(), readCONVERSATION_IDfrom the environment (line 458). - Pass the value to the
BasicVoiceAssistant(...)constructor (lines 483-486). - In the constructor, set the value on
AgentSessionConfigviaconfig.ConversationId.
When a valid conversationId is provided, the agent retrieves the previous conversation context and can reference earlier exchanges.
Note
Conversation IDs are tied to the agent and project. Using a conversation ID with a different agent creates a new conversation.
Log session metadata for continuity and diagnostics
The sample logs key session metadata, including the session ID, to a timestamped conversation log file under logs/. This helps you:
- Identify the session for debugging and support.
- Correlate user-reported behavior with session metadata.
- Track runs over time by preserving per-session log files.
The following code creates the log filename and writes session metadata when SessionUpdateSessionUpdated fires:
private static readonly string LogFilename = $"conversation_{DateTime.Now:yyyyMMdd_HHmmss}.log";
// <handle_events>
private async Task HandleEventAsync(SessionUpdate serverEvent, CancellationToken cancellationToken)
{
switch (serverEvent)
{
case SessionUpdateSessionUpdated sessionUpdated:
Console.WriteLine("Session updated and ready");
var sessionId = sessionUpdated.Session?.Id;
WriteLog($"SessionID: {sessionId}\n");
// Send a proactive greeting
if (!_greetingSent)
{
_greetingSent = true;
await SendProactiveGreetingAsync(cancellationToken).ConfigureAwait(false);
}
// Start audio capture once session is ready
_audioProcessor?.StartCapture();
break;
private static void WriteLog(string message)
{
try
{
var logDir = Path.Combine(Directory.GetCurrentDirectory(), "logs");
Directory.CreateDirectory(logDir);
File.AppendAllText(Path.Combine(logDir, LogFilename), message + Environment.NewLine);
}
catch (IOException ex)
{
Console.Error.WriteLine($"Failed to write conversation log: {ex.Message}");
}
}
Session metadata logging is applied in three places:
- A timestamped conversation log file (
conversation_YYYYMMDD_HHmmss.log) is created per run (lines 188ā189). - On
SessionUpdateSessionUpdated, the handler extracts the session ID and writes it to the log (lines 309ā310). WriteLog(...)appends entries throughout the conversation lifecycle (lines 427ā439).
Use the logged session metadata with CONVERSATION_ID to resume the same agent conversation later. Use the session ID alongside your conversation ID for diagnostics and reconnect scenarios.
Learn how to use Voice Live with Microsoft Foundry Agent Service using the VoiceLive SDK for JavaScript. This article builds on the Quickstart: Create a Voice Agent with Foundry Agent Service and Voice Live with advanced features and integration options.
Reference documentation | Package (npm) | Additional samples on GitHub
Create and run applications to use Voice Live with agents for real-time conversations.
Agents provide several advantages:
- Use centralized configuration in the agent itself instead of session code.
- Handle complex logic and conversational behaviors for easier updates.
- Connect automatically by using your agent ID.
- Support multiple variations without changing client code.
To use Voice Live without Foundry agents, see the Voice Live API quickstart.
Tip
You don't need to deploy an audio model with Microsoft Foundry to use Voice Live. Voice Live is fully managed and automatically deploys the model for you. For model availability, see the Voice Live overview documentation.
Note
The JavaScript Voice Live SDK is designed for browser-based applications with built-in WebSocket and Web Audio support. This how-to guide uses Node.js with node-record-lpcm16 and speaker for a console experience. For a full browser-based voice UI, see the Voice Live universal assistant sample.
Prerequisites
Note
This document refers to the Microsoft Foundry (new) portal and the latest Foundry Agent Service version.
- An Azure subscription. Create one for free.
- Node.js version 18 or later.
- SoX installed on your system (required by
node-record-lpcm16for microphone capture). - The required language runtimes, global tools, and Visual Studio Code extensions as described in Prepare your development environment.
- A Microsoft Foundry resource created in one of the supported regions. For more information about region availability, see the Voice Live overview documentation.
- A model deployed in Microsoft Foundry. If you don't have a model, first complete Quickstart: Set up Microsoft Foundry resources.
- Assign the
Azure AI Userrole to your user account. You can assign roles in the Azure portal under Access control (IAM) > Add role assignment.
Prepare the environment and create the agent
Complete the Quickstart: Create a Voice Agent with Foundry Agent Service and Voice Live to set up your environment, configure the agent with Voice Live settings, and test your first conversation.
Agent integration concepts
Use these concepts to understand how Voice Live and Foundry Agent Service work together in the JavaScript sample.
Agent configuration contract
Set the agent property with an AgentSessionConfig object in your createSession(...) call to identify the target agent and project. At minimum, include agentName and projectName. Add agentVersion when you want to pin behavior to a specific version.
Authentication model for agent mode
Use Microsoft Entra ID credentials for agent mode. Agent invocation in this flow doesn't support key-based authentication, so configure DefaultAzureCredential (or another Entra token credential) for local development and deployment.
API version pinning
Use a consistent SDK version (@azure/ai-voicelive@1.0.0-beta.3) in your package.json to keep behavior predictable across preview updates. Use the same version consistently across quickstart and how-to samples to avoid schema drift.
Conversation and trace alignment
Treat agent thread and trace records as text-turn history, not exact playback history. If your app allows interruption or truncation, enable truncation-aware handling so persisted history better matches what the user actually heard.
Connect to a specific agent version
Pin your agent to a specific version to enable controlled deployments. This lets production use stable versions while development tests newer iterations.
Set the AGENT_VERSION environment variable or pass the agentVersion property when initializing the assistant:
* @param {string} [opts.greetingText]
* @param {boolean} [opts.noAudio]
*/
// <agent_config>
constructor(opts) {
this.endpoint = opts.endpoint;
this.credential = opts.credential;
this.greetingText = opts.greetingText;
this.noAudio = opts.noAudio;
this.agentConfig = {
agentName: opts.agentName,
projectName: opts.projectName,
...(opts.agentVersion && { agentVersion: opts.agentVersion }),
...(opts.conversationId && { conversationId: opts.conversationId }),
...(opts.foundryResourceOverride && {
foundryResourceOverride: opts.foundryResourceOverride,
}),
...(opts.foundryResourceOverride &&
opts.authenticationIdentityClientId && {
authenticationIdentityClientId: opts.authenticationIdentityClientId,
}),
};
this._session = null;
this._audio = new AudioProcessor(!opts.noAudio, opts.audioInputDevice);
console.log("");
console.log("Options:");
console.log(" --endpoint <url> VoiceLive endpoint URL");
console.log(" --agent-name <name> Foundry agent name");
console.log(" --project-name <name> Foundry project name");
console.log(" --agent-version <ver> Agent version");
console.log(" --conversation-id <id> Conversation ID to resume");
console.log(" --foundry-resource <name> Foundry resource override");
console.log(" --auth-client-id <id> Authentication identity client ID");
console.log(" --audio-input-device <name> Explicit SoX input device name (Windows)");
console.log(" --list-audio-devices List available audio input devices and exit");
console.log(" --greeting-text <text> Send a pre-defined greeting instead of LLM-generated");
console.log(" --no-audio Connect and configure session without mic/speaker");
console.log(" -h, --help Show this help text");
}
console.log(` CONVERSATION_ID: ${args.conversationId ?? "(not set)"}`);
console.log(
` FOUNDRY_RESOURCE_OVERRIDE: ${args.foundryResourceOverride ?? "(not set)"}`,
);
console.log(
` AGENT_AUTHENTICATION_IDENTITY_CLIENT_ID: ${args.authenticationIdentityClientId ?? "(not set)"}`,
);
console.log(` AUDIO_INPUT_DEVICE: ${args.audioInputDevice ?? "(not set)"}`);
if (args.greetingText) {
console.log(` Proactive greeting: pre-defined`);
} else {
console.log(` Proactive greeting: LLM-generated (default)`);
}
In this sample, the version configuration is applied in three places:
- In
main(),AGENT_VERSIONis read fromprocess.env. - In the
BasicVoiceAssistantconstructor,agentVersionis spread into theagentConfigobject. - The config is passed to
client.createSession({ agent: this.agentConfig }), which sends it to Voice Live.
The agentVersion value corresponds to the version string returned when you create or update an agent using the Foundry Agent SDK. If not specified, Voice Live connects to the latest version of the agent.
Connect to an agent on a different Foundry resource
Configure Voice Live to connect to an agent on a different Foundry resource for audio processing. This is useful when:
- The agent is deployed in a region that has different feature availability
- You want to separate development/staging environments from production
- Your organization uses different resources for different workloads
To connect to an agent on a different resource, configure two additional environment variables:
FOUNDRY_RESOURCE_OVERRIDE: The Foundry resource name hosting the agent project (for example,my-agent-resource).AGENT_AUTHENTICATION_IDENTITY_CLIENT_ID: The managed identity client ID of the Voice Live resource, required for cross-resource authentication.
* @param {string} [opts.greetingText]
* @param {boolean} [opts.noAudio]
*/
// <agent_config>
constructor(opts) {
this.endpoint = opts.endpoint;
this.credential = opts.credential;
this.greetingText = opts.greetingText;
this.noAudio = opts.noAudio;
this.agentConfig = {
agentName: opts.agentName,
projectName: opts.projectName,
...(opts.agentVersion && { agentVersion: opts.agentVersion }),
...(opts.conversationId && { conversationId: opts.conversationId }),
...(opts.foundryResourceOverride && {
foundryResourceOverride: opts.foundryResourceOverride,
}),
...(opts.foundryResourceOverride &&
opts.authenticationIdentityClientId && {
authenticationIdentityClientId: opts.authenticationIdentityClientId,
}),
};
this._session = null;
this._audio = new AudioProcessor(!opts.noAudio, opts.audioInputDevice);
console.log("");
console.log("Options:");
console.log(" --endpoint <url> VoiceLive endpoint URL");
console.log(" --agent-name <name> Foundry agent name");
console.log(" --project-name <name> Foundry project name");
console.log(" --agent-version <ver> Agent version");
console.log(" --conversation-id <id> Conversation ID to resume");
console.log(" --foundry-resource <name> Foundry resource override");
console.log(" --auth-client-id <id> Authentication identity client ID");
console.log(" --audio-input-device <name> Explicit SoX input device name (Windows)");
console.log(" --list-audio-devices List available audio input devices and exit");
console.log(" --greeting-text <text> Send a pre-defined greeting instead of LLM-generated");
console.log(" --no-audio Connect and configure session without mic/speaker");
console.log(" -h, --help Show this help text");
}
console.log(` CONVERSATION_ID: ${args.conversationId ?? "(not set)"}`);
console.log(
` FOUNDRY_RESOURCE_OVERRIDE: ${args.foundryResourceOverride ?? "(not set)"}`,
);
console.log(
` AGENT_AUTHENTICATION_IDENTITY_CLIENT_ID: ${args.authenticationIdentityClientId ?? "(not set)"}`,
);
console.log(` AUDIO_INPUT_DEVICE: ${args.audioInputDevice ?? "(not set)"}`);
if (args.greetingText) {
console.log(` Proactive greeting: pre-defined`);
} else {
console.log(` Proactive greeting: LLM-generated (default)`);
}
This configuration is resolved in main() and then applied when the assistant is created:
FOUNDRY_RESOURCE_OVERRIDEandAGENT_AUTHENTICATION_IDENTITY_CLIENT_IDare read fromprocess.env.- Both values are spread into the constructor options.
- In the constructor, the values are conditionally set on the
agentConfigobject, which is sent inclient.createSession({ agent: this.agentConfig }).
Important
Cross-resource connections require proper role assignments. Ensure the Voice Live resource's managed identity has the Azure AI User role on the target agent resource.
Add a proactive message at session start
Voice Live can initiate the conversation by sending a proactive message as soon as the session is ready. In this sample, the assistant checks a one-time flag in the onSessionUpdated handler, sends a greeting prompt, and then triggers a response.
await session.dispose();
} catch {
// ignore dispose errors during shutdown
}
}
// </start_session>
// <proactive_greeting>
/**
* Send a proactive greeting when the session starts.
* Supports pre-defined (--greeting-text) or LLM-generated (default).
*/
async _sendProactiveGreeting() {
const session = this._session;
if (this.greetingText) {
// Pre-generated assistant message (deterministic)
console.log("[session] Sending pre-generated greeting ...");
try {
await session.sendEvent({
type: "response.create",
response: {
preGeneratedAssistantMessage: {
content: [{ type: "text", text: this.greetingText }],
},
},
});
} catch (err) {
console.error("[session] Failed to send pre-generated greeting:", err.message);
}
} else {
// LLM-generated greeting (default)
console.log("[session] Sending proactive greeting ...");
try {
await session.addConversationItem({
type: "message",
role: "system",
content: [
In this sample, proactive messaging is applied in three steps:
_greetingSentis abooleaninitialized tofalseto track one-time greeting state.- In the
onSessionUpdatedhandler,if (!this._greetingSent)gates proactive execution to run once per session. session.addConversationItem(...)adds the greeting instruction to conversation context, andsession.sendEvent({ type: "response.create" })generates spoken output.
Improving tool calling and latency wait times
Voice Live provides a feature called interimResponse to bridge wait times when tool calling is required or a high latency is experienced to generate an agent response.
The voice assistant created with the quickstart shows the required code additions to configure this feature as follows:
text: "Say something to welcome the user in English.",
},
],
});
await session.sendEvent({ type: "response.create" });
} catch (err) {
console.error("[session] Failed to send greeting:", err.message);
}
}
}
// </proactive_greeting>
// <setup_session>
/** Configure session modalities, audio format, and interim response. */
async _setupSession() {
console.log("[session] Configuring session ...");
await this._session.updateSession({
In this sample, the interim response setup is applied inside _setupSession():
interimResponsedefines when interim responses trigger and what style they use.session.updateSession(...)sends the session configuration to Voice Live, including the interim response settings.
Use auto truncation for interrupted responses
When users interrupt agent audio, conversation text can drift from what users actually heard. Auto truncation helps keep session context aligned with delivered audio, which improves follow-up response quality after barge-in and keeps voice conversation history logging more accurate.
This sample currently shows interruption handling with response.cancel during speech start, but it doesn't configure auto_truncate in turn_detection.
Note
In Foundry Agent Service, thread messages and tracing agent threads are based on text content in the thread. Without auto truncation, those records can differ from the exact portion of audio the user actually heard before interruption.
For setup details and supported options, see Handle voice interruptions in chat history (preview).
Reconnect to a previous agent conversation
Voice Live enables you to reconnect to a previous conversation by specifying the conversation ID. This preserves the conversation history and context, allowing users to continue where they left off.
When a session connects successfully, Voice Live returns session metadata in the onSessionUpdated handler. The sample extracts the session ID from the context and logs it to the conversation file:
`for project "${this.agentConfig.projectName}" ...`,
);
// Subscribe to VoiceLive events BEFORE connecting, so the
// SESSION_UPDATED event is not missed.
// <handle_events>
const subscription = session.subscribe({
// <session_updated_metadata>
onSessionUpdated: async (event, context) => {
const s = event.session;
const agent = s?.agent;
const voice = s?.voice;
console.log(`[session] Session ready: ${context.sessionId}`);
writeConversationLog(
[
`SessionID: ${context.sessionId}`,
`Agent Name: ${agent?.name ?? ""}`,
In this event handler, the session ID is extracted from context.sessionId and written to the conversation log along with agent metadata.
The sample code writes session details to a conversation log file in the logs/ folder (for example, logs/conversation_20260219_143000.log).
To reconnect to that conversation, pass the conversation ID as the CONVERSATION_ID environment variable (or the conversationId property):
console.log(" --conversation-id <id> Conversation ID to resume");
);
In this sample, conversation reconnect is applied in three places:
- In
main(),CONVERSATION_IDis read fromprocess.env(line 542). - The value is passed to the
BasicVoiceAssistantconstructor. - In the constructor,
conversationIdis conditionally spread into theagentConfigobject.
When a valid conversationId is provided, the agent retrieves the previous conversation context and can reference earlier exchanges in its responses.
Note
Conversation IDs are tied to the agent and project. Attempting to use a conversation ID with a different agent results in a new conversation being created.
Log session metadata for continuity and diagnostics
The sample logs key session metadata, including the session ID, to a timestamped conversation log file under logs/. This helps you:
- Identify the session for debugging and support scenarios.
- Correlate user-reported behavior with session metadata.
- Track runs over time by preserving per-session log files.
The following code creates the log filename and writes session metadata when onSessionUpdated is received:
// ---------------------------------------------------------------------------
const logsDir = join(__dirname, "logs");
if (!existsSync(logsDir)) mkdirSync(logsDir, { recursive: true });
const timestamp = new Date()
.toISOString()
.replace(/[:.]/g, "-")
.replace("T", "_")
.slice(0, 19);
const conversationLogFile = join(logsDir, `conversation_${timestamp}.log`);
function writeConversationLog(message) {
appendFileSync(conversationLogFile, message + "\n", "utf-8");
}
`for project "${this.agentConfig.projectName}" ...`,
);
// Subscribe to VoiceLive events BEFORE connecting, so the
// SESSION_UPDATED event is not missed.
// <handle_events>
const subscription = session.subscribe({
// <session_updated_metadata>
onSessionUpdated: async (event, context) => {
const s = event.session;
const agent = s?.agent;
const voice = s?.voice;
console.log(`[session] Session ready: ${context.sessionId}`);
writeConversationLog(
[
`SessionID: ${context.sessionId}`,
`Agent Name: ${agent?.name ?? ""}`,
In this sample, session metadata logging is applied in three places:
- A
logs/directory is created if it doesn't exist, and a timestamped conversation log file (conversation_YYYYMMDD_HHmmss.log) is created per run (lines 20ā28). - On
onSessionUpdated, the handler extracts the session ID fromcontext.sessionIdand writes it along with agent metadata to the log (lines 302ā305). writeConversationLog(...)appends entries to the same log file throughout the conversation lifecycle (lines 30ā33).
Use the logged session metadata with CONVERSATION_ID to resume the same agent conversation in a later session.
Use the session ID value alongside your conversation ID for diagnostics and reconnect scenarios.
Learn how to use Voice Live with Microsoft Foundry Agent Service using the VoiceLive SDK for Java. This article builds on the Quickstart: Create a Voice Agent with Foundry Agent Service and Voice Live with advanced features and integration options.
Reference documentation | Package (Maven) | Additional samples on GitHub
Create and run applications to use Voice Live with agents for real-time conversations.
Agents provide several advantages:
- Use centralized configuration in the agent itself instead of session code.
- Handle complex logic and conversational behaviors for easier updates.
- Connect automatically by using your agent ID.
- Support multiple variations without changing client code.
To use Voice Live without Foundry agents, see the Voice Live API quickstart.
Tip
You don't need to deploy an audio model with Microsoft Foundry to use Voice Live. Voice Live is fully managed and automatically deploys the model for you. For model availability, see the Voice Live overview documentation.
Prerequisites
Note
This document refers to the Microsoft Foundry (new) portal and the latest Foundry Agent Service version.
- An Azure subscription. Create one for free.
- Java Development Kit (JDK) version 11 or later.
- Apache Maven installed.
- The required language runtimes, global tools, and Visual Studio Code extensions as described in Prepare your development environment.
- A Microsoft Foundry resource created in one of the supported regions. For more information about region availability, see the Voice Live overview documentation.
- A model deployed in Microsoft Foundry. If you don't have a model, first complete Quickstart: Set up Microsoft Foundry resources.
- Assign the
Azure AI Userrole to your user account. You can assign roles in the Azure portal under Access control (IAM) > Add role assignment.
Prepare the environment and create the agent
Complete the Quickstart: Create a Voice Agent with Foundry Agent Service and Voice Live to set up your environment, configure the agent with Voice Live settings, and test your first conversation.
Agent integration concepts
Use these concepts to understand how Voice Live and Foundry Agent Service work together in the Java sample.
Agent configuration contract
Set AgentSessionConfig in your session setup to identify the target agent and project. At minimum, include agentName and projectName. Add agentVersion when you want to pin behavior to a specific version.
Authentication model for agent mode
Use Microsoft Entra ID credentials for agent mode. Agent invocation in this flow doesn't support key-based authentication, so configure AzureCliCredential (or another Entra token credential) for local development and deployment.
API version pinning
Use a consistent SDK version (azure-ai-voicelive:1.0.0-beta.5) in the Maven POM to keep behavior predictable across preview updates. Use the same version consistently across quickstart and how-to samples to avoid schema drift.
Conversation and trace alignment
Treat agent thread and trace records as text-turn history, not exact playback history. If your app allows interruption or truncation, enable truncation-aware handling so persisted history better matches what the user actually heard.
Connect to a specific agent version
Pin your agent to a specific version to enable controlled deployments. This lets production use stable versions while development tests newer iterations.
Set the AGENT_VERSION environment variable or pass the agentVersion parameter when initializing the assistant:
// <agent_config>
BasicVoiceAssistant(String endpoint, String agentName, String projectName,
String agentVersion, String conversationId,
String foundryResourceOverride, String authIdentityClientId) {
this.endpoint = endpoint;
// Build the agent session configuration
AgentSessionConfig config = new AgentSessionConfig(agentName, projectName);
if (agentVersion != null && !agentVersion.isEmpty()) {
config.setAgentVersion(agentVersion);
}
if (conversationId != null && !conversationId.isEmpty()) {
config.setConversationId(conversationId);
}
if (foundryResourceOverride != null && !foundryResourceOverride.isEmpty()) {
config.setFoundryResourceOverride(foundryResourceOverride);
if (authIdentityClientId != null && !authIdentityClientId.isEmpty()) {
config.setAuthenticationIdentityClientId(authIdentityClientId);
}
}
this.agentConfig = config;
}
// </agent_config>
// <main>
public static void main(String[] args) {
String endpoint = System.getenv("VOICELIVE_ENDPOINT");
String agentName = System.getenv("AGENT_NAME");
String projectName = System.getenv("PROJECT_NAME");
String agentVersion = System.getenv("AGENT_VERSION");
String conversationId = System.getenv("CONVERSATION_ID");
String foundryResourceOverride = System.getenv("FOUNDRY_RESOURCE_OVERRIDE");
String authIdentityClientId = System.getenv("AGENT_AUTHENTICATION_IDENTITY_CLIENT_ID");
System.out.println("Environment variables:");
System.out.println("VOICELIVE_ENDPOINT: " + endpoint);
System.out.println("AGENT_NAME: " + agentName);
System.out.println("PROJECT_NAME: " + projectName);
System.out.println("AGENT_VERSION: " + agentVersion);
System.out.println("CONVERSATION_ID: " + conversationId);
System.out.println("FOUNDRY_RESOURCE_OVERRIDE: " + foundryResourceOverride);
if (endpoint == null || endpoint.isEmpty()
|| agentName == null || agentName.isEmpty()
|| projectName == null || projectName.isEmpty()) {
System.err.println("Set VOICELIVE_ENDPOINT, AGENT_NAME, and PROJECT_NAME environment variables.");
System.exit(1);
}
// Verify audio devices
checkAudioDevices();
System.out.println("šļø Basic Foundry Voice Agent with Azure VoiceLive SDK (Agent Mode)");
System.out.println("=".repeat(65));
BasicVoiceAssistant assistant = new BasicVoiceAssistant(
endpoint, agentName, projectName,
agentVersion, conversationId,
foundryResourceOverride, authIdentityClientId);
// Handle graceful shutdown
Runtime.getRuntime().addShutdownHook(new Thread(() -> {
System.out.println("\nš Voice assistant shut down. Goodbye!");
}));
try {
assistant.start();
} catch (InterruptedException e) {
Thread.currentThread().interrupt();
System.out.println("\nš Voice assistant shut down. Goodbye!");
} catch (Exception e) {
System.err.println("Fatal Error: " + e.getMessage());
e.printStackTrace();
}
}
// </main>
In this sample, the version configuration is applied in three places:
- In
main(),AGENT_VERSIONis read from the environment. - In the
BasicVoiceAssistant(...)constructor,agentVersionis passed in. - In the constructor, the value is set on
AgentSessionConfigviaconfig.setAgentVersion(agentVersion), and then sent to Voice Live viaclient.startSession(agentConfig).
The agentVersion value corresponds to the version string returned when you create or update an agent using the Foundry Agent SDK. If not specified, Voice Live connects to the latest version of the agent.
Connect to an agent on a different Foundry resource
Configure Voice Live to connect to an agent on a different Foundry resource for audio processing. This is useful when:
- The agent is deployed in a region that has different feature availability
- You want to separate development/staging environments from production
- Your organization uses different resources for different workloads
To connect to an agent on a different resource, configure two additional environment variables:
FOUNDRY_RESOURCE_OVERRIDE: The Foundry resource name hosting the agent project (for example,my-agent-resource).AGENT_AUTHENTICATION_IDENTITY_CLIENT_ID: The managed identity client ID of the Voice Live resource, required for cross-resource authentication.
// <agent_config>
BasicVoiceAssistant(String endpoint, String agentName, String projectName,
String agentVersion, String conversationId,
String foundryResourceOverride, String authIdentityClientId) {
this.endpoint = endpoint;
// Build the agent session configuration
AgentSessionConfig config = new AgentSessionConfig(agentName, projectName);
if (agentVersion != null && !agentVersion.isEmpty()) {
config.setAgentVersion(agentVersion);
}
if (conversationId != null && !conversationId.isEmpty()) {
config.setConversationId(conversationId);
}
if (foundryResourceOverride != null && !foundryResourceOverride.isEmpty()) {
config.setFoundryResourceOverride(foundryResourceOverride);
if (authIdentityClientId != null && !authIdentityClientId.isEmpty()) {
config.setAuthenticationIdentityClientId(authIdentityClientId);
}
}
this.agentConfig = config;
}
// </agent_config>
// <main>
public static void main(String[] args) {
String endpoint = System.getenv("VOICELIVE_ENDPOINT");
String agentName = System.getenv("AGENT_NAME");
String projectName = System.getenv("PROJECT_NAME");
String agentVersion = System.getenv("AGENT_VERSION");
String conversationId = System.getenv("CONVERSATION_ID");
String foundryResourceOverride = System.getenv("FOUNDRY_RESOURCE_OVERRIDE");
String authIdentityClientId = System.getenv("AGENT_AUTHENTICATION_IDENTITY_CLIENT_ID");
System.out.println("Environment variables:");
System.out.println("VOICELIVE_ENDPOINT: " + endpoint);
System.out.println("AGENT_NAME: " + agentName);
System.out.println("PROJECT_NAME: " + projectName);
System.out.println("AGENT_VERSION: " + agentVersion);
System.out.println("CONVERSATION_ID: " + conversationId);
System.out.println("FOUNDRY_RESOURCE_OVERRIDE: " + foundryResourceOverride);
if (endpoint == null || endpoint.isEmpty()
|| agentName == null || agentName.isEmpty()
|| projectName == null || projectName.isEmpty()) {
System.err.println("Set VOICELIVE_ENDPOINT, AGENT_NAME, and PROJECT_NAME environment variables.");
System.exit(1);
}
// Verify audio devices
checkAudioDevices();
System.out.println("šļø Basic Foundry Voice Agent with Azure VoiceLive SDK (Agent Mode)");
System.out.println("=".repeat(65));
BasicVoiceAssistant assistant = new BasicVoiceAssistant(
endpoint, agentName, projectName,
agentVersion, conversationId,
foundryResourceOverride, authIdentityClientId);
// Handle graceful shutdown
Runtime.getRuntime().addShutdownHook(new Thread(() -> {
System.out.println("\nš Voice assistant shut down. Goodbye!");
}));
try {
assistant.start();
} catch (InterruptedException e) {
Thread.currentThread().interrupt();
System.out.println("\nš Voice assistant shut down. Goodbye!");
} catch (Exception e) {
System.err.println("Fatal Error: " + e.getMessage());
e.printStackTrace();
}
}
// </main>
This configuration is resolved in main() and then applied when the assistant is created:
FOUNDRY_RESOURCE_OVERRIDEandAGENT_AUTHENTICATION_IDENTITY_CLIENT_IDare read from environment variables.- Both values are passed to the
BasicVoiceAssistant(...)constructor. - In the constructor, the values are set on
AgentSessionConfigviaconfig.setFoundryResourceOverride(...)andconfig.setAuthenticationIdentityClientId(...), which is sent inclient.startSession(agentConfig).
Important
Cross-resource connections require proper role assignments. Ensure the Voice Live resource's managed identity has the Azure AI User role on the target agent resource.
Add a proactive message at session start
Send a proactive message to initiate conversations as soon as the session is ready. This sample checks a one-time flag in the SESSION_UPDATED event handler, sends a greeting prompt, and triggers a response.
// <proactive_greeting>
private void sendProactiveGreeting() {
logger.info("Sending proactive greeting request");
try {
// Create a system message to trigger greeting
SystemMessageItem greetingMessage = new SystemMessageItem(
Arrays.asList(new InputTextContentPart("Say something to welcome the user in English.")));
ClientEventConversationItemCreate createEvent = new ClientEventConversationItemCreate()
.setItem(greetingMessage);
session.sendEvent(createEvent).block();
// Request a response
session.sendEvent(new ClientEventResponseCreate()).block();
} catch (Exception e) {
logger.log(Level.WARNING, "Failed to send proactive greeting", e);
}
}
// </proactive_greeting>
In this sample, proactive messaging is applied in three steps:
greetingSentis abooleaninitialized tofalseto track one-time greeting state.- In the
SESSION_UPDATEDbranch,if (!greetingSent)gates proactive execution to run once per session. sendEvent(new ClientEventConversationItemCreate()...)adds the greeting instruction to conversation context, andsendEvent(new ClientEventResponseCreate())generates spoken output.
Improve tool calling and latency wait times
Use Voice Live's interimResponse feature to bridge wait times during tool calling or when generating agent responses with high latency.
This feature supports two modes:
LlmInterimResponseConfig: LLM-generated interim response - best for dynamic and adaptive startsInterimResponseTrigger: Pre-generated interim response - best for deterministic or branded messaging
The voice assistant created with the quickstart shows the required code additions to configure this feature as follows:
// <setup_session>
private void setupSession() {
logger.info("Setting up voice conversation session...");
// Configure interim responses to bridge latency gaps during processing
LlmInterimResponseConfig interimResponseConfig = new LlmInterimResponseConfig()
.setTriggers(Arrays.asList(
InterimResponseTrigger.TOOL,
InterimResponseTrigger.LATENCY))
.setLatencyThresholdMs(100)
.setInstructions("Create friendly interim responses indicating wait time due to "
+ "ongoing processing, if any. Do not include in all responses! Do not "
+ "say you don't have real-time access to information when calling tools!");
// Create session configuration
VoiceLiveSessionOptions sessionOptions = new VoiceLiveSessionOptions()
.setModalities(Arrays.asList(InteractionModality.TEXT, InteractionModality.AUDIO))
.setInputAudioFormat(InputAudioFormat.PCM16)
.setOutputAudioFormat(OutputAudioFormat.PCM16)
.setInterimResponse(BinaryData.fromObject(interimResponseConfig));
// Send session update
session.sendEvent(new ClientEventSessionUpdate(sessionOptions)).block();
logger.info("Session configuration sent");
}
// </setup_session>
In this sample, the interim response setup is applied inside BasicVoiceAssistant.setupSession():
LlmInterimResponseConfigdefines when interim responses trigger and what style they use.VoiceLiveSessionOptionsattaches that config through theinterimResponsefield (serialized viaBinaryData.fromObject(...)).session.sendEvent(new ClientEventSessionUpdate(sessionOptions))sends the session configuration to Voice Live.
Use auto truncation for interrupted responses
When users interrupt agent audio, conversation text can drift from what users actually heard. Auto truncation helps keep session context aligned with delivered audio, which improves follow-up response quality after barge-in and keeps voice conversation history logging more accurate.
This sample currently shows interruption handling with ClientEventResponseCancel during speech start, but it doesn't configure auto_truncate in turn_detection.
Note
In Foundry Agent Service, thread messages and tracing agent threads are based on text content in the thread. Without auto truncation, those records can differ from the exact portion of audio the user actually heard before interruption.
For setup details and supported options, see Handle voice interruptions in chat history (preview).
Reconnect to a previous agent conversation
Reconnect to a previous conversation by specifying the conversation ID. This preserves history and context, allowing users to continue where they left off.
When a session connects successfully, Voice Live returns session metadata in the SESSION_UPDATED event. The sample extracts the session ID and logs it to the conversation file:
if (type == ServerEventType.SESSION_UPDATED) {
logger.info("Session updated and ready");
sessionReady = true;
String sessionId = extractField(event, "id");
writeLog(String.format("SessionID: %s\n", sessionId));
// Send a proactive greeting
if (!greetingSent) {
greetingSent = true;
sendProactiveGreeting();
}
// Start audio capture once session is ready
try {
audioProcessor.startCapture();
} catch (LineUnavailableException e) {
logger.log(Level.SEVERE, "Failed to start audio capture", e);
}
In this event handler, the session ID is extracted from the event JSON using extractField(event, "id") and written to the conversation log.
The sample code writes session details to a conversation log file in the logs/ folder (for example, logs/conversation_20260219_143000.log).
To reconnect to that conversation, pass the conversation ID as the CONVERSATION_ID environment variable (or the conversationId parameter):
String conversationId = System.getenv("CONVERSATION_ID");
BasicVoiceAssistant assistant = new BasicVoiceAssistant(
endpoint, agentName, projectName,
agentVersion, conversationId,
foundryResourceOverride, authIdentityClientId);
In this sample, conversation reconnect is applied in three places:
- In
main(),CONVERSATION_IDis read from the environment (line 512). - The value is passed to the
BasicVoiceAssistant(...)constructor (lines 537-540). - In the constructor, the value is set on
AgentSessionConfigviaconfig.setConversationId(conversationId).
When a valid conversationId is provided, the agent retrieves the previous conversation context and can reference earlier exchanges in its responses.
Note
Conversation IDs are tied to the agent and project. Attempting to use a conversation ID with a different agent results in a new conversation being created.
Log session metadata for continuity and diagnostics
Log key session metadata, including the session ID, to a timestamped conversation log file under logs/. This helps you:
- Identify the session for debugging and support scenarios.
- Correlate user-reported behavior with session metadata.
- Track runs over time by preserving per-session log files.
The following code creates the log filename and writes session metadata when SESSION_UPDATED is received:
// Conversation log
private static final String LOG_FILENAME = "conversation_"
+ LocalDateTime.now().format(DateTimeFormatter.ofPattern("yyyyMMdd_HHmmss")) + ".log";
if (type == ServerEventType.SESSION_UPDATED) {
logger.info("Session updated and ready");
sessionReady = true;
String sessionId = extractField(event, "id");
writeLog(String.format("SessionID: %s\n", sessionId));
// Send a proactive greeting
if (!greetingSent) {
greetingSent = true;
sendProactiveGreeting();
}
// Start audio capture once session is ready
try {
audioProcessor.startCapture();
} catch (LineUnavailableException e) {
logger.log(Level.SEVERE, "Failed to start audio capture", e);
}
private void writeLog(String message) {
try {
Path logDir = Paths.get("logs");
Files.createDirectories(logDir);
try (PrintWriter writer = new PrintWriter(
new FileWriter(logDir.resolve(LOG_FILENAME).toString(), true))) {
writer.println(message);
}
} catch (IOException e) {
logger.warning("Failed to write conversation log: " + e.getMessage());
}
}
In this sample, session metadata logging is applied in three places:
- A timestamped conversation log file (
conversation_YYYYMMDD_HHmmss.log) is created per run (lines 92ā95). - On
SESSION_UPDATED, the handler extracts the session ID from the event JSON and writes it to the log (lines 365ā366). writeLog(...)appends entries to the same log file throughout the conversation lifecycle (lines 471ā482).
Use the logged session metadata with CONVERSATION_ID to resume the same agent conversation in a later session.
Use the session ID value alongside your conversation ID for diagnostics and reconnect scenarios.
Migrate from Agent Service (classic)
If you're using Voice Live with Agent Service (classic), we recommend you migrate to the new Foundry Agent Service. For general Agent Service migration steps, see Migrate from Agent Service (classic) to Foundry Agent Service.
Voice Live SDK changes
The Voice Live SDK introduces typed configuration classes that replace the raw query parameters used in the classic integration:
| Classic (v1) | New (v2) |
|---|---|
agent-id query parameter |
agent_name in AgentConfig / AgentSessionConfig |
agent-project-name query parameter |
Project endpoint in client constructor |
agent-access-token query parameter |
Handled automatically by SDK |
Manual connect() with query dict |
Strongly-typed AgentSessionConfig passed to session options |
Minimum SDK versions
| Language | Package | Minimum version |
|---|---|---|
| Python | azure-ai-voicelive |
1.0.0b5 |
| C# | Azure.AI.VoiceLive |
1.1.0-beta.2 |
| Java | azure-ai-voicelive |
1.0.0-beta.5 |
| JavaScript | @azure/ai-voicelive |
1.0.0-beta.3 |
Before and after: Python connection setup
Classic (v1) ā raw query parameters in connect():
async with connect(
endpoint=self.endpoint,
credential=self.credential,
query={
"agent-id": self.agent_id,
"agent-project-name": self.foundry_project_name,
"agent-access-token": agent_access_token
},
) as connection:
New (v2) ā strongly-typed AgentSessionConfig:
from azure.ai.voicelive import AgentConfig, AgentSessionConfig
agent_config = AgentConfig(agent_name=agent_name)
agent_session_config = AgentSessionConfig(agent_config=agent_config)
session_options = VoiceLiveSessionOptions(
agent_session_config=agent_session_config,
# ... other options
)
For complete code examples, see the new agent quickstart. The classic quickstart remains available.
Related content
- Explore How to add proactive messages
- Explore How to improve tool calling and latency wait times
- Learn more about How to use the Voice Live API
- See the Voice Live API reference