Bemærk
Adgang til denne side kræver godkendelse. Du kan prøve at logge på eller ændre mapper.
Adgang til denne side kræver godkendelse. Du kan prøve at ændre mapper.
Important
This feature is in Beta. Account admins can control access to this feature from the Previews page. See Manage Azure Databricks previews.
The Supervisor API simplifies building custom agents on Azure Databricks with support for background mode for long-running tasks. You define the model, tools, and instructions in one request to an OpenResponses-compatible endpoint (POST /mlflow/v1/responses), and Azure Databricks runs the agent loop for you: repeatedly calling the model, selecting and executing tools, and synthesizing a final response.
There are three approaches to build a customized tool-calling agent on Azure Databricks:
- Agent Bricks Supervisor Agent (recommended): Fully declarative with human feedback optimization for highest quality.
- Supervisor API: Build a custom agent programmatically—choose models at runtime, control which tools to use per request, or iterate during development. Also the right choice when you need control over model choice while offloading agent loop management to Azure Databricks.
- AI Gateway unified or native APIs: Write your own agent loop. Azure Databricks provides only the LLM inference layer. Use unified APIs where possible to enable switching models, or provider-specific native APIs (
/openai,/anthropic,/gemini) when porting existing code to Azure Databricks or using provider-specific features.
Requirements
- AI Gateway for LLM endpoints enabled for your account. See Manage Azure Databricks previews.
- Because the Supervisor API runs through AI Gateway (Beta), AI Gateway features such as inference tables, rate limits, and fallbacks apply. Usage tracking is not supported in this beta.
- Store MLflow traces in Unity Catalog enabled for your account. See Manage Azure Databricks previews.
- Stores traces from the Supervisor API agent loop in Unity Catalog tables.
- A Azure Databricks workspace in a supported region.
- Unity Catalog enabled for your workspace. See Enable a workspace for Unity Catalog.
- The tools you pass (Genie spaces, Unity Catalog functions, MCP servers, knowledge assistants, Apps) must already be configured and accessible.
- The
databricks-openaipackage installed:pip install databricks-openai
Step 1: Create a single-turn LLM call
Start with a basic call with no tools. The DatabricksOpenAI client automatically configures the base URL and authentication for your workspace:
from databricks_openai import DatabricksOpenAI
client = DatabricksOpenAI(use_ai_gateway=True)
response = client.responses.create(
model="databricks-claude-sonnet-4-5",
input=[{"type": "message", "role": "user", "content": "Tell me about Databricks"}],
stream=False
)
print(response.output_text)
Step 2: Add hosted tools to run the agent loop
When you include tools in the request, Azure Databricks manages a multi-turn loop on your behalf: the model decides which tools to call, Azure Databricks executes them, feeds the results back to the model, and repeats until the model produces a final answer.
response = client.responses.create(
model="databricks-claude-sonnet-4-5",
input=[{"type": "message", "role": "user", "content": "Summarize recent customer reviews and flag any urgent issues."}],
tools=[
{
"type": "genie_space",
"genie_space": {
"id": "<genie-space-id>",
"description": "Answers customer review questions using SQL"
}
},
{
"type": "uc_function",
"uc_function": {
"name": "<catalog>.<schema>.<function_name>",
"description": "Flags a review as requiring urgent attention"
}
},
{
"type": "knowledge_assistant",
"knowledge_assistant": {
"knowledge_assistant_id": "<knowledge-assistant-id>",
"description": "Answers questions from internal documentation"
}
},
{
"type": "app",
"app": {
"name": "<app-name>",
"description": "Custom application endpoint"
}
},
{
"type": "uc_connection",
"uc_connection": {
"name": "<uc-connection-name>",
"description": "Searches the web for current information"
}
},
],
stream=True
)
for event in response:
print(event)
Step 3: Enable tracing
Pass a trace_destination in the request body to send traces from the agent loop to Unity Catalog tables. Each request generates a trace capturing the full sequence of model calls and tool executions. If you don't set trace_destination, no traces are written. For setup details, see Store MLflow traces in Unity Catalog.
Using the databricks-openai Python client, pass it via extra_body:
response = client.responses.create(
model="databricks-claude-sonnet-4-5",
input=[{"type": "message", "role": "user", "content": "Tell me about Databricks"}],
tools=[...],
extra_body={
"trace_destination": {
"catalog_name": "<catalog>",
"schema_name": "<schema>",
"table_prefix": "<table-prefix>"
}
}
)
To also return the trace directly in the API response, pass "databricks_options": {"return_trace": True} in extra_body.
You can also use MLflow distributed tracing to combine traces from your application code and the Supervisor API agent loop into a single end-to-end trace. Propagate trace context headers using the extra_headers field:
import mlflow
from mlflow.tracing import get_tracing_context_headers_for_http_request
with mlflow.start_span("client-root") as root_span:
root_span.set_inputs({"input": "Tell me about Databricks"})
trace_headers = get_tracing_context_headers_for_http_request()
response = client.responses.create(
model="databricks-claude-sonnet-4-5",
input=[{"type": "message", "role": "user", "content": "Tell me about Databricks"}],
tools=[...],
extra_body={
"trace_destination": {
"catalog_name": "<catalog>",
"schema_name": "<schema>",
"table_prefix": "<table-prefix>"
}
},
extra_headers=trace_headers,
)
Background mode
Background mode enables you to run long-running agent workflows that involve multiple tool calls and complex reasoning without waiting for them to finish synchronously. Submit your request with background=True, receive a response ID immediately, and poll for the result when it's ready. This is especially useful for agents that query multiple data sources or chain several tools together in a single request.
Create a background request
response = client.responses.create(
model="databricks-claude-sonnet-4-5",
input=[{"type": "message", "role": "user", "content": "Tell me about Databricks"}],
tools=[...],
background=True,
)
print(response.id) # Use this ID to poll for the result
print(response.status) # "queued" or "in_progress"
Poll for the result
Use responses.retrieve() to check the status until it reaches a terminal state:
from time import sleep
while response.status in {"queued", "in_progress"}:
sleep(2)
response = client.responses.retrieve(response.id)
print(response.output_text)
Background mode with MCP
For security, the Supervisor API requires explicit user approval before executing any MCP tool call in background mode. When the agent loop selects an MCP tool, the response completes with an mcp_approval_request. You can review the tool name, server label, and arguments the model intends to pass:
{
"type": "mcp_approval_request",
"id": "<tool-call-id>",
"arguments": "{\"query\": \"what is Databricks\", \"count\": 5}",
"name": "you-search",
"server_label": "<server-label>",
"status": "completed"
}
To approve the tool call and continue the agent loop, pass an mcp_approval_response back in the input field with the full conversation history:
{
"type": "mcp_approval_response",
"id": "<tool-call-id>",
"approval_request_id": "<tool-call-id>",
"approve": true
}
Note
Background mode responses are retained in the database for a maximum of 30 days.
Supported tools
You define tools in the tools array of your request. Each entry specifies a type and a configuration object with the same key. For example, a Genie space tool has "type": "genie_space" and a "genie_space": {...} object. The API supports the following tool types:
| Tool type | Description | Scope |
|---|---|---|
genie_space |
Queries a Genie space to answer questions about your data. Parameters: id, description. |
genie |
uc_function |
Calls a Unity Catalog function as an agent tool. Parameters: name, description. |
unity-catalog |
uc_connection |
Connects to an external MCP server through a Unity Catalog connection. Parameters: name, description. Note: custom MCP servers on Apps are not yet supported. |
unity-catalog |
app |
Calls a Azure Databricks App endpoint. Parameters: name, description. |
apps |
knowledge_assistant |
Calls an Knowledge Assistant endpoint. Parameters: knowledge_assistant_id, description. |
model-serving |
Supported parameters
Each request to the Supervisor API accepts the following parameters.
model: one of the following supported models. Change this field to switch providers without changing the rest of your code.- Claude-Haiku-4.5 (
databricks-claude-haiku-4-5) - Claude-Opus-4.1 (
databricks-claude-opus-4-1) - Claude-Opus-4.5 (
databricks-claude-opus-4-5) - Claude-Opus-4.6 (
databricks-claude-opus-4-6) - Claude-Sonnet-4 (
databricks-claude-sonnet-4) - Claude-Sonnet-4.5 (
databricks-claude-sonnet-4-5) - Claude-Sonnet-4.6 (
databricks-claude-sonnet-4-6)
- Claude-Haiku-4.5 (
input: the conversation messages to send.tools: hosted tool definitions (genie_space,uc_function,knowledge_assistant,app,uc_connection).instructions: a system prompt to guide the supervisor's behavior.stream: set totrueto stream responses.background: set totrueto run the request asynchronously. Returns a response ID that you poll withresponses.retrieve(). See Background mode.trace_destination: optional object withcatalog_name,schema_name, andtable_prefixfields. When set, the Supervisor API writes a trace of the full agent loop to the specified Unity Catalog tables. Pass viaextra_bodyin the Python client.
The API doesn't support inference parameters such as temperature. The server manages these internally.
Limitations
The Supervisor API has the following limitations:
- Background mode runtime: Background mode requests have a maximum execution time of 30 minutes.
- Client-side function calling: Only hosted tools are supported. You can't pass
functiontool definitions for the client to execute, and you can't mix hosted tools with client-sidefunctiontools in the same request. - Streaming in background mode:
streamandbackgroundcan't both betruein the same request. - Durable execution: Automatic recovery from failures or interruptions with exactly-once execution guarantees for the agent loop is not supported.
- Azure Databricks Apps OBO not supported: On-behalf-of-user authorization is not supported for the Supervisor API. To use the Supervisor API in Azure Databricks Apps, use system authorization and grant permissions for your tools.