Supervisor API (Beta)

Important

This feature is in Beta. Account admins can control access to this feature from the Previews page. See Manage Azure Databricks previews.

The Supervisor API simplifies building custom agents on Azure Databricks with support for background mode for long-running tasks. You define the model, tools, and instructions in one request to an OpenResponses-compatible endpoint (POST /mlflow/v1/responses), and Azure Databricks runs the agent loop for you: repeatedly calling the model, selecting and executing tools, and synthesizing a final response.

There are three approaches to build a customized tool-calling agent on Azure Databricks:

Agent Bricks Supervisor Agent (recommended): Fully declarative with human feedback optimization for highest quality.
Supervisor API: Build a custom agent programmatically—choose models at runtime, control which tools to use per request, or iterate during development. Also the right choice when you need control over model choice while offloading agent loop management to Azure Databricks.
AI Gateway unified or native APIs: Write your own agent loop. Azure Databricks provides only the LLM inference layer. Use unified APIs where possible to enable switching models, or provider-specific native APIs (/openai, /anthropic, /gemini) when porting existing code to Azure Databricks or using provider-specific features.

Requirements

AI Gateway for LLM endpoints enabled for your account. See Manage Azure Databricks previews.
- Because the Supervisor API runs through AI Gateway (Beta), AI Gateway features such as inference tables, rate limits, and fallbacks apply. Usage tracking is not supported in this beta.
Store MLflow traces in Unity Catalog enabled for your account. See Manage Azure Databricks previews.
- Stores traces from the Supervisor API agent loop in Unity Catalog tables.
A Azure Databricks workspace in a supported region.
Unity Catalog enabled for your workspace. See Enable a workspace for Unity Catalog.
The tools you pass (Genie spaces, Unity Catalog functions, MCP servers, knowledge assistants, Apps) must already be configured and accessible.
The databricks-openai package installed: pip install databricks-openai

Step 1: Create a single-turn LLM call

Start with a basic call with no tools. The DatabricksOpenAI client automatically configures the base URL and authentication for your workspace:

from databricks_openai import DatabricksOpenAI

client = DatabricksOpenAI(use_ai_gateway=True)

response = client.responses.create(
  model="databricks-claude-sonnet-4-5",
  input=[{"type": "message", "role": "user", "content": "Tell me about Databricks"}],
  stream=False
)

print(response.output_text)

Step 2: Add hosted tools to run the agent loop

When you include tools in the request, Azure Databricks manages a multi-turn loop on your behalf: the model decides which tools to call, Azure Databricks executes them, feeds the results back to the model, and repeats until the model produces a final answer.

response = client.responses.create(
  model="databricks-claude-sonnet-4-5",
  input=[{"type": "message", "role": "user", "content": "Summarize recent customer reviews and flag any urgent issues."}],
  tools=[
    {
      "type": "genie_space",
      "genie_space": {
        "id": "<genie-space-id>",
        "description": "Answers customer review questions using SQL"
      }
    },
    {
      "type": "uc_function",
      "uc_function": {
        "name": "<catalog>.<schema>.<function_name>",
        "description": "Flags a review as requiring urgent attention"
      }
    },
    {
      "type": "knowledge_assistant",
      "knowledge_assistant": {
        "knowledge_assistant_id": "<knowledge-assistant-id>",
        "description": "Answers questions from internal documentation"
      }
    },
    {
      "type": "app",
      "app": {
        "name": "<app-name>",
        "description": "Custom application endpoint"
      }
    },
    {
      "type": "uc_connection",
      "uc_connection": {
        "name": "<uc-connection-name>",
        "description": "Searches the web for current information"
      }
    },
  ],
  stream=True
)

for event in response:
  print(event)

Step 3: Enable tracing

Pass a trace_destination in the request body to send traces from the agent loop to Unity Catalog tables. Each request generates a trace capturing the full sequence of model calls and tool executions. If you don't set trace_destination, no traces are written. For setup details, see Store MLflow traces in Unity Catalog.

Using the databricks-openai Python client, pass it via extra_body:

response = client.responses.create(
  model="databricks-claude-sonnet-4-5",
  input=[{"type": "message", "role": "user", "content": "Tell me about Databricks"}],
  tools=[...],
  extra_body={
    "trace_destination": {
      "catalog_name": "<catalog>",
      "schema_name": "<schema>",
      "table_prefix": "<table-prefix>"
    }
  }
)

To also return the trace directly in the API response, pass "databricks_options": {"return_trace": True} in extra_body.

You can also use MLflow distributed tracing to combine traces from your application code and the Supervisor API agent loop into a single end-to-end trace. Propagate trace context headers using the extra_headers field:

import mlflow
from mlflow.tracing import get_tracing_context_headers_for_http_request

with mlflow.start_span("client-root") as root_span:
  root_span.set_inputs({"input": "Tell me about Databricks"})

  trace_headers = get_tracing_context_headers_for_http_request()

  response = client.responses.create(
    model="databricks-claude-sonnet-4-5",
    input=[{"type": "message", "role": "user", "content": "Tell me about Databricks"}],
    tools=[...],
    extra_body={
      "trace_destination": {
        "catalog_name": "<catalog>",
        "schema_name": "<schema>",
        "table_prefix": "<table-prefix>"
      }
    },
    extra_headers=trace_headers,
  )

Background mode

Background mode enables you to run long-running agent workflows that involve multiple tool calls and complex reasoning without waiting for them to finish synchronously. Submit your request with background=True, receive a response ID immediately, and poll for the result when it's ready. This is especially useful for agents that query multiple data sources or chain several tools together in a single request.

Create a background request

response = client.responses.create(
  model="databricks-claude-sonnet-4-5",
  input=[{"type": "message", "role": "user", "content": "Tell me about Databricks"}],
  tools=[...],
  background=True,
)

print(response.id)     # Use this ID to poll for the result
print(response.status) # "queued" or "in_progress"

Poll for the result

Use responses.retrieve() to check the status until it reaches a terminal state:

from time import sleep

while response.status in {"queued", "in_progress"}:
  sleep(2)
  response = client.responses.retrieve(response.id)

print(response.output_text)

Background mode with MCP

For security, the Supervisor API requires explicit user approval before executing any MCP tool call in background mode. When the agent loop selects an MCP tool, the response completes with an mcp_approval_request. You can review the tool name, server label, and arguments the model intends to pass:

{
  "type": "mcp_approval_request",
  "id": "<tool-call-id>",
  "arguments": "{\"query\": \"what is Databricks\", \"count\": 5}",
  "name": "you-search",
  "server_label": "<server-label>",
  "status": "completed"
}

To approve the tool call and continue the agent loop, pass an mcp_approval_response back in the input field with the full conversation history:

{
  "type": "mcp_approval_response",
  "id": "<tool-call-id>",
  "approval_request_id": "<tool-call-id>",
  "approve": true
}

Note

Background mode responses are retained in the database for a maximum of 30 days.

Supported tools

You define tools in the tools array of your request. Each entry specifies a type and a configuration object with the same key. For example, a Genie space tool has "type": "genie_space" and a "genie_space": {...} object. The API supports the following tool types:

Tool type	Description	Scope
`genie_space`	Queries a Genie space to answer questions about your data. Parameters: `id`, `description`.	`genie`
`uc_function`	Calls a Unity Catalog function as an agent tool. Parameters: `name`, `description`.	`unity-catalog`
`uc_connection`	Connects to an external MCP server through a Unity Catalog connection. Parameters: `name`, `description`. Note: custom MCP servers on Apps are not yet supported.	`unity-catalog`
`app`	Calls a Azure Databricks App endpoint. Parameters: `name`, `description`.	`apps`
`knowledge_assistant`	Calls an Knowledge Assistant endpoint. Parameters: `knowledge_assistant_id`, `description`.	`model-serving`

Supported parameters

Each request to the Supervisor API accepts the following parameters.

model: one of the following supported models. Change this field to switch providers without changing the rest of your code.
- Claude-Haiku-4.5 (databricks-claude-haiku-4-5)
- Claude-Opus-4.1 (databricks-claude-opus-4-1)
- Claude-Opus-4.5 (databricks-claude-opus-4-5)
- Claude-Opus-4.6 (databricks-claude-opus-4-6)
- Claude-Sonnet-4 (databricks-claude-sonnet-4)
- Claude-Sonnet-4.5 (databricks-claude-sonnet-4-5)
- Claude-Sonnet-4.6 (databricks-claude-sonnet-4-6)

input: the conversation messages to send.
tools: hosted tool definitions (genie_space, uc_function, knowledge_assistant, app, uc_connection).
instructions: a system prompt to guide the supervisor's behavior.
stream: set to true to stream responses.
background: set to true to run the request asynchronously. Returns a response ID that you poll with responses.retrieve(). See Background mode.
trace_destination: optional object with catalog_name, schema_name, and table_prefix fields. When set, the Supervisor API writes a trace of the full agent loop to the specified Unity Catalog tables. Pass via extra_body in the Python client.

The API doesn't support inference parameters such as temperature. The server manages these internally.

Limitations

The Supervisor API has the following limitations:

Background mode runtime: Background mode requests have a maximum execution time of 30 minutes.
Client-side function calling: Only hosted tools are supported. You can't pass function tool definitions for the client to execute, and you can't mix hosted tools with client-side function tools in the same request.
Streaming in background mode: stream and background can't both be true in the same request.
Durable execution: Automatic recovery from failures or interruptions with exactly-once execution guarantees for the agent loop is not supported.
Azure Databricks Apps OBO not supported: On-behalf-of-user authorization is not supported for the Supervisor API. To use the Supervisor API in Azure Databricks Apps, use system authorization and grant permissions for your tools.

Next steps

Feedback

Var denne side nyttig?

Last updated on 2026-04-20