Edit

Share via


Use model router for Microsoft Foundry (classic)

Currently viewing: Foundry (classic) portal version - Switch to version for the new Foundry portal

Model router for Microsoft Foundry is a deployable AI chat model that selects the best large language model (LLM) to respond to a prompt in real time. It uses different preexisting models to deliver high performance and save on compute costs, all in one model deployment. To learn more about how model router works, its advantages, and limitations, see the Model router concepts guide.

Use model router through the Chat Completions API like you'd use a single base model such as GPT-5. Follow the same steps as in the Chat completions guide.

Tip

The Microsoft Foundry (new) portal offers enhanced configuration options for model router. Switch to the Microsoft Foundry (new) documentation to see the latest features.

Supported underlying models

With the 2025-11-18 version, Model Router adds nine new models including Anthropic's Claude, DeepSeek, Llama, Grok models to support a total of 18 models available for routing your prompts.

Note

You don't need to separately deploy the supported LLMs for use with model router, with the exception of the Claude models. To use model router with your Claude models, first deploy them from the model catalog. The deployments will get invoked by Model router if they're selected for routing.

Model router version Underlying models Underlying model version
2025-11-18 gpt-4.1
gpt-4.1-mini
gpt-4.1-nano
o4-mini
gpt-5-nano
gpt-5-mini
gpt-5
gpt-5-chat
gpt-5.2
gpt-5.2-chat
Deepseek-v3.12
Deepseek-v3.22
gpt-oss-120b2
llama4-maverick-instruct2
grok-42
grok-4-fast2
claude-haiku-4-53
claude-sonnet-4-53
claude-opus-4-13
claude-opus-4-63
2025-04-14
2025-04-14
2025-04-14
2025-04-16
2025-08-07
2025-08-07
2025-08-07
2025-08-07
2025-12-11
2025-12-11
N/A
N/A
N/A
N/A
N/A
N/A
N/A
2025-09-29
2025-08-05
2025-09-29
2025-08-05
2025-08-07 gpt-4.1
gpt-4.1-mini
gpt-4.1-nano
o4-mini
gpt-51
gpt-5-mini
gpt-5-nano
gpt-5-chat
2025-04-14
2025-04-14
2025-04-14
2025-04-16
2025-08-07
2025-08-07
2025-08-07
2025-08-07
2025-05-19 gpt-4.1
gpt-4.1-mini
gpt-4.1-nano
o4-mini
2025-04-14
2025-04-14
2025-04-14
2025-04-16
  • 1Requires registration.
  • 2Model router support is in preview.
  • 3Model router support is in preview. Requires deployment of model for use with Model router.

Deploy a model router model

Model router is packaged as a single Foundry model that you deploy. Start by following the steps in the resource deployment guide.

In the Create new deployment, find model-router in the Models list and select it.

Note

Your deployment settings apply to all underlying chat models that model router uses.

  • Don't deploy the underlying chat models separately. Model router works independently of your other deployed models.
  • Select a content filter when you deploy the model router model or apply a filter later. The content filter applies to all content passed to and from the model router; don't set content filters for each underlying chat model.
  • Your tokens-per-minute rate limit setting applies to all activity to and from the model router; don't set rate limits for each underlying chat model.

Test model router with the Completions API

You can use model router through the chat completions API in the same way you'd use other OpenAI chat models. Set the model parameter to the name of our model router deployment, and set the messages parameter to the messages you want to send to the model.

Test model router in the playground

In the Foundry portal, go to your model router deployment on the Models + endpoints page and select it to open the model playground. In the playground, enter messages and see the model's responses. Each response shows which underlying model the router selected.

Important

You can set the Temperature and Top_P parameters to the values you prefer (see the concepts guide), but note that reasoning models (o-series) don't support these parameters. If model router selects a reasoning model for your prompt, it ignores the Temperature and Top_P input parameters.

The parameters stop, presence_penalty, frequency_penalty, logit_bias, and logprobs are similarly dropped for o-series models but used otherwise.

Important

Starting with the 2025-11-18 version, the reasoning_effort parameter (see the Reasoning models guide) is now supported in model router. If the model router selects a reasoning model for your prompt, it will use your reasoning_effort input value with the underlying model.

Output format

The JSON response you receive from a model router model is identical to the standard chat completions API response. Note that the "model" field reveals which underlying model was selected to respond to the prompt.

The following example response was generated using API version 2025-11-18:


{
    "success": true,
    "data": {
        "choices": [
            {
                "content_filter_results": {
                    "hate": {
                        "filtered": false,
                        "severity": "safe"
                    },
                    "protected_material_code": {
                        "filtered": false,
                        "detected": false
                    },
                    "protected_material_text": {
                        "filtered": false,
                        "detected": false
                    },
                    "self_harm": {
                        "filtered": false,
                        "severity": "safe"
                    },
                    "sexual": {
                        "filtered": false,
                        "severity": "safe"
                    },
                    "violence": {
                        "filtered": false,
                        "severity": "safe"
                    }
                },
                "finish_reason": "stop",
                "index": 0,
                "logprobs": null,
                "message": {
                    "annotations": [],
                    "content": "Charismatic and bold—combining brash showmanship and poetic wit with fierce competitiveness, moral conviction, and unwavering activism.",
                    "refusal": null,
                    "role": "assistant"
                }
            }
        ],
        "created": 1774543376,
        "id": "xxxx-yyyy-zzzz",
        "model": "gpt-5-mini-2025-08-07",
        "object": "chat.completion",
        "prompt_filter_results": [
            {
                "prompt_index": 0,
                "content_filter_results": {
                    "hate": {
                        "filtered": false,
                        "severity": "safe"
                    },
                    "jailbreak": {
                        "filtered": false,
                        "detected": false
                    },
                    "self_harm": {
                        "filtered": false,
                        "severity": "safe"
                    },
                    "sexual": {
                        "filtered": false,
                        "severity": "safe"
                    },
                    "violence": {
                        "filtered": false,
                        "severity": "safe"
                    }
                }
            }
        ],
        "system_fingerprint": null,
        "usage": {
            "completion_tokens": 163,
            "completion_tokens_details": {
                "accepted_prediction_tokens": 0,
                "audio_tokens": 0,
                "reasoning_tokens": 128,
                "rejected_prediction_tokens": 0
            },
            "prompt_tokens": 3254,
            "prompt_tokens_details": {
                "audio_tokens": 0,
                "cached_tokens": 3200
            },
            "total_tokens": 3417
        }
    }
}

Monitor model router metrics

Monitor performance

Monitor the performance of your model router deployment in Azure Monitor (AzMon) in the Azure portal.

  1. Go to the Monitoring > Metrics page for your Azure OpenAI resource in the Azure portal.
  2. Filter by the deployment name of your model router model.
  3. Split the metrics by underlying models if needed.

Monitor costs

You can monitor the costs of model router, which is the sum of the costs incurred by the underlying models.

  1. Visit the Resource Management -> Cost analysis page in the Azure portal.
  2. If needed, filter by Azure resource.
  3. Then, filter by deployment name: Filter by "Tag", select Deployment as the type of the tag, and then select your model router deployment name as the value.

Troubleshoot model router

Common issues

Issue Cause Resolution
Rate limit exceeded Too many requests to model router deployment Increase tokens-per-minute quota or implement retry with exponential backoff
Unexpected model selection Routing logic selected different model than expected Review routing mode settings; consider using model subset to constrain options
High latency Router overhead plus underlying model processing Use Cost mode for latency-sensitive workloads; smaller models respond faster
Claude model not routing Claude models require separate deployment Deploy Claude models from model catalog before enabling in subset

Error codes

For API error codes and troubleshooting, see the Azure OpenAI REST API reference.

Resources

The following open-source repositories demonstrate model router in different scenarios. Each repo is on GitHub — learn, fork, and extend to accelerate your learning. Most samples require an existing model router deployment; see Deploy a model router model to get started.

Resource Learn Extend
Model Router Capabilities Interactive Demo (Python) Compare Balanced, Cost, and Quality routing modes with custom prompts. View live benchmark data for cost savings, latency, and routing distribution. Add your own prompt sets, integrate with your CI pipeline, or connect to your deployment for A/B testing.
Routed Models Distribution Analysis (Python) Run batches of prompts across routing profiles and model subsets. See which models the router selects and in what proportions. Plug in representative prompt logs to evaluate tradeoffs before adopting a routing policy at scale.
Multi-team sceanrios with Quality & Cost benchmarking (Python, workshop) Deploy model router, run benchmarks against fixed-model deployments, and analyze cost and latency optimization in a multi-team enterprise scenario. Swap in your own models, prompts, and routing profiles to benchmark against your workload patterns.
On-Call Copilot Multi-Agent Demo (Python) See how model router dynamically selects the right model per agent step — a fast, low-cost model for classification and a reasoning model for root-cause analysis. Adapt the multi-agent architecture, agent roles, and escalation paths for your own operations or support scenarios.

Important

These samples are intended for learning and experimentation only and are not production-ready. Before deploying any code derived from these repositories, review it against your organization's security, compliance, and responsible AI policies. See the Microsoft Responsible AI principles for guidance.

Next steps