Edit

Share via


MAI-Transcribe-1 in Azure Speech (preview)

Note

This feature is currently in public preview. This preview is provided without a service-level agreement, and is not recommended for production workloads. Certain features might not be supported or might have constrained capabilities. For more information, see Supplemental Terms of Use for Microsoft Azure Previews.

MAI‑Transcribe‑1 is a speech recognition model developed by the Microsoft AI (MAI) Superintelligence team with a dual focus: high accuracy and high efficiency. You can use the MAI‑Transcribe‑1 model with the LLM Speech API.

Prerequisites

  • An Azure subscription. You can create one for free.
  • Create a Foundry resource for Speech in the Azure portal.
  • Get the Speech resource key and region. After your Speech resource is deployed, select Go to resource to view and manage keys. For the current list of supported regions, see Speech service regions.
  • An audio file (less than 300 MB in size) in one of the formats: WAV, MP3, and FLAC.

Use the MAI-Transcribe-1 model

The following languages are currently supported for mai-transcribe-1 model:

  • Arabic, Chinese, Czech, Danish, Dutch, English, Finnish, French, German, Hindi, Hungarian, Indonesian, Italian, Japanese, Korean, Norwegian Bokmål, Polish, Portuguese, Romanian, Russian, Spanish, Swedish, Thai, Turkish, and Vietnamese.

Upload audio

You can provide audio data in the following ways:

  • Pass inline audio data.
  --form 'audio=@"YourAudioFile"'
  • Upload audio file from a public audioUrl.
  --form 'definition": "{\"audioUrl\": \"https://crbn.us/hello.wav"}"'

In the sections below, inline audio upload is used as an example.

Create transcription

To use the MAI-Transcribe-1 model, set the model property accordingly in the request.

curl --location 'https://<YourServiceRegion>.api.cognitive.microsoft.com/speechtotext/transcriptions:transcribe?api-version=2025-10-15' \
--header 'Content-Type: multipart/form-data' \
--header 'Ocp-Apim-Subscription-Key: <YourSpeechResourceKey>' \
--form 'audio=@"YourAudioFile.wav"' \
--form 'definition={
  "enhancedMode": {
    "enabled": true,
    "model":"mai-transcribe-1"
  }
}'

Note the following limitations using the MAI-Transcribe-1 model:

  • Diarization isn't supported.

Tip

For more information about using LLM Speech API, see LLM Speech API