It seems that there is a gc bug in the azure speech python sdk, how to solve it?

Question

It seems that there is a gc bug in the azure speech python sdk, how to solve it?

datou ai 0

I would like to use the Azure TTS input text streaming capability, as described in this documentation: https://learn.microsoft.com/zh-cn/azure/ai-services/speech-service/how-to-lower-speech-synthesis-latency?pivots=programming-language-python&source=docs#input-text-streaming

However, it seems that there are bugs in the python sdk. I use python3.13. Under the fastapi project, I found that when implementing stream tts using the azure sdk, it caused the event loop thread to be permanently stuck, which was almost an inevitable situation. This issue seems to be caused by the gc of objects in the azure sdk. Could you tell me how to solve it? When this issue occurs, the thread trace is printed as follows:

Current thread 0x000070bced8a3340 (most recent call first):
  Garbage-collecting
  File "/app/.venv/lib/python3.13/site-packages/azure/cognitiveservices/speech/interop.py", line 133 in 
  File "/usr/local/lib/python3.13/weakref.py", line 428 in 
  File "/app/.venv/lib/python3.13/site-packages/anyio/_backends/_asyncio.py", line 427 in 
  File "/app/.venv/lib/python3.13/site-packages/httpcore/_synchronization.py", line 214 in 
  File "/app/.venv/lib/python3.13/site-packages/httpcore/_async/connection_pool.py", line 343 in _close_connections
  File "/app/.venv/lib/python3.13/site-packages/httpcore/_async/connection_pool.py", line 229 in handle_async_request
  File "/app/.venv/lib/python3.13/site-packages/httpx/_transports/default.py", line 394 in handle_async_request
  File "/app/.venv/lib/python3.13/site-packages/httpx/_client.py", line 1730 in _send_single_request
  File "/app/.venv/lib/python3.13/site-packages/httpx/_client.py", line 1694 in _send_handling_redirects
  File "/app/.venv/lib/python3.13/site-packages/httpx/_client.py", line 1657 in _send_handling_auth
  File "/app/.venv/lib/python3.13/site-packages/httpx/_client.py", line 1629 in send
  File "/app/.venv/lib/python3.13/site-packages/openai/_base_client.py", line 1604 in request
  File "/app/.venv/lib/python3.13/site-packages/openinference/instrumentation/openai/_request.py", line 398 in 
  File "/app/.venv/lib/python3.13/site-packages/openai/_base_client.py", line 1884 in post
  File "/app/.venv/lib/python3.13/site-packages/openai/resources/chat/completions/completions.py", line 2714 in create
  File "/app/app/chain/general/component/agent/chain_chat_agent/agent.py", line 414 in agent_loop
  File "/usr/local/lib/python3.13/asyncio/runners.py", line 118 in run
  File "/usr/local/lib/python3.13/asyncio/runners.py", line 195 in run
  File "/app/.venv/lib/python3.13/site-packages/uvicorn/workers.py", line 107 in run
  File "/app/.venv/lib/python3.13/site-packages/gunicorn/workers/base.py", line 144 in init_process
  File "/app/.venv/lib/python3.13/site-packages/uvicorn/workers.py", line 75 in init_process
  File "/app/.venv/lib/python3.13/site-packages/gunicorn/arbiter.py", line 684 in spawn_worker
  File "/app/.venv/lib/python3.13/site-packages/gunicorn/arbiter.py", line 719 in spawn_workers
  File "/app/.venv/lib/python3.13/site-packages/gunicorn/arbiter.py", line 634 in manage_workers
  File "/app/.venv/lib/python3.13/site-packages/gunicorn/arbiter.py", line 206 in run
  File "/app/.venv/lib/python3.13/site-packages/gunicorn/app/base.py", line 71 in run
  File "/app/.venv/lib/python3.13/site-packages/gunicorn/app/base.py", line 235 in run
  File "/app/.venv/lib/python3.13/site-packages/gunicorn/app/wsgiapp.py", line 66 in run
  File "/app/.venv/bin/gunicorn", line 10 in <module>

I just follow https://learn.microsoft.com/zh-cn/azure/ai-services/speech-service/how-to-lower-speech-synthesis-latency?pivots=programming-language-python#input-text-streaming. I have considered the issue of synchronous function calls under asynchronous conditions. (Use the "run_in_threadpool" function of fastapi to submit the threadpool for asynchronous execution of synchronous function in azure sdk)

Karnam Venkata Rajeswari 1,150 Reputation points Microsoft External Staff Moderator

2026-04-07T20:22:05.3466667+00:00
Hello datou ai,

Welcome to Microsoft Q&A .Thank you for reaching out.

The behavior observed does not indicate incorrect implementation, but rather highlights a mismatch between asynchronous server runtimes and a native, synchronous SDK execution model.

The observed behavior aligns with a runtime interaction between asynchronous application frameworks and the native execution model of the Azure Speech Python SDK, particularly when using input text streaming for text‑to‑speech. In this setup, the Speech SDK relies on native components and background threads for audio synthesis, while the application is hosted inside an asyncio‑based server such as FastAPI. When these two models intersect, object cleanup may occur while native threads are still active, which can result in the process appearing to stall during garbage collection rather than raising an explicit error.

This behavior becomes more noticeable in long‑running services and streaming scenarios, where synthesizer and stream objects are created repeatedly. Python garbage collection may attempt to reclaim SDK objects that still hold native resources, leading to thread contention during cleanup. Running this pattern on newer Python runtimes further increases the likelihood of encountering such lifecycle timing issues.

From a compatibility perspective, the Speech SDK for Python is currently validated against Python 3.7 through 3.11. Running on newer runtimes may expose untested interactions between garbage collection and native resource management. Aligning the runtime to a supported Python version helps reduce this risk and improves overall stability.

To mitigate the issue and improve reliability, the following steps might be helpful

Explicitly managing the Speech SDK object lifecycle

Ensuring that each SpeechSynthesizer instance is explicitly closed after synthesis completes.

Closing any associated pull or push audio streams.

Please avoid relying on garbage collection to clean up SDK objects.

This approach releases native handles deterministically and reduces cleanup during garbage collection.

Using a supported Python runtime

Consider preferring python 3.10 or 3.11 for production deployments using Speech SDK streaming.

This aligns with tested and supported runtime combinations and avoids unvalidated GC behavior.

Upgrading to the latest Speech SDK version by installing the most recent stable or preview release of azure-cognitiveservices-speech as the recent releases include improvements in object lifecycle handling and native cleanup paths.

Isolating speech synthesis execution

Run streaming text‑to‑speech in a dedicated worker thread or separate process.

Avoid sharing Speech SDK objects directly with the asyncio event loop.

This separation reduces contention between native threads and async scheduling.

Using garbage collection control only as a diagnostic aid

Temporarily disabling cyclic garbage collection around synthesis calls may help confirm GC involvement.

This approach is not recommended as a permanent production strategy, but can assist in troubleshooting lifecycle timing issues.

The following references might be helpful , please check them out

About the Speech SDK - Speech service - Foundry Tools | Microsoft Learn

How to lower speech synthesis latency using Speech SDK - Foundry Tools | Microsoft Learn

How to track Speech SDK memory usage - Speech service - Foundry Tools | Microsoft Learn

Troubleshoot the Speech SDK - Speech service - Foundry Tools | Microsoft Learn

Thank you

Share via

It seems that there is a gc bug in the azure speech python sdk, how to solve it?

Your answer