An Azure service that is used to provision Windows and Linux virtual machines.
Hello Ashish,
Thank you for the detailed explanation of your workflow and the issue observed during VM provisioning.
Based on your description, the behavior you are encountering is expected and aligns with how Azure handles compute capacity allocation.
The validation step using the Resource SKUs API confirms whether a VM size is supported in a region and whether there are any subscription-level restrictions. However, it does not provide real-time capacity availability. Azure capacity is dynamically allocated across regions and customers, and availability can change between the validation step and the actual VM creation request.
As a result, even if a VM size appears available during validation, the deployment may still fail with a SkuNotAvailable error at the time of allocation due to temporary capacity constraints in the selected region or availability zone.
At present, there is no Azure API that exposes real-time VM capacity availability prior to deployment. This is a known platform behavior and not an issue with your implementation.
Given this, the recommended approach is to design the automation workflow to handle such allocation failures gracefully.
You may consider attempting On-Demand Capacity Reservation at the beginning of the workflow so that capacity is validated upfront. If the reservation is unsuccessful, you can immediately try alternate options without creating dependent resources: https://learn.microsoft.com/en-us/azure/virtual-machines/capacity-reservation-overview
It is also recommended to maintain a fallback strategy with alternate VM sizes, availability zones, and regions. In case of a SkuNotAvailable or AllocationFailed error, the system can automatically retry using the next available option: https://learn.microsoft.com/en-us/troubleshoot/azure/virtual-machines/allocation-failure
Additionally, implementing retry logic with exponential backoff is important, as these failures are often temporary and may succeed on subsequent attempts: https://learn.microsoft.com/en-us/troubleshoot/azure/virtual-machines/allocation-failure
To avoid partial resource creation, you may consider using ARM or Bicep templates for deployment, which ensure automatic rollback if the VM allocation fails.
For scenarios where capacity must be guaranteed, Azure Capacity Reservations can be used to reserve compute resources in advance, ensuring availability at deployment time: https://learn.microsoft.com/en-us/azure/virtual-machines/capacity-reservation-overview
Hope this helps! Please let me know if you have any queries.