Share via

Unhealthy scale set instance for scale set "FE"

Harini Gopinath 0 Reputation points
2026-03-11T05:24:57.91+00:00

Hi Team,

All Scale set instance under VMSS "FE" are unhealthy, I have tried restarting scale set instance to see if it helps but it didn't what can be done here?

Tenant is GME. It is in prod environment.

Because of this, auto upgrade is getting failed which is causing issues.

Thanks

Azure Virtual Machine Scale Sets
Azure Virtual Machine Scale Sets

Azure compute resources that are used to create and manage groups of heterogeneous load-balanced virtual machines.

0 comments No comments

2 answers

Sort by: Most helpful
  1. Manish Deshpande 5,255 Reputation points Microsoft External Staff Moderator
    2026-03-11T06:30:23.5366667+00:00

    Hello Harini

    We understand that all instances under the Virtual Machine Scale Set “FE” are currently marked as Unhealthy, and restarting the instances did not resolve the issue. As a result, automatic upgrades are failing, which is impacting the production workload.

    In Azure Virtual Machine Scale Sets (VMSS), automatic OS upgrades and instance repairs rely on health signals from either:

    • An Application Health Extension, or
    • An Azure Load Balancer health probe

    If these health checks are misconfigured, not responding with expected results (for example, HTTP 2xx responses), or not enabled at all, the platform marks the instances as Unhealthy. When all instances are unhealthy, Azure intentionally blocks automatic upgrades to prevent potential service downtime.

    https://learn.microsoft.com/en-us/troubleshoot/azure/virtual-machine-scale-sets/restart-stop/instances-not-repaired

    Please review and validate the following configuration:

    1. Confirm Health Monitoring is Enabled
    2. Validate the Health Endpoint
    3. Check Automatic Repair / Upgrade Eligibility

    If the issue persists after validating the health configuration, we are happy to assist further. Please add your comments in the Comment section, and we will respond promptly to help you move forward.

    Thanks,
    Manish.


  2. Q&A Assist
    2026-03-11T05:25:13.9266667+00:00
    AI generated content. This question contains an answer created with AI from Q&A Assist. Learn more

    For a VM scale set where all instances show as Unhealthy and automatic OS image upgrade is failing, use the following checks and fixes:

    1. Confirm automatic repairs are actually enabled and running
      • Check the VMSS orchestrationServices property and verify the automatic repairs serviceState is Running, not NotRunning or Suspended. If it is not Running, enable or re-enable automatic instance repairs on the scale set.
    2. Verify health monitoring configuration
      • If all instances are Unhealthy, this often indicates a misconfigured health probe rather than a real outage.
      • Confirm whether the VMSS is using the Application Health extension or Load Balancer health probes.
      • For either method, ensure the configured endpoint returns at least a 2xx HTTP(S) status or a successful TCP handshake. Any other response (including unreachable) will mark instances Unhealthy.
      • Validate protocol, port, and path (for example, /) configured under Health and repair for the VMSS, and that the application in each instance is listening and responding correctly.
    3. Check for provisioning failures
      • Use Get Instance View on the VMSS with API version 2019-12-01 or later and inspect virtualMachine.statusesSummary.
      • If any instance shows ProvisioningState/failed, automatic instance repairs will not fix those VMs. Manually:
        • Remove the failed instance from the scale set.
        • Increase capacity (or add a new instance) so the scale set recreates a healthy VM.
    4. Consider grace period and repair batching
      • Automatic repairs only act after the configured automaticRepairsPolicy.gracePeriod (10–90 minutes). If instances were recently updated or restarted, wait for the grace period to elapse.
      • Repairs are batched (max 5% of instances at a time, or one at a time if fewer than 20 instances). If many instances are Unhealthy, full recovery can take time.
    5. Reconfigure or enable automatic repairs via portal (if needed)
      • In the Azure portal, open the VMSS → SettingsHealth and repair.
      • Enable Monitor application health and configure either:
        • Application Health extension: set protocol (HTTP/HTTPS/TCP), port, and health path; ensure the endpoint returns 200 OK when healthy.
        • Load balancer probe: select or create a probe with correct protocol, port, and path.
      • Turn Automatic repairs to On, set an appropriate Grace period (min), and save.
    6. Re-run or resume automatic OS image upgrade after health is fixed
      • Once instances start reporting Healthy and automatic repairs are functioning, re-trigger or resume the automatic OS image upgrade.
      • If upgrade errors persist, use Get Instance View, Rolling Upgrades – Get Latest, and Get OS Upgrade History to inspect detailed error codes such as MaxUnhealthyInstancePercentExceededInRollingUpgrade or MaxUnhealthyUpgradedInstancePercentExceededInRollingUpgrade, and adjust health thresholds or fix underlying app issues accordingly.

    These steps should restore instance health, allow automatic repairs to replace or reimage bad instances, and unblock the auto upgrade process in the GME prod environment.


    References:

    0 comments No comments

Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.