Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
This document provides basic troubleshooting information for Bare Metal Machine (BMM) resources that are reporting a Warning message in the BMM detailed status message.
Prerequisites
- Access to the Azure portal or Azure CLI
- Permissions to view and manage Bare Metal Machine resources
- For diagnostic commands: SSH access via BareMetalMachineKeySet (see Manage emergency access to a Bare Metal Machine)
Symptoms
The Detailed status message of the Bare Metal Machine (Operator Nexus) resource includes one or more of the following.
| Detailed status message | Details and mitigation |
|---|---|
Warning: PXE port is unhealthy |
Warning: PXE port is unhealthy |
Warning: BMM power state doesn't match expected state |
Warning: BMM power state doesn't match expected state |
Warning: This machine has failed hardware validation |
Warning: This machine has failed hardware validation |
Troubleshooting
Evaluate the current status of all BMMs in the specified resource group. Any active Warning conditions are visible in the Detailed Status Message, as seen in the following example.
To check for any Bare Metal Machines (BMMs) which are reporting Warning messages, run:
az networkcloud baremetalmachine list -g <ResourceGroup_Name> -o table
Name ResourceGroup DetailedStatus DetailedStatusMessage
-------------- ---------------------------------- ---------------- -------------------------------------------------------------------------------------------
rack1control01 cluster-1-HostedResources-3EA53DF9 Provisioned The OS is provisioned to the machine.
rack1control02 cluster-1-HostedResources-3EA53DF9 Available Available to participate in the cluster.
rack1compute02 cluster-1-HostedResources-3EA53DF9 Provisioned The OS is provisioned to the machine. Warning: PXE port is unhealthy
rack1compute01 cluster-1-HostedResources-3EA53DF9 Provisioned The OS is provisioned to the machine. Warning: BMM power state doesn't match expected state
For more information, use an Azure CLI Bare Metal Machine run-read-command command such as the following to inspect the conditions status of the corresponding kubernetes BMM object.
az networkcloud baremetalmachine run-read-command \
-g <ResourceGroup_Name> \
-n rack1control01 \
--limit-time-seconds 60 \
--commands "[{command:'kubectl get',arguments:[-n,nc-system,bmm,rack1compute01,-o,json]}]" \
--output-directory .
- Replace
<ResourceGroup_Name>with the name of the resource group containing the BMM resources. - Replace
rack1control01with the name of a BMM resource for a healthy Kubernetes control plane node, from which to execute thekubectl getcommand. - Replace
rack1compute01with the name of the affected BMM.
For more information about the run-read-command feature and available diagnostic commands, see Troubleshoot Bare-Metal Machines by Using the run-read Command.
Review the lastTransitionTime and message fields for more information about the corresponding error condition, as shown in the following example output.
Example run-read-command output (kubectl get bmm):
{
"status": {
"conditions": [
{
"lastTransitionTime": "2025-03-04T01:57:06Z",
"status": "True",
"type": "BmmInExpectedNodeReadiness"
},
{
"lastTransitionTime": "2025-03-04T15:59:36Z",
"message": "BareMetalMachine expected to be powered on",
"reason": "BmmPoweredOnExpected",
"severity": "Error",
"status": "False",
"type": "BmmInExpectedPowerState"
},
{
"lastTransitionTime": "2025-03-04T02:48:54Z",
"message": "PXE network port (pxe) is up and stable",
"reason": "PxePortsHealthy",
"status": "True",
"type": "BmmPxePortHealthy"
}
],
"detailedStatus": "Provisioned",
"detailedStatusMessage": "The OS is provisioned to the machine. Warning: BMM power state doesn't match expected state"
}
}
You can also check for any potentially related recent lifecycle actions (such as Restart or Power off actions) in the Azure portal. See Monitor status in Bare Metal Machine JSON properties. If available, this information is also visible in the output of the previous run-read-command in the actionStates status field.
Warning: PXE port is unhealthy
This message in the BMM Detailed status message field indicates a problem with network connectivity on the Preboot Execution Environment (PXE) Ethernet port on the underlying compute host. The PXE port is used during provisioning and upgrades to download the operating system image and other software components. PXE connectivity issues shouldn't directly affect customer workloads running on a compute host. However they can cause failures in BMM lifecycle operations such as the following.
- Cluster Provisioning
- Cluster Upgrade
- BMM Reimage
- BMM Replace
Either of the following conditions can trigger this Warning. These conditions can be due to hardware, cabling, or network configuration issues.
- PXE network port is down (physical link is down)
- PXE network port is flapping (more than two changes in physical link state in the previous 15 minutes)
To troubleshoot this issue:
- review the
conditionsstatus of the kubernetesbmmobject, as described in the Troubleshooting section - this information should identify the specific root cause (port down or port flapping) and approximate time of the issue
- check the Ethernet cabling and Top Of Rack (TOR) switch for the affected PXE port
- check for any other BMMs that are also reporting unhealthy PXE status or other network-related problems
- check for any recent deployment or infrastructure changes that coincide with the time of failure.
Example conditions output for PXE warning
"conditions": [
{
"lastTransitionTime": "2025-03-04T16:43:29Z",
"message": "Physical link down on PXE interface: pxe",
"reason": "PxePortUnhealthy",
"status": "False",
"type": "BmmPxePortHealthy"
},
],
Warning: BMM power state doesn't match expected state
This message in the BMM Detailed status message field indicates that either:
- the underlying host is powered off when it should be on, or
- the underlying host is powered on when it should be off.
This message can indicate an issue with the underlying compute host or baseboard management controller (BMC).
To troubleshoot this issue:
- review the
conditionsstatus of the kubernetesbmmobject, as described in the Troubleshooting section - review the
actionStatesstatus field of the kubernetesbmmobject for any recently initiated lifecycle actions (such as a Restart or Power off) as described in the Troubleshooting section - this information should identify the approximate time of the issue and any other available details
- check the power feed, power cables, and physical hardware for the specified BMM
- check whether any other BMMs are also reporting an unexpected power state Warning, which might indicate a broader issue with the underlying infrastructure
- check for any recent deployment or infrastructure changes that coincide with the time of failure
- review the power state and logs on the BMC for the affected host.
For more information about logging into the BMC, see Troubleshoot Hardware Validation Failure.
Warning
In versions 2502.1 and 2502.3, there's a known issue where BMM power state doesn't match expected state is incorrectly reported during deprovisioning and provisioning.
For example, the issue can happen when running the BMM Reimage or Replace actions. This issue is fixed in version 2504.1.
Example conditions output for unexpected power state
"conditions": [
{
"lastTransitionTime": "2025-03-04T15:59:36Z",
"message": "BareMetalMachine expected to be powered on",
"reason": "BmmPoweredOnExpected",
"severity": "Error",
"status": "False",
"type": "BmmInExpectedPowerState"
},
],
Warning: This machine has failed hardware validation
This BMM Detailed status message indicates that hardware validation for the BMM failed. Hardware validation typically occurs during initial cluster provisioning or during a BMM Replace action.
For more information about troubleshooting hardware validation failures, see Troubleshoot Hardware Validation Failure.