Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
Summary
This article provides a guide to troubleshoot and resolve Node Not Ready problems in Azure Kubernetes Service (AKS) clusters. When a node enters a NotReady state, it disrupts the application's functionality and causes it to stop responding. Typically, the node recovers automatically after a short period. However, to prevent recurring problems and maintain a stable environment, you need to understand the underlying causes so you can implement effective resolutions.
Cause
Several scenarios can cause a NotReady state:
The unavailability of the API server. This condition causes the readiness probe to fail. This failure prevents the pod from being attached to the service so that traffic is no longer forwarded to the pod instance.
Virtual machine (VM) host faults. To determine whether VM host faults occurred, check the following information sources:
- AKS diagnostics
- Azure status
- Azure notifications (for any recent outages or maintenance periods)
Resolution
To resolve this issue, follow these steps:
- Run
kubectl describe node <node-name>to review detailed information about the node's status. Look for any error messages or warnings that might indicate the root cause of the problem. - Check the API server availability by running the
kubectl get apiservicescommand. Make sure that the readiness probe is correctly configured in the deployment YAML file. - Verify the node's network configuration to make sure that there are no connectivity problems.
- Check the node's resource usage, like CPU, memory, and disk, to identify potential constraints. For more information, see Monitor your Kubernetes cluster performance with Container insights.
For further steps, see Basic troubleshooting of Node Not Ready failures.
Prevention
To prevent this issue, take one or more of the following actions:
- Make sure that you pay for your service tier.
- Reduce the number of
watchandgetrequests to the API server.