Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
Summary
This article explains how to troubleshoot an Azure Kubernetes Service (AKS) node that changes to Not Ready after staying healthy for some time. It describes the cause and shows how to restore the node to a healthy state.
Prerequisites
- The Kubernetes kubectl tool. To install kubectl by using Azure CLI, run the az aks install-cli command.
- The Kubernetes kubelet tool.
- The Kubernetes containerd tool.
- The following Linux tools:
Connect to the AKS cluster
Before you can troubleshoot the issue, you must connect to the AKS cluster. To do so, run the following commands:
export RANDOM_SUFFIX=$(head -c 3 /dev/urandom | xxd -p)
export RESOURCE_GROUP="my-resource-group$RANDOM_SUFFIX"
export AKS_CLUSTER="my-aks-cluster$RANDOM_SUFFIX"
az aks get-credentials --resource-group $RESOURCE_GROUP --name $AKS_CLUSTER --overwrite-existing
Symptoms
The status of a cluster node that has a healthy state (all services running) unexpectedly changes to Not Ready. To view the status of a node, run the following kubectl describe command:
kubectl describe nodes
Cause
The kubelet stops posting its Ready status.
Check the output of the kubectl describe nodes command to find the Conditions field and the Capacity and Allocatable blocks. Do the contents of these fields appear as expected? For example, in the Conditions field, does the message property contain the "kubelet is posting ready status" string? If you have direct Secure Shell (SSH) access to the node, check the recent events to understand the error. Look within the /var/log/syslog file instead of /var/log/messages (not available on all distributions). Or, generate the kubelet and container daemon log files by running the following shell commands:
# First, identify the NotReady node
export NODE_NAME=$(kubectl get nodes --no-headers | grep NotReady | awk '{print $1}' | head -1)
if [ -z "$NODE_NAME" ]; then
echo "No NotReady nodes found"
kubectl get nodes
else
echo "Found NotReady node: $NODE_NAME"
# Use kubectl debug to access the node
kubectl debug node/$NODE_NAME -it --image=mcr.microsoft.com/dotnet/runtime-deps:6.0 -- chroot /host bash -c "
echo '=== Checking syslog ==='
if [ -f /var/log/syslog ]; then
tail -100 /var/log/syslog
else
echo 'syslog not found'
fi
echo '=== Checking kubelet logs ==='
journalctl -u kubelet --no-pager | tail -100
echo '=== Checking containerd logs ==='
journalctl -u containerd --no-pager | tail -100
"
fi
After you run these commands, examine the syslog and daemon log files for more information about the error.
Solution
Step 1: Check for changes at the network level
If all cluster nodes regressed to a Not Ready status, check whether any changes occurred at the network level. Examples of network-level changes include:
- Domain name system (DNS) changes.
- Firewall rule changes, like port, fully qualified domain names (FQDNs), and so on.
- Added network security groups (NSGs).
- Applied or changed route table configurations for AKS traffic.
If there were changes at the network level, make any necessary corrections. If you have direct Secure Shell (SSH) access to the node, use the curl or telnet command to check the connectivity to AKS outbound requirements. After you fix the issues, stop and restart the nodes. If the nodes stay in a healthy state after these fixes, you can safely skip the remaining steps.
Step 2: Stop and restart the nodes
If only a few nodes show a Not Ready status, stop and restart the nodes. This action might return the nodes to a healthy state. Then, check Azure Kubernetes Service diagnostics overview to see if there are any problems, like the following issues:
- Node faults.
- Source network address translation (SNAT) failures.
- Node input/output operations per second (IOPS) performance problems.
- Other problems.
If the diagnostics don't find any underlying problems and the nodes return to Ready status, you can safely skip the remaining steps.
Step 3: Fix SNAT issues for public AKS API clusters
Did AKS diagnostics find any SNAT problems? If so, take some of the following actions, as appropriate:
Check whether your connections stay idle for a long time and rely on the default idle timeout to release their port. If the connections show this behavior, you might need to reduce the default timeout of 30 minutes.
Find out how your application creates outbound connectivity. For example, does it use code review or packet capture?
Find out whether this activity represents the expected behavior or, instead, it shows that the application is misbehaving. Use metrics and logs in Azure Monitor to back up your findings. For example, you can use the Failed category as a SNAT Connections metric.
Check whether appropriate patterns are followed.
Check whether you should fix SNAT port exhaustion by using extra outbound IP addresses and more allocated outbound ports. For more information, see Scale the number of managed outbound public IPs and Configure the allocated outbound ports.
For more information about how to troubleshoot SNAT port exhaustion, see Troubleshoot SNAT port exhaustion on AKS nodes.
Step 4: Fix IOPS performance problems
If AKS diagnostics uncovers problems that reduce IOPS performance, take some of the following actions, as appropriate:
To increase IOPS on virtual machine (VM) scale sets, choose a larger disk size that offers better IOPS performance by deploying a new node pool. Direct resizing of VMSS isn't supported. For more information about resizing node pools, see Resize node pools in Azure Kubernetes Service (AKS).
Increase the node SKU size for more memory and CPU processing capability.
Consider using Ephemeral OS.
Limit the CPU and memory usage for pods. These limits help prevent node CPU consumption and out-of-memory situations.
Use scheduling topology methods to add more nodes and distribute the load among the nodes. For more information, see Pod topology spread constraints.
Step 5: Fix threading problems
Kubernetes components like kubelets and containerd runtimes rely heavily on threading, and they spawn new threads regularly. If the allocation of new threads is unsuccessful, this failure can affect service readiness, as follows:
The node status changes to Not Ready, but a remediator restarts the node and recovers it.
The /var/log/messages and /var/log/syslog log files show repeated occurrences of the following error entries:
pthread_create failed: Resource temporarily unavailable by various processes
The processes that are cited include containerd and possibly kubelet.
The node status changes to Not Ready soon after the
pthread_createfailure entries are written to the log files.
Process IDs (PIDs) represent threads. The default number of PIDs that a pod can use might depend on the operating system. However, the default number is at least 32,768. This number is more than enough PIDs for most situations. Are there any known application requirements for higher PID resources? If there aren't, even an eight-fold increase to 262,144 PIDs might not be enough to accommodate a high-resource application.
Instead, identify the offending application, and then take the appropriate action. Consider other options, like increasing the VM size or upgrading AKS. These actions can mitigate the issue temporarily, but they aren't a guarantee that the issue won't reappear again.
To monitor the thread count for each control group (cgroup) and print the top eight cgroups, run the following shell command:
# Show current thread count for each cgroup (top 8)
ps -e -w -o "thcount,cgname" --no-headers | awk '{a[$2] += $1} END{for (i in a) print a[i], i}' | sort --numeric-sort --reverse | head --lines=8
For more information, see Process ID limits and reservations.
Kubernetes offers two methods to manage PID exhaustion at the node level:
Configure the maximum number of PIDs that a pod can use within a kubelet by using the
--pod-max-pidsparameter. This configuration sets thepids.maxsetting within the cgroup of each pod. You can also use the--system-reservedand--kube-reservedparameters to configure the system and kubelet limits, respectively.Configure PID-based eviction.
Note
By default, neither of these methods is set up. Additionally, you can't currently configure either method by using Node configuration for AKS node pools.
Step 6: Use a higher service tier
To ensure the AKS API server is highly available, use a higher service tier. For more information, see the Azure Kubernetes Service (AKS) Uptime SLA.
More information
To view the health and performance of the AKS API server and kubelets, see Managed AKS components.
For general troubleshooting steps, see Basic troubleshooting of node not ready failures.