High CPU Runbook
System should run at a reasonable:
- CPU utilization level (e.g. 20-30%)
- and load average of 2-3.
CPU Analytics
Section titled “CPU Analytics”Since k3s runs everything on the host (not in VMs), the problematic process will show up directly in top.
k3s is the parent process for all containers, you need to drill down to see which pod/container is actually consuming resources
Get access to the node
- Ssh
- Start a container
kubectl debug node/<node-name> -it --image=ubuntu# Then inside the container:chroot /hostThen start top
top# ortop -c -o %CPUor
htopLoad average
Section titled “Load average”The load of 3.01 per core is quite high - typically you want it below 1.0 per core. Since it’s sustained over 15 minutes, this suggests an ongoing issue rather than a temporary spike. A load of 3.01 per core suggests either CPU-bound processes or significant I/O blocking.
Example:
- normal (usually sit between about 1.0 and 3.0.)
load average: 0.87, 1.56, 1.35- high (around 8-9, getting worse over time)
load average: 5.40, 10.09, 11.80A high load average can be caused by:
- Memory thrashing: System is constantly swapping because you have no swap space and RAM is 95%+ full
- Processes are blocking waiting for memory/I/O
Swapping ? kswapd0, wa
Section titled “Swapping ? kswapd0, wa”- If
kswapd0is using CPU, this kernel process swaps memory to disk when RAM is full - High I/O wait (
wa): 9.7% - the system is waiting on disk I/O (memory swapping)
Memory
Section titled “Memory”MiB Mem : 7750.7 total, 403.6 free, 5580.2 used, 2156.9 buff/cacheMemory exhausted?, no : Only 132 MB free out of 7.7 GB total
Other CPU analysis
Section titled “Other CPU analysis”# Check resource usage of pods on that nodekubectl top pods --all-namespaces --sort-by=cpu --field-selector spec.nodeName=kube-server-01.eraldy.com# Script failed use the global debug flag ie (kubee --debug kubectl) to get more information
ps aux --sort=-%cpu | head -20K3s-server as Top process
Section titled “K3s-server as Top process”k3s as highest CPU user per top
k3s-server using 232% CPU and 2.1 GB RAM - this is abnormally high
Check the log for any error
journalctl -u k3s -f --since "15 minutes ago"# or ?journalctl -e -u k3sCheck the database size for vaccum / compaction error