Fix CrashLoopBackOff when using kubeadm to init cluster

问题的解决

这个问题,问AI一直没有解决,简中互联也没搜到。最后再gitlab issue里面找到答案。因此快速记一下修复原理。

解决问题的issue在

https://github.com/kubernetes/kubeadm/issues/2833

https://github.com/etcd-io/etcd/issues/13670

问题的特征是用kubeadm init启动,像issue里面会遇到CrashLoopBackOff

1
2
3
4
5
6
7
8
9
10
root@student-VMware-Virtual-Platform:/home/student/Desktop# kubectl get pods -n kube-system
NAME READY STATUS RESTARTS AGE
coredns-668d6bf9bc-mc5mn 0/1 Pending 0 119s
coredns-668d6bf9bc-sd84z 0/1 Pending 0 119s
etcd-student-vmware-virtual-platform 1/1 Running 2 (2m53s ago) 2m57s
kube-apiserver-student-vmware-virtual-platform 1/1 Running 2 (2m23s ago) 2m57s
kube-controller-manager-student-vmware-virtual-platform 1/1 Running 4 (2m53s ago) 2m57s
kube-proxy-7k6hq 1/1 Running 2 (41s ago) 2m
kube-scheduler-student-vmware-virtual-platform 0/1 CrashLoopBackOff 4 (22s ago) 2m57s
root@student-VMware-Virtual-Platform:/home/student/Desktop#

然后慢慢的整个集群在挂掉,使用不了kubectl。

1
2
root@student-VMware-Virtual-Platform:/home/student/Desktop/lk8s# kubectl get pods -n kube-system  
The connection to the server 192.168.220.128:6443 was refused - did you specify the right host or port?

看apiserver的容器日志发现它就是收到了信号被关了,也没有error错误。

解决方法是在/etc/containerd/config.toml

1
2
/etc/containerd/
vim /etc/containerd/config.toml

添加配置

1
2
3
4
5
6
7
8
9
version = 2
[plugins]
[plugins."io.containerd.grpc.v1.cri"]
[plugins."io.containerd.grpc.v1.cri".containerd]
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes]
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc]
runtime_type = "io.containerd.runc.v2"
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc.options]
SystemdCgroup = true

产生问题的原因

根据issue的里面指导可以看到k8s文档,在linux下又两个cgroup driver,一个是cgroupfs,一个是systemd。

Kubelet和容器运行时必须要使用一样的cgroup driver才可以。

It’s critical that the kubelet and the container runtime use the same cgroup driver and are configured the same.

根据文档所说kubelet默认情况下用的是cgroupfs

但是我这里使用kubeadm v1.32,默认是用的systemd

1
cat /var/lib/kubelet/config.yaml | grep cgroupDriver

再看containerd配置,看到默认是cgroupfs而不是system

1
containerd config default > /tmp/1.txt

systemd_cgroup = false

[plugins.”io.containerd.grpc.v1.cri”.containerd.runtimes.runc.options]
BinaryName = “”
CriuImagePath = “”
CriuPath = “”
CriuWorkPath = “”
IoGid = 0
IoUid = 0
NoNewKeyring = false
NoPivotRoot = false
Root = “”
ShimCgroup = “”
SystemdCgroup = false

如文档所说,两套cgroup manager同时用会导致不稳定,需要修改runc的启动参数。

Author

李三(cl0und)

Posted on

2025-01-12

Updated on

2025-01-12

Licensed under