kubelet 프로세스 실패

저번 포스팅에서 kubeadm으로 직접 클러스터를 생성하면서 시행착오를 겪었습니다.

에러가 발생한 부분은 kubeadm init 이었고 원인은 Container Runtime 환경을 Docker로 설정하면서 놓친 부분에 있었습니다.

에러 로그

[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get "http://localhost:10248/healthz": dial tcp 127.0.0.1:10248: connect: connection refused.

        Unfortunately, an error has occurred:
                timed out waiting for the condition

        This error is likely caused by:
                - The kubelet is not running
                - The kubelet is unhealthy due to a misconfiguration of the node in some way (required cgroups disabled)

        If you are on a systemd-powered system, you can try to troubleshoot the error with the following commands:
                - 'systemctl status kubelet'
                - 'journalctl -xeu kubelet'

        Additionally, a control plane component may have crashed or exited when started by the container runtime.
        To troubleshoot, list all containers using your preferred container runtimes CLI.

        Here is one example how you may list all Kubernetes containers running in docker:
                - 'docker ps -a | grep kube | grep -v pause'
                Once you have found the failing container, you can inspect its logs with:
                - 'docker logs CONTAINERID'

error execution phase wait-control-plane: couldn't initialize a Kubernetes cluster
To see the stack trace of this error execute with --v=5 or higher

원인 추적

kubelet이 정상 작동하고 있지 않네요…

root@master-node1:~# systemctl status kubelet
	 kubelet.service - kubelet: The Kubernetes Node Agent
	   Loaded: loaded (/lib/systemd/system/kubelet.service; enabled; vendor preset: enable
	  Drop-In: /etc/systemd/system/kubelet.service.d
	           └─10-kubeadm.conf
	   Active: activating (auto-restart) (Result: exit-code) since Tue 2022-05-03 00:31:01
	     Docs: https://kubernetes.io/docs/home/
	  Process: 8735 ExecStart=/usr/bin/kubelet $KUBELET_KUBECONFIG_ARGS $KUBELET_CONFIG_AR
	 Main PID: 8735 (code=exited, status=1/FAILURE)
root@master-node1:~# docker ps -a | grep kube | grep -v pause
root@master-node1:~# docker logs CONTAINERID
	Error: No such container: CONTAINERID

에러 해결

  • Container Runtime인 Docker를 제대로 설치하지 않아서 발생한 에러였습니다.
    • Docker Engine에서 kubelet을 systemd 서비스로 관리하기 때문에 Kubernetes에서는 Docker 데몬의 드라이버를 systemd 드라이버로 권장합니다.
    • 따라서 cgroup 드라이버는 systemd로 일치시켜야 하며 그렇지 않을 시 kubelet 프로세스는 실패하게 됩니다.
  • daemon.json 파일 생성 → kubeadm resetkubeadm init
root@master-node1:~# cat > /etc/docker/daemon.json <<EOF
	> {
	>   "exec-opts": ["native.cgroupdriver=systemd"],
	>   "log-driver": "json-file",
	>   "log-opts": {
	>     "max-size": "100m"
	>   },
	>   "storage-driver": "overlay2"
	> }
	> EOF
root@master-node1:~# mkdir -p /etc/systemd/system/docker.service.d
root@master-node1:~# systemctl daemon-reload
root@master-node1:~# systemctl restart docker
root@master-node1:~# kubeadm reset
	[reset] Reading configuration from the cluster...
	[reset] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-confi                      g -o yaml'
	W0503 00:56:43.361756   26579 reset.go:101] [reset] Unable to fetch the kubeadm-config ConfigMa                      p from cluster: failed to get config map: configmaps "kubeadm-config" not found
	[reset] WARNING: Changes made to this host by 'kubeadm init' or 'kubeadm join' will be reverted                      .
	[reset] Are you sure you want to proceed? [y/N]: y
	[preflight] Running pre-flight checks
	W0503 00:57:41.933212   26579 removeetcdmember.go:80] [reset] No kubeadm config, using etcd pod spec to get data directory
	[reset] Stopping the kubelet service
	[reset] Unmounting mounted directories in "/var/lib/kubelet"
	[reset] Deleting contents of config directories: [/etc/kubernetes/manifests /etc/kubernetes/pki]
	[reset] Deleting files: [/etc/kubernetes/admin.conf /etc/kubernetes/kubelet.conf /etc/kubernetes/bootstrap-kubelet.conf /etc/kubernetes/controller-manager.conf /etc/kubernetes/scheduler.conf]
	[reset] Deleting contents of stateful directories: [/var/lib/etcd /var/lib/kubelet /var/lib/dockershim /var/run/kubernetes /var/lib/cni]

	The reset process does not clean CNI configuration. To do so, you must remove /etc/cni/net.d
	
	The reset process does not reset or clean up iptables rules or IPVS tables.
	If you wish to reset iptables, you must do so manually by using the "iptables" command.
	
	If your cluster was setup to utilize IPVS, run ipvsadm --clear (or similar)
	to reset your system's IPVS tables.
	
	The reset process does not clean your kubeconfig files and you must remove them manually.
	Please, check the contents of the $HOME/.kube/config file.

root@master-node1:~# kubeadm init