通过kubeadm join增加节点

通过kubeadm join命令为kubenetes集群增加节点

参考 Kubernetes 官方文档:

准备工作

通过 kubeadmin init 命令安装k8s时,会有如下提示:

Then you can join any number of worker nodes by running the following on each as root:

sudo kubeadm join 192.168.0.41:6443 --token 5ezixq.itmxvdgey8uduysr \
        --discovery-token-ca-cert-hash sha256:d641cec650bdee479a3e7479b558ab68886f7c41ef89f2857099776ed72bcaae

这里用到的 token 可以通过 kubeadm token list 命令获取:

$ kubeadm token list                                                                                                                       
TOKEN                     TTL         EXPIRES                USAGES                   DESCRIPTION                                                EXTRA GROUPS
5ezixq.itmxvdgey8uduysr   12h         2021-12-28T04:22:54Z   authentication,signing   The default bootstrap token generated by 'kubeadm init'.   system:bootstrappers:kubeadm:default-node-token

由于 token 的有效期(TTL)通常不是很久(默认12小时),因此可能会出现没有可用的token的情况。此时需要在该集群上创建新的token(注意需要登录到集群的控制平面所在的节点上执行命令,因为后面会读取本地文件):

$ kubeadm token create
omkq4t.v6nnkj4erms2ipyf
$ kubeadm token list  
TOKEN                     TTL         EXPIRES                USAGES                   DESCRIPTION                                                EXTRA GROUPS
omkq4t.v6nnkj4erms2ipyf   23h         2021-12-29T09:19:23Z   authentication,signing   <none>                                                     system:bootstrappers:kubeadm:default-node-token

discovery-token-ca-cert-hash 可以通过下面的命令生成:

openssl x509 -pubkey -in /etc/kubernetes/pki/ca.crt | openssl rsa -pubin -outform der 2>/dev/null | openssl dgst -sha256 -hex | sed 's/^.* //'

d641cec650bdee479a3e7479b558ab68886f7c41ef89f2857099776ed72bcaae

执行kubeadm join

输出如下:

$ sudo kubeadm join 192.168.0.41:6443 --token 5ezixq.itmxvdgey8uduysr \
        --discovery-token-ca-cert-hash sha256:d641cec650bdee479a3e7479b558ab68886f7c41ef89f2857099776ed72bcaae

[preflight] Running pre-flight checks
[preflight] Reading configuration from the cluster...
[preflight] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'
W1228 00:04:48.056252   78445 utils.go:69] The recommended value for "resolvConf" in "KubeletConfiguration" is: /run/systemd/resolve/resolv.conf; the provided value is: /run/systemd/resolve/resolv.conf
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Starting the kubelet
[kubelet-start] Waiting for the kubelet to perform the TLS Bootstrap...

This node has joined the cluster:
* Certificate signing request was sent to apiserver and a response was received.
* The Kubelet was informed of the new secure connection details.

Run 'kubectl get nodes' on the control-plane to see this node join the cluster.

在当前机器上,执行命令,会发现无法连接本地 api server:

$ k get nodes   
The connection to the server localhost:8080 was refused - did you specify the right host or port?

在另一台机器上执行命令,可以看到这个节点添加成功:

$ k get nodes
NAME         STATUS   ROLES                  AGE    VERSION
skyserver    Ready    control-plane,master   11h    v1.23.1
skyserver2   Ready    <none>                 4m1s   v1.23.1

错误处理

pod无法启动

发现有调度到某个节点的pod无法启动,一直卡在 ContainerCreating 上:

$ get pods -A
NAMESPACE              NAME                                         READY   STATUS              RESTARTS      AGE
kubernetes-dashboard   dashboard-metrics-scraper-799d786dbf-6wksz   0/1     ContainerCreating   0             8h

查看该pod信息发现调度到node skywork2,然后报错 "cni0" already has an IP address different from 10.244.2.1/24:

k describe pods dashboard-metrics-scraper-799d786dbf-hqlg6 -n kubernetes-dashboard 
Name:           dashboard-metrics-scraper-799d786dbf-hqlg6
Namespace:      kubernetes-dashboard
Priority:       0
Node:           skywork2/192.168.0.20
......
  Warning  FailedCreatePodSandBox  17s (x4 over 20s)   kubelet            (combined from similar events): Failed to create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "41479d55f5428ec9a36267170dd1516f996bcf9d49f772d98c2fc79230f64830" network for pod "dashboard-metrics-scraper-799d786dbf-hqlg6": networkPlugin cni failed to set up pod "dashboard-metrics-scraper-799d786dbf-hqlg6_kubernetes-dashboard" network: failed to delegate add: failed to set bridge addr: "cni0" already has an IP address different from 10.244.2.1/24

这是因为之前这个节点在 kubeadm join 之前,做过 kubeadm init ,在 kebeadm reset 之后残余了部分网络配置。

解决的方法是彻底的重置网络再join, 操作如下:

sudo -i
kubeadm reset -f
systemctl stop kubelet
systemctl stop docker
rm -rf /var/lib/cni/
rm -rf /var/lib/kubelet/*
rm -rf /etc/cni/
rm -rf /etc/kubernetes/
ifconfig cni0 down
ifconfig flannel.1 down
ifconfig docker0 down
ip link delete cni0
ip link delete flannel.1
systemctl start docker
systemctl start kubelet

在清理干净之后再次执行 kubeadm join 即可。

备注: 发现在节点执行 kubeadm reset 之后,在master节点上执行 kebuctr get nodes 时这个节点信息迟迟不能剔除。安全起见可以手工执行一次 kebuctl delete nodes skywork2

参考资料: