通过kubeadm join增加节点
通过kubeadm join命令为kubenetes集群增加节点
参考 Kubernetes 官方文档:
- https://kubernetes.io/docs/setup/independent/create-cluster-kubeadm/
- https://kubernetes.io/zh/docs/reference/setup-tools/kubeadm/kubeadm-join/ : 上文的中文版本
准备工作
通过 kubeadmin init
命令安装k8s时,会有如下提示:
Then you can join any number of worker nodes by running the following on each as root:
sudo kubeadm join 192.168.0.41:6443 --token 5ezixq.itmxvdgey8uduysr \
--discovery-token-ca-cert-hash sha256:d641cec650bdee479a3e7479b558ab68886f7c41ef89f2857099776ed72bcaae
这里用到的 token 可以通过 kubeadm token list
命令获取:
$ kubeadm token list
TOKEN TTL EXPIRES USAGES DESCRIPTION EXTRA GROUPS
5ezixq.itmxvdgey8uduysr 12h 2021-12-28T04:22:54Z authentication,signing The default bootstrap token generated by 'kubeadm init'. system:bootstrappers:kubeadm:default-node-token
由于 token 的有效期(TTL)通常不是很久(默认12小时),因此可能会出现没有可用的token的情况。此时需要在该集群上创建新的token(注意需要登录到集群的控制平面所在的节点上执行命令,因为后面会读取本地文件):
$ kubeadm token create
omkq4t.v6nnkj4erms2ipyf
$ kubeadm token list
TOKEN TTL EXPIRES USAGES DESCRIPTION EXTRA GROUPS
omkq4t.v6nnkj4erms2ipyf 23h 2021-12-29T09:19:23Z authentication,signing <none> system:bootstrappers:kubeadm:default-node-token
discovery-token-ca-cert-hash 可以通过下面的命令生成:
openssl x509 -pubkey -in /etc/kubernetes/pki/ca.crt | openssl rsa -pubin -outform der 2>/dev/null | openssl dgst -sha256 -hex | sed 's/^.* //'
d641cec650bdee479a3e7479b558ab68886f7c41ef89f2857099776ed72bcaae
执行kubeadm join
输出如下:
$ sudo kubeadm join 192.168.0.41:6443 --token 5ezixq.itmxvdgey8uduysr \
--discovery-token-ca-cert-hash sha256:d641cec650bdee479a3e7479b558ab68886f7c41ef89f2857099776ed72bcaae
[preflight] Running pre-flight checks
[preflight] Reading configuration from the cluster...
[preflight] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'
W1228 00:04:48.056252 78445 utils.go:69] The recommended value for "resolvConf" in "KubeletConfiguration" is: /run/systemd/resolve/resolv.conf; the provided value is: /run/systemd/resolve/resolv.conf
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Starting the kubelet
[kubelet-start] Waiting for the kubelet to perform the TLS Bootstrap...
This node has joined the cluster:
* Certificate signing request was sent to apiserver and a response was received.
* The Kubelet was informed of the new secure connection details.
Run 'kubectl get nodes' on the control-plane to see this node join the cluster.
在当前机器上,执行命令,会发现无法连接本地 api server:
$ k get nodes
The connection to the server localhost:8080 was refused - did you specify the right host or port?
在另一台机器上执行命令,可以看到这个节点添加成功:
$ k get nodes
NAME STATUS ROLES AGE VERSION
skyserver Ready control-plane,master 11h v1.23.1
skyserver2 Ready <none> 4m1s v1.23.1
错误处理
pod无法启动
发现有调度到某个节点的pod无法启动,一直卡在 ContainerCreating 上:
$ get pods -A
NAMESPACE NAME READY STATUS RESTARTS AGE
kubernetes-dashboard dashboard-metrics-scraper-799d786dbf-6wksz 0/1 ContainerCreating 0 8h
查看该pod信息发现调度到node skywork2,然后报错 "cni0" already has an IP address different from 10.244.2.1/24
:
k describe pods dashboard-metrics-scraper-799d786dbf-hqlg6 -n kubernetes-dashboard
Name: dashboard-metrics-scraper-799d786dbf-hqlg6
Namespace: kubernetes-dashboard
Priority: 0
Node: skywork2/192.168.0.20
......
Warning FailedCreatePodSandBox 17s (x4 over 20s) kubelet (combined from similar events): Failed to create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "41479d55f5428ec9a36267170dd1516f996bcf9d49f772d98c2fc79230f64830" network for pod "dashboard-metrics-scraper-799d786dbf-hqlg6": networkPlugin cni failed to set up pod "dashboard-metrics-scraper-799d786dbf-hqlg6_kubernetes-dashboard" network: failed to delegate add: failed to set bridge addr: "cni0" already has an IP address different from 10.244.2.1/24
这是因为之前这个节点在 kubeadm join
之前,做过 kubeadm init
,在 kebeadm reset
之后残余了部分网络配置。
解决的方法是彻底的重置网络再join, 操作如下:
sudo -i
kubeadm reset -f
systemctl stop kubelet
systemctl stop docker
rm -rf /var/lib/cni/
rm -rf /var/lib/kubelet/*
rm -rf /etc/cni/
rm -rf /etc/kubernetes/
ifconfig cni0 down
ifconfig flannel.1 down
ifconfig docker0 down
ip link delete cni0
ip link delete flannel.1
systemctl start docker
systemctl start kubelet
在清理干净之后再次执行 kubeadm join
即可。
备注: 发现在节点执行
kubeadm reset
之后,在master节点上执行kebuctr get nodes
时这个节点信息迟迟不能剔除。安全起见可以手工执行一次kebuctl delete nodes skywork2
参考资料: