在ubuntu上安装kubenetes
- 1: ubuntu22.04上用kubeadm安装kubenetes
- 2: Kubernetes安装后配置
- 3: 通过kubeadm join增加节点
- 4: 部署并访问Dashboard
- 5: 部署 metrics-server
- 6: ubuntu 20.04下用 kubeadm 安装 kubenetes
1 - ubuntu22.04上用kubeadm安装kubenetes
以 ubuntu server 22.04 为例,参考 Kubernetes 官方文档:
前期准备
检查 docker 版本
注意
暂时固定使用 docker 和 k8s 的特定版本搭配:
- docker: 20.10.21
- k8s: 1.23.14
具体原因请见最下面的解释。
检查 container 配置
sudo vi /etc/containerd/config.toml
确保文件不存在或者一下这行内容被注释:
# disabled_plugins = ["cri"]
修改之后需要重启 containerd:
sudo systemctl restart containerd.service
备注:如果不做这个修改,k8s 安装时会报错 “CRI v1 runtime API is not implemented”。
禁用虚拟内存swap
执行 free -m
命令检测:
$ free -m
total used free shared buff/cache available
Mem: 15896 1665 11376 20 2854 13819
Swap: 0 0 0
如果Swap这一行不是0,则说明虚拟内存swap被开启了,需要关闭。
需要做两个事情:
-
操作系统安装时就不要设置swap分区,如果有,删除该swap分区
-
即使没有swap分区,也会开启swap,需要通过
sudo vi /etc/fstab
找到swap 这一行:# 在swap分区这行前加 # 禁用掉swap /swapfile none swap sw 0 0
重启之后再用
free -m
命令检测。
安装kubeadm
切记
想办法搞定全局翻墙,不然 kubeadm 安装是非常麻烦的。执行如下命令:
sudo apt-get update
sudo apt-get install -y apt-transport-https ca-certificates curl
curl -fsSL https://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo gpg --dearmor -o /etc/apt/keyrings/kubernetes-archive-keyring.gpg
echo "deb [signed-by=/etc/apt/keyrings/kubernetes-archive-keyring.gpg] https://apt.kubernetes.io/ kubernetes-xenial main" | sudo tee /etc/apt/sources.list.d/kubernetes.list
sudo apt-get update
安装最新版本
sudo apt-get install -y kubelet kubeadm kubectl
安装完成后
kubectl version --output=yaml
查看 kubectl 版本:
clientVersion:
buildDate: "2023-06-14T09:53:42Z"
compiler: gc
gitCommit: 25b4e43193bcda6c7328a6d147b1fb73a33f1598
gitTreeState: clean
gitVersion: v1.27.3
goVersion: go1.20.5
major: "1"
minor: "27"
platform: linux/amd64
kustomizeVersion: v5.0.1
The connection to the server localhost:8080 was refused - did you specify the right host or port?
查看 kubeadm 版本:
kubeadm version
kubeadm version: &version.Info{Major:"1", Minor:"27", GitVersion:"v1.27.3", GitCommit:"25b4e43193bcda6c7328a6d147b1fb73a33f1598", GitTreeState:"clean", BuildDate:"2023-06-14T09:52:26Z", GoVersion:"go1.20.5", Compiler:"gc", Platform:"linux/amd64"}
查看 kubelet 版本:
kubelet --version
Kubernetes v1.27.3
安装特定版本
如果希望安装特定版本:
sudo apt-get install kubelet=1.23.14-00 kubeadm=1.23.14-00 kubectl=1.23.14-00
具体有哪些可用的版本,可以看这里:
https://packages.cloud.google.com/apt/dists/kubernetes-xenial/main/binary-amd64/Packages
安装k8s
参考:https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/create-cluster-kubeadm/
同样切记
想办法搞定全局翻墙。sudo kubeadm init --pod-network-cidr=10.244.0.0/16 -v=9
sudo kubeadm init --pod-network-cidr=10.244.0.0/16 --apiserver-advertise-address=192.168.100.40 -v=9
注意后面为了使用 CNI network 和 Flannel,我们在这里设置了 --pod-network-cidr=10.244.0.0/16
,如果不加这个设置,Flannel 会一直报错。如果机器上有多个网卡,可以用 --apiserver-advertise-address
指定要使用的IP地址。
kubeadm init 输出如下:
......
Your Kubernetes control-plane has initialized successfully!
To start using your cluster, you need to run the following as a regular user:
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
Alternatively, if you are the root user, you can run:
export KUBECONFIG=/etc/kubernetes/admin.conf
You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
https://kubernetes.io/docs/concepts/cluster-administration/addons/
Then you can join any number of worker nodes by running the following on each as root:
kubeadm join 192.168.0.57:6443 --token gwr923.gctdq2sr423mrwp7 \
--discovery-token-ca-cert-hash sha256:ad86f4eb0d430fc1bdf784ae655dccdcb14881cd4ca8d03d84cd2135082c4892
为了使用普通用户,按照上面的提示执行:
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
安装完成后,node处于NotReady状态:
$ kubectl get node
NAME STATUS ROLES AGE VERSION
skyserver NotReady control-plane,master 3m7s v1.23.5
kubectl describe 可以看到是因为没有安装 network plugin
$ kubectl describe node ubuntu2204
Name: ubuntu2204
Roles: control-plane
......
Ready False Wed, 28 Jun 2023 16:53:27 +0000 Wed, 28 Jun 2023 16:52:41 +0000 KubeletNotReady container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: cni plugin not initialized
安装 flannel 作为 pod network add-on:
kubectl apply -f https://raw.githubusercontent.com/flannel-io/flannel/master/Documentation/kube-flannel.yml
备注:有时会遇到 raw.githubusercontent.com 这个域名被污染,解析为 127.0.0.1,导致无法访问。解决方法是访问 https://ipaddress.com/website/raw.githubusercontent.com 然后查看可用的IP地址,找一个速度最快的,在
/etc/hosts
文件中加入一行记录即可,如185.199.111.133 raw.githubusercontent.com
。
稍等就可以看到 node 的状态变为 Ready了:
$ kubectl get node
NAME STATUS ROLES AGE VERSION
skyserver Ready control-plane,master 4m52s v1.23.5
最后,如果是测试用的单节点,为了让负载可以跑在 k8s 的 master 节点上,执行下列命令去除 master/control-plane 的污点:
# 以前的污点名为 master
# kubectl taint nodes --all node-role.kubernetes.io/master-
# 新版本污点名改为 control-plane (master政治不正确)
kubectl taint nodes --all node-role.kubernetes.io/control-plane-
可以通过 kubectl describe node skyserver
对比去除污点前后 node 信息中的 Taints 部分,去除污点前:
Taints: node.kubernetes.io/not-ready:NoExecute
node-role.kubernetes.io/master:NoSchedule
node.kubernetes.io/not-ready:NoSchedule
去除污点后:
Taints: <none>
常见问题
CRI v1 runtime API is not implemented
如果类似的报错(新版本):
[preflight] Some fatal errors occurred:
[ERROR CRI]: container runtime is not running: output: time="2023-06-28T16:12:49Z" level=fatal msg="validate service connection: CRI v1 runtime API is not implemented for endpoint \"unix:///var/run/containerd/containerd.sock\": rpc error: code = Unimplemented desc = unknown service runtime.v1.RuntimeService"
, error: exit status 1
或者报错(老一些的版本):
[preflight] Some fatal errors occurred:
[ERROR CRI]: container runtime is not running: output: E1125 11:16:01.799551 14661 remote_runtime.go:948] "Status from runtime service failed" err="rpc error: code = Unimplemented desc = unknown service runtime.v1alpha2.RuntimeService"
time="2022-11-25T11:16:01+08:00" level=fatal msg="getting status of runtime: rpc error: code = Unimplemented desc = unknown service runtime.v1alpha2.RuntimeService"
, error: exit status 1
这都是因为 containerd 的默认配置文件中 disable 了 CRI 的原因,可以打开文件 /etc/containerd/config.toml
看到这行
disabled_plugins = ["cri"]
将这行注释之后,重启 containerd :
sudo systemctl restart containerd.service
之后重新尝试 kubeadm init。
参考:
控制平面不启动或者异常重启
安装最新版本(1.27 / 1.25)完成显示成功,但是控制平面没有启动,6443 端口无法连接:
k get node
E0628 16:34:50.966940 6581 memcache.go:265] couldn't get current server API group list: Get "https://192.168.0.57:6443/api?timeout=32s": read tcp 192.168.0.57:41288->192.168.0.1:7890: read: connection reset by peer - error from a previous attempt: read tcp 192.168.0.57:41276->192.168.0.1:7890: read: connection reset by peer
使用中发现控制平面经常不稳定, 大量的 pod 在反复重启,日志中有提示:pod sandbox changed。
记录测试验证有问题的版本:
- kubeadm: 1.27.3 / 1.25.6
- kubelet:1.27.3 / 1.25.6
- docker: 24.0.2 / 20.10.21
尝试回退 docker 版本,k8s 1.27 的 changelog 中,
https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.27.md 提到的 docker 版本是 v20.10.21 (incompatible 是什么鬼?) :
github.com/docker/docker: v20.10.18+incompatible → v20.10.21+incompatible
这个 v20.10.21 版本我翻了一下我之前的安装记录,非常凑巧之前是有使用这个 docker 版本的,而且稳定没出问题。因此考虑换到这个版本:
VERSION_STRING=5:20.10.21~3-0~ubuntu-jammy
sudo apt-get install docker-ce=$VERSION_STRING docker-ce-cli=$VERSION_STRING containerd.io docker-buildx-plugin docker-compose-plugin
k8s 暂时固定选择 1.23.14 这个经过验证的版本:
sudo apt-get install kubelet=1.23.14-00 kubeadm=1.23.14-00 kubectl=1.23.14-00
备注: 1.27.3 / 1.25.6 这两个 k8s 的版本都验证过会有问题,暂时不清楚原因,先固定用 1.23.14。
后续再排查。
失败重来
如果遇到安装失败,需要重新开始,或者想铲掉现有的安装,则可以:
- 运行
kubeadm reset
- 删除
.kube
目录 - 再次执行
kubeadm init
如果网络设置有改动,则需要彻底的重置网络。具体见下一章。
将节点加入到集群
如果有多个kubenetes节点(即多台机器),则需要将其他节点加入到集群中。具体见下一章。
2 - Kubernetes安装后配置
配置 kubectl 自动完成
zsh配置
mac默认使用zsh,为了实现 kubectl 的自动完成功能,需要在 ~/.zshrc
中增加以下内容:
# 注意这一行要加在文件的最前面
autoload -Uz compinit && compinit -i
......
# k8s auto complete
source <(kubectl completion zsh)
alias k=kubectl
complete -F __start_kubectl k
同时为了使用方便,为 kubectl 增加了 k 的别名,同样也为 k 增加了 自动完成功能。
使用oh-my-zsh
在使用 oh-my-zsh 之后,会更加的简单(强烈推荐使用 oh-my-zsh ),只要在 oh-my-zsh 的 plugins 列表中增加 kubectl 即可。
然后,在 ~/.zshrc
中增加以下内容:
# k8s auto complete
alias k=kubectl
complete -F __start_kubectl k
source ~/.zshrc
之后即可使用,此时用 k 这个别名来执行 kubectl 命令时也可以实现自动完成,非常的方便。
显示正在使用的kubectl上下文
https://github.com/ohmyzsh/ohmyzsh/tree/master/plugins/kubectx
这个插件增加了 kubectx_prompt_info()函数。它显示正在使用的 kubectl context 的名称(kubectl config current-context
)。
你可以用它来定制提示,并知道你是否在prod集群上.
使用方式为修改 ~/.zshrc
:
- 在 plugins 中增加 “kubectx”
- 增加一行
RPS1='$(kubectx_prompt_info)'
source ~/.zshrc
之后即可生效,会在命令行的最右侧显示出kubectl context 的名称,默认情况下 kubectl config current-context
的输出是 “kubernetes-admin@kubernetes”。
如果需要更友好的显示,则可以将名字映射为可读性更强的标记,如 dev, stage, prod:
kubectx_mapping[kubernetes-admin@kubernetes]="dev"
备注: 在多个k8s环境下切换时应该很有用,后续有需要再研究。
从其他机器上操作k8s集群
如果k8s安装在本机,则相应的 kubectl
等命令行工具在安装过程中都在本地准备就绪,而且 kubeadm init
命令在安装完毕之后会提示:
To start using your cluster, you need to run the following as a regular user:
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
Alternatively, if you are the root user, you can run:
export KUBECONFIG=/etc/kubernetes/admin.conf
安装上面的提示操作之后,就可以在本地通过 kubectl
命令行工具操作安装的k8s集群。
如果我们希望从其他机器上方便的操作k8s集群,而不是限制要先ssh登录到安装k8s控制平面的机器上,则可以简单的在这台机器上安装kubectl并配置好kubeconf文件。
步骤如下:
-
安装 kubectl:和前面的不走类似
-
配置 kubeconf
mkdir -p $HOME/.kube # 复制集群的config文件到这台机器 cp -i /path/to/cluster/config $HOME/.kube/config
如果有多个k8s集群需要操作,则可以在执行 kubectl
命令时通过 --kubeconfig
参数指定要使用的 kubeconf 文件:
kubectl --kubeconfig /home/sky/.kube/skyserver get nodes
每次都输入 “–kubeconfig /home/sky/.kube/skyserver” 会很累,可以通过设置临时的环境变量来在当前终端下选择kubeconf文件,如:
export KUBECONFIG=$HOME/.kube/skyserver
k get nodes
# 不需要用时,关闭终端或者unset
unset KUBECONFIG
如果需要同时操作多个集群,需要在多个集群之间反复切换,则应该使用context来灵活切换,参考:
取消docker和k8s的更新
通过 apt 方式安装的 docker 和 k8s,会在 apt upgrade 时自动升级到最新版本,这未必安全,通常也没有必要。
可以考虑取消docker和k8s的的 apt 更新,cd /etc/apt/sources.list.d
,将 docker 和 k8s 的ppa配置文件内容用 “#” 注释掉就可以了。需要时可以重新打开。
3 - 通过kubeadm join增加节点
参考 Kubernetes 官方文档:
- https://kubernetes.io/docs/setup/independent/create-cluster-kubeadm/
- https://kubernetes.io/zh/docs/reference/setup-tools/kubeadm/kubeadm-join/ : 上文的中文版本
准备工作
通过 kubeadmin init
命令安装k8s时,会有如下提示:
Then you can join any number of worker nodes by running the following on each as root:
sudo kubeadm join 192.168.0.41:6443 --token 5ezixq.itmxvdgey8uduysr \
--discovery-token-ca-cert-hash sha256:d641cec650bdee479a3e7479b558ab68886f7c41ef89f2857099776ed72bcaae
这里用到的 token 可以通过 kubeadm token list
命令获取:
$ kubeadm token list
TOKEN TTL EXPIRES USAGES DESCRIPTION EXTRA GROUPS
5ezixq.itmxvdgey8uduysr 12h 2021-12-28T04:22:54Z authentication,signing The default bootstrap token generated by 'kubeadm init'. system:bootstrappers:kubeadm:default-node-token
由于 token 的有效期(TTL)通常不是很久(默认12小时),因此可能会出现没有可用的token的情况。此时需要在该集群上创建新的token(注意需要登录到集群的控制平面所在的节点上执行命令,因为后面会读取本地文件):
$ kubeadm token create
omkq4t.v6nnkj4erms2ipyf
$ kubeadm token list
TOKEN TTL EXPIRES USAGES DESCRIPTION EXTRA GROUPS
omkq4t.v6nnkj4erms2ipyf 23h 2021-12-29T09:19:23Z authentication,signing <none> system:bootstrappers:kubeadm:default-node-token
discovery-token-ca-cert-hash 可以通过下面的命令生成:
openssl x509 -pubkey -in /etc/kubernetes/pki/ca.crt | openssl rsa -pubin -outform der 2>/dev/null | openssl dgst -sha256 -hex | sed 's/^.* //'
d641cec650bdee479a3e7479b558ab68886f7c41ef89f2857099776ed72bcaae
执行kubeadm join
输出如下:
$ sudo kubeadm join 192.168.0.41:6443 --token 5ezixq.itmxvdgey8uduysr \
--discovery-token-ca-cert-hash sha256:d641cec650bdee479a3e7479b558ab68886f7c41ef89f2857099776ed72bcaae
[preflight] Running pre-flight checks
[preflight] Reading configuration from the cluster...
[preflight] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'
W1228 00:04:48.056252 78445 utils.go:69] The recommended value for "resolvConf" in "KubeletConfiguration" is: /run/systemd/resolve/resolv.conf; the provided value is: /run/systemd/resolve/resolv.conf
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Starting the kubelet
[kubelet-start] Waiting for the kubelet to perform the TLS Bootstrap...
This node has joined the cluster:
* Certificate signing request was sent to apiserver and a response was received.
* The Kubelet was informed of the new secure connection details.
Run 'kubectl get nodes' on the control-plane to see this node join the cluster.
在当前机器上,执行命令,会发现无法连接本地 api server:
$ k get nodes
The connection to the server localhost:8080 was refused - did you specify the right host or port?
在另一台机器上执行命令,可以看到这个节点添加成功:
$ k get nodes
NAME STATUS ROLES AGE VERSION
skyserver Ready control-plane,master 11h v1.23.1
skyserver2 Ready <none> 4m1s v1.23.1
错误处理
pod无法启动
发现有调度到某个节点的pod无法启动,一直卡在 ContainerCreating 上:
$ get pods -A
NAMESPACE NAME READY STATUS RESTARTS AGE
kubernetes-dashboard dashboard-metrics-scraper-799d786dbf-6wksz 0/1 ContainerCreating 0 8h
查看该pod信息发现调度到node skywork2,然后报错 "cni0" already has an IP address different from 10.244.2.1/24
:
k describe pods dashboard-metrics-scraper-799d786dbf-hqlg6 -n kubernetes-dashboard
Name: dashboard-metrics-scraper-799d786dbf-hqlg6
Namespace: kubernetes-dashboard
Priority: 0
Node: skywork2/192.168.0.20
......
Warning FailedCreatePodSandBox 17s (x4 over 20s) kubelet (combined from similar events): Failed to create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "41479d55f5428ec9a36267170dd1516f996bcf9d49f772d98c2fc79230f64830" network for pod "dashboard-metrics-scraper-799d786dbf-hqlg6": networkPlugin cni failed to set up pod "dashboard-metrics-scraper-799d786dbf-hqlg6_kubernetes-dashboard" network: failed to delegate add: failed to set bridge addr: "cni0" already has an IP address different from 10.244.2.1/24
这是因为之前这个节点在 kubeadm join
之前,做过 kubeadm init
,在 kebeadm reset
之后残余了部分网络配置。
解决的方法是彻底的重置网络再join, 操作如下:
sudo -i
kubeadm reset -f
systemctl stop kubelet
systemctl stop docker
rm -rf /var/lib/cni/
rm -rf /var/lib/kubelet/*
rm -rf /etc/cni/
rm -rf /etc/kubernetes/
ifconfig cni0 down
ifconfig flannel.1 down
ifconfig docker0 down
ip link delete cni0
ip link delete flannel.1
systemctl start docker
systemctl start kubelet
在清理干净之后再次执行 kubeadm join
即可。
备注: 发现在节点执行
kubeadm reset
之后,在master节点上执行kebuctr get nodes
时这个节点信息迟迟不能剔除。安全起见可以手工执行一次kebuctl delete nodes skywork2
参考资料:
4 - 部署并访问Dashboard
参考资料:
部署dashboard
在下面地址上查看当前dashboard的版本:
https://github.com/kubernetes/dashboard/releases
根据对kubernetes版本的兼容情况选择对应的dashboard的版本:
- dashboard 2.7 : 全面兼容 k8s 1.25
- dashboard 2.6.1 : 全面兼容 k8s 1.24
- dashboard 2.5.1: 全面兼容 k8s 1.23
通过如下命令部署:
kubectl apply -f https://raw.githubusercontent.com/kubernetes/dashboard/v2.5.1/aio/deploy/recommended.yaml
其中版本号可以查看 https://github.com/kubernetes/dashboard/releases
部署成功之后,可以看到 kubernetes-dashboard 相关的两个pod:
$ k get pods -A
NAMESPACE NAME READY STATUS RESTARTS AGE
kubernetes-dashboard dashboard-metrics-scraper-799d786dbf-krhln 1/1 Running 0 11m
kubernetes-dashboard kubernetes-dashboard-6b6b86c4c5-ptstx 1/1 Running 0 8h
和 kubernetes-dashboard 相关的两个service:
$ k get services -A
NAMESPACE NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kubernetes-dashboard dashboard-metrics-scraper ClusterIP 10.103.242.118 <none> 8000/TCP 8h
kubernetes-dashboard kubernetes-dashboard ClusterIP 10.106.3.227 <none> 443/TCP 8h
访问dashboard
参考官方文章: https://github.com/kubernetes/dashboard/blob/master/docs/user/accessing-dashboard/README.md
前面部署 dashboard 时使用的是 recommended 配置,和文章要求一致。
当前集群信息如下:
$ kubectl cluster-info
Kubernetes control plane is running at https://192.168.0.41:6443
CoreDNS is running at https://192.168.0.41:6443/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy
kubectl proxy
直接 kubectl proxy
启动的是本地代理服务器,只能通过 localhost 访问,这个只适合本地单集群使用:
$ k proxy
Starting to serve on 127.0.0.1:8001
kubectl port-forward
$ kubectl port-forward -n kubernetes-dashboard service/kubernetes-dashboard 8080:443
Forwarding from 127.0.0.1:8080 -> 8443
Forwarding from [::1]:8080 -> 8443
类似的,也只能本地访问 https://localhost:8080 。
NodePort
执行:
kubectl -n kubernetes-dashboard edit service kubernetes-dashboard
修改 type: ClusterIP
为 type: NodePort
:
apiVersion: v1
...
name: kubernetes-dashboard
namespace: kubernetes-dashboard
resourceVersion: "343478"
selfLink: /api/v1/namespaces/kubernetes-dashboard/services/kubernetes-dashboard
uid: 8e48f478-993d-11e7-87e0-901b0e532516
spec:
clusterIP: 10.100.124.90
externalTrafficPolicy: Cluster
ports:
- port: 443
protocol: TCP
targetPort: 8443
selector:
k8s-app: kubernetes-dashboard
sessionAffinity: None
type: ClusterIP
看一下具体分配的 node port 是哪个:
$ kubectl -n kubernetes-dashboard get service kubernetes-dashboard
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kubernetes-dashboard NodePort 10.106.3.227 <none> 443:32212/TCP 9h
可以看到这里分配的是 32212 端口。
然后就是 node 的 ip 地址了,如果是单节点的集群,那么 node ip 就固定为 master node 的IP,可以通过 kubectl cluster-info
获取。如果是多节点的集群,则需要找到 kubernetes-dashboard 服务被部署到了哪个节点。
$ k get pods -A -o wide | grep kubernetes-dashboard
kubernetes-dashboard dashboard-metrics-scraper-799d786dbf-krhln 1/1 Running 0 32m 10.244.1.3 skyserver2 <none> <none>
kubernetes-dashboard kubernetes-dashboard-6b6b86c4c5-ptstx 1/1 Running 0 9h 10.244.1.2 skyserver2 <none> <none>
如图 kubernetes-dashboard 服务被部署到了 skyserver2 节点,skyserver2 的 IP 是 192.168.0.50,则拼合起来的地址是
https://192.168.0.50:32212
或者为了方便起见,将每台node的名字和IP地址绑定,通过 sudo vi /ete/hosts
修改hosts文件,增加以下内容:
# node IP
192.168.0.10 skywork
192.168.0.20 skywork2
192.168.0.40 skyserver
192.168.0.50 skyserver2
之后就可以通过 https://skyserver2:32212 访问了。
特别事项:浏览器对自签名证书网站的访问处理
使用浏览器访问该地址时,可以连接上,但是浏览器会因为网站使用的是自签名证书而报错 “此网站连接不安全” 拒绝访问。
各浏览器的处理:
- edag:拒绝访问,可以使用魔术短语:
thisisunsafe
(没有输入框,只要单击该页面以确保它具有焦点,然后键盘输入即可) - firefox:默认拒绝,选择"接受风险并继续"后可以正常访问
- Chrome:待测试,应该可以使用魔术短语:
thisisunsafe
- Safari: 默认拒绝,点击 “Show details” -> “visit this website” -> “visit website” 可以绕开限制继续访问
参考:
登录Dashboard
通过token登录
token可以通过下面的命令简单获取到:
kubectl -n kube-system describe $(kubectl -n kube-system get secret -n kube-system -o name | grep namespace) | grep token
输出为:
$ kubectl -n kube-system describe $(kubectl -n kube-system get secret -n kube-system -o name | grep namespace) | grep token
Name: namespace-controller-token-r87br
Type: kubernetes.io/service-account-token
token: eyJhbGciOiJSUzI1NiIsImtpZCI6ImNuYUVPT3FRR0dVOFBmN3pFeW81Y1p5R004RVh6VGtJUUpfSHo1ZVFMUVEifQ.eyJpc3MiOiJrdWJlcm5ldGVzL3NlcnZpY2VhY2NvdW50Iiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9uYW1lc3BhY2UiOiJrdWJlLXN5c3RlbSIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VjcmV0Lm5hbWUiOiJuYW1lc3BhY2UtY29udHJvbGxlci10b2tlbi1yODdiciIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VydmljZS1hY2NvdW50Lm5hbWUiOiJuYW1lc3BhY2UtY29udHJvbGxlciIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VydmljZS1hY2NvdW50LnVpZCI6ImU2NjU3ODI3LTc4NTUtNDAzOC04MmJjLTlmMjI0OWM3NzYyZiIsInN1YiI6InN5c3RlbTpzZXJ2aWNlYWNjb3VudDprdWJlLXN5c3RlbTpuYW1lc3BhY2UtY29udHJvbGxlciJ9.sVRT_x5NB4sqYwyyqn2Mm3hKg1jhvCsCDMbm_JY-3a19tknzwv_ZPpGOHWrPxmCG45_-tHExi7BbbGK1ZAky2UjtEpxmtVNR6yqHRMYvXtqifqHI4yS6ig-t5WiZ0a4h1q6xZfWsM9nlINSTGQbguCCN2kXUYyAZ0HPdPhdFtmyH9_fjI-FXQOPeK9t9GfWn9Nm52T85spzriwOMY96fFXZ3YaiuzfY5aBtGoxLwDu7O2GOazBmeFaRzEEGR0RjgdM7WPFmtDvbaidIJDPkLznqftqwUFeWHjz6-toO8iaKW_QKHFBvZTQ6uXSc__tbcSYyThu3Ty97-Ml8TArhacw
复制这里的 token 提交就可以登录。
参考:
通过kubeconf文件登录
在 kebeconf 文件(路径为 ~/.kube/config
)中加入 token 信息:
apiVersion: v1
clusters:
- cluster:
certificate-authority-data: XXXXXX==
server: https://192.168.0.41:6443
name: kubernetes
contexts:
- context:
cluster: kubernetes
user: kubernetes-admin
name: kubernetes-admin@kubernetes
current-context: kubernetes-admin@kubernetes
kind: Config
preferences: {}
users:
- name: kubernetes-admin
user:
client-certificate-data: XXXXX==
client-key-data: XXXX=
token: eyJhbGciOiJSUzI1NiIsImtpZCI6ImNuYUVPT3FRR0dVOFBmN3pFeW81Y1p5R004RVh6VGtJUUpfSHo1ZVFMUVEifQ.eyJpc3MiOiJrdWJlcm5ldGVzL3NlcnZpY2VhY2NvdW50Iiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9uYW1lc3BhY2UiOiJrdWJlLXN5c3RlbSIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VjcmV0Lm5hbWUiOiJuYW1lc3BhY2UtY29udHJvbGxlci10b2tlbi1yODdiciIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VydmljZS1hY2NvdW50Lm5hbWUiOiJuYW1lc3BhY2UtY29udHJvbGxlciIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VydmljZS1hY2NvdW50LnVpZCI6ImU2NjU3ODI3LTc4NTUtNDAzOC04MmJjLTlmMjI0OWM3NzYyZiIsInN1YiI6InN5c3RlbTpzZXJ2aWNlYWNjb3VudDprdWJlLXN5c3RlbTpuYW1lc3BhY2UtY29udHJvbGxlciJ9.sVRT_x5NB4sqYwyyqn2Mm3hKg1jhvCsCDMbm_JY-3a19tknzwv_ZPpGOHWrPxmCG45_-tHExi7BbbGK1ZAky2UjtEpxmtVNR6yqHRMYvXtqifqHI4yS6ig-t5WiZ0a4h1q6xZfWsM9nlINSTGQbguCCN2kXUYyAZ0HPdPhdFtmyH9_fjI-FXQOPeK9t9GfWn9Nm52T85spzriwOMY96fFXZ3YaiuzfY5aBtGoxLwDu7O2GOazBmeFaRzEEGR0RjgdM7WPFmtDvbaidIJDPkLznqftqwUFeWHjz6-toO8iaKW_QKHFBvZTQ6uXSc__tbcSYyThu3Ty97-Ml8TArhacw
默认生成的kebuconf文件是不带 token 字段的,加上即可。
然后在页面上提交这个 kebuconf 文件即可登录。相比token登录方式,不需要每次去获取token内容,一次保存之后以后方便很多。
5 - 部署 metrics-server
安装 metrics-server
通过 kubeadm 安装的 k8s 集群默认是没有安装 metrics-server,因此需要手工安装。
注意:不要按照官方文档所说的那样直接安装,会不可用的。
修改 api server
先检查 k8s 集群的 api server 是否有启用API Aggregator:
ps -ef | grep apiserver
对比:
ps -ef | grep apiserver | grep enable-aggregator-routing
默认是没有开启的。因此需要修改 k8s apiserver 的配置文件:
sudo vi /etc/kubernetes/manifests/kube-apiserver.yaml
增加 --enable-aggregator-routing=true
apiVersion: v1
kind: Pod
......
spec:
containers:
- command:
- kube-apiserver
......
- --enable-bootstrap-token-auth=true
- --enable-aggregator-routing=true # 增加这行
api server 会自动重启,稍后用命令验证一下:
ps -ef | grep apiserver | grep enable-aggregator-routing
下载并修改安装文件
先下载安装文件,直接用最新版本:
wget https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml
如果要安装指定版本,请查看 https://github.com/kubernetes-sigs/metrics-server/releases/ 页面。
修改下载下来的 components.yaml, 增加 --kubelet-insecure-tls
并修改 --kubelet-preferred-address-types
:
template:
metadata:
labels:
k8s-app: metrics-server
spec:
containers:
- args:
- --cert-dir=/tmp
- --secure-port=4443
- --kubelet-preferred-address-types=InternalIP # 修改这行,默认是InternalIP,ExternalIP,Hostname
- --kubelet-use-node-status-port
- --metric-resolution=15s
- --kubelet-insecure-tls # 增加这行
然后安装:
$ k apply -f components.yaml
serviceaccount/metrics-server created
clusterrole.rbac.authorization.k8s.io/system:aggregated-metrics-reader created
clusterrole.rbac.authorization.k8s.io/system:metrics-server created
rolebinding.rbac.authorization.k8s.io/metrics-server-auth-reader created
clusterrolebinding.rbac.authorization.k8s.io/metrics-server:system:auth-delegator created
clusterrolebinding.rbac.authorization.k8s.io/system:metrics-server created
service/metrics-server created
deployment.apps/metrics-server created
apiservice.apiregistration.k8s.io/v1beta1.metrics.k8s.io created
稍等片刻看是否启动:
$ kubectl get pod -n kube-system | grep metrics-server
metrics-server-5979f785c8-lmtq5 1/1 Running 0 46s
验证一下,查看 service 信息
$ kubectl describe svc metrics-server -n kube-system
Name: metrics-server
Namespace: kube-system
Labels: k8s-app=metrics-server
Annotations: <none>
Selector: k8s-app=metrics-server
Type: ClusterIP
IP Family Policy: SingleStack
IP Families: IPv4
IP: 10.98.127.10
IPs: 10.98.127.10
Port: https 443/TCP
TargetPort: https/TCP
Endpoints: 10.244.0.37:4443 # ping 一下这个 IP 地址
Session Affinity: None
Events: <none>
使用
简单验证一下基本使用。
$ kubectl top nodes
NAME CPU(cores) CPU% MEMORY(bytes) MEMORY%
skyserver 384m 1% 1687Mi 1%
$ kubectl top pods -n kube-system
NAME CPU(cores) MEMORY(bytes)
coredns-64897985d-9z82d 2m 19Mi
coredns-64897985d-wkzc7 2m 20Mi
etcd-skyserver 23m 77Mi
kube-apiserver-skyserver 74m 282Mi
kube-controller-manager-skyserver 24m 58Mi
kube-flannel-ds-lnl72 4m 39Mi
kube-proxy-8g26s 1m 37Mi
kube-scheduler-skyserver 5m 23Mi
metrics-server-5979f785c8-lmtq5 4m 21Mi
6 - ubuntu 20.04下用 kubeadm 安装 kubenetes
参考 Kubernetes 官方文档:
前期准备
关闭防火墙
systemctl disable firewalld && systemctl stop firewalld
安装docker和bridge-utils
要求节点上安装有 docker (或者其他container runtime)和 bridge-utils (用来操作linux bridge).
查看 docker 版本:
$ docker --version
Docker version 20.10.21, build baeda1f
bridge-utils可以通过apt安装:
sudo apt-get install bridge-utils
设置iptables
要确保 br_netfilter
模块已经加载,可以通过运行 lsmod | grep br_netfilter
来完成。
$ lsmod | grep br_netfilter
br_netfilter 32768 0
bridge 307200 1 br_netfilter
如需要明确加载,请调用 sudo modprobe br_netfilter
。
为了让作为Linux节点的iptables能看到桥接流量,应该确保 net.bridge.bridge-nf-call-iptables
在 sysctl 配置中被设置为1,执行命令:
cat <<EOF | sudo tee /etc/modules-load.d/k8s.conf
br_netfilter
EOF
cat <<EOF | sudo tee /etc/sysctl.d/k8s.conf
net.bridge.bridge-nf-call-ip6tables = 1
net.bridge.bridge-nf-call-iptables = 1
EOF
sudo sysctl --system
禁用虚拟内存swap
执行 free -m
命令检测:
$ free -m
total used free shared buff/cache available
Mem: 15896 1665 11376 20 2854 13819
Swap: 0 0 0
如果Swap这一行不是0,则说明虚拟内存swap被开启了,需要关闭。
需要做两个事情:
-
操作系统安装时就不要设置swap分区,如果有,删除该swap分区
-
即使没有swap分区,也会开启swap,需要通过
sudo vi /etc/fstab
找到swap 这一行:# 在swap分区这行前加 # 禁用掉swap /swapfile none swap sw 0 0
重启之后再用
free -m
命令检测。
设置docker的cgroup driver
docker 默认的 cgroup driver 是 cgroupfs,可以通过 docker info 命令查看:
$ docker info | grep "Cgroup Driver"
Cgroup Driver: cgroupfs
而 kubernetes 在v1.22版本之后,如果用户没有在 KubeletConfiguration 下设置 cgroupDriver 字段,则 kubeadm 将默认为 systemd
。
需要修改 docker 的 cgroup driver 为 systemd
, 方式为打开 docker 的配置文件(如果不存在则创建)
sudo vi /etc/docker/daemon.json
增加内容:
{
"exec-opts": ["native.cgroupdriver=systemd"]
}
修改完成后重启 docker:
systemctl restart docker
# 重启后检查一下
docker info | grep "Cgroup Driver"
否则,在安装过程中,由于 cgroup driver 的不一致,kubeadm init
命令会因为 kubelet 无法启动而超时失败,报错为:
[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests". This can take up to 4m0s
[kubelet-check] Initial timeout of 40s passed.
Unfortunately, an error has occurred:
timed out waiting for the condition
This error is likely caused by:
- The kubelet is not running
- The kubelet is unhealthy due to a misconfiguration of the node in some way (required cgroups disabled)
If you are on a systemd-powered system, you can try to troubleshoot the error with the following commands:
- 'systemctl status kubelet'
- 'journalctl -xeu kubelet'
执行 systemctl status kubelet
会发现 kubelet 因为报错而退出,执行 journalctl -xeu kubelet
会发现有如下的错误信息:
Dec 26 22:31:21 skyserver2 kubelet[132861]: I1226 22:31:21.438523 132861 docker_service.go:264] "Docker Info" dockerInfo=&{ID:AEON:SBVF:43UK:WASV:YIQK:QGGA:7RU3:IIDK:DV7M:6QLH:5ICJ:KT6R Containers:2 ContainersRunning:0 ContainersPaused:>
Dec 26 22:31:21 skyserver2 kubelet[132861]: E1226 22:31:21.438616 132861 server.go:302] "Failed to run kubelet" err="failed to run Kubelet: misconfiguration: kubelet cgroup driver: \"systemd\" is different from docker cgroup driver: \"c>
Dec 26 22:31:21 skyserver2 systemd[1]: kubelet.service: Main process exited, code=exited, status=1/FAILURE
-- Subject: Unit process exited
-- Defined-By: systemd
-- Support: http://www.ubuntu.com/support
--
-- An ExecStart= process belonging to unit kubelet.service has exited.
--
-- The process' exit code is 'exited' and its exit status is 1.
参考:
- https://kubernetes.io/docs/tasks/administer-cluster/kubeadm/configure-cgroup-driver/
- https://blog.51cto.com/riverxyz/2537914
安装kubeadm
切记
想办法搞定全局翻墙,不然kubeadm安装是非常麻烦的。按照官方文档的指示,执行如下命令:
sudo -i
apt-get update
apt-get install -y apt-transport-https ca-certificates curl
curl -fsSLo /usr/share/keyrings/kubernetes-archive-keyring.gpg https://packages.cloud.google.com/apt/doc/apt-key.gpg
echo "deb [signed-by=/usr/share/keyrings/kubernetes-archive-keyring.gpg] https://apt.kubernetes.io/ kubernetes-xenial main" | sudo tee /etc/apt/sources.list.d/kubernetes.list
apt-get update
apt-get install -y kubelet kubeadm kubectl
这会安装最新版本的kubernetes:
......
Setting up conntrack (1:1.4.6-2build2) ...
Setting up kubectl (1.25.4-00) ...
Setting up ebtables (2.0.11-4build2) ...
Setting up socat (1.7.4.1-3ubuntu4) ...
Setting up cri-tools (1.25.0-00) ...
Setting up kubernetes-cni (1.1.1-00) ...
Setting up kubelet (1.25.4-00) ...
Created symlink /etc/systemd/system/multi-user.target.wants/kubelet.service → /lib/systemd/system/kubelet.service.
Setting up kubeadm (1.25.4-00) ...
Processing triggers for man-db (2.10.2-1) ...
Processing triggers for doc-base (0.11.1) ...
Processing 1 added doc-base file...
# 查看版本
$ kubeadm version
kubeadm version: &version.Info{Major:"1", Minor:"25", GitVersion:"v1.25.4", GitCommit:"872a965c6c6526caa949f0c6ac028ef7aff3fb78", GitTreeState:"clean", BuildDate:"2022-11-09T13:35:06Z", GoVersion:"go1.19.3", Compiler:"gc", Platform:"linux/amd64"}
$ kubelet --version
Kubernetes v1.25.4
$ kubectl version
Client Version: version.Info{Major:"1", Minor:"25", GitVersion:"v1.25.4", GitCommit:"872a965c6c6526caa949f0c6ac028ef7aff3fb78", GitTreeState:"clean", BuildDate:"2022-11-09T13:36:36Z", GoVersion:"go1.19.3", Compiler:"gc", Platform:"linux/amd64"}
Kustomize Version: v4.5.7
The connection to the server localhost:8080 was refused - did you specify the right host or port?
如果希望安装特定版本:
apt-get install kubelet=1.23.5-00 kubeadm=1.23.5-00 kubectl=1.23.5-00
apt-get install kubelet=1.23.14-00 kubeadm=1.23.14-00 kubectl=1.23.14-00
apt-get install kubelet=1.24.8-00 kubeadm=1.24.8-00 kubectl=1.24.8-00
具体有哪些可用的版本,可以看这里:
https://packages.cloud.google.com/apt/dists/kubernetes-xenial/main/binary-amd64/Packages
由于 kubernetes 1.25 之后默认使用
安装k8s
同样切记
想办法搞定全局翻墙。sudo kubeadm init --pod-network-cidr=10.244.0.0/16 -v=9
sudo kubeadm init --pod-network-cidr=10.244.0.0/16 --apiserver-advertise-address=192.168.100.40 -v=9
注意后面为了使用 CNI network 和 Flannel,我们在这里设置了 --pod-network-cidr=10.244.0.0/16
,如果不加这个设置,Flannel 会一直报错。如果机器上有多个网卡,可以用 --apiserver-advertise-address
指定要使用的IP地址。
如果遇到报错:
[preflight] Some fatal errors occurred:
[ERROR CRI]: container runtime is not running: output: E1125 11:16:01.799551 14661 remote_runtime.go:948] "Status from runtime service failed" err="rpc error: code = Unimplemented desc = unknown service runtime.v1alpha2.RuntimeService"
time="2022-11-25T11:16:01+08:00" level=fatal msg="getting status of runtime: rpc error: code = Unimplemented desc = unknown service runtime.v1alpha2.RuntimeService"
, error: exit status 1
则可以执行下列命令之后重新尝试 kubeadm init:
$ rm -rf /etc/containerd/config.toml
$ systemctl restart containerd.service
kubeadm init 输出如下:
......
Your Kubernetes control-plane has initialized successfully!
To start using your cluster, you need to run the following as a regular user:
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
Alternatively, if you are the root user, you can run:
export KUBECONFIG=/etc/kubernetes/admin.conf
You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
https://kubernetes.io/docs/concepts/cluster-administration/addons/
Then you can join any number of worker nodes by running the following on each as root:
kubeadm join 192.168.100.40:6443 --token uq5nqn.bppygpcqty6icec4 \
--discovery-token-ca-cert-hash sha256:51c13871cd25b122f3a743040327b98b1c19466d01e1804aa2547c047b83632b
为了使用普通用户,按照上面的提示执行:
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
安装完成后,node处于NotReady状态:
$ kubectl get node
NAME STATUS ROLES AGE VERSION
skyserver NotReady control-plane,master 3m7s v1.23.5
kubectl describe 可以看到是因为没有安装 network plugin
$ kubectl describe node skyserver
Name: skyserver
Roles: control-plane,master
......
Ready False Thu, 24 Mar 2022 13:57:21 +0000 Thu, 24 Mar 2022 13:57:06 +0000 KubeletNotReady container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized
安装flannel:
kubectl apply -f https://raw.githubusercontent.com/flannel-io/flannel/master/Documentation/kube-flannel.yml
备注:有时会遇到 raw.githubusercontent.com 这个域名被污染,解析为 127.0.0.1,导致无法访问。解决方法是访问 https://ipaddress.com/website/raw.githubusercontent.com 然后查看可用的IP地址,找一个速度最快的,在
/etc/hosts
文件中加入一行记录即可,如185.199.111.133 raw.githubusercontent.com
。
稍等就可以看到 node 的状态变为 Ready了:
$ kubectl get node
NAME STATUS ROLES AGE VERSION
skyserver Ready control-plane,master 4m52s v1.23.5
最后,如果是测试用的单节点,为了让负载可以跑在k8s的master节点上,执行下列命令去除master的污点:
kubectl taint nodes --all node-role.kubernetes.io/master-
可以通过 kubectl describe node skyserver
对比去除污点前后 node 信息中的 Taints 部分,去除污点前:
Taints: node.kubernetes.io/not-ready:NoExecute
node-role.kubernetes.io/master:NoSchedule
node.kubernetes.io/not-ready:NoSchedule
去除污点后:
Taints: <none>
常见问题
有时会遇到 coredns pod无法创建的情况:
$ k get pods -A
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system coredns-64897985d-9z82d 0/1 ContainerCreating 0 82s
kube-system coredns-64897985d-wkzc7 0/1 ContainerCreating 0 82s
问题发生在 flannel 上:
$ k describe pods -n kube-system coredns-64897985d-9z82d
......
Warning FailedCreatePodSandBox 100s kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "675b91ac9d25f0385d3794847f47c94deac2cb712399c21da59cf90e7cccb246" network for pod "coredns-64897985d-9z82d": networkPlugin cni failed to set up pod "coredns-64897985d-9z82d_kube-system" network: open /run/flannel/subnet.env: no such file or directory
Normal SandboxChanged 97s (x12 over 108s) kubelet Pod sandbox changed, it will be killed and re-created.
Warning FailedCreatePodSandBox 96s (x4 over 99s) kubelet (combined from similar events): Failed to create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "b46dcd8abb9ab0787fdb2ab9f33ebf052c2dd1ad091c006974a3db7716904196" network for pod "coredns-64897985d-9z82d": networkPlugin cni failed to set up pod "coredns-64897985d-9z82d_kube-system" network: open /run/flannel/subnet.env: no such file or directory
解决的方式就是重新执行:
kubectl apply -f https://raw.githubusercontent.com/flannel-io/flannel/master/Documentation/kube-flannel.yml
备注:这个问题只遇到过一次。
失败重来
如果遇到安装失败,需要重新开始,或者想铲掉现有的安装,则可以:
- 运行
kubeadm reset
- 删除
.kube
目录 - 再次执行
kubeadm init
如果网络设置有改动,则需要彻底的重置网络。具体见下一章。
将节点加入到集群
如果有多个kubenetes节点(即多台机器),则需要将其他节点加入到集群中。具体见下一章。