介绍Kubernetes学习笔记的基本资料和访问方式
Kubernetes学习笔记
- 1: 介绍
- 1.1: Kubernetes概述
- 1.2: Kubernetes资料收集
- 2: Ingress
- 2.1: Nginx Ingress
- 3: 安装
- 3.1: 安装kubectl
- 3.1.1: 在 ubuntu 上安装 kubectl
- 3.2: 通过 kubeadm 安装 kubenetes
- 3.2.1: 在ubuntu上安装kubenetes
- 3.2.1.1: ubuntu22.04上用kubeadm安装kubenetes
- 3.2.1.2: Kubernetes安装后配置
- 3.2.1.3: 通过kubeadm join增加节点
- 3.2.1.4: 部署并访问Dashboard
- 3.2.1.5: 部署 metrics-server
- 3.2.1.6: ubuntu 20.04下用 kubeadm 安装 kubenetes
- 3.2.2: 在debian12上安装kubenetes
- 3.2.2.1: 在debian12上安装kubenetes
- 3.3: 通过 minikube 安装 kubenetes
- 3.3.1: minikube概述
- 3.3.2: ubuntu下用minikube安装
- 3.3.3: MacOS下用minikube安装
- 3.4: 在 Docker Desktop 中安装 kubenetes
- 4: Sidecar Container
- 4.1: Sidecar Container概述
- 4.2: KEP753: Sidecar Container
- 4.3: 推翻KEP753的讨论
1 - 介绍
1.1 - Kubernetes概述
kubernetes是什么?
Kubernetes是一个可移植,可扩展的开源平台,用于管理容器化工作负载和服务,有助于声明性配置和自动化。 它拥有庞大而快速发展的生态系统。 Kubernetes服务,支持和工具广泛可用。
谷歌在2014年开源了Kubernetes项目。Kubernetes建立在谷歌十五年来大规模运行生产负载经验的基础上,结合了社区中最佳的创意和实践。
为什么需要kubernets和它可以做什么?
Kubernetes拥有许多功能。 它可以被认为是:
- 容器平台
- 微服务平台
- 可移植云平台,还有更多
Kubernetes提供以容器为中心的管理环境。 它代表用户工作负载编排计算,网络和存储基础设施。这极大的简化了PaaS,并具有IaaS的灵活性,还支持跨基础设施提供商的可移植性。
Kubernetes如何成为一个平台?
Kubernetes提供了许多功能,但总会有新的方案可以从新功能中受益。它可以简化特定于应用程序的工作流程,以加快开发速度。 最初被认可的编排通常需要较强的大规模自动化能力。这就是为什么Kubernetes还可以作为构建组件和工具生态系统的平台,以便更轻松地部署,扩展和管理应用程序。
Label 允许用户按照自己的方式组织管理对应的资源。 Annotations 使用户能够以自定义的描述信息来修饰资源,以适用于自己的工作流,并为管理工具提供检查点状态的简单方法。
此外,Kubernetes 控制面构建在相同的 API 上面,开发人员和用户都可以用。用户可以编写自己的控制器, 如调度器,如果这么做,根据新加的自定义 API ,可以扩展当前的通用 CLI 命令行工具。
此外,Kubernetes控制平面基于开发人员和用户可用的相同API构建。用户可以使用自己的API编写自己的控制器,如scheduler,这些API可以通过通用命令行工具进行定位。
这种设计使得能够在Kubernetes上面构建许多其他系统。
Kubernetes不是什么?
Kubernetes 不是一个传统的,包罗万象的 PaaS(Platform as a Service)系统。由于Kubernetes在容器级而非硬件级运行,因此它提供了PaaS产品常用的一些通用功能,例如部署,扩展,负载平衡,日志和监控。 但是,Kubernetes不是单体,而且这些默认解决方案是可选的和可插拔的。 Kubernetes提供了构建开发人员平台的构建块,但在重要的地方保留了用户选择和灵活性。
Kubernetes:
- 不限制支持的应用程序类型。 Kubernetes旨在支持各种各样的工作负载,包括无状态,有状态和数据处理工作负载。如果一个应用程序可以在一个容器中运行,它应该在Kubernetes上运行得很好。
- 不部署源代码并且不构建您的应用程序。持续集成,交付和部署(CI / CD)工作流程由组织文化和偏好以及技术要求决定。
- 不提供应用程序级服务,例如中间件(例如,消息总线),数据处理框架(例如,Spark),数据库(例如,mysql),高速缓存,也不提供集群存储系统(例如,Ceph)作为内建服务。这些组件可以在Kubernetes上运行,可以被在Kubernetes上运行的应用程序访问,通过可移植机制(例如Open Service Broker)。
- 不指定记录,监控或告警解决方案。它提供了一些集成作为概念证明,以及收集和导出指标的机制。
- 不提供或授权配置语言/系统(例如,jsonnet)。它提供了一个声明性API,可以通过任意形式的声明性规范来实现。
- 不提供或采用任何全面的机器配置,维护,管理或自我修复系统。
此外,Kubernetes不仅仅是编排系统。实际上,它消除了编排的需要。业务流程的技术定义是执行定义的工作流程:首先执行A,然后运行B,然后运行C.相反,Kubernetes由一组独立的,可组合的控制流程组成,这些流程将当前状态持续推向所提供的所需状态。 如何从A到C无关紧要。也不需要集中控制。 这使得系统更易于使用且功能更强大,更强大,更具弹性且可扩展。
为什么用容器?
找找应该使用容器的原因?
部署应用程序的旧方法是使用操作系统软件包管理器在主机上安装应用程序。这样做的缺点是将应用程序的可执行文件,配置,类库和生命周期混在一起,并与主机操作系统纠缠在一起。 可以构建不可变的虚拟机映像以实现可预测的部署和回滚,但虚拟机是重量级且不可移植的。
新方法是基于操作系统级虚拟化而不是硬件虚拟化来部署容器。这些容器彼此隔离并与主机隔离:它们具有自己的文件系统,它们无法看到彼此的进程,并且它们的计算资源使用可能是有限的。它们比虚拟机更容易构建,并且因为它们与底层基础设施和主机文件系统解藕,所以它们可以跨云和操作系统分发进行移植。
由于容器小而快,因此可以在每个容器映像中打包一个应用程序。 这种一对一的应用程序到映像关系解锁了容器的全部优势。 使用容器,可以在构建/发布时而不是部署时创建不可变容器映像,因为每个应用程序不需要与应用程序堆栈的其余部分组合,也不需要与生产基础设施环境结合。 在构建/发布时生成容器映像可以实现从开发到生产的一致环境。 同样,容器比VM更加透明,这有利于监控和管理。当容器的进程生命周期由基础设施管理而不是由容器内的进程管理器隐藏时,尤其如此。 最后,每个容器使用一个应用程序,管理容器就等于管理应用程序的部署。
容器好处总结如下:
- 应用程序创建和部署更敏捷:与VM映像使用相比,增加了容器映像创建的简便性和效率。
- 持续开发,集成和部署:通过快速简便的回滚(源于镜像不变性)提供可靠且频繁的容器镜像构建和部署。
- Dev和Ops关注点分离:在构建/发布时而不是部署时创建应用程序容器映像,从而将应用程序与基础设施解耦。
- 可观察性:不仅可以显示操作系统级别的信息和指标,还可以显示应用程序运行状况和其他信号。
- 开发,测试和生产的环境一致性:在笔记本电脑上运行与在云中运行相同。
- 云和OS分发可移植性:在Ubuntu,RHEL,CoreOS,本地,Google Kubernetes引擎以及其他任何地方运行。
- 以应用程序为中心的管理:提升抽象级别,从在虚拟硬件上运行OS到使用逻辑资源在OS上运行应用程序。
- 松散耦合,分布式,弹性,解放的微服务:应用程序被分解为更小,独立的部分,可以动态部署和管理 - 而不是在一台大型单一用途机器上运行的单体堆栈。
- 资源隔离:可预测的应用程序性能。
- 资源利用:高效率和高密度。
参考资料
- What is Kubernetes?: 官方文档的介绍篇,还是官方文档写的到位
1.2 - Kubernetes资料收集
官方资料
- Kubernetes官网
- kubernetes@github
- 官方文档:英文版 ,还有 中文翻译版本,不过目前完成度还比较低
- https://k8smeetup.github.io/docs/home/ : 这里有另一份中文翻译版本(官方中文版本的前身),完成度较高
社区资料
学习资料
- Kubernetes指南: 这是目前最新最好的Kubernetes中文资料,强烈推荐!
2 - Ingress
2.1 - Nginx Ingress
3 - 安装
3.1 - 安装kubectl
kubectl 是 Kubernetes 的命令行工具,允许对Kubernetes集群运行命令。
可以使用kubectl来部署应用程序,检查和管理集群资源,并查看日志。
3.1.1 - 在 ubuntu 上安装 kubectl
参考 Kubernetes 官方文档:
分步骤安装
和后面安装 kubeadm 方式一样,只是这里只需要安装 kubectl 一个工具,不需要安装 kubeadm 和 kublete
执行如下命令:
sudo apt-get update
sudo apt-get install -y apt-transport-https ca-certificates curl
curl -fsSL https://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo gpg --dearmor -o /etc/apt/keyrings/kubernetes-archive-keyring.gpg
echo "deb [signed-by=/etc/apt/keyrings/kubernetes-archive-keyring.gpg] https://apt.kubernetes.io/ kubernetes-xenial main" | sudo tee /etc/apt/sources.list.d/kubernetes.list
sudo apt-get update
k8s 暂时固定使用 1.23.14 版本:
sudo apt-get install kubectl=1.23.14-00
# sudo apt-get install kubelet=1.23.14-00 kubeadm=1.23.14-00 kubectl=1.23.14-00
直接安装
不推荐这样安装,会安装最新版本,而且安装目录是 /usr/local/bin/
。
curl -LO "https://dl.k8s.io/release/$(curl -L -s https://dl.k8s.io/release/stable.txt)/bin/linux/amd64/kubectl"
sudo install -o root -g root -m 0755 kubectl /usr/local/bin/kubectl
rm kubectl
如果 /usr/local/bin/
不在 path 路径下,则需要修改一下 path:
export PATH=/usr/local/bin:$PATH
验证一下:
kubectl version --output=yaml
输出为:
clientVersion:
buildDate: "2023-06-14T09:53:42Z"
compiler: gc
gitCommit: 25b4e43193bcda6c7328a6d147b1fb73a33f1598
gitTreeState: clean
gitVersion: v1.27.3
goVersion: go1.20.5
major: "1"
minor: "27"
platform: linux/amd64
kustomizeVersion: v5.0.1
The connection to the server localhost:8080 was refused - did you specify the right host or port?
配置
oh-my-zsh自动完成
在使用 oh-my-zsh 之后,会更加的简单(强烈推荐使用 oh-my-zsh ),只要在 oh-my-zsh 的 plugins 列表中增加 kubectl 即可。
然后,在 ~/.zshrc
中增加以下内容:
# k8s auto complete
alias k=kubectl
complete -F __start_kubectl k
source ~/.zshrc
之后即可使用,此时用 k 这个别名来执行 kubectl 命令时也可以实现自动完成,非常的方便。
3.2 - 通过 kubeadm 安装 kubenetes
3.2.1 - 在ubuntu上安装kubenetes
3.2.1.1 - ubuntu22.04上用kubeadm安装kubenetes
以 ubuntu server 22.04 为例,参考 Kubernetes 官方文档:
前期准备
检查 docker 版本
注意
暂时固定使用 docker 和 k8s 的特定版本搭配:
- docker: 20.10.21
- k8s: 1.23.14
具体原因请见最下面的解释。
检查 container 配置
sudo vi /etc/containerd/config.toml
确保文件不存在或者一下这行内容被注释:
# disabled_plugins = ["cri"]
修改之后需要重启 containerd:
sudo systemctl restart containerd.service
备注:如果不做这个修改,k8s 安装时会报错 “CRI v1 runtime API is not implemented”。
禁用虚拟内存swap
执行 free -m
命令检测:
$ free -m
total used free shared buff/cache available
Mem: 15896 1665 11376 20 2854 13819
Swap: 0 0 0
如果Swap这一行不是0,则说明虚拟内存swap被开启了,需要关闭。
需要做两个事情:
-
操作系统安装时就不要设置swap分区,如果有,删除该swap分区
-
即使没有swap分区,也会开启swap,需要通过
sudo vi /etc/fstab
找到swap 这一行:# 在swap分区这行前加 # 禁用掉swap /swapfile none swap sw 0 0
重启之后再用
free -m
命令检测。
安装kubeadm
切记
想办法搞定全局翻墙,不然 kubeadm 安装是非常麻烦的。执行如下命令:
sudo apt-get update
sudo apt-get install -y apt-transport-https ca-certificates curl
curl -fsSL https://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo gpg --dearmor -o /etc/apt/keyrings/kubernetes-archive-keyring.gpg
echo "deb [signed-by=/etc/apt/keyrings/kubernetes-archive-keyring.gpg] https://apt.kubernetes.io/ kubernetes-xenial main" | sudo tee /etc/apt/sources.list.d/kubernetes.list
sudo apt-get update
安装最新版本
sudo apt-get install -y kubelet kubeadm kubectl
安装完成后
kubectl version --output=yaml
查看 kubectl 版本:
clientVersion:
buildDate: "2023-06-14T09:53:42Z"
compiler: gc
gitCommit: 25b4e43193bcda6c7328a6d147b1fb73a33f1598
gitTreeState: clean
gitVersion: v1.27.3
goVersion: go1.20.5
major: "1"
minor: "27"
platform: linux/amd64
kustomizeVersion: v5.0.1
The connection to the server localhost:8080 was refused - did you specify the right host or port?
查看 kubeadm 版本:
kubeadm version
kubeadm version: &version.Info{Major:"1", Minor:"27", GitVersion:"v1.27.3", GitCommit:"25b4e43193bcda6c7328a6d147b1fb73a33f1598", GitTreeState:"clean", BuildDate:"2023-06-14T09:52:26Z", GoVersion:"go1.20.5", Compiler:"gc", Platform:"linux/amd64"}
查看 kubelet 版本:
kubelet --version
Kubernetes v1.27.3
安装特定版本
如果希望安装特定版本:
sudo apt-get install kubelet=1.23.14-00 kubeadm=1.23.14-00 kubectl=1.23.14-00
具体有哪些可用的版本,可以看这里:
https://packages.cloud.google.com/apt/dists/kubernetes-xenial/main/binary-amd64/Packages
安装k8s
参考:https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/create-cluster-kubeadm/
同样切记
想办法搞定全局翻墙。sudo kubeadm init --pod-network-cidr=10.244.0.0/16 -v=9
sudo kubeadm init --pod-network-cidr=10.244.0.0/16 --apiserver-advertise-address=192.168.100.40 -v=9
注意后面为了使用 CNI network 和 Flannel,我们在这里设置了 --pod-network-cidr=10.244.0.0/16
,如果不加这个设置,Flannel 会一直报错。如果机器上有多个网卡,可以用 --apiserver-advertise-address
指定要使用的IP地址。
kubeadm init 输出如下:
......
Your Kubernetes control-plane has initialized successfully!
To start using your cluster, you need to run the following as a regular user:
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
Alternatively, if you are the root user, you can run:
export KUBECONFIG=/etc/kubernetes/admin.conf
You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
https://kubernetes.io/docs/concepts/cluster-administration/addons/
Then you can join any number of worker nodes by running the following on each as root:
kubeadm join 192.168.0.57:6443 --token gwr923.gctdq2sr423mrwp7 \
--discovery-token-ca-cert-hash sha256:ad86f4eb0d430fc1bdf784ae655dccdcb14881cd4ca8d03d84cd2135082c4892
为了使用普通用户,按照上面的提示执行:
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
安装完成后,node处于NotReady状态:
$ kubectl get node
NAME STATUS ROLES AGE VERSION
skyserver NotReady control-plane,master 3m7s v1.23.5
kubectl describe 可以看到是因为没有安装 network plugin
$ kubectl describe node ubuntu2204
Name: ubuntu2204
Roles: control-plane
......
Ready False Wed, 28 Jun 2023 16:53:27 +0000 Wed, 28 Jun 2023 16:52:41 +0000 KubeletNotReady container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: cni plugin not initialized
安装 flannel 作为 pod network add-on:
kubectl apply -f https://raw.githubusercontent.com/flannel-io/flannel/master/Documentation/kube-flannel.yml
备注:有时会遇到 raw.githubusercontent.com 这个域名被污染,解析为 127.0.0.1,导致无法访问。解决方法是访问 https://ipaddress.com/website/raw.githubusercontent.com 然后查看可用的IP地址,找一个速度最快的,在
/etc/hosts
文件中加入一行记录即可,如185.199.111.133 raw.githubusercontent.com
。
稍等就可以看到 node 的状态变为 Ready了:
$ kubectl get node
NAME STATUS ROLES AGE VERSION
skyserver Ready control-plane,master 4m52s v1.23.5
最后,如果是测试用的单节点,为了让负载可以跑在 k8s 的 master 节点上,执行下列命令去除 master/control-plane 的污点:
# 以前的污点名为 master
# kubectl taint nodes --all node-role.kubernetes.io/master-
# 新版本污点名改为 control-plane (master政治不正确)
kubectl taint nodes --all node-role.kubernetes.io/control-plane-
可以通过 kubectl describe node skyserver
对比去除污点前后 node 信息中的 Taints 部分,去除污点前:
Taints: node.kubernetes.io/not-ready:NoExecute
node-role.kubernetes.io/master:NoSchedule
node.kubernetes.io/not-ready:NoSchedule
去除污点后:
Taints: <none>
常见问题
CRI v1 runtime API is not implemented
如果类似的报错(新版本):
[preflight] Some fatal errors occurred:
[ERROR CRI]: container runtime is not running: output: time="2023-06-28T16:12:49Z" level=fatal msg="validate service connection: CRI v1 runtime API is not implemented for endpoint \"unix:///var/run/containerd/containerd.sock\": rpc error: code = Unimplemented desc = unknown service runtime.v1.RuntimeService"
, error: exit status 1
或者报错(老一些的版本):
[preflight] Some fatal errors occurred:
[ERROR CRI]: container runtime is not running: output: E1125 11:16:01.799551 14661 remote_runtime.go:948] "Status from runtime service failed" err="rpc error: code = Unimplemented desc = unknown service runtime.v1alpha2.RuntimeService"
time="2022-11-25T11:16:01+08:00" level=fatal msg="getting status of runtime: rpc error: code = Unimplemented desc = unknown service runtime.v1alpha2.RuntimeService"
, error: exit status 1
这都是因为 containerd 的默认配置文件中 disable 了 CRI 的原因,可以打开文件 /etc/containerd/config.toml
看到这行
disabled_plugins = ["cri"]
将这行注释之后,重启 containerd :
sudo systemctl restart containerd.service
之后重新尝试 kubeadm init。
参考:
控制平面不启动或者异常重启
安装最新版本(1.27 / 1.25)完成显示成功,但是控制平面没有启动,6443 端口无法连接:
k get node
E0628 16:34:50.966940 6581 memcache.go:265] couldn't get current server API group list: Get "https://192.168.0.57:6443/api?timeout=32s": read tcp 192.168.0.57:41288->192.168.0.1:7890: read: connection reset by peer - error from a previous attempt: read tcp 192.168.0.57:41276->192.168.0.1:7890: read: connection reset by peer
使用中发现控制平面经常不稳定, 大量的 pod 在反复重启,日志中有提示:pod sandbox changed。
记录测试验证有问题的版本:
- kubeadm: 1.27.3 / 1.25.6
- kubelet:1.27.3 / 1.25.6
- docker: 24.0.2 / 20.10.21
尝试回退 docker 版本,k8s 1.27 的 changelog 中,
https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.27.md 提到的 docker 版本是 v20.10.21 (incompatible 是什么鬼?) :
github.com/docker/docker: v20.10.18+incompatible → v20.10.21+incompatible
这个 v20.10.21 版本我翻了一下我之前的安装记录,非常凑巧之前是有使用这个 docker 版本的,而且稳定没出问题。因此考虑换到这个版本:
VERSION_STRING=5:20.10.21~3-0~ubuntu-jammy
sudo apt-get install docker-ce=$VERSION_STRING docker-ce-cli=$VERSION_STRING containerd.io docker-buildx-plugin docker-compose-plugin
k8s 暂时固定选择 1.23.14 这个经过验证的版本:
sudo apt-get install kubelet=1.23.14-00 kubeadm=1.23.14-00 kubectl=1.23.14-00
备注: 1.27.3 / 1.25.6 这两个 k8s 的版本都验证过会有问题,暂时不清楚原因,先固定用 1.23.14。
后续再排查。
失败重来
如果遇到安装失败,需要重新开始,或者想铲掉现有的安装,则可以:
- 运行
kubeadm reset
- 删除
.kube
目录 - 再次执行
kubeadm init
如果网络设置有改动,则需要彻底的重置网络。具体见下一章。
将节点加入到集群
如果有多个kubenetes节点(即多台机器),则需要将其他节点加入到集群中。具体见下一章。
3.2.1.2 - Kubernetes安装后配置
配置 kubectl 自动完成
zsh配置
mac默认使用zsh,为了实现 kubectl 的自动完成功能,需要在 ~/.zshrc
中增加以下内容:
# 注意这一行要加在文件的最前面
autoload -Uz compinit && compinit -i
......
# k8s auto complete
source <(kubectl completion zsh)
alias k=kubectl
complete -F __start_kubectl k
同时为了使用方便,为 kubectl 增加了 k 的别名,同样也为 k 增加了 自动完成功能。
使用oh-my-zsh
在使用 oh-my-zsh 之后,会更加的简单(强烈推荐使用 oh-my-zsh ),只要在 oh-my-zsh 的 plugins 列表中增加 kubectl 即可。
然后,在 ~/.zshrc
中增加以下内容:
# k8s auto complete
alias k=kubectl
complete -F __start_kubectl k
source ~/.zshrc
之后即可使用,此时用 k 这个别名来执行 kubectl 命令时也可以实现自动完成,非常的方便。
显示正在使用的kubectl上下文
https://github.com/ohmyzsh/ohmyzsh/tree/master/plugins/kubectx
这个插件增加了 kubectx_prompt_info()函数。它显示正在使用的 kubectl context 的名称(kubectl config current-context
)。
你可以用它来定制提示,并知道你是否在prod集群上.
使用方式为修改 ~/.zshrc
:
- 在 plugins 中增加 “kubectx”
- 增加一行
RPS1='$(kubectx_prompt_info)'
source ~/.zshrc
之后即可生效,会在命令行的最右侧显示出kubectl context 的名称,默认情况下 kubectl config current-context
的输出是 “kubernetes-admin@kubernetes”。
如果需要更友好的显示,则可以将名字映射为可读性更强的标记,如 dev, stage, prod:
kubectx_mapping[kubernetes-admin@kubernetes]="dev"
备注: 在多个k8s环境下切换时应该很有用,后续有需要再研究。
从其他机器上操作k8s集群
如果k8s安装在本机,则相应的 kubectl
等命令行工具在安装过程中都在本地准备就绪,而且 kubeadm init
命令在安装完毕之后会提示:
To start using your cluster, you need to run the following as a regular user:
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
Alternatively, if you are the root user, you can run:
export KUBECONFIG=/etc/kubernetes/admin.conf
安装上面的提示操作之后,就可以在本地通过 kubectl
命令行工具操作安装的k8s集群。
如果我们希望从其他机器上方便的操作k8s集群,而不是限制要先ssh登录到安装k8s控制平面的机器上,则可以简单的在这台机器上安装kubectl并配置好kubeconf文件。
步骤如下:
-
安装 kubectl:和前面的不走类似
-
配置 kubeconf
mkdir -p $HOME/.kube # 复制集群的config文件到这台机器 cp -i /path/to/cluster/config $HOME/.kube/config
如果有多个k8s集群需要操作,则可以在执行 kubectl
命令时通过 --kubeconfig
参数指定要使用的 kubeconf 文件:
kubectl --kubeconfig /home/sky/.kube/skyserver get nodes
每次都输入 “–kubeconfig /home/sky/.kube/skyserver” 会很累,可以通过设置临时的环境变量来在当前终端下选择kubeconf文件,如:
export KUBECONFIG=$HOME/.kube/skyserver
k get nodes
# 不需要用时,关闭终端或者unset
unset KUBECONFIG
如果需要同时操作多个集群,需要在多个集群之间反复切换,则应该使用context来灵活切换,参考:
取消docker和k8s的更新
通过 apt 方式安装的 docker 和 k8s,会在 apt upgrade 时自动升级到最新版本,这未必安全,通常也没有必要。
可以考虑取消docker和k8s的的 apt 更新,cd /etc/apt/sources.list.d
,将 docker 和 k8s 的ppa配置文件内容用 “#” 注释掉就可以了。需要时可以重新打开。
3.2.1.3 - 通过kubeadm join增加节点
参考 Kubernetes 官方文档:
- https://kubernetes.io/docs/setup/independent/create-cluster-kubeadm/
- https://kubernetes.io/zh/docs/reference/setup-tools/kubeadm/kubeadm-join/ : 上文的中文版本
准备工作
通过 kubeadmin init
命令安装k8s时,会有如下提示:
Then you can join any number of worker nodes by running the following on each as root:
sudo kubeadm join 192.168.0.41:6443 --token 5ezixq.itmxvdgey8uduysr \
--discovery-token-ca-cert-hash sha256:d641cec650bdee479a3e7479b558ab68886f7c41ef89f2857099776ed72bcaae
这里用到的 token 可以通过 kubeadm token list
命令获取:
$ kubeadm token list
TOKEN TTL EXPIRES USAGES DESCRIPTION EXTRA GROUPS
5ezixq.itmxvdgey8uduysr 12h 2021-12-28T04:22:54Z authentication,signing The default bootstrap token generated by 'kubeadm init'. system:bootstrappers:kubeadm:default-node-token
由于 token 的有效期(TTL)通常不是很久(默认12小时),因此可能会出现没有可用的token的情况。此时需要在该集群上创建新的token(注意需要登录到集群的控制平面所在的节点上执行命令,因为后面会读取本地文件):
$ kubeadm token create
omkq4t.v6nnkj4erms2ipyf
$ kubeadm token list
TOKEN TTL EXPIRES USAGES DESCRIPTION EXTRA GROUPS
omkq4t.v6nnkj4erms2ipyf 23h 2021-12-29T09:19:23Z authentication,signing <none> system:bootstrappers:kubeadm:default-node-token
discovery-token-ca-cert-hash 可以通过下面的命令生成:
openssl x509 -pubkey -in /etc/kubernetes/pki/ca.crt | openssl rsa -pubin -outform der 2>/dev/null | openssl dgst -sha256 -hex | sed 's/^.* //'
d641cec650bdee479a3e7479b558ab68886f7c41ef89f2857099776ed72bcaae
执行kubeadm join
输出如下:
$ sudo kubeadm join 192.168.0.41:6443 --token 5ezixq.itmxvdgey8uduysr \
--discovery-token-ca-cert-hash sha256:d641cec650bdee479a3e7479b558ab68886f7c41ef89f2857099776ed72bcaae
[preflight] Running pre-flight checks
[preflight] Reading configuration from the cluster...
[preflight] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'
W1228 00:04:48.056252 78445 utils.go:69] The recommended value for "resolvConf" in "KubeletConfiguration" is: /run/systemd/resolve/resolv.conf; the provided value is: /run/systemd/resolve/resolv.conf
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Starting the kubelet
[kubelet-start] Waiting for the kubelet to perform the TLS Bootstrap...
This node has joined the cluster:
* Certificate signing request was sent to apiserver and a response was received.
* The Kubelet was informed of the new secure connection details.
Run 'kubectl get nodes' on the control-plane to see this node join the cluster.
在当前机器上,执行命令,会发现无法连接本地 api server:
$ k get nodes
The connection to the server localhost:8080 was refused - did you specify the right host or port?
在另一台机器上执行命令,可以看到这个节点添加成功:
$ k get nodes
NAME STATUS ROLES AGE VERSION
skyserver Ready control-plane,master 11h v1.23.1
skyserver2 Ready <none> 4m1s v1.23.1
错误处理
pod无法启动
发现有调度到某个节点的pod无法启动,一直卡在 ContainerCreating 上:
$ get pods -A
NAMESPACE NAME READY STATUS RESTARTS AGE
kubernetes-dashboard dashboard-metrics-scraper-799d786dbf-6wksz 0/1 ContainerCreating 0 8h
查看该pod信息发现调度到node skywork2,然后报错 "cni0" already has an IP address different from 10.244.2.1/24
:
k describe pods dashboard-metrics-scraper-799d786dbf-hqlg6 -n kubernetes-dashboard
Name: dashboard-metrics-scraper-799d786dbf-hqlg6
Namespace: kubernetes-dashboard
Priority: 0
Node: skywork2/192.168.0.20
......
Warning FailedCreatePodSandBox 17s (x4 over 20s) kubelet (combined from similar events): Failed to create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "41479d55f5428ec9a36267170dd1516f996bcf9d49f772d98c2fc79230f64830" network for pod "dashboard-metrics-scraper-799d786dbf-hqlg6": networkPlugin cni failed to set up pod "dashboard-metrics-scraper-799d786dbf-hqlg6_kubernetes-dashboard" network: failed to delegate add: failed to set bridge addr: "cni0" already has an IP address different from 10.244.2.1/24
这是因为之前这个节点在 kubeadm join
之前,做过 kubeadm init
,在 kebeadm reset
之后残余了部分网络配置。
解决的方法是彻底的重置网络再join, 操作如下:
sudo -i
kubeadm reset -f
systemctl stop kubelet
systemctl stop docker
rm -rf /var/lib/cni/
rm -rf /var/lib/kubelet/*
rm -rf /etc/cni/
rm -rf /etc/kubernetes/
ifconfig cni0 down
ifconfig flannel.1 down
ifconfig docker0 down
ip link delete cni0
ip link delete flannel.1
systemctl start docker
systemctl start kubelet
在清理干净之后再次执行 kubeadm join
即可。
备注: 发现在节点执行
kubeadm reset
之后,在master节点上执行kebuctr get nodes
时这个节点信息迟迟不能剔除。安全起见可以手工执行一次kebuctl delete nodes skywork2
参考资料:
3.2.1.4 - 部署并访问Dashboard
参考资料:
部署dashboard
在下面地址上查看当前dashboard的版本:
https://github.com/kubernetes/dashboard/releases
根据对kubernetes版本的兼容情况选择对应的dashboard的版本:
- dashboard 2.7 : 全面兼容 k8s 1.25
- dashboard 2.6.1 : 全面兼容 k8s 1.24
- dashboard 2.5.1: 全面兼容 k8s 1.23
通过如下命令部署:
kubectl apply -f https://raw.githubusercontent.com/kubernetes/dashboard/v2.5.1/aio/deploy/recommended.yaml
其中版本号可以查看 https://github.com/kubernetes/dashboard/releases
部署成功之后,可以看到 kubernetes-dashboard 相关的两个pod:
$ k get pods -A
NAMESPACE NAME READY STATUS RESTARTS AGE
kubernetes-dashboard dashboard-metrics-scraper-799d786dbf-krhln 1/1 Running 0 11m
kubernetes-dashboard kubernetes-dashboard-6b6b86c4c5-ptstx 1/1 Running 0 8h
和 kubernetes-dashboard 相关的两个service:
$ k get services -A
NAMESPACE NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kubernetes-dashboard dashboard-metrics-scraper ClusterIP 10.103.242.118 <none> 8000/TCP 8h
kubernetes-dashboard kubernetes-dashboard ClusterIP 10.106.3.227 <none> 443/TCP 8h
访问dashboard
参考官方文章: https://github.com/kubernetes/dashboard/blob/master/docs/user/accessing-dashboard/README.md
前面部署 dashboard 时使用的是 recommended 配置,和文章要求一致。
当前集群信息如下:
$ kubectl cluster-info
Kubernetes control plane is running at https://192.168.0.41:6443
CoreDNS is running at https://192.168.0.41:6443/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy
kubectl proxy
直接 kubectl proxy
启动的是本地代理服务器,只能通过 localhost 访问,这个只适合本地单集群使用:
$ k proxy
Starting to serve on 127.0.0.1:8001
kubectl port-forward
$ kubectl port-forward -n kubernetes-dashboard service/kubernetes-dashboard 8080:443
Forwarding from 127.0.0.1:8080 -> 8443
Forwarding from [::1]:8080 -> 8443
类似的,也只能本地访问 https://localhost:8080 。
NodePort
执行:
kubectl -n kubernetes-dashboard edit service kubernetes-dashboard
修改 type: ClusterIP
为 type: NodePort
:
apiVersion: v1
...
name: kubernetes-dashboard
namespace: kubernetes-dashboard
resourceVersion: "343478"
selfLink: /api/v1/namespaces/kubernetes-dashboard/services/kubernetes-dashboard
uid: 8e48f478-993d-11e7-87e0-901b0e532516
spec:
clusterIP: 10.100.124.90
externalTrafficPolicy: Cluster
ports:
- port: 443
protocol: TCP
targetPort: 8443
selector:
k8s-app: kubernetes-dashboard
sessionAffinity: None
type: ClusterIP
看一下具体分配的 node port 是哪个:
$ kubectl -n kubernetes-dashboard get service kubernetes-dashboard
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kubernetes-dashboard NodePort 10.106.3.227 <none> 443:32212/TCP 9h
可以看到这里分配的是 32212 端口。
然后就是 node 的 ip 地址了,如果是单节点的集群,那么 node ip 就固定为 master node 的IP,可以通过 kubectl cluster-info
获取。如果是多节点的集群,则需要找到 kubernetes-dashboard 服务被部署到了哪个节点。
$ k get pods -A -o wide | grep kubernetes-dashboard
kubernetes-dashboard dashboard-metrics-scraper-799d786dbf-krhln 1/1 Running 0 32m 10.244.1.3 skyserver2 <none> <none>
kubernetes-dashboard kubernetes-dashboard-6b6b86c4c5-ptstx 1/1 Running 0 9h 10.244.1.2 skyserver2 <none> <none>
如图 kubernetes-dashboard 服务被部署到了 skyserver2 节点,skyserver2 的 IP 是 192.168.0.50,则拼合起来的地址是
https://192.168.0.50:32212
或者为了方便起见,将每台node的名字和IP地址绑定,通过 sudo vi /ete/hosts
修改hosts文件,增加以下内容:
# node IP
192.168.0.10 skywork
192.168.0.20 skywork2
192.168.0.40 skyserver
192.168.0.50 skyserver2
之后就可以通过 https://skyserver2:32212 访问了。
特别事项:浏览器对自签名证书网站的访问处理
使用浏览器访问该地址时,可以连接上,但是浏览器会因为网站使用的是自签名证书而报错 “此网站连接不安全” 拒绝访问。
各浏览器的处理:
- edag:拒绝访问,可以使用魔术短语:
thisisunsafe
(没有输入框,只要单击该页面以确保它具有焦点,然后键盘输入即可) - firefox:默认拒绝,选择"接受风险并继续"后可以正常访问
- Chrome:待测试,应该可以使用魔术短语:
thisisunsafe
- Safari: 默认拒绝,点击 “Show details” -> “visit this website” -> “visit website” 可以绕开限制继续访问
参考:
登录Dashboard
通过token登录
token可以通过下面的命令简单获取到:
kubectl -n kube-system describe $(kubectl -n kube-system get secret -n kube-system -o name | grep namespace) | grep token
输出为:
$ kubectl -n kube-system describe $(kubectl -n kube-system get secret -n kube-system -o name | grep namespace) | grep token
Name: namespace-controller-token-r87br
Type: kubernetes.io/service-account-token
token: eyJhbGciOiJSUzI1NiIsImtpZCI6ImNuYUVPT3FRR0dVOFBmN3pFeW81Y1p5R004RVh6VGtJUUpfSHo1ZVFMUVEifQ.eyJpc3MiOiJrdWJlcm5ldGVzL3NlcnZpY2VhY2NvdW50Iiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9uYW1lc3BhY2UiOiJrdWJlLXN5c3RlbSIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VjcmV0Lm5hbWUiOiJuYW1lc3BhY2UtY29udHJvbGxlci10b2tlbi1yODdiciIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VydmljZS1hY2NvdW50Lm5hbWUiOiJuYW1lc3BhY2UtY29udHJvbGxlciIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VydmljZS1hY2NvdW50LnVpZCI6ImU2NjU3ODI3LTc4NTUtNDAzOC04MmJjLTlmMjI0OWM3NzYyZiIsInN1YiI6InN5c3RlbTpzZXJ2aWNlYWNjb3VudDprdWJlLXN5c3RlbTpuYW1lc3BhY2UtY29udHJvbGxlciJ9.sVRT_x5NB4sqYwyyqn2Mm3hKg1jhvCsCDMbm_JY-3a19tknzwv_ZPpGOHWrPxmCG45_-tHExi7BbbGK1ZAky2UjtEpxmtVNR6yqHRMYvXtqifqHI4yS6ig-t5WiZ0a4h1q6xZfWsM9nlINSTGQbguCCN2kXUYyAZ0HPdPhdFtmyH9_fjI-FXQOPeK9t9GfWn9Nm52T85spzriwOMY96fFXZ3YaiuzfY5aBtGoxLwDu7O2GOazBmeFaRzEEGR0RjgdM7WPFmtDvbaidIJDPkLznqftqwUFeWHjz6-toO8iaKW_QKHFBvZTQ6uXSc__tbcSYyThu3Ty97-Ml8TArhacw
复制这里的 token 提交就可以登录。
参考:
通过kubeconf文件登录
在 kebeconf 文件(路径为 ~/.kube/config
)中加入 token 信息:
apiVersion: v1
clusters:
- cluster:
certificate-authority-data: XXXXXX==
server: https://192.168.0.41:6443
name: kubernetes
contexts:
- context:
cluster: kubernetes
user: kubernetes-admin
name: kubernetes-admin@kubernetes
current-context: kubernetes-admin@kubernetes
kind: Config
preferences: {}
users:
- name: kubernetes-admin
user:
client-certificate-data: XXXXX==
client-key-data: XXXX=
token: eyJhbGciOiJSUzI1NiIsImtpZCI6ImNuYUVPT3FRR0dVOFBmN3pFeW81Y1p5R004RVh6VGtJUUpfSHo1ZVFMUVEifQ.eyJpc3MiOiJrdWJlcm5ldGVzL3NlcnZpY2VhY2NvdW50Iiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9uYW1lc3BhY2UiOiJrdWJlLXN5c3RlbSIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VjcmV0Lm5hbWUiOiJuYW1lc3BhY2UtY29udHJvbGxlci10b2tlbi1yODdiciIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VydmljZS1hY2NvdW50Lm5hbWUiOiJuYW1lc3BhY2UtY29udHJvbGxlciIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VydmljZS1hY2NvdW50LnVpZCI6ImU2NjU3ODI3LTc4NTUtNDAzOC04MmJjLTlmMjI0OWM3NzYyZiIsInN1YiI6InN5c3RlbTpzZXJ2aWNlYWNjb3VudDprdWJlLXN5c3RlbTpuYW1lc3BhY2UtY29udHJvbGxlciJ9.sVRT_x5NB4sqYwyyqn2Mm3hKg1jhvCsCDMbm_JY-3a19tknzwv_ZPpGOHWrPxmCG45_-tHExi7BbbGK1ZAky2UjtEpxmtVNR6yqHRMYvXtqifqHI4yS6ig-t5WiZ0a4h1q6xZfWsM9nlINSTGQbguCCN2kXUYyAZ0HPdPhdFtmyH9_fjI-FXQOPeK9t9GfWn9Nm52T85spzriwOMY96fFXZ3YaiuzfY5aBtGoxLwDu7O2GOazBmeFaRzEEGR0RjgdM7WPFmtDvbaidIJDPkLznqftqwUFeWHjz6-toO8iaKW_QKHFBvZTQ6uXSc__tbcSYyThu3Ty97-Ml8TArhacw
默认生成的kebuconf文件是不带 token 字段的,加上即可。
然后在页面上提交这个 kebuconf 文件即可登录。相比token登录方式,不需要每次去获取token内容,一次保存之后以后方便很多。
3.2.1.5 - 部署 metrics-server
安装 metrics-server
通过 kubeadm 安装的 k8s 集群默认是没有安装 metrics-server,因此需要手工安装。
注意:不要按照官方文档所说的那样直接安装,会不可用的。
修改 api server
先检查 k8s 集群的 api server 是否有启用API Aggregator:
ps -ef | grep apiserver
对比:
ps -ef | grep apiserver | grep enable-aggregator-routing
默认是没有开启的。因此需要修改 k8s apiserver 的配置文件:
sudo vi /etc/kubernetes/manifests/kube-apiserver.yaml
增加 --enable-aggregator-routing=true
apiVersion: v1
kind: Pod
......
spec:
containers:
- command:
- kube-apiserver
......
- --enable-bootstrap-token-auth=true
- --enable-aggregator-routing=true # 增加这行
api server 会自动重启,稍后用命令验证一下:
ps -ef | grep apiserver | grep enable-aggregator-routing
下载并修改安装文件
先下载安装文件,直接用最新版本:
wget https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml
如果要安装指定版本,请查看 https://github.com/kubernetes-sigs/metrics-server/releases/ 页面。
修改下载下来的 components.yaml, 增加 --kubelet-insecure-tls
并修改 --kubelet-preferred-address-types
:
template:
metadata:
labels:
k8s-app: metrics-server
spec:
containers:
- args:
- --cert-dir=/tmp
- --secure-port=4443
- --kubelet-preferred-address-types=InternalIP # 修改这行,默认是InternalIP,ExternalIP,Hostname
- --kubelet-use-node-status-port
- --metric-resolution=15s
- --kubelet-insecure-tls # 增加这行
然后安装:
$ k apply -f components.yaml
serviceaccount/metrics-server created
clusterrole.rbac.authorization.k8s.io/system:aggregated-metrics-reader created
clusterrole.rbac.authorization.k8s.io/system:metrics-server created
rolebinding.rbac.authorization.k8s.io/metrics-server-auth-reader created
clusterrolebinding.rbac.authorization.k8s.io/metrics-server:system:auth-delegator created
clusterrolebinding.rbac.authorization.k8s.io/system:metrics-server created
service/metrics-server created
deployment.apps/metrics-server created
apiservice.apiregistration.k8s.io/v1beta1.metrics.k8s.io created
稍等片刻看是否启动:
$ kubectl get pod -n kube-system | grep metrics-server
metrics-server-5979f785c8-lmtq5 1/1 Running 0 46s
验证一下,查看 service 信息
$ kubectl describe svc metrics-server -n kube-system
Name: metrics-server
Namespace: kube-system
Labels: k8s-app=metrics-server
Annotations: <none>
Selector: k8s-app=metrics-server
Type: ClusterIP
IP Family Policy: SingleStack
IP Families: IPv4
IP: 10.98.127.10
IPs: 10.98.127.10
Port: https 443/TCP
TargetPort: https/TCP
Endpoints: 10.244.0.37:4443 # ping 一下这个 IP 地址
Session Affinity: None
Events: <none>
使用
简单验证一下基本使用。
$ kubectl top nodes
NAME CPU(cores) CPU% MEMORY(bytes) MEMORY%
skyserver 384m 1% 1687Mi 1%
$ kubectl top pods -n kube-system
NAME CPU(cores) MEMORY(bytes)
coredns-64897985d-9z82d 2m 19Mi
coredns-64897985d-wkzc7 2m 20Mi
etcd-skyserver 23m 77Mi
kube-apiserver-skyserver 74m 282Mi
kube-controller-manager-skyserver 24m 58Mi
kube-flannel-ds-lnl72 4m 39Mi
kube-proxy-8g26s 1m 37Mi
kube-scheduler-skyserver 5m 23Mi
metrics-server-5979f785c8-lmtq5 4m 21Mi
3.2.1.6 - ubuntu 20.04下用 kubeadm 安装 kubenetes
参考 Kubernetes 官方文档:
前期准备
关闭防火墙
systemctl disable firewalld && systemctl stop firewalld
安装docker和bridge-utils
要求节点上安装有 docker (或者其他container runtime)和 bridge-utils (用来操作linux bridge).
查看 docker 版本:
$ docker --version
Docker version 20.10.21, build baeda1f
bridge-utils可以通过apt安装:
sudo apt-get install bridge-utils
设置iptables
要确保 br_netfilter
模块已经加载,可以通过运行 lsmod | grep br_netfilter
来完成。
$ lsmod | grep br_netfilter
br_netfilter 32768 0
bridge 307200 1 br_netfilter
如需要明确加载,请调用 sudo modprobe br_netfilter
。
为了让作为Linux节点的iptables能看到桥接流量,应该确保 net.bridge.bridge-nf-call-iptables
在 sysctl 配置中被设置为1,执行命令:
cat <<EOF | sudo tee /etc/modules-load.d/k8s.conf
br_netfilter
EOF
cat <<EOF | sudo tee /etc/sysctl.d/k8s.conf
net.bridge.bridge-nf-call-ip6tables = 1
net.bridge.bridge-nf-call-iptables = 1
EOF
sudo sysctl --system
禁用虚拟内存swap
执行 free -m
命令检测:
$ free -m
total used free shared buff/cache available
Mem: 15896 1665 11376 20 2854 13819
Swap: 0 0 0
如果Swap这一行不是0,则说明虚拟内存swap被开启了,需要关闭。
需要做两个事情:
-
操作系统安装时就不要设置swap分区,如果有,删除该swap分区
-
即使没有swap分区,也会开启swap,需要通过
sudo vi /etc/fstab
找到swap 这一行:# 在swap分区这行前加 # 禁用掉swap /swapfile none swap sw 0 0
重启之后再用
free -m
命令检测。
设置docker的cgroup driver
docker 默认的 cgroup driver 是 cgroupfs,可以通过 docker info 命令查看:
$ docker info | grep "Cgroup Driver"
Cgroup Driver: cgroupfs
而 kubernetes 在v1.22版本之后,如果用户没有在 KubeletConfiguration 下设置 cgroupDriver 字段,则 kubeadm 将默认为 systemd
。
需要修改 docker 的 cgroup driver 为 systemd
, 方式为打开 docker 的配置文件(如果不存在则创建)
sudo vi /etc/docker/daemon.json
增加内容:
{
"exec-opts": ["native.cgroupdriver=systemd"]
}
修改完成后重启 docker:
systemctl restart docker
# 重启后检查一下
docker info | grep "Cgroup Driver"
否则,在安装过程中,由于 cgroup driver 的不一致,kubeadm init
命令会因为 kubelet 无法启动而超时失败,报错为:
[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests". This can take up to 4m0s
[kubelet-check] Initial timeout of 40s passed.
Unfortunately, an error has occurred:
timed out waiting for the condition
This error is likely caused by:
- The kubelet is not running
- The kubelet is unhealthy due to a misconfiguration of the node in some way (required cgroups disabled)
If you are on a systemd-powered system, you can try to troubleshoot the error with the following commands:
- 'systemctl status kubelet'
- 'journalctl -xeu kubelet'
执行 systemctl status kubelet
会发现 kubelet 因为报错而退出,执行 journalctl -xeu kubelet
会发现有如下的错误信息:
Dec 26 22:31:21 skyserver2 kubelet[132861]: I1226 22:31:21.438523 132861 docker_service.go:264] "Docker Info" dockerInfo=&{ID:AEON:SBVF:43UK:WASV:YIQK:QGGA:7RU3:IIDK:DV7M:6QLH:5ICJ:KT6R Containers:2 ContainersRunning:0 ContainersPaused:>
Dec 26 22:31:21 skyserver2 kubelet[132861]: E1226 22:31:21.438616 132861 server.go:302] "Failed to run kubelet" err="failed to run Kubelet: misconfiguration: kubelet cgroup driver: \"systemd\" is different from docker cgroup driver: \"c>
Dec 26 22:31:21 skyserver2 systemd[1]: kubelet.service: Main process exited, code=exited, status=1/FAILURE
-- Subject: Unit process exited
-- Defined-By: systemd
-- Support: http://www.ubuntu.com/support
--
-- An ExecStart= process belonging to unit kubelet.service has exited.
--
-- The process' exit code is 'exited' and its exit status is 1.
参考:
- https://kubernetes.io/docs/tasks/administer-cluster/kubeadm/configure-cgroup-driver/
- https://blog.51cto.com/riverxyz/2537914
安装kubeadm
切记
想办法搞定全局翻墙,不然kubeadm安装是非常麻烦的。按照官方文档的指示,执行如下命令:
sudo -i
apt-get update
apt-get install -y apt-transport-https ca-certificates curl
curl -fsSLo /usr/share/keyrings/kubernetes-archive-keyring.gpg https://packages.cloud.google.com/apt/doc/apt-key.gpg
echo "deb [signed-by=/usr/share/keyrings/kubernetes-archive-keyring.gpg] https://apt.kubernetes.io/ kubernetes-xenial main" | sudo tee /etc/apt/sources.list.d/kubernetes.list
apt-get update
apt-get install -y kubelet kubeadm kubectl
这会安装最新版本的kubernetes:
......
Setting up conntrack (1:1.4.6-2build2) ...
Setting up kubectl (1.25.4-00) ...
Setting up ebtables (2.0.11-4build2) ...
Setting up socat (1.7.4.1-3ubuntu4) ...
Setting up cri-tools (1.25.0-00) ...
Setting up kubernetes-cni (1.1.1-00) ...
Setting up kubelet (1.25.4-00) ...
Created symlink /etc/systemd/system/multi-user.target.wants/kubelet.service → /lib/systemd/system/kubelet.service.
Setting up kubeadm (1.25.4-00) ...
Processing triggers for man-db (2.10.2-1) ...
Processing triggers for doc-base (0.11.1) ...
Processing 1 added doc-base file...
# 查看版本
$ kubeadm version
kubeadm version: &version.Info{Major:"1", Minor:"25", GitVersion:"v1.25.4", GitCommit:"872a965c6c6526caa949f0c6ac028ef7aff3fb78", GitTreeState:"clean", BuildDate:"2022-11-09T13:35:06Z", GoVersion:"go1.19.3", Compiler:"gc", Platform:"linux/amd64"}
$ kubelet --version
Kubernetes v1.25.4
$ kubectl version
Client Version: version.Info{Major:"1", Minor:"25", GitVersion:"v1.25.4", GitCommit:"872a965c6c6526caa949f0c6ac028ef7aff3fb78", GitTreeState:"clean", BuildDate:"2022-11-09T13:36:36Z", GoVersion:"go1.19.3", Compiler:"gc", Platform:"linux/amd64"}
Kustomize Version: v4.5.7
The connection to the server localhost:8080 was refused - did you specify the right host or port?
如果希望安装特定版本:
apt-get install kubelet=1.23.5-00 kubeadm=1.23.5-00 kubectl=1.23.5-00
apt-get install kubelet=1.23.14-00 kubeadm=1.23.14-00 kubectl=1.23.14-00
apt-get install kubelet=1.24.8-00 kubeadm=1.24.8-00 kubectl=1.24.8-00
具体有哪些可用的版本,可以看这里:
https://packages.cloud.google.com/apt/dists/kubernetes-xenial/main/binary-amd64/Packages
由于 kubernetes 1.25 之后默认使用
安装k8s
同样切记
想办法搞定全局翻墙。sudo kubeadm init --pod-network-cidr=10.244.0.0/16 -v=9
sudo kubeadm init --pod-network-cidr=10.244.0.0/16 --apiserver-advertise-address=192.168.100.40 -v=9
注意后面为了使用 CNI network 和 Flannel,我们在这里设置了 --pod-network-cidr=10.244.0.0/16
,如果不加这个设置,Flannel 会一直报错。如果机器上有多个网卡,可以用 --apiserver-advertise-address
指定要使用的IP地址。
如果遇到报错:
[preflight] Some fatal errors occurred:
[ERROR CRI]: container runtime is not running: output: E1125 11:16:01.799551 14661 remote_runtime.go:948] "Status from runtime service failed" err="rpc error: code = Unimplemented desc = unknown service runtime.v1alpha2.RuntimeService"
time="2022-11-25T11:16:01+08:00" level=fatal msg="getting status of runtime: rpc error: code = Unimplemented desc = unknown service runtime.v1alpha2.RuntimeService"
, error: exit status 1
则可以执行下列命令之后重新尝试 kubeadm init:
$ rm -rf /etc/containerd/config.toml
$ systemctl restart containerd.service
kubeadm init 输出如下:
......
Your Kubernetes control-plane has initialized successfully!
To start using your cluster, you need to run the following as a regular user:
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
Alternatively, if you are the root user, you can run:
export KUBECONFIG=/etc/kubernetes/admin.conf
You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
https://kubernetes.io/docs/concepts/cluster-administration/addons/
Then you can join any number of worker nodes by running the following on each as root:
kubeadm join 192.168.100.40:6443 --token uq5nqn.bppygpcqty6icec4 \
--discovery-token-ca-cert-hash sha256:51c13871cd25b122f3a743040327b98b1c19466d01e1804aa2547c047b83632b
为了使用普通用户,按照上面的提示执行:
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
安装完成后,node处于NotReady状态:
$ kubectl get node
NAME STATUS ROLES AGE VERSION
skyserver NotReady control-plane,master 3m7s v1.23.5
kubectl describe 可以看到是因为没有安装 network plugin
$ kubectl describe node skyserver
Name: skyserver
Roles: control-plane,master
......
Ready False Thu, 24 Mar 2022 13:57:21 +0000 Thu, 24 Mar 2022 13:57:06 +0000 KubeletNotReady container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized
安装flannel:
kubectl apply -f https://raw.githubusercontent.com/flannel-io/flannel/master/Documentation/kube-flannel.yml
备注:有时会遇到 raw.githubusercontent.com 这个域名被污染,解析为 127.0.0.1,导致无法访问。解决方法是访问 https://ipaddress.com/website/raw.githubusercontent.com 然后查看可用的IP地址,找一个速度最快的,在
/etc/hosts
文件中加入一行记录即可,如185.199.111.133 raw.githubusercontent.com
。
稍等就可以看到 node 的状态变为 Ready了:
$ kubectl get node
NAME STATUS ROLES AGE VERSION
skyserver Ready control-plane,master 4m52s v1.23.5
最后,如果是测试用的单节点,为了让负载可以跑在k8s的master节点上,执行下列命令去除master的污点:
kubectl taint nodes --all node-role.kubernetes.io/master-
可以通过 kubectl describe node skyserver
对比去除污点前后 node 信息中的 Taints 部分,去除污点前:
Taints: node.kubernetes.io/not-ready:NoExecute
node-role.kubernetes.io/master:NoSchedule
node.kubernetes.io/not-ready:NoSchedule
去除污点后:
Taints: <none>
常见问题
有时会遇到 coredns pod无法创建的情况:
$ k get pods -A
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system coredns-64897985d-9z82d 0/1 ContainerCreating 0 82s
kube-system coredns-64897985d-wkzc7 0/1 ContainerCreating 0 82s
问题发生在 flannel 上:
$ k describe pods -n kube-system coredns-64897985d-9z82d
......
Warning FailedCreatePodSandBox 100s kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "675b91ac9d25f0385d3794847f47c94deac2cb712399c21da59cf90e7cccb246" network for pod "coredns-64897985d-9z82d": networkPlugin cni failed to set up pod "coredns-64897985d-9z82d_kube-system" network: open /run/flannel/subnet.env: no such file or directory
Normal SandboxChanged 97s (x12 over 108s) kubelet Pod sandbox changed, it will be killed and re-created.
Warning FailedCreatePodSandBox 96s (x4 over 99s) kubelet (combined from similar events): Failed to create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "b46dcd8abb9ab0787fdb2ab9f33ebf052c2dd1ad091c006974a3db7716904196" network for pod "coredns-64897985d-9z82d": networkPlugin cni failed to set up pod "coredns-64897985d-9z82d_kube-system" network: open /run/flannel/subnet.env: no such file or directory
解决的方式就是重新执行:
kubectl apply -f https://raw.githubusercontent.com/flannel-io/flannel/master/Documentation/kube-flannel.yml
备注:这个问题只遇到过一次。
失败重来
如果遇到安装失败,需要重新开始,或者想铲掉现有的安装,则可以:
- 运行
kubeadm reset
- 删除
.kube
目录 - 再次执行
kubeadm init
如果网络设置有改动,则需要彻底的重置网络。具体见下一章。
将节点加入到集群
如果有多个kubenetes节点(即多台机器),则需要将其他节点加入到集群中。具体见下一章。
3.2.2 - 在debian12上安装kubenetes
3.2.2.1 - 在debian12上安装kubenetes
准备
系统更新
确保更新debian系统到最新,移除不再需要的软件,清理无用的安装包:
sudo apt update && sudo apt full-upgrade -y
sudo apt autoremove
sudo apt autoclean
如果更新了内核,最好重启一下。
swap分区
安装 Kubernetes 要求机器不能有 swap 分区。
开启模块
cat <<EOF | sudo tee /etc/modules-load.d/k8s.conf
overlay
br_netfilter
EOF
sudo modprobe overlay
sudo modprobe br_netfilter
# sysctl params required by setup, params persist across reboots
cat <<EOF | sudo tee /etc/sysctl.d/k8s.conf
net.bridge.bridge-nf-call-iptables = 1
net.bridge.bridge-nf-call-ip6tables = 1
net.ipv4.ip_forward = 1
EOF
# Apply sysctl params without reboot
sudo sysctl --system
安装 docker
卸载非官方版本 Docker:
for pkg in docker.io docker-doc docker-compose podman-docker containerd runc; do sudo apt-get remove $pkg; done
安装:
sudo apt-get update
sudo apt-get install ca-certificates curl gnupg
sudo install -m 0755 -d /etc/apt/keyrings
curl -fsSL https://download.docker.com/linux/debian/gpg | sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg
sudo chmod a+r /etc/apt/keyrings/docker.gpg
echo \
"deb [arch="$(dpkg --print-architecture)" signed-by=/etc/apt/keyrings/docker.gpg] https://download.docker.com/linux/debian \
"$(. /etc/os-release && echo "$VERSION_CODENAME")" stable" | \
sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
sudo apt-get update
sudo apt-get install docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin
启动 docker 并设置开机自动运行:
sudo systemctl enable docker --now
安装 goalng 1.20 或者更高版本
这是为了给下面的手工 build cri-dockerd 做准备。
下载最新版本的 golang。
mkdir -p ~/temp
mkdir -p ~/work/soft/gopath
cd ~/temp
wget https://go.dev/dl/go1.22.4.linux-amd64.tar.gz
sudo tar -C /usr/local -xzf go1.22.4.linux-amd64.tar.gz
修改
vi ~/.zshrc
加入以下内容:
export GOPATH=/home/sky/work/soft/gopath
export PATH=/usr/local/go/bin:$GOPATH/bin:$PATH
执行:
source ~/.zshrc
go version
go env
安装 cri-dockerd
注意需要先安装 goalng 1.20 或者更高版本。
mkdir -p ~/temp
cd ~/temp
git clone https://github.com/Mirantis/cri-dockerd.git
cd cri-dockerd
make cri-dockerd
sudo mkdir -p /usr/local/bin
sudo install -o root -g root -m 0755 cri-dockerd /usr/local/bin/cri-dockerd
sudo install packaging/systemd/* /etc/systemd/system
sudo sed -i -e 's,/usr/bin/cri-dockerd,/usr/local/bin/cri-dockerd,' /etc/systemd/system/cri-docker.service
sudo systemctl daemon-reload
sudo systemctl enable cri-docker.service
sudo systemctl enable --now cri-docker.socket
安装 helm
为后面安装 dashboard 做准备:
curl https://baltocdn.com/helm/signing.asc | gpg --dearmor | sudo tee /usr/share/keyrings/helm.gpg > /dev/null
sudo apt-get install apt-transport-https --yes
echo "deb [arch=$(dpkg --print-architecture) signed-by=/usr/share/keyrings/helm.gpg] https://baltocdn.com/helm/stable/debian/ all main" | sudo tee /etc/apt/sources.list.d/helm-stable-debian.list
sudo apt-get update
sudo apt-get install helm
安装 kubernetes
安装 kubeadm / kubelet / kubectl
sudo apt-get update
sudo apt-get install -y apt-transport-https
假定要安装的 kubernetes 版本为 1.29:
export K8S_VERSION=1.29
curl -fsSL https://pkgs.k8s.io/core:/stable:/v${K8S_VERSION}/deb/Release.key | sudo gpg --dearmor -o /etc/apt/keyrings/kubernetes-apt-keyring.gpg
echo "deb [signed-by=/etc/apt/keyrings/kubernetes-apt-keyring.gpg] https://pkgs.k8s.io/core:/stable:/v${K8S_VERSION}/deb/ /" | sudo tee /etc/apt/sources.list.d/kubernetes.list
开始安装 kubelet kubeadm kubectl:
sudo apt update
sudo apt install kubelet kubeadm kubectl -y
禁止这三个程序的自动更新:
sudo apt-mark hold kubelet kubeadm kubectl
验证安装:
kubectl version --client && echo && kubeadm version
Client Version: v1.29.6
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
kubeadm version: &version.Info{Major:"1", Minor:"29", GitVersion:"v1.29.6", GitCommit:"062798d53d83265b9e05f14d85198f74362adaca", GitTreeState:"clean", BuildDate:"2024-06-11T20:22:13Z", GoVersion:"go1.21.11", Compiler:"gc", Platform:"linux/amd64"}
优化zsh
在 ~/.zshrc
中增加以下内容:
# k8s auto complete
alias k=kubectl
complete -F __start_kubectl k
source ~/.zshrc
之后即可使用,此时用 k 这个别名来执行 kubectl 命令时也可以实现自动完成,非常的方便。
取消自动更新
docker / helm / kubernetes 这些的版本没有必要升级到最新,因此可以取消他们的自动更新。
cd /etc/apt/sources.list.d
ls
docker.list helm-stable-debian.list kubernetes.list
初始化集群
sudo kubeadm init --pod-network-cidr 10.244.0.0/16 --cri-socket unix:///var/run/cri-dockerd.sock --apiserver-advertise-address=192.168.6.224
输出为:
sudo kubeadm init --pod-network-cidr 10.244.0.0/16 --cri-socket unix:///var/run/cri-dockerd.sock --apiserver-advertise-address=192.168.6.224
I0621 05:51:22.665581 20837 version.go:256] remote version is much newer: v1.30.2; falling back to: stable-1.29
[init] Using Kubernetes version: v1.29.6
[preflight] Running pre-flight checks
[preflight] Pulling images required for setting up a Kubernetes cluster
[preflight] This might take a minute or two, depending on the speed of your internet connection
[preflight] You can also perform this action in beforehand using 'kubeadm config images pull'
[certs] Using certificateDir folder "/etc/kubernetes/pki"
[certs] Generating "ca" certificate and key
[certs] Generating "apiserver" certificate and key
[certs] apiserver serving cert is signed for DNS names [debian12 kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local] and IPs [10.96.0.1 192.168.6.224]
[certs] Generating "apiserver-kubelet-client" certificate and key
[certs] Generating "front-proxy-ca" certificate and key
[certs] Generating "front-proxy-client" certificate and key
[certs] Generating "etcd/ca" certificate and key
[certs] Generating "etcd/server" certificate and key
[certs] etcd/server serving cert is signed for DNS names [debian12 localhost] and IPs [192.168.6.224 127.0.0.1 ::1]
[certs] Generating "etcd/peer" certificate and key
[certs] etcd/peer serving cert is signed for DNS names [debian12 localhost] and IPs [192.168.6.224 127.0.0.1 ::1]
[certs] Generating "etcd/healthcheck-client" certificate and key
[certs] Generating "apiserver-etcd-client" certificate and key
[certs] Generating "sa" key and public key
[kubeconfig] Using kubeconfig folder "/etc/kubernetes"
[kubeconfig] Writing "admin.conf" kubeconfig file
[kubeconfig] Writing "super-admin.conf" kubeconfig file
[kubeconfig] Writing "kubelet.conf" kubeconfig file
[kubeconfig] Writing "controller-manager.conf" kubeconfig file
[kubeconfig] Writing "scheduler.conf" kubeconfig file
[etcd] Creating static Pod manifest for local etcd in "/etc/kubernetes/manifests"
[control-plane] Using manifest folder "/etc/kubernetes/manifests"
[control-plane] Creating static Pod manifest for "kube-apiserver"
[control-plane] Creating static Pod manifest for "kube-controller-manager"
[control-plane] Creating static Pod manifest for "kube-scheduler"
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Starting the kubelet
[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests". This can take up to 4m0s
[apiclient] All control plane components are healthy after 3.500697 seconds
[upload-config] Storing the configuration used in ConfigMap "kubeadm-config" in the "kube-system" Namespace
[kubelet] Creating a ConfigMap "kubelet-config" in namespace kube-system with the configuration for the kubelets in the cluster
[upload-certs] Skipping phase. Please see --upload-certs
[mark-control-plane] Marking the node debian12 as control-plane by adding the labels: [node-role.kubernetes.io/control-plane node.kubernetes.io/exclude-from-external-load-balancers]
[mark-control-plane] Marking the node debian12 as control-plane by adding the taints [node-role.kubernetes.io/control-plane:NoSchedule]
[bootstrap-token] Using token: 1x7fi1.kjxn00med7dd3xwx
[bootstrap-token] Configuring bootstrap tokens, cluster-info ConfigMap, RBAC Roles
[bootstrap-token] Configured RBAC rules to allow Node Bootstrap tokens to get nodes
[bootstrap-token] Configured RBAC rules to allow Node Bootstrap tokens to post CSRs in order for nodes to get long term certificate credentials
[bootstrap-token] Configured RBAC rules to allow the csrapprover controller automatically approve CSRs from a Node Bootstrap Token
[bootstrap-token] Configured RBAC rules to allow certificate rotation for all node client certificates in the cluster
[bootstrap-token] Creating the "cluster-info" ConfigMap in the "kube-public" namespace
[kubelet-finalize] Updating "/etc/kubernetes/kubelet.conf" to point to a rotatable kubelet client certificate and key
[addons] Applied essential addon: CoreDNS
[addons] Applied essential addon: kube-proxy
Your Kubernetes control-plane has initialized successfully!
To start using your cluster, you need to run the following as a regular user:
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
Alternatively, if you are the root user, you can run:
export KUBECONFIG=/etc/kubernetes/admin.conf
You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
https://kubernetes.io/docs/concepts/cluster-administration/addons/
Then you can join any number of worker nodes by running the following on each as root:
kubeadm join 192.168.6.224:6443 --token 1x7fi1.kjxn00med7dd3xwx \
--discovery-token-ca-cert-hash sha256:51037fa4e37f485e10cb8ddfe8ec23e57d0dcd6698e5982f01449b6b6ca843e5
根据提示操作:
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
此时节点的状态会是 NotReady:
kubectl get node
NAME STATUS ROLES AGE VERSION
debian12 NotReady control-plane 4m7s v1.29.6
需要继续安装网络插件,可以选择 flannel 或者 Calico。
对于测试用的单节点,去除 master/control-plane 的污点:
kubectl taint nodes --all node-role.kubernetes.io/control-plane-
(可选)安装 flannel
kubectl apply -f https://raw.githubusercontent.com/flannel-io/flannel/master/Documentation/kube-flannel.yml
(可选)安装 Calico
查看最新版本,当前最新版本是 v3.28:
kubectl create -f https://raw.githubusercontent.com/projectcalico/calico/v3.28.0/manifests/calico.yaml
安装 dashboard
在下面地址上查看当前dashboard的版本:
https://github.com/kubernetes/dashboard/releases
根据对kubernetes版本的兼容情况选择对应的dashboard的版本:
- dashboard 7.5 : 全面兼容 k8s 1.29
最新版本需要用 helm 进行安装:
helm repo add kubernetes-dashboard https://kubernetes.github.io/dashboard/
helm upgrade --install kubernetes-dashboard kubernetes-dashboard/kubernetes-dashboard --create-namespace --namespace kubernetes-dashboard
输出为:
helm repo add kubernetes-dashboard https://kubernetes.github.io/dashboard/
helm upgrade --install kubernetes-dashboard kubernetes-dashboard/kubernetes-dashboard --create-namespace --namespace kubernetes-dashboard
"kubernetes-dashboard" has been added to your repositories
Release "kubernetes-dashboard" does not exist. Installing it now.
NAME: kubernetes-dashboard
LAST DEPLOYED: Fri Jun 21 06:23:53 2024
NAMESPACE: kubernetes-dashboard
STATUS: deployed
REVISION: 1
TEST SUITE: None
NOTES:
*************************************************************************************************
*** PLEASE BE PATIENT: Kubernetes Dashboard may need a few minutes to get up and become ready ***
*************************************************************************************************
Congratulations! You have just installed Kubernetes Dashboard in your cluster.
To access Dashboard run:
kubectl -n kubernetes-dashboard port-forward svc/kubernetes-dashboard-kong-proxy 8443:443
NOTE: In case port-forward command does not work, make sure that kong service name is correct.
Check the services in Kubernetes Dashboard namespace using:
kubectl -n kubernetes-dashboard get svc
Dashboard will be available at:
https://localhost:8443
此时 dashboard 的 service 和 pod 情况:
kubectl -n kubernetes-dashboard get services
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kubernetes-dashboard-api ClusterIP 10.107.22.93 <none> 8000/TCP 17m
kubernetes-dashboard-auth ClusterIP 10.102.201.198 <none> 8000/TCP 17m
kubernetes-dashboard-kong-manager NodePort 10.103.64.84 <none> 8002:30161/TCP,8445:31811/TCP 17m
kubernetes-dashboard-kong-proxy ClusterIP 10.97.134.204 <none> 443/TCP 17m
kubernetes-dashboard-metrics-scraper ClusterIP 10.98.177.211 <none> 8000/TCP 17m
kubernetes-dashboard-web ClusterIP 10.109.72.203 <none> 8000/TCP 17m
可以访问 http://ip:31811 来访问 dashboard 中的 kong manager。
以前的版本是要访问 kubernetes-dashboard service,现在新版本修改为要访问 kubernetes-dashboard-kong-proxy。
为了方便,使用 node port 来访问 dashboard,需要执行
kubectl -n kubernetes-dashboard edit service kubernetes-dashboard-kong-proxy
然后修改 type: ClusterIP
为 type: NodePort
。然后看一下具体分配的 node port 是哪个:
kubectl -n kubernetes-dashboard get service kubernetes-dashboard-kong-proxy
输出为:
$ kubectl -n kubernetes-dashboard get service kubernetes-dashboard-kong-proxy
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kubernetes-dashboard-kong-proxy NodePort 10.97.134.204 <none> 443:30730/TCP 24m
直接就可以用浏览器直接访问:
https://192.168.0.101:30730/
创建用户并登录 dashboard
创建 admin-user 用户:
vi admin-user-ServiceAccount.yaml
内容为:
apiVersion: v1
kind: ServiceAccount
metadata:
name: admin-user
namespace: kubernetes-dashboard
执行:
k create -f admin-user-ServiceAccount.yaml
然后绑定角色:
vi admin-user-ClusterRoleBinding.yaml
内容为:
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: admin-user
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: cluster-admin
subjects:
- kind: ServiceAccount
name: admin-user
namespace: kubernetes-dashboard
执行:
k create -f admin-user-ClusterRoleBinding.yaml
然后创建 token :
kubectl -n kubernetes-dashboard create token admin-user
输出为:
$ kubectl -n kubernetes-dashboard create token admin-user
eyJhbGciOiJSUzI1NiIsImtpZCI6IjdGczc3STI1VVA1OFpKdF9zektMVVFtZjd1NXRDRU8xTTZpZ1VYbDdKWFEifQ.eyJhdWQiOlsiaHR0cHM6Ly9rdWJlcm5ldGVzLmRlZmF1bHQuc3ZjLmNsdXN0ZXIubG9jYWwiXSwiZXhwIjoxNzE5MDM2NDM0LCJpYXQiOjE3MTkwMzI4MzQsImlzcyI6Imh0dHBzOi8va3ViZXJuZXRlcy5kZWZhdWx0LnN2Yy5jbHVzdGVyLmxvY2FsIiwia3ViZXJuZXRlcy5pbyI6eyJuYW1lc3BhY2UiOiJrdWJlcm5ldGVzLWRhc2hib2FyZCIsInNlcnZpY2VhY2NvdW50Ijp7Im5hbWUiOiJhZG1pbi11c2VyIiwidWlkIjoiNGY4YmQ3YjAtZjM2OS00MjgzLWJlNmItMThjNjUyMzE0YjQ0In19LCJuYmYiOjE3MTkwMzI4MzQsInN1YiI6InN5c3RlbTpzZXJ2aWNlYWNjb3VudDprdWJlcm5ldGVzLWRhc2hib2FyZDphZG1pbi11c2VyIn0.GOYLXoCCeaZPQ-kuJgx0d4KzRnLkHDHJArAjOwRqg49WIhAl3Hb8O2oD6at2jFgItO-xihFm3D3Ru2jXnPnMhvir0BJ5LBnumH0xDakZ4PrwvCAQADv8KR1ZuzMHlN5yktJ14eSo_UN1rZarq5P1DnbAIHRmgtIlRL2Hfl_Bamkuoxpwr06v50nJHskW7K3A2LjUlgv5rdS7FckIPaD5apmag7NyUi7FP1XEItUX20tF7jy5E5Gv9mI_HDGMTVMxawY4IAvipRcKVQ3tAypVOOMhrqGsfBprtWUkwmyWW8p0jHcAmqq-WX-x-vN70qI4Y2RipKGd4d6z39zPEPCsow
这个 token 就可以用在 kubernetes-dashboard 的登录页面上了。
为了方便,将这个 token 存储在 Secret :
vi admin-user-Secret.yaml
内容为:
apiVersion: v1
kind: Secret
metadata:
name: admin-user
namespace: kubernetes-dashboard
annotations:
kubernetes.io/service-account.name: "admin-user"
type: kubernetes.io/service-account-token
执行:
k create -f admin-user-Secret.yaml
之后就可以用命令随时获取这个 token 了:
kubectl get secret admin-user -n kubernetes-dashboard -o jsonpath={".data.token"} | base64 -d
安装 metrics server
下载:
cd ~/work/soft/k8s
wget https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml
修改下载下来的 components.yaml, 增加 --kubelet-insecure-tls
并修改 --kubelet-preferred-address-types
:
template:
metadata:
labels:
k8s-app: metrics-server
spec:
containers:
- args:
- --cert-dir=/tmp
- --secure-port=4443
- --kubelet-preferred-address-types=InternalIP # 修改这行,默认是InternalIP,ExternalIP,Hostname
- --kubelet-use-node-status-port
- --metric-resolution=15s
- --kubelet-insecure-tls # 增加这行
然后安装:
k apply -f components.yaml
稍等片刻看是否启动:
$ kubectl get pod -n kube-system | grep metrics-server
验证一下,查看 service 信息
kubectl describe svc metrics-server -n kube-system
简单验证一下基本使用:
kubectl top nodes
kubectl top pods -n kube-system
参考资料
3.3 - 通过 minikube 安装 kubenetes
3.3.1 - minikube概述
3.3.2 - ubuntu下用minikube安装
参考官方资料:
https://kubernetes.io/docs/tasks/tools/install-minikube/
准备
- VT-x 或 AMD-v 虚拟化支持必须在电脑的bios中开启
- 安装虚拟机: 对于 Linux, 可以安装 VirtualBox 或 KVM
安装VirtualBox
具体操作参考这里:
https://skyao.io/learning-linux-mint/daily/system/virtualbox.html
安装kubectl
参考:
https://kubernetes.io/docs/tasks/tools/install-kubectl/
执行命令如下:
curl -LO https://storage.googleapis.com/kubernetes-release/release/$(curl -s https://storage.googleapis.com/kubernetes-release/release/stable.txt)/bin/linux/amd64/kubectl
chmod +x ./kubectl
sudo mv ./kubectl /usr/local/bin/kubectl
kubectl version
注意:如果更新了Minikube,务必重新再执行上述步骤以便更新kubectl到最新版本,否则可能出现问题,比如minikube dashboard
打不开浏览器。
如果需要重新安装,需要先执行清理:
rm -rf ~/.kube/
sudo rm -rf /usr/bin/kubectl
安装Minikube
安装命令:
curl -LO https://storage.googleapis.com/minikube/releases/latest/minikube-linux-amd64 \
&& sudo install minikube-linux-amd64 /usr/local/bin/minikube
minikube version
如果有版本更新,重新执行上面命令再次安装即可。
通过minikube运行k8s
参考: https://kubernetes.io/docs/getting-started-guides/minikube/
warn::一定要设置代理
切记要设置代理,否则会因为网络被墙导致无法获取镜像,
kubectl get pod
会发现一直阻塞在 statusContainerCreating
.
minikube start --docker-env http_proxy=http://192.168.31.152:8123 --docker-env https_proxy=http://192.168.31.152:8123 --docker-env no_proxy=localhost,127.0.0.1,::1,192.168.31.0/24,192.168.99.0/24
如果有全局翻墙,就可以简单的minikube start
启动。
注意:这里的代理一定要是http代理,因此不能直接输入shadowsocks的地址,要用pilipo提供的http代理,而且要设置pilipo的proxyAddress,不能只监听127.0.0.1.
如果没有正确设置代理就执行了 minikube start
,则只能删除虚拟机然后重新来过,后面再设置代理是没有效果的:
minikube stop
minikube delete
按照上面的文章可以测试minikube是否可以正常工作.如果想查看k8s的控制台,输入下面命令:
minikube dashboard
备忘
用到的一些的命令,备用.
kubectl:
- kubectl get pod
- kubectl get pods –all-namespaces
- kubectl get service
- kubectl describe po hello-minikube-180744149-lj0rd
minikube:
- minikube dashboard
- minikube status
- minikube service hello-minikube –url
- curl $(minikube service hello-minikube –url)
3.3.3 - MacOS下用minikube安装
安装VirtualBox
https://www.virtualbox.org/wiki/Downloads
下载安装for mac的版本,如 “ VirtualBox 6.0.4 platform packages” 下 的 OSX hosts,然后安装下载的img文件。
之后在下载 VirtualBox 6.0.4 Oracle VM VirtualBox Extension Pack,双击安装。
安装kubectl
brew install kubernetes-cli
安装minikube
brew cask install minikube
完成后测试一下:
$ kubectl version
Client Version: version.Info{Major:"1", Minor:"13", GitVersion:"v1.13.2", GitCommit:"cff46ab41ff0bb44d8584413b598ad8360ec1def", GitTreeState:"clean", BuildDate:"2019-01-13T23:15:13Z", GoVersion:"go1.11.4", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"13", GitVersion:"v1.13.2", GitCommit:"cff46ab41ff0bb44d8584413b598ad8360ec1def", GitTreeState:"clean", BuildDate:"2019-01-10T23:28:14Z", GoVersion:"go1.11.4", Compiler:"gc", Platform:"linux/amd64"}
启动minikube
如果是在可以翻墙的环境中,安装最新版本的kubernetes,可以简单执行命令:
minikube start
如果不支持全局翻墙,可以指定镜像地址,也可以指定要安装的kubernetes版本:
minikube start --memory=8192 --cpus=4 --disk-size=20g --registry-mirror=https://docker.mirrors.ustc.edu.cn --kubernetes-version=v1.12.5 --docker-env http_proxy=http://192.168.0.40:8123 --docker-env https_proxy=http://192.168.0.40:8123 --docker-env no_proxy=localhost,127.0.0.1,::1,192.168.0.0/24,192.168.99.0/24
minikube start --memory=8192 --cpus=4 --disk-size=20g --kubernetes-version=v1.12.5
实测下载还是出问题,怀疑是不是要在minikuber start 前面再加 http_proxy=http://192.168.0.40:8123 http_proxys=http://192.168.0.40:8123
稍后验证。
启动dashborad
标准方式,执行命令 minikube dashboard
,然后就会自动打开浏览器访问地址 http://localhost:8001/api/v1/namespaces/kube-system/services/https:kubernetes-dashboard:/proxy/#!/login
备注:之前各个版本都正常,最近用v0.33.1的minikuber安装的kubernetes 1.13.2版本,遇到问题,报错如下:
$ kubectl version
Client Version: version.Info{Major:"1", Minor:"13", GitVersion:"v1.13.2", GitCommit:"cff46ab41ff0bb44d8584413b598ad8360ec1def", GitTreeState:"clean", BuildDate:"2019-01-13T23:15:13Z", GoVersion:"go1.11.4", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"13", GitVersion:"v1.13.2", GitCommit:"cff46ab41ff0bb44d8584413b598ad8360ec1def", GitTreeState:"clean", BuildDate:"2019-01-10T23:28:14Z", GoVersion:"go1.11.4", Compiler:"gc", Platform:"linux/amd64"}
$ minikube version
minikube version: v0.33.1
$ minikube dashboard
Enabling dashboard ...
Verifying dashboard health ...
Launching proxy ...
Verifying proxy health ...
http://127.0.0.1:51695/api/v1/namespaces/kube-system/services/http:kubernetes-dashboard:/proxy/ is not responding properly: Temporary Error: unexpected response code: 503
Temporary Error: unexpected response code: 503
Temporary Error: unexpected response code: 503
Temporary Error: unexpected response code: 503
导致无法打开dashboard,只好用 kubeproxy 的方式:
$ kubectl proxy
Starting to serve on 127.0.0.1:8001
然后手动打开地址: http://localhost:8001/api/v1/namespaces/kube-system/services/https:kubernetes-dashboard:/proxy/#!/login
参考:https://stackoverflow.com/questions/52916548/minikube-dashboard-returns-503-error-on-macos
然后又是遇到 dashboard 登录的问题。
3.4 - 在 Docker Desktop 中安装 kubenetes
3.4.1 - MacOS下用Docker Desktop安装kubernetes
这是mac下获得一个可用的kubernetes的最简单的方法
安装
在 docker desktop 安装完成之后,在 “Preferences” 中,左边选择 “Kubernetes”
docker desktop 会自行安装好最新版本的 kubernetes。
这个方式的最大优点是足够简单,只要网络OK基本只要点击下鼠标即可。
4 - Sidecar Container
4.1 - Sidecar Container概述
From Kubernetes 1.18 containers can be marked as sidecars
Unfortunately, that features has been removed from 1.18, then removed from 1.19 and currently has no specific date for landing.
reference: kubernetes/enhancements#753
资料
官方正式资料
-
Sidecar Containers(kubernetes/enhancements#753): 最权威的资料了,准备细读
-
Support startup dependencies between containers on the same Pod
社区介绍资料
- Sidecar Containers improvement in Kubernetes 1.18: 重点阅读
- Kubernetes — Learn Sidecar Container Pattern
- Sidecar container lifecycle changes in Kubernetes 1.18
- Tutorial: Apply the Sidecar Pattern to Deploy Redis in Kubernetes
- Sidecar Containers:by 陈鹏,特别鸣谢
相关项目的处理
Istio
信息1
https://github.com/kubernetes/enhancements/issues/753#issuecomment-684176649
We use a custom daemon image like a supervisor
to wrap the user’s program. The daemon will also listen to a particular port to convey the health status of users’ programs (exited or not).
我们使用一个类似
supervisor
的自定义守护进程镜像来包装用户的程序。守护进程也会监听特定的端口来传达用户程序的健康状态(是否退出)。
Here is the workaround:
- Using the daemon image as
initContainers
to copy the binary to a shared volume. - Our
CD
will hijack users’ command, let the daemon start first. Then, the daemon runs the users’ program until Envoy is ready. - Also, we add
preStop
, a script that keeps checking the daemon’s health status, for Envoy.
下面是变通的方法:
- 以 “initContainers” 的方式用守护进程的镜像来复制二进制文件到共享卷。
- 我们的
CD
会劫持用户的命令,让守护进程先启动,然后,守护进程运行用户的程序,直到 Envoy 准备好。- 同时,我们还为Envoy添加
preStop
,一个不断检查守护进程健康状态的脚本。
As a result, the users’ process will start if Envoy is ready, and Envoy will stop after the process of users is exited.
结果,如果Envoy准备好了,用户的程序就会启动,而Envoy会在用户的程序退出后停止。
It’s a complicated workaround, but it works fine in our production environment.
这是一个复杂的变通方法,但在我们的生产环境中运行良好。
信息2
还找到一个答复: https://github.com/kubernetes/enhancements/issues/753#issuecomment-687184232
Allow users to delay application start until proxy is ready
for startup issues, the istio community came up with a quite clever workaround which basically injects envoy as the first container in the container list and adds a postStart hook that checks and wait for envoy to be ready. This is blocking and the other containers are not started making sure envoy is there and ready before starting the app container.
对于启动问题,istio社区想出了一个相当聪明的变通方法,基本上是将envoy作为容器列表中的第一个容器注入,并添加一个postStart钩子,检查并等待envoy准备好。这是阻塞的,而其他容器不会启动,这样确保envoy启动并且准备好之后,然后再启动应用程序容器。
We had to port this to the version we’re running but is quite straightforward and are happy with the results so far.
我们已经将其移植到我们正在运行的版本中,很直接,目前对结果很满意。
For shutdown we are also ‘solving’ with preStop hook but adding an arbitrary sleep which we hope the application would have gracefully shutdown before continue with SIGTERM.
对于关机,我们也用 preStop 钩子来 “解决”,但增加了一个任意的 sleep,我们希望应用程序在继续 SIGTERM 之前能优雅地关机。
相关issue: Enable holdApplicationUntilProxyStarts at pod level
Knative
dapr
- Clarify lifecycle of Dapr process and app process : dapr项目中在等待 sidecar container的结果。在此之前,dapr做了一个简单的调整,将daprd这个sidecar的启动顺序放在最前面(详见 https://github.com/dapr/dapr/pull/2341)
4.2 - KEP753: Sidecar Container
相关issue
https://github.com/kubernetes/enhancements/issues/753
这个issue 开启于 2019年1月。
One-line enhancement description: Containers can now be a marked as sidecars so that they startup before normal containers and shutdown after all other containers have terminated.
一句话改进描述:容器现在可以被标记为 sidecar,使其在正常容器之前启动,并在所有其他容器终止后关闭。
设计提案链接:https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/753-sidecar-containers
4.3 - 推翻KEP753的讨论
https://github.com/kubernetes/enhancements/pull/1980
这是一个关于 sidecar 的讨论汇总,最后得出的结论是推翻 kep753.
起于derekwaynecarr的发言
I want to capture my latest thoughts on sidecar concepts, and get a path forward.
Here is my latest thinking:
我想归纳我对 sidecar 概念的最新思考,并得到一条前进的道路。
这是我的最新思考。
I think it’s important to ask if the introduction of sidecar containers will actually address an end-user requirement or just shift a problem and further constrain adoption of sidecars themselves by pod authors. To help frame this exercise, I will look at the proposed use of sidecar containers in the service mesh community.
我认为重要的是要问一下 sidecar容器的引入是否会真正解决最终用户的需求,或者只是转移一个问题,并进一步限制pod作者对sidecars本身的采用。为了帮助构架这项工作,我将看看服务网格社区中拟议的 sidecar 容器的使用情况。
User story
I want to enable mTLS for all traffic in my mesh because my auditor demands it.
我想在我的Mesh中启用mTLS,因为我的会计要求这样做。
The proposed solution is the introduction of sidecar containers that change the pod lifecycle:
提出的解决方案是引入sidecar container,改变 pod 的生命周期:
- Init containers start/stop
- Sidecar containers start
- Primary containers start/stop
- Sidecar containers stop
The issue with the proposed solution meeting the user story is as follows:
建议的解决方案可以满足用户故事的问题如下:
-
Init containers are not subject to service mesh because the proxy is not running. This is because init containers run to completion before starting the next container. Many users do network interaction that should be subject to the mesh in their init container.
Init container 不受服务网格的影响,因为代理没有运行。这是因为init container 在启动下一个容器之前会运行到完成状态。很多用户在 init container 中做网络交互,应该受制于网格。
-
Sidecar containers (once introduced) will be used by users for use cases unrelated to the mesh, but subject to the mesh. The proposal makes no semantic guarantees on ordering among sidecars. Similar to init containers, this means sidecars are not guaranteed to participate in the mesh.
Sidecar 容器(一旦引入)将被用户用于与网格无关但受网格制约的用例。该提案没有对sidecars之间的顺序进行语义保证。与 init 容器类似,这意味着 sidecar 不能保证参与 mesh。
The real requirement is that the proxy container MUST stop last even among sidecars if those sidecars require network.
真正的需求是,如果这些sidecar需要网络,代理容器也必须最后停止,即使代理容器也是 sidecar。
Similar to the behavior observed with init containers (users externalize run-once setup from their main application container), the introduction of sidecar containers will result in more elements of the application getting externalized into sidecars, but those elements will still desire to be part of the mesh when they require a network. Hence, we are just shifting, and not solving the problem.
与观察到的init容器的行为类似(用户从他们的主应用容器中外部化一次性设置),引入sidecar容器将导致更多的应用元素被外部化到sidecar中,但是当这些元素需要网络时,它们仍然会渴望成为网格结构的一部分。因此,我们只是在转移,而不是解决问题。
Given the above gaps, I feel we are not actually solving a primary requirement that would drive improved adoption of a service mesh (ensure all traffic is mTLS from my pod) to meet auditing.
鉴于上述差距,我觉得我们并没有实际上解决主要需求,这个需求将推动服务网格的改进采用(确保所有来自我的pod的流量都是mTLS),以满足审计。
Alternative proposal:
- Support an ordered graph among containers in the pod (it’s inevitable), possibly with N rings of runlevels?
- Identify which containers in that graph must run to completion before initiating termination (Job use case).
- Move init containers into the graph (collapse the concept)
- Have some way to express if a network is required by the container to act as a hint for the mesh community on where to inject a proxy in the graph.
替代建议:
- 支持在pod中的容器之间建立一个有序图(这是不可避免的),可能有N个运行级别的环?
- 识别该图中的哪些容器必须在启动终止之前运行至完成状态(Job用例)。
- 将 init 容器移入图中(折叠概念)。
- 有某种方式来标记容器是否需要网络,用来作为网格社区的提示,在图中某处注入代理。
A few other notes based on Red Hat’s experience with service mesh:
Red Hat does not support injection of privileged sidecar containers and will always require CNI approach. In this flow, the CNI runs, multus runs, iptables are setup, and then init containers start. The iptables rules are setup, but no proxy is running, so init containers lose connectivity. Users are unhappy that init containers are not participating in the mesh. Users should not have to sacrifice usage of an init container (or any aspect of the pod lifecycle) to fulfill auditor requirements. The API should be flexible enough to support graceful introduction in the right level of a intra pod life-cycle graph transparent to the user.
根据红帽在服务网格方面的经验,还有一些其他说明:
红帽不支持注入特权sidecar容器,总是需要CNI方式。在这个流程中,CNI运行,multus运行,设置iptables,然后 init 容器启动。iptables规则设置好了,但是没有代理运行,所以 init容器 失去了连接。用户对init容器不参与网格感到不满。用户不应该为了满足审计师的要求而牺牲init容器的使用(或pod生命周期的任何方面)。API应该足够灵活,以支持在正确的层次上优雅地引入对用户透明的 pod 生命周期图。
Proposed next steps:
- Get a dedicated set of working meetings to ensure that across the mesh and kubernetes community, we can meet a users auditing requirement without limiting usage or adoption of init containers and/or sidecar containers themselves by pod authors.
- I will send a doodle.
拟议的下一步措施:
召开一组专门的工作会议,以确保在整个mesh和kubernetes社区,我们可以满足用户审计要求,而不限制pod作者使用或采用init容器和/或sidecar容器本身。
我会发一个涂鸦。
其他人的意见
mrunalp:
Agree! We might as well tackle this general problem vs. doing it step by step with baggage added along the way.
同意! 我们不妨解决这个普遍性的问题,而不是按部就班地做,在做的过程中增加包袱。
sjenning :
I agree @derekwaynecarr
I think that in order to satisfy fully the use cases mentioned, we are gravitating toward systemd level semantics where there is just an ordered graph of services containers in the pod spec.
You could basically collapse init containers into the normal containers map and add two fields to Container
; oneshot bool
that expresses if the container terminates and dependent containers should wait for it to terminate (handles init containers w/ ordering), and requires map[string]
a list of container names upon which the current container depends.
This is flexible enough to accommodate a oneshot: true
container (init container) depending on a oneshot: false
container (a proxy container on which the init container depends).
Admittedly this would be quite the undertaking and there is API compatibility to consider.
我同意 @derekwaynecarr
我认为,为了充分满足上述用例,我们正在倾向于systemd级别的语义,在pod规范中,需要有一个有序的
服务容器图。你基本上可以把init容器折叠到普通容器图中,并在Container中添加两个字段;
oneshot bool
,表示容器是否终止,依赖的容器是否应该等待它终止(处理init容器 w/排序),和requires map[string]
一个当前容器依赖的容器名称列表。这足够灵活,可以容纳一个
oneshot: true
容器(init 容器)依赖于一个oneshot: false
容器(init 容器依赖的代理容器)。诚然,这将是一个相当大的工程,而且还要考虑API的兼容性。
thockin:
I have also been thinking about this. There are a number of open issues, feature-requests, etc that all circle around the topic of pod and container lifecycle. I’ve been a vocal opponent of complex API here, but it’s clear that what we have is inadequate.
When we consider init-container x sidecar-container, it is clear we will inevitably eventually need an init-sidecar.
我也一直在思考这个问题。有一些开放的问题、功能需求等,都是围绕着pod和容器生命周期这个话题展开的。我在这里一直是复杂API的强烈反对者,但很明显,我们所拥有的是不够的。
当我们考虑 init-container x sidecar-container 时,很明显我们最终将不可避免地需要一个init-sidecar。
Some (non-exhaustive) of the other related topics:
- Node shutdown -> Pod shutdown (in progress?)
- Voluntary pod restart (“Something bad happened, please burn me down to the ground and start over”)
- Voluntary pod failure (“I know something you don’t, and I can’t run here - please terminate me and do not retry”)
- “Critical” or “Keystone” containers (“when this container exits, the rest should be stopped”)
- Startup/shutdown phases with well-defined semantics (e.g. “phase 0 has no network”)
- Mixed restart policies in a pod (e.g. helper container which runs and terminates)
- Clearer interaction between pod, network, and device plugins
其他的一些(非详尽的)相关主题:
- 节点关闭 -> Pod关闭(正在进行中?
- 自愿重启pod(“发生了不好的事情,请把我摧毁,然后重新开始”)。
- 自愿pod失败(“我知道一些你不知道的事情,我无法在这里运行–请终止我,不要重试”)
- “关键 “或 “基石 “容器(“当这个容器退出时,其他容器应停止”)。
- 具有明确语义的启动/关闭阶段(如 “phase 0 没有网络”)。
- 在一个pod中混合重启策略(例如,帮助容器,它会运行并终止)。
- 更清晰的 pod、网络和设备插件之间的交互。
** thockin:**
This is a big enough topic that we almost certainly need to explore multiple avenues before we can have confidence in any one.
这是一个足够大的话题,我们几乎肯定需要探索多种途径,才能对任何一种途径有信心。
kfox1111:
the dependency idea also would allow for doing an init container, then a sidecar network plugin, then more init containers, etc, which has some nice features.
Also the readyness checks and oneshot could all play together with the dependencies so the next steps aren’t started before ready.
So, as a user experience, I think that api might be very nice.
Implementation wise there are probably lots of edge cases to carefully consider there.
依赖的想法还可以做一个init容器,然后做一个sidecar网络插件,然后做更多的init容器等等,这有一些不错的功能。
另外 readyness 检查和 oneshot 都可以和依赖一起考虑,这样就不会在准备好之前就开始下一步。
所以,作为用户体验来说,我觉得这个api可能是非常不错的。
从实现上来说,可能有很多边缘情况需要仔细考虑。
SergeyKanzhelev:
this is great idea to set up a working group to move it forward in bigger scope. One topic I suggest we cover early on in the discussions is whether we need to address the existing pain point of injecting sidecars in jobs in 1.20. This KEP intentionally limited the scope to just this - formalizing what people are already trying to do today with workarounds.
From Google side we also would love the bigger scope of a problem be addressed, but hope to address some immediate pain points early if possible. Either in current scope or slightly bigger.
这是一个很好的想法,成立一个工作组,在更大范围内推进它。我建议我们在讨论中尽早涉及的一个话题是,我们是否需要在1.20中解决现有的Job中注入 sidecar 的痛点。这个KEP有意将范围限制在这一点上–将人们今天已经在尝试的工作方法正式化。
从Google方面来说,我们也希望更大范围的问题能够得到解决,但如果可能的话,希望能够尽早解决一些直接的痛点。要么在目前的范围内,要么稍微大一点。
derekwaynecarr:
I would speculate that the dominant consumer of the job scenario is a job that required participation in a mesh to complete its task, and since I don’t see much point in solving for the mesh use case (which I view as the primary motivator for defining side car semantics) for only one workload type, I would rather ensure a pattern that solves the problem in light of our common experience across mesh and k8s communities.
我推测工作场景的主要消费者是需要参与网格来完成任务的Job,由于我认为只为一种工作负载类型解决mesh用例(我认为这是定义 sidecar 语义的主要动机)没有太大意义,所以我宁愿根据我们在 mesh 和k8s社区中的共同经验,确保一个能解决问题的模式。