介绍Kubernetes学习笔记的基本资料和访问方式
Kubernetes学习笔记
- 1: 介绍
- 1.1: Kubernetes概述
- 1.2: Kubernetes资料收集
- 2: 安装
- 2.1: 通过 kubeadm 安装 kubenetes
- 2.1.1: 在 debian12 上安装 kubenetes
- 2.1.1.1: 准备工作
- 2.1.1.2: 安装命令行
- 2.1.1.3: 初始化集群
- 2.1.1.4: 安装 dashboard
- 2.1.1.5: 安装 metrics server
- 2.1.1.6: 安装监控
- 2.2: 安装kubectl
- 2.2.1: 在 ubuntu 上安装 kubectl
- 3: Sidecar Container
- 3.1: Sidecar Container概述
- 3.2: KEP753: Sidecar Container
- 3.3: 推翻KEP753的讨论
1 - 介绍
1.1 - Kubernetes概述
kubernetes是什么?
Kubernetes是一个可移植,可扩展的开源平台,用于管理容器化工作负载和服务,有助于声明性配置和自动化。 它拥有庞大而快速发展的生态系统。 Kubernetes服务,支持和工具广泛可用。
谷歌在2014年开源了Kubernetes项目。Kubernetes建立在谷歌十五年来大规模运行生产负载经验的基础上,结合了社区中最佳的创意和实践。
为什么需要kubernets和它可以做什么?
Kubernetes拥有许多功能。 它可以被认为是:
- 容器平台
- 微服务平台
- 可移植云平台,还有更多
Kubernetes提供以容器为中心的管理环境。 它代表用户工作负载编排计算,网络和存储基础设施。这极大的简化了PaaS,并具有IaaS的灵活性,还支持跨基础设施提供商的可移植性。
Kubernetes如何成为一个平台?
Kubernetes提供了许多功能,但总会有新的方案可以从新功能中受益。它可以简化特定于应用程序的工作流程,以加快开发速度。 最初被认可的编排通常需要较强的大规模自动化能力。这就是为什么Kubernetes还可以作为构建组件和工具生态系统的平台,以便更轻松地部署,扩展和管理应用程序。
Label 允许用户按照自己的方式组织管理对应的资源。 Annotations 使用户能够以自定义的描述信息来修饰资源,以适用于自己的工作流,并为管理工具提供检查点状态的简单方法。
此外,Kubernetes 控制面构建在相同的 API 上面,开发人员和用户都可以用。用户可以编写自己的控制器, 如调度器,如果这么做,根据新加的自定义 API ,可以扩展当前的通用 CLI 命令行工具。
此外,Kubernetes控制平面基于开发人员和用户可用的相同API构建。用户可以使用自己的API编写自己的控制器,如scheduler,这些API可以通过通用命令行工具进行定位。
这种设计使得能够在Kubernetes上面构建许多其他系统。
Kubernetes不是什么?
Kubernetes 不是一个传统的,包罗万象的 PaaS(Platform as a Service)系统。由于Kubernetes在容器级而非硬件级运行,因此它提供了PaaS产品常用的一些通用功能,例如部署,扩展,负载平衡,日志和监控。 但是,Kubernetes不是单体,而且这些默认解决方案是可选的和可插拔的。 Kubernetes提供了构建开发人员平台的构建块,但在重要的地方保留了用户选择和灵活性。
Kubernetes:
- 不限制支持的应用程序类型。 Kubernetes旨在支持各种各样的工作负载,包括无状态,有状态和数据处理工作负载。如果一个应用程序可以在一个容器中运行,它应该在Kubernetes上运行得很好。
- 不部署源代码并且不构建您的应用程序。持续集成,交付和部署(CI / CD)工作流程由组织文化和偏好以及技术要求决定。
- 不提供应用程序级服务,例如中间件(例如,消息总线),数据处理框架(例如,Spark),数据库(例如,mysql),高速缓存,也不提供集群存储系统(例如,Ceph)作为内建服务。这些组件可以在Kubernetes上运行,可以被在Kubernetes上运行的应用程序访问,通过可移植机制(例如Open Service Broker)。
- 不指定记录,监控或告警解决方案。它提供了一些集成作为概念证明,以及收集和导出指标的机制。
- 不提供或授权配置语言/系统(例如,jsonnet)。它提供了一个声明性API,可以通过任意形式的声明性规范来实现。
- 不提供或采用任何全面的机器配置,维护,管理或自我修复系统。
此外,Kubernetes不仅仅是编排系统。实际上,它消除了编排的需要。业务流程的技术定义是执行定义的工作流程:首先执行A,然后运行B,然后运行C.相反,Kubernetes由一组独立的,可组合的控制流程组成,这些流程将当前状态持续推向所提供的所需状态。 如何从A到C无关紧要。也不需要集中控制。 这使得系统更易于使用且功能更强大,更强大,更具弹性且可扩展。
为什么用容器?
找找应该使用容器的原因?
部署应用程序的旧方法是使用操作系统软件包管理器在主机上安装应用程序。这样做的缺点是将应用程序的可执行文件,配置,类库和生命周期混在一起,并与主机操作系统纠缠在一起。 可以构建不可变的虚拟机映像以实现可预测的部署和回滚,但虚拟机是重量级且不可移植的。
新方法是基于操作系统级虚拟化而不是硬件虚拟化来部署容器。这些容器彼此隔离并与主机隔离:它们具有自己的文件系统,它们无法看到彼此的进程,并且它们的计算资源使用可能是有限的。它们比虚拟机更容易构建,并且因为它们与底层基础设施和主机文件系统解藕,所以它们可以跨云和操作系统分发进行移植。
由于容器小而快,因此可以在每个容器映像中打包一个应用程序。 这种一对一的应用程序到映像关系解锁了容器的全部优势。 使用容器,可以在构建/发布时而不是部署时创建不可变容器映像,因为每个应用程序不需要与应用程序堆栈的其余部分组合,也不需要与生产基础设施环境结合。 在构建/发布时生成容器映像可以实现从开发到生产的一致环境。 同样,容器比VM更加透明,这有利于监控和管理。当容器的进程生命周期由基础设施管理而不是由容器内的进程管理器隐藏时,尤其如此。 最后,每个容器使用一个应用程序,管理容器就等于管理应用程序的部署。
容器好处总结如下:
- 应用程序创建和部署更敏捷:与VM映像使用相比,增加了容器映像创建的简便性和效率。
- 持续开发,集成和部署:通过快速简便的回滚(源于镜像不变性)提供可靠且频繁的容器镜像构建和部署。
- Dev和Ops关注点分离:在构建/发布时而不是部署时创建应用程序容器映像,从而将应用程序与基础设施解耦。
- 可观察性:不仅可以显示操作系统级别的信息和指标,还可以显示应用程序运行状况和其他信号。
- 开发,测试和生产的环境一致性:在笔记本电脑上运行与在云中运行相同。
- 云和OS分发可移植性:在Ubuntu,RHEL,CoreOS,本地,Google Kubernetes引擎以及其他任何地方运行。
- 以应用程序为中心的管理:提升抽象级别,从在虚拟硬件上运行OS到使用逻辑资源在OS上运行应用程序。
- 松散耦合,分布式,弹性,解放的微服务:应用程序被分解为更小,独立的部分,可以动态部署和管理 - 而不是在一台大型单一用途机器上运行的单体堆栈。
- 资源隔离:可预测的应用程序性能。
- 资源利用:高效率和高密度。
参考资料
- What is Kubernetes?: 官方文档的介绍篇,还是官方文档写的到位
1.2 - Kubernetes资料收集
官方资料
- Kubernetes官网
- kubernetes@github
- 官方文档:英文版 ,还有 中文翻译版本,不过目前完成度还比较低
- https://k8smeetup.github.io/docs/home/ : 这里有另一份中文翻译版本(官方中文版本的前身),完成度较高
社区资料
学习资料
- Kubernetes指南: 这是目前最新最好的Kubernetes中文资料,强烈推荐!
2 - 安装
2.1 - 通过 kubeadm 安装 kubenetes
2.1.1 - 在 debian12 上安装 kubenetes
参考官方文档:
https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/install-kubeadm/
2.1.1.1 - 准备工作
系统更新
确保更新debian系统到最新,移除不再需要的软件,清理无用的安装包:
sudo apt update && sudo apt full-upgrade -y
sudo apt autoremove
sudo apt autoclean
如果更新了内核,最好重启一下。
swap 分区
安装 Kubernetes 要求机器不能有 swap 分区。
参考:
开启模块
cat <<EOF | sudo tee /etc/modules-load.d/k8s.conf
overlay
br_netfilter
EOF
sudo modprobe overlay
sudo modprobe br_netfilter
# sysctl params required by setup, params persist across reboots
cat <<EOF | sudo tee /etc/sysctl.d/k8s.conf
net.bridge.bridge-nf-call-iptables = 1
net.bridge.bridge-nf-call-ip6tables = 1
net.ipv4.ip_forward = 1
EOF
# Apply sysctl params without reboot
sudo sysctl --system
container runtime
Kubernetes 支持多种 container runtime,这里暂时继续使用 docker engine + cri-dockerd。
参考:
https://kubernetes.io/docs/setup/production-environment/container-runtimes/
安装 docker + cri-dockerd
docker 的安装参考:
https://skyao.io/learning-docker/docs/installation/debian12/
cri-dockerd 的安装参考:
https://mirantis.github.io/cri-dockerd/usage/install/
从 release 页面下载:
https://github.com/Mirantis/cri-dockerd/releases
debian 12 选择下载文件
下载后安装:
sudo dpkg -i ./cri-dockerd_0.3.16.3-0.debian-bookworm_amd64.deb
安装后会提示:
Selecting previously unselected package cri-dockerd.
(Reading database ... 48498 files and directories currently installed.)
Preparing to unpack .../cri-dockerd_0.3.16.3-0.debian-bookworm_amd64.deb ...
Unpacking cri-dockerd (0.3.16~3-0~debian-bookworm) ...
Setting up cri-dockerd (0.3.16~3-0~debian-bookworm) ...
Created symlink /etc/systemd/system/multi-user.target.wants/cri-docker.service → /lib/systemd/system/cri-docker.service.
Created symlink /etc/systemd/system/sockets.target.wants/cri-docker.socket → /lib/systemd/system/cri-docker.socket.
安装后查看状态:
sudo systemctl status cri-docker.service
如果成功则状态为:
● cri-docker.service - CRI Interface for Docker Application Container Engine
Loaded: loaded (/lib/systemd/system/cri-docker.service; enabled; preset: enabled)
Active: active (running) since Tue 2025-03-04 19:18:50 CST; 3min 25s ago
TriggeredBy: ● cri-docker.socket
Docs: https://docs.mirantis.com
Main PID: 2665 (cri-dockerd)
Tasks: 9
Memory: 15.0M
CPU: 21ms
CGroup: /system.slice/cri-docker.service
└─2665 /usr/bin/cri-dockerd --container-runtime-endpoint fd://
Mar 04 19:18:50 debian12 cri-dockerd[2665]: time="2025-03-04T19:18:50+08:00" level=info msg="Hairpin mode is set to none"
Mar 04 19:18:50 debian12 cri-dockerd[2665]: time="2025-03-04T19:18:50+08:00" level=info msg="The binary conntrack is not installed, this can cause failures in network conn>
Mar 04 19:18:50 debian12 cri-dockerd[2665]: time="2025-03-04T19:18:50+08:00" level=info msg="The binary conntrack is not installed, this can cause failures in network conn>
Mar 04 19:18:50 debian12 cri-dockerd[2665]: time="2025-03-04T19:18:50+08:00" level=info msg="Loaded network plugin cni"
Mar 04 19:18:50 debian12 cri-dockerd[2665]: time="2025-03-04T19:18:50+08:00" level=info msg="Docker cri networking managed by network plugin cni"
Mar 04 19:18:50 debian12 cri-dockerd[2665]: time="2025-03-04T19:18:50+08:00" level=info msg="Setting cgroupDriver systemd"
Mar 04 19:18:50 debian12 cri-dockerd[2665]: time="2025-03-04T19:18:50+08:00" level=info msg="Docker cri received runtime config &RuntimeConfig{NetworkConfig:&NetworkConfig>
Mar 04 19:18:50 debian12 cri-dockerd[2665]: time="2025-03-04T19:18:50+08:00" level=info msg="Starting the GRPC backend for the Docker CRI interface."
Mar 04 19:18:50 debian12 cri-dockerd[2665]: time="2025-03-04T19:18:50+08:00" level=info msg="Start cri-dockerd grpc backend"
Mar 04 19:18:50 debian12 systemd[1]: Started cri-docker.service - CRI Interface for Docker Application Container Engine.
安装 containerd
TODO:后面考虑换 containerd
安装 helm
参考:
https://helm.sh/docs/intro/install/#from-apt-debianubuntu
安装:
curl https://baltocdn.com/helm/signing.asc | gpg --dearmor | sudo tee /usr/share/keyrings/helm.gpg > /dev/null
sudo apt-get install apt-transport-https --yes
echo "deb [arch=$(dpkg --print-architecture) signed-by=/usr/share/keyrings/helm.gpg] https://baltocdn.com/helm/stable/debian/ all main" | sudo tee /etc/apt/sources.list.d/helm-stable-debian.list
sudo apt-get update
sudo apt-get install helm
安装后取消 helm 的自动更新:
sudo vi /etc/apt/sources.list.d/helm-stable-debian.list
查看安装的版本:
$ helm version
version.BuildInfo{Version:"v3.17.1", GitCommit:"980d8ac1939e39138101364400756af2bdee1da5", GitTreeState:"clean", GoVersion:"go1.23.5"}
2.1.1.2 - 安装命令行
参考: https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/install-kubeadm/
安装 kubeadm / kubelet / kubectl
sudo apt-get update
sudo apt-get install -y apt-transport-https ca-certificates curl gpg
假定要安装的 kubernetes 版本为 1.32:
export K8S_VERSION=1.32
# sudo mkdir -p -m 755 /etc/apt/keyrings
curl -fsSL https://pkgs.k8s.io/core:/stable:/v${K8S_VERSION}/deb/Release.key | sudo gpg --dearmor -o /etc/apt/keyrings/kubernetes-apt-keyring.gpg
echo "deb [signed-by=/etc/apt/keyrings/kubernetes-apt-keyring.gpg] https://pkgs.k8s.io/core:/stable:/v${K8S_VERSION}/deb/ /" | sudo tee /etc/apt/sources.list.d/kubernetes.list
开始安装 kubelet kubeadm kubectl:
sudo apt-get update
sudo apt-get install -y kubelet kubeadm kubectl
禁止这三个程序的自动更新:
sudo apt-mark hold kubelet kubeadm kubectl
验证安装:
kubectl version --client && echo && kubeadm version
输出为:
Client Version: v1.32.2
Kustomize Version: v5.5.0
kubeadm version: &version.Info{Major:"1", Minor:"32", GitVersion:"v1.32.2", GitCommit:"67a30c0adcf52bd3f56ff0893ce19966be12991f", GitTreeState:"clean", BuildDate:"2025-02-12T21:24:52Z", GoVersion:"go1.23.6", Compiler:"gc", Platform:"linux/amd64"}
在运行 kubeadm 之前,先启动 kubelet 服务:
sudo systemctl enable --now kubelet
安装后配置
优化 zsh
vi ~/.zshrc
增加以下内容:
# k8s auto complete
alias k=kubectl
complete -F __start_kubectl k
执行:
source ~/.zshrc
之后即可使用,此时用 k 这个别名来执行 kubectl 命令时也可以实现自动完成,非常的方便。
取消更新
kubeadm / kubelet / kubectl 的版本没有必要升级到最新,因此可以取消他们的自动更新。
sudo vi /etc/apt/sources.list.d/kubernetes.list
2.1.1.3 - 初始化集群
参考官方文档:
https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/create-cluster-kubeadm/
初始化集群
pod-network-cidr 尽量用 10.244.0.0/16 这个范围,不然有些网络插件会需要额外的配置。
cri-socket 的配置参考:
因为前面用的 Docker Engine 和 cri-dockerd ,因此这里的 cri-socket 需要指定为 “unix:///var/run/cri-dockerd.sock”。
apiserver-advertise-address 需要指定为当前节点的 IP 地址,因为当前节点是单节点,因此这里指定为 192.168.3.215。
sudo kubeadm init --pod-network-cidr 10.244.0.0/16 --cri-socket unix:///var/run/cri-dockerd.sock --apiserver-advertise-address=192.168.3.215
输出为:
[init] Using Kubernetes version: v1.32.2
[preflight] Running pre-flight checks
[preflight] Pulling images required for setting up a Kubernetes cluster
[preflight] This might take a minute or two, depending on the speed of your internet connection
[preflight] You can also perform this action beforehand using 'kubeadm config images pull'
W0304 20:23:50.183712 5058 checks.go:846] detected that the sandbox image "registry.k8s.io/pause:3.9" of the container runtime is inconsistent with that used by kubeadm.It is recommended to use "registry.k8s.io/pause:3.10" as the CRI sandbox image.
[certs] Using certificateDir folder "/etc/kubernetes/pki"
[certs] Generating "ca" certificate and key
[certs] Generating "apiserver" certificate and key
[certs] apiserver serving cert is signed for DNS names [debian12 kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local] and IPs [10.96.0.1 192.168.3.215]
[certs] Generating "apiserver-kubelet-client" certificate and key
[certs] Generating "front-proxy-ca" certificate and key
[certs] Generating "front-proxy-client" certificate and key
[certs] Generating "etcd/ca" certificate and key
[certs] Generating "etcd/server" certificate and key
[certs] etcd/server serving cert is signed for DNS names [debian12 localhost] and IPs [192.168.3.215 127.0.0.1 ::1]
[certs] Generating "etcd/peer" certificate and key
[certs] etcd/peer serving cert is signed for DNS names [debian12 localhost] and IPs [192.168.3.215 127.0.0.1 ::1]
[certs] Generating "etcd/healthcheck-client" certificate and key
[certs] Generating "apiserver-etcd-client" certificate and key
[certs] Generating "sa" key and public key
[kubeconfig] Using kubeconfig folder "/etc/kubernetes"
[kubeconfig] Writing "admin.conf" kubeconfig file
[kubeconfig] Writing "super-admin.conf" kubeconfig file
[kubeconfig] Writing "kubelet.conf" kubeconfig file
[kubeconfig] Writing "controller-manager.conf" kubeconfig file
[kubeconfig] Writing "scheduler.conf" kubeconfig file
[etcd] Creating static Pod manifest for local etcd in "/etc/kubernetes/manifests"
[control-plane] Using manifest folder "/etc/kubernetes/manifests"
[control-plane] Creating static Pod manifest for "kube-apiserver"
[control-plane] Creating static Pod manifest for "kube-controller-manager"
[control-plane] Creating static Pod manifest for "kube-scheduler"
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Starting the kubelet
[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests"
[kubelet-check] Waiting for a healthy kubelet at http://127.0.0.1:10248/healthz. This can take up to 4m0s
[kubelet-check] The kubelet is healthy after 500.939992ms
[api-check] Waiting for a healthy API server. This can take up to 4m0s
[api-check] The API server is healthy after 3.00043501s
[upload-config] Storing the configuration used in ConfigMap "kubeadm-config" in the "kube-system" Namespace
[kubelet] Creating a ConfigMap "kubelet-config" in namespace kube-system with the configuration for the kubelets in the cluster
[upload-certs] Skipping phase. Please see --upload-certs
[mark-control-plane] Marking the node debian12 as control-plane by adding the labels: [node-role.kubernetes.io/control-plane node.kubernetes.io/exclude-from-external-load-balancers]
[mark-control-plane] Marking the node debian12 as control-plane by adding the taints [node-role.kubernetes.io/control-plane:NoSchedule]
[bootstrap-token] Using token: 8e5a3n.rqbqfbnvhf4uyjft
[bootstrap-token] Configuring bootstrap tokens, cluster-info ConfigMap, RBAC Roles
[bootstrap-token] Configured RBAC rules to allow Node Bootstrap tokens to get nodes
[bootstrap-token] Configured RBAC rules to allow Node Bootstrap tokens to post CSRs in order for nodes to get long term certificate credentials
[bootstrap-token] Configured RBAC rules to allow the csrapprover controller automatically approve CSRs from a Node Bootstrap Token
[bootstrap-token] Configured RBAC rules to allow certificate rotation for all node client certificates in the cluster
[bootstrap-token] Creating the "cluster-info" ConfigMap in the "kube-public" namespace
[kubelet-finalize] Updating "/etc/kubernetes/kubelet.conf" to point to a rotatable kubelet client certificate and key
[addons] Applied essential addon: CoreDNS
[addons] Applied essential addon: kube-proxy
Your Kubernetes control-plane has initialized successfully!
To start using your cluster, you need to run the following as a regular user:
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
Alternatively, if you are the root user, you can run:
export KUBECONFIG=/etc/kubernetes/admin.conf
You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
https://kubernetes.io/docs/concepts/cluster-administration/addons/
Then you can join any number of worker nodes by running the following on each as root:
kubeadm join 192.168.3.215:6443 --token 8e5a3n.rqbqfbnvhf4uyjft \
--discovery-token-ca-cert-hash sha256:183b3e9965d298e67689baddeff2ff88c32b3f18aa9dd9a15be1881d26025a22
根据提示操作:
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
对于测试用的单节点,去除 master/control-plane 的污点:
kubectl taint nodes --all node-role.kubernetes.io/control-plane-
执行:
kubectl get node
能看到此时节点的状态会是 NotReady:
NAME STATUS ROLES AGE VERSION
debian12 NotReady control-plane 3m49s v1.32.2
执行:
kubectl describe node debian12
能看到节点的错误信息:
Conditions:
Type Status LastHeartbeatTime LastTransitionTime Reason Message
---- ------ ----------------- ------------------ ------ -------
MemoryPressure False Tue, 04 Mar 2025 20:28:00 +0800 Tue, 04 Mar 2025 20:23:53 +0800 KubeletHasSufficientMemory kubelet has sufficient memory available
DiskPressure False Tue, 04 Mar 2025 20:28:00 +0800 Tue, 04 Mar 2025 20:23:53 +0800 KubeletHasNoDiskPressure kubelet has no disk pressure
PIDPressure False Tue, 04 Mar 2025 20:28:00 +0800 Tue, 04 Mar 2025 20:23:53 +0800 KubeletHasSufficientPID kubelet has sufficient PID available
Ready False Tue, 04 Mar 2025 20:28:00 +0800 Tue, 04 Mar 2025 20:23:53 +0800 KubeletNotReady container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized
需要继续安装网络插件。
安装网络插件
安装 flannel
参考官方文档: https://github.com/flannel-io/flannel#deploying-flannel-with-kubectl
kubectl apply -f https://github.com/flannel-io/flannel/releases/latest/download/kube-flannel.yml
如果一切正常,就能看到 k8s 集群内的 pod 都启动完成状态为 Running:
k get pods -A
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-flannel kube-flannel-ds-ts6n8 1/1 Running 7 (9m27s ago) 15m
kube-system coredns-668d6bf9bc-rbkzb 1/1 Running 0 3h55m
kube-system coredns-668d6bf9bc-vbltg 1/1 Running 0 3h55m
kube-system etcd-debian12 1/1 Running 0 3h55m
kube-system kube-apiserver-debian12 1/1 Running 1 (5h57m ago) 3h55m
kube-system kube-controller-manager-debian12 1/1 Running 0 3h55m
kube-system kube-proxy-95ccr 1/1 Running 0 3h55m
kube-system kube-scheduler-debian12 1/1 Running 1 (6h15m ago) 3h55m
如果发现 kube-flannel-ds pod 的状态总是 CrashLoopBackOff:
k get pods -A
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-flannel kube-flannel-ds-ts6n8 0/1 CrashLoopBackOff 2 (22s ago) 42s
继续查看 pod 的具体错误信息:
k describe pods -n kube-flannel kube-flannel-ds-ts6n8
发现报错 “Back-off restarting failed container kube-flannel in pod kube-flannel”:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 117s default-scheduler Successfully assigned kube-flannel/kube-flannel-ds-ts6n8 to debian12
Normal Pulled 116s kubelet Container image "ghcr.io/flannel-io/flannel-cni-plugin:v1.6.2-flannel1" already present on machine
Normal Created 116s kubelet Created container: install-cni-plugin
Normal Started 116s kubelet Started container install-cni-plugin
Normal Pulled 115s kubelet Container image "ghcr.io/flannel-io/flannel:v0.26.4" already present on machine
Normal Created 115s kubelet Created container: install-cni
Normal Started 115s kubelet Started container install-cni
Normal Pulled 28s (x5 over 114s) kubelet Container image "ghcr.io/flannel-io/flannel:v0.26.4" already present on machine
Normal Created 28s (x5 over 114s) kubelet Created container: kube-flannel
Normal Started 28s (x5 over 114s) kubelet Started container kube-flannel
Warning BackOff 2s (x10 over 110s) kubelet Back-off restarting failed container kube-flannel in pod kube-flannel-ds-ts6n8_kube-flannel(1e03c200-2062-4838
此时应该去检查准备工作中 “开启模块” 一节的内容是不是有疏漏。
补救之后,就能看到 kube-flannel-ds 这个 pod 正常运行了:
k get pods -A
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-flannel kube-flannel-ds-ts6n8 1/1 Running 7 (9m27s ago) 15m
安装 Calico
查看最新版本,当前最新版本是 v3.29.2:
kubectl create -f https://raw.githubusercontent.com/projectcalico/calico/v3.29.2/manifests/tigera-operator.yaml
TODO:用了 flannel, Calico 后面再验证。
2.1.1.4 - 安装 dashboard
安装 dashboard
参考:https://github.com/kubernetes/dashboard/#installation
在下面地址上查看当前 dashboard 的版本:
https://github.com/kubernetes/dashboard/releases
根据对 kubernetes 版本的兼容情况选择对应的 dashboard 的版本:
- kubernetes-dashboard-7.11.0 ,兼容 k8s 1.32
最新版本需要用 helm 进行安装:
helm repo add kubernetes-dashboard https://kubernetes.github.io/dashboard/
helm upgrade --install kubernetes-dashboard kubernetes-dashboard/kubernetes-dashboard --create-namespace --namespace kubernetes-dashboard
输出为:
"kubernetes-dashboard" already exists with the same configuration, skipping
Release "kubernetes-dashboard" does not exist. Installing it now.
NAME: kubernetes-dashboard
LAST DEPLOYED: Wed Mar 5 00:53:17 2025
NAMESPACE: kubernetes-dashboard
STATUS: deployed
REVISION: 1
TEST SUITE: None
NOTES:
*************************************************************************************************
*** PLEASE BE PATIENT: Kubernetes Dashboard may need a few minutes to get up and become ready ***
*************************************************************************************************
Congratulations! You have just installed Kubernetes Dashboard in your cluster.
To access Dashboard run:
kubectl -n kubernetes-dashboard port-forward svc/kubernetes-dashboard-kong-proxy 8443:443
NOTE: In case port-forward command does not work, make sure that kong service name is correct.
Check the services in Kubernetes Dashboard namespace using:
kubectl -n kubernetes-dashboard get svc
Dashboard will be available at:
https://localhost:8443
此时 dashboard 的 service 和 pod 情况:
kubectl -n kubernetes-dashboard get services
输出为:
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kubernetes-dashboard-api ClusterIP 10.108.225.190 <none> 8000/TCP 2m5s
kubernetes-dashboard-auth ClusterIP 10.99.205.102 <none> 8000/TCP 2m5s
kubernetes-dashboard-kong-proxy ClusterIP 10.96.247.162 <none> 443/TCP 2m5s
kubernetes-dashboard-metrics-scraper ClusterIP 10.103.222.22 <none> 8000/TCP 2m5s
kubernetes-dashboard-web ClusterIP 10.108.219.9 <none> 8000/TCP 2m5s
查看 pod 的情况:
kubectl -n kubernetes-dashboard get pods
等待两三分钟之后,pod 启动完成,输出为:
NAME READY STATUS RESTARTS AGE
kubernetes-dashboard-api-7d8567b8f-9ksk2 1/1 Running 0 3m8s
kubernetes-dashboard-auth-6877bf44b9-9qfmg 1/1 Running 0 3m8s
kubernetes-dashboard-kong-79867c9c48-rzlhp 1/1 Running 0 3m8s
kubernetes-dashboard-metrics-scraper-794c587449-6phjv 1/1 Running 0 3m8s
kubernetes-dashboard-web-75576c76b-sm2wj 1/1 Running 0 3m8s
为了方便,使用 node port 来访问 dashboard,需要执行:
kubectl -n kubernetes-dashboard edit service kubernetes-dashboard-kong-proxy
然后修改 type: ClusterIP
为 type: NodePort
。然后看一下具体分配的 node port 是哪个:
kubectl -n kubernetes-dashboard get service kubernetes-dashboard-kong-proxy
输出为:
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kubernetes-dashboard-kong-proxy NodePort 10.96.247.162 <none> 443:32616/TCP 17m
现在可以用浏览器直接访问:
https://192.168.3.215:32616/
创建用户并登录 dashboard
创建 admin-user 用户:
vi dashboard-adminuser.yaml
内容为:
apiVersion: v1
kind: ServiceAccount
metadata:
name: admin-user
namespace: kubernetes-dashboard
执行:
k create -f dashboard-adminuser.yaml
然后绑定角色:
vi dashboard-adminuser-binding.yaml
内容为:
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: admin-user
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: cluster-admin
subjects:
- kind: ServiceAccount
name: admin-user
namespace: kubernetes-dashboard
执行:
k create -f dashboard-adminuser-binding.yaml
然后创建 token :
kubectl -n kubernetes-dashboard create token admin-user
输出为:
eyJhbGciOiJSUzI1NiIsImtpZCI6Ik9sWnJsTk5UNE9JVlVmRFMxMUpwNC1tUlVndTl5Zi1WQWtmMjIzd2hDNmcifQ.eyJhdWQiOlsiaHR0cHM6Ly9rdWJlcm5ldGVzLmRlZmF1bHQuc3ZjLmNsdXN0ZXIubG9jYWwiXSwiZXhwIjoxNzQxMTEyNDg4LCJpYXQiOjE3NDExMDg4ODgsImlzcyI6Imh0dHBzOi8va3ViZXJuZXRlcy5kZWZhdWx0LnN2Yy5jbHVzdGVyLmxvY2FsIiwianRpIjoiNDU5ZGQxNjctNWI5OS00MWIzLTgzZWEtNGIxMGY3MTc5ZjEyIiwia3ViZXJuZXRlcy5pbyI6eyJuYW1lc3BhY2UiOiJrdWJlcm5ldGVzLWRhc2hib2FyZCIsInNlcnZpY2VhY2NvdW50Ijp7Im5hbWUiOiJhZG1pbi11c2VyIiwidWlkIjoiZjMxN2VhZTItNTNiNi00MGZhLWI3MWYtMzZiNDI1YmY4YWQ0In19LCJuYmYiOjE3NDExMDg4ODgsInN1YiI6InN5c3RlbTpzZXJ2aWNlYWNjb3VudDprdWJlcm5ldGVzLWRhc2hib2FyZDphZG1pbi11c2VyIn0.TYzOdrMFXcSEeVMbc1ewIA13JVi4FUYoRN7rSH5OstbVfKIF48X_o1RWxOGM_AurhgLxuKZHzmns3K_pX_OR3u1URfK6-gGos4iAQY-H1yntfRmzzsip_FbZh95EYFGTN43gw21jTyfem3OKBXXLgzsnVT_29uMnJzSnCDnrAciVKMoCEUP6x2RSHQhp6PrxrIrx_NMB3vojEZYq3AysQoNqYYjRDd4MnDRClm03dNvW5lvKSgNCVmZFje_EEa2EhI2X6d3X8zx6tHwT5M4-T3hMmyIpzHUwf3ixeZR85rhorMbskNVvRpH6VLH6BXP31c3NMeSgYk3BG8d7UjCYxQ
这个 token 就可以用在 kubernetes-dashboard 的登录页面上了。
为了方便,将这个 token 存储在 Secret :
vi dashboard-adminuser-secret.yaml
内容为:
apiVersion: v1
kind: Secret
metadata:
name: admin-user
namespace: kubernetes-dashboard
annotations:
kubernetes.io/service-account.name: "admin-user"
type: kubernetes.io/service-account-token
执行:
k create -f dashboard-adminuser-secret.yaml
之后就可以用命令随时获取这个 token 了:
kubectl get secret admin-user -n kubernetes-dashboard -o jsonpath="{.data.token}" | base64 -d
2.1.1.5 - 安装 metrics server
参考:https://github.com/kubernetes-sigs/metrics-server/#installation
安装 metrics server
下载:
mkdir -p ~/work/soft/k8s
cd ~/work/soft/k8s
wget https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml
修改下载下来的 components.yaml, 增加 --kubelet-insecure-tls
并修改 --kubelet-preferred-address-types
:
template:
metadata:
labels:
k8s-app: metrics-server
spec:
containers:
- args:
- --cert-dir=/tmp
- --secure-port=4443
- --kubelet-preferred-address-types=InternalIP # 修改这行,默认是InternalIP,ExternalIP,Hostname
- --kubelet-use-node-status-port
- --metric-resolution=15s
- --kubelet-insecure-tls # 增加这行
然后安装:
k apply -f components.yaml
稍等片刻看是否启动:
kubectl get pod -n kube-system | grep metrics-server
验证一下,查看 service 信息
kubectl describe svc metrics-server -n kube-system
简单验证一下基本使用:
kubectl top nodes
kubectl top pods -n kube-system
参考资料
2.1.1.6 - 安装监控
参考:https://github.com/prometheus-operator/prometheus-operator
https://computingforgeeks.com/setup-prometheus-and-grafana-on-kubernetes/
2.2 - 安装kubectl
kubectl 是 Kubernetes 的命令行工具,允许对Kubernetes集群运行命令。
单独安装 kubectl 命令行工具,可以方便的在本地远程操作集群。
2.2.1 - 在 ubuntu 上安装 kubectl
参考 Kubernetes 官方文档:
分步骤安装
和后面安装 kubeadm 方式一样,只是这里只需要安装 kubectl 一个工具,不需要安装 kubeadm 和 kublete
执行如下命令:
sudo apt-get update
sudo apt-get install -y apt-transport-https ca-certificates curl
curl -fsSL https://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo gpg --dearmor -o /etc/apt/keyrings/kubernetes-archive-keyring.gpg
echo "deb [signed-by=/etc/apt/keyrings/kubernetes-archive-keyring.gpg] https://apt.kubernetes.io/ kubernetes-xenial main" | sudo tee /etc/apt/sources.list.d/kubernetes.list
sudo apt-get update
k8s 暂时固定使用 1.23.14 版本:
sudo apt-get install kubectl=1.23.14-00
# sudo apt-get install kubelet=1.23.14-00 kubeadm=1.23.14-00 kubectl=1.23.14-00
直接安装
不推荐这样安装,会安装最新版本,而且安装目录是 /usr/local/bin/
。
curl -LO "https://dl.k8s.io/release/$(curl -L -s https://dl.k8s.io/release/stable.txt)/bin/linux/amd64/kubectl"
sudo install -o root -g root -m 0755 kubectl /usr/local/bin/kubectl
rm kubectl
如果 /usr/local/bin/
不在 path 路径下,则需要修改一下 path:
export PATH=/usr/local/bin:$PATH
验证一下:
kubectl version --output=yaml
输出为:
clientVersion:
buildDate: "2023-06-14T09:53:42Z"
compiler: gc
gitCommit: 25b4e43193bcda6c7328a6d147b1fb73a33f1598
gitTreeState: clean
gitVersion: v1.27.3
goVersion: go1.20.5
major: "1"
minor: "27"
platform: linux/amd64
kustomizeVersion: v5.0.1
The connection to the server localhost:8080 was refused - did you specify the right host or port?
配置
oh-my-zsh自动完成
在使用 oh-my-zsh 之后,会更加的简单(强烈推荐使用 oh-my-zsh ),只要在 oh-my-zsh 的 plugins 列表中增加 kubectl 即可。
然后,在 ~/.zshrc
中增加以下内容:
# k8s auto complete
alias k=kubectl
complete -F __start_kubectl k
source ~/.zshrc
之后即可使用,此时用 k 这个别名来执行 kubectl 命令时也可以实现自动完成,非常的方便。
3 - Sidecar Container
3.1 - Sidecar Container概述
From Kubernetes 1.18 containers can be marked as sidecars
Unfortunately, that features has been removed from 1.18, then removed from 1.19 and currently has no specific date for landing.
reference: kubernetes/enhancements#753
资料
官方正式资料
-
Sidecar Containers(kubernetes/enhancements#753): 最权威的资料了,准备细读
-
Support startup dependencies between containers on the same Pod
社区介绍资料
- Sidecar Containers improvement in Kubernetes 1.18: 重点阅读
- Kubernetes — Learn Sidecar Container Pattern
- Sidecar container lifecycle changes in Kubernetes 1.18
- Tutorial: Apply the Sidecar Pattern to Deploy Redis in Kubernetes
- Sidecar Containers:by 陈鹏,特别鸣谢
相关项目的处理
Istio
信息1
https://github.com/kubernetes/enhancements/issues/753#issuecomment-684176649
We use a custom daemon image like a supervisor
to wrap the user’s program. The daemon will also listen to a particular port to convey the health status of users’ programs (exited or not).
我们使用一个类似
supervisor
的自定义守护进程镜像来包装用户的程序。守护进程也会监听特定的端口来传达用户程序的健康状态(是否退出)。
Here is the workaround:
- Using the daemon image as
initContainers
to copy the binary to a shared volume. - Our
CD
will hijack users’ command, let the daemon start first. Then, the daemon runs the users’ program until Envoy is ready. - Also, we add
preStop
, a script that keeps checking the daemon’s health status, for Envoy.
下面是变通的方法:
- 以 “initContainers” 的方式用守护进程的镜像来复制二进制文件到共享卷。
- 我们的
CD
会劫持用户的命令,让守护进程先启动,然后,守护进程运行用户的程序,直到 Envoy 准备好。- 同时,我们还为Envoy添加
preStop
,一个不断检查守护进程健康状态的脚本。
As a result, the users’ process will start if Envoy is ready, and Envoy will stop after the process of users is exited.
结果,如果Envoy准备好了,用户的程序就会启动,而Envoy会在用户的程序退出后停止。
It’s a complicated workaround, but it works fine in our production environment.
这是一个复杂的变通方法,但在我们的生产环境中运行良好。
信息2
还找到一个答复: https://github.com/kubernetes/enhancements/issues/753#issuecomment-687184232
Allow users to delay application start until proxy is ready
for startup issues, the istio community came up with a quite clever workaround which basically injects envoy as the first container in the container list and adds a postStart hook that checks and wait for envoy to be ready. This is blocking and the other containers are not started making sure envoy is there and ready before starting the app container.
对于启动问题,istio社区想出了一个相当聪明的变通方法,基本上是将envoy作为容器列表中的第一个容器注入,并添加一个postStart钩子,检查并等待envoy准备好。这是阻塞的,而其他容器不会启动,这样确保envoy启动并且准备好之后,然后再启动应用程序容器。
We had to port this to the version we’re running but is quite straightforward and are happy with the results so far.
我们已经将其移植到我们正在运行的版本中,很直接,目前对结果很满意。
For shutdown we are also ‘solving’ with preStop hook but adding an arbitrary sleep which we hope the application would have gracefully shutdown before continue with SIGTERM.
对于关机,我们也用 preStop 钩子来 “解决”,但增加了一个任意的 sleep,我们希望应用程序在继续 SIGTERM 之前能优雅地关机。
相关issue: Enable holdApplicationUntilProxyStarts at pod level
Knative
dapr
- Clarify lifecycle of Dapr process and app process : dapr项目中在等待 sidecar container的结果。在此之前,dapr做了一个简单的调整,将daprd这个sidecar的启动顺序放在最前面(详见 https://github.com/dapr/dapr/pull/2341)
3.2 - KEP753: Sidecar Container
相关issue
https://github.com/kubernetes/enhancements/issues/753
这个issue 开启于 2019年1月。
One-line enhancement description: Containers can now be a marked as sidecars so that they startup before normal containers and shutdown after all other containers have terminated.
一句话改进描述:容器现在可以被标记为 sidecar,使其在正常容器之前启动,并在所有其他容器终止后关闭。
设计提案链接:https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/753-sidecar-containers
3.3 - 推翻KEP753的讨论
https://github.com/kubernetes/enhancements/pull/1980
这是一个关于 sidecar 的讨论汇总,最后得出的结论是推翻 kep753.
起于derekwaynecarr的发言
I want to capture my latest thoughts on sidecar concepts, and get a path forward.
Here is my latest thinking:
我想归纳我对 sidecar 概念的最新思考,并得到一条前进的道路。
这是我的最新思考。
I think it’s important to ask if the introduction of sidecar containers will actually address an end-user requirement or just shift a problem and further constrain adoption of sidecars themselves by pod authors. To help frame this exercise, I will look at the proposed use of sidecar containers in the service mesh community.
我认为重要的是要问一下 sidecar容器的引入是否会真正解决最终用户的需求,或者只是转移一个问题,并进一步限制pod作者对sidecars本身的采用。为了帮助构架这项工作,我将看看服务网格社区中拟议的 sidecar 容器的使用情况。
User story
I want to enable mTLS for all traffic in my mesh because my auditor demands it.
我想在我的Mesh中启用mTLS,因为我的会计要求这样做。
The proposed solution is the introduction of sidecar containers that change the pod lifecycle:
提出的解决方案是引入sidecar container,改变 pod 的生命周期:
- Init containers start/stop
- Sidecar containers start
- Primary containers start/stop
- Sidecar containers stop
The issue with the proposed solution meeting the user story is as follows:
建议的解决方案可以满足用户故事的问题如下:
-
Init containers are not subject to service mesh because the proxy is not running. This is because init containers run to completion before starting the next container. Many users do network interaction that should be subject to the mesh in their init container.
Init container 不受服务网格的影响,因为代理没有运行。这是因为init container 在启动下一个容器之前会运行到完成状态。很多用户在 init container 中做网络交互,应该受制于网格。
-
Sidecar containers (once introduced) will be used by users for use cases unrelated to the mesh, but subject to the mesh. The proposal makes no semantic guarantees on ordering among sidecars. Similar to init containers, this means sidecars are not guaranteed to participate in the mesh.
Sidecar 容器(一旦引入)将被用户用于与网格无关但受网格制约的用例。该提案没有对sidecars之间的顺序进行语义保证。与 init 容器类似,这意味着 sidecar 不能保证参与 mesh。
The real requirement is that the proxy container MUST stop last even among sidecars if those sidecars require network.
真正的需求是,如果这些sidecar需要网络,代理容器也必须最后停止,即使代理容器也是 sidecar。
Similar to the behavior observed with init containers (users externalize run-once setup from their main application container), the introduction of sidecar containers will result in more elements of the application getting externalized into sidecars, but those elements will still desire to be part of the mesh when they require a network. Hence, we are just shifting, and not solving the problem.
与观察到的init容器的行为类似(用户从他们的主应用容器中外部化一次性设置),引入sidecar容器将导致更多的应用元素被外部化到sidecar中,但是当这些元素需要网络时,它们仍然会渴望成为网格结构的一部分。因此,我们只是在转移,而不是解决问题。
Given the above gaps, I feel we are not actually solving a primary requirement that would drive improved adoption of a service mesh (ensure all traffic is mTLS from my pod) to meet auditing.
鉴于上述差距,我觉得我们并没有实际上解决主要需求,这个需求将推动服务网格的改进采用(确保所有来自我的pod的流量都是mTLS),以满足审计。
Alternative proposal:
- Support an ordered graph among containers in the pod (it’s inevitable), possibly with N rings of runlevels?
- Identify which containers in that graph must run to completion before initiating termination (Job use case).
- Move init containers into the graph (collapse the concept)
- Have some way to express if a network is required by the container to act as a hint for the mesh community on where to inject a proxy in the graph.
替代建议:
- 支持在pod中的容器之间建立一个有序图(这是不可避免的),可能有N个运行级别的环?
- 识别该图中的哪些容器必须在启动终止之前运行至完成状态(Job用例)。
- 将 init 容器移入图中(折叠概念)。
- 有某种方式来标记容器是否需要网络,用来作为网格社区的提示,在图中某处注入代理。
A few other notes based on Red Hat’s experience with service mesh:
Red Hat does not support injection of privileged sidecar containers and will always require CNI approach. In this flow, the CNI runs, multus runs, iptables are setup, and then init containers start. The iptables rules are setup, but no proxy is running, so init containers lose connectivity. Users are unhappy that init containers are not participating in the mesh. Users should not have to sacrifice usage of an init container (or any aspect of the pod lifecycle) to fulfill auditor requirements. The API should be flexible enough to support graceful introduction in the right level of a intra pod life-cycle graph transparent to the user.
根据红帽在服务网格方面的经验,还有一些其他说明:
红帽不支持注入特权sidecar容器,总是需要CNI方式。在这个流程中,CNI运行,multus运行,设置iptables,然后 init 容器启动。iptables规则设置好了,但是没有代理运行,所以 init容器 失去了连接。用户对init容器不参与网格感到不满。用户不应该为了满足审计师的要求而牺牲init容器的使用(或pod生命周期的任何方面)。API应该足够灵活,以支持在正确的层次上优雅地引入对用户透明的 pod 生命周期图。
Proposed next steps:
- Get a dedicated set of working meetings to ensure that across the mesh and kubernetes community, we can meet a users auditing requirement without limiting usage or adoption of init containers and/or sidecar containers themselves by pod authors.
- I will send a doodle.
拟议的下一步措施:
召开一组专门的工作会议,以确保在整个mesh和kubernetes社区,我们可以满足用户审计要求,而不限制pod作者使用或采用init容器和/或sidecar容器本身。
我会发一个涂鸦。
其他人的意见
mrunalp:
Agree! We might as well tackle this general problem vs. doing it step by step with baggage added along the way.
同意! 我们不妨解决这个普遍性的问题,而不是按部就班地做,在做的过程中增加包袱。
sjenning :
I agree @derekwaynecarr
I think that in order to satisfy fully the use cases mentioned, we are gravitating toward systemd level semantics where there is just an ordered graph of services containers in the pod spec.
You could basically collapse init containers into the normal containers map and add two fields to Container
; oneshot bool
that expresses if the container terminates and dependent containers should wait for it to terminate (handles init containers w/ ordering), and requires map[string]
a list of container names upon which the current container depends.
This is flexible enough to accommodate a oneshot: true
container (init container) depending on a oneshot: false
container (a proxy container on which the init container depends).
Admittedly this would be quite the undertaking and there is API compatibility to consider.
我同意 @derekwaynecarr
我认为,为了充分满足上述用例,我们正在倾向于systemd级别的语义,在pod规范中,需要有一个有序的
服务容器图。你基本上可以把init容器折叠到普通容器图中,并在Container中添加两个字段;
oneshot bool
,表示容器是否终止,依赖的容器是否应该等待它终止(处理init容器 w/排序),和requires map[string]
一个当前容器依赖的容器名称列表。这足够灵活,可以容纳一个
oneshot: true
容器(init 容器)依赖于一个oneshot: false
容器(init 容器依赖的代理容器)。诚然,这将是一个相当大的工程,而且还要考虑API的兼容性。
thockin:
I have also been thinking about this. There are a number of open issues, feature-requests, etc that all circle around the topic of pod and container lifecycle. I’ve been a vocal opponent of complex API here, but it’s clear that what we have is inadequate.
When we consider init-container x sidecar-container, it is clear we will inevitably eventually need an init-sidecar.
我也一直在思考这个问题。有一些开放的问题、功能需求等,都是围绕着pod和容器生命周期这个话题展开的。我在这里一直是复杂API的强烈反对者,但很明显,我们所拥有的是不够的。
当我们考虑 init-container x sidecar-container 时,很明显我们最终将不可避免地需要一个init-sidecar。
Some (non-exhaustive) of the other related topics:
- Node shutdown -> Pod shutdown (in progress?)
- Voluntary pod restart (“Something bad happened, please burn me down to the ground and start over”)
- Voluntary pod failure (“I know something you don’t, and I can’t run here - please terminate me and do not retry”)
- “Critical” or “Keystone” containers (“when this container exits, the rest should be stopped”)
- Startup/shutdown phases with well-defined semantics (e.g. “phase 0 has no network”)
- Mixed restart policies in a pod (e.g. helper container which runs and terminates)
- Clearer interaction between pod, network, and device plugins
其他的一些(非详尽的)相关主题:
- 节点关闭 -> Pod关闭(正在进行中?
- 自愿重启pod(“发生了不好的事情,请把我摧毁,然后重新开始”)。
- 自愿pod失败(“我知道一些你不知道的事情,我无法在这里运行–请终止我,不要重试”)
- “关键 “或 “基石 “容器(“当这个容器退出时,其他容器应停止”)。
- 具有明确语义的启动/关闭阶段(如 “phase 0 没有网络”)。
- 在一个pod中混合重启策略(例如,帮助容器,它会运行并终止)。
- 更清晰的 pod、网络和设备插件之间的交互。
** thockin:**
This is a big enough topic that we almost certainly need to explore multiple avenues before we can have confidence in any one.
这是一个足够大的话题,我们几乎肯定需要探索多种途径,才能对任何一种途径有信心。
kfox1111:
the dependency idea also would allow for doing an init container, then a sidecar network plugin, then more init containers, etc, which has some nice features.
Also the readyness checks and oneshot could all play together with the dependencies so the next steps aren’t started before ready.
So, as a user experience, I think that api might be very nice.
Implementation wise there are probably lots of edge cases to carefully consider there.
依赖的想法还可以做一个init容器,然后做一个sidecar网络插件,然后做更多的init容器等等,这有一些不错的功能。
另外 readyness 检查和 oneshot 都可以和依赖一起考虑,这样就不会在准备好之前就开始下一步。
所以,作为用户体验来说,我觉得这个api可能是非常不错的。
从实现上来说,可能有很多边缘情况需要仔细考虑。
SergeyKanzhelev:
this is great idea to set up a working group to move it forward in bigger scope. One topic I suggest we cover early on in the discussions is whether we need to address the existing pain point of injecting sidecars in jobs in 1.20. This KEP intentionally limited the scope to just this - formalizing what people are already trying to do today with workarounds.
From Google side we also would love the bigger scope of a problem be addressed, but hope to address some immediate pain points early if possible. Either in current scope or slightly bigger.
这是一个很好的想法,成立一个工作组,在更大范围内推进它。我建议我们在讨论中尽早涉及的一个话题是,我们是否需要在1.20中解决现有的Job中注入 sidecar 的痛点。这个KEP有意将范围限制在这一点上–将人们今天已经在尝试的工作方法正式化。
从Google方面来说,我们也希望更大范围的问题能够得到解决,但如果可能的话,希望能够尽早解决一些直接的痛点。要么在目前的范围内,要么稍微大一点。
derekwaynecarr:
I would speculate that the dominant consumer of the job scenario is a job that required participation in a mesh to complete its task, and since I don’t see much point in solving for the mesh use case (which I view as the primary motivator for defining side car semantics) for only one workload type, I would rather ensure a pattern that solves the problem in light of our common experience across mesh and k8s communities.
我推测工作场景的主要消费者是需要参与网格来完成任务的Job,由于我认为只为一种工作负载类型解决mesh用例(我认为这是定义 sidecar 语义的主要动机)没有太大意义,所以我宁愿根据我们在 mesh 和k8s社区中的共同经验,确保一个能解决问题的模式。