这是本节的多页打印视图。 点击此处打印.

返回本页常规视图.

Kubernetes学习笔记

介绍Kubernetes学习笔记的基本资料和访问方式

1 - 介绍

Kubernetes的介绍,以及Kubernetes的资料收集

1.1 - Kubernetes概述

Kubernetes概述

kubernetes是什么?

Kubernetes是一个可移植,可扩展的开源平台,用于管理容器化工作负载和服务,有助于声明性配置和自动化。 它拥有庞大而快速发展的生态系统。 Kubernetes服务,支持和工具广泛可用。

谷歌在2014年开源了Kubernetes项目。Kubernetes建立在谷歌十五年来大规模运行生产负载经验的基础上,结合了社区中最佳的创意和实践。

为什么需要kubernets和它可以做什么?

Kubernetes拥有许多功能。 它可以被认为是:

  • 容器平台
  • 微服务平台
  • 可移植云平台,还有更多

Kubernetes提供以容器为中心的管理环境。 它代表用户工作负载编排计算,网络和存储基础设施。这极大的简化了PaaS,并具有IaaS的灵活性,还支持跨基础设施提供商的可移植性。

Kubernetes如何成为一个平台?

Kubernetes提供了许多功能,但总会有新的方案可以从新功能中受益。它可以简化特定于应用程序的工作流程,以加快开发速度。 最初被认可的编排通常需要较强的大规模自动化能力。这就是为什么Kubernetes还可以作为构建组件和工具生态系统的平台,以便更轻松地部署,扩展和管理应用程序。

Label 允许用户按照自己的方式组织管理对应的资源。 Annotations 使用户能够以自定义的描述信息来修饰资源,以适用于自己的工作流,并为管理工具提供检查点状态的简单方法。

此外,Kubernetes 控制面构建在相同的 API 上面,开发人员和用户都可以用。用户可以编写自己的控制器, 如调度器,如果这么做,根据新加的自定义 API ,可以扩展当前的通用 CLI 命令行工具。

此外,Kubernetes控制平面基于开发人员和用户可用的相同API构建。用户可以使用自己的API编写自己的控制器,如scheduler,这些API可以通过通用命令行工具进行定位。

这种设计使得能够在Kubernetes上面构建许多其他系统。

Kubernetes不是什么?

Kubernetes 不是一个传统的,包罗万象的 PaaS(Platform as a Service)系统。由于Kubernetes在容器级而非硬件级运行,因此它提供了PaaS产品常用的一些通用功能,例如部署,扩展,负载平衡,日志和监控。 但是,Kubernetes不是单体,而且这些默认解决方案是可选的和可插拔的。 Kubernetes提供了构建开发人员平台的构建块,但在重要的地方保留了用户选择和灵活性。

Kubernetes:

  • 不限制支持的应用程序类型。 Kubernetes旨在支持各种各样的工作负载,包括无状态,有状态和数据处理工作负载。如果一个应用程序可以在一个容器中运行,它应该在Kubernetes上运行得很好。
  • 不部署源代码并且不构建您的应用程序。持续集成,交付和部署(CI / CD)工作流程由组织文化和偏好以及技术要求决定。
  • 不提供应用程序级服务,例如中间件(例如,消息总线),数据处理框架(例如,Spark),数据库(例如,mysql),高速缓存,也不提供集群存储系统(例如,Ceph)作为内建服务。这些组件可以在Kubernetes上运行,可以被在Kubernetes上运行的应用程序访问,通过可移植机制(例如Open Service Broker)。
  • 不指定记录,监控或告警解决方案。它提供了一些集成作为概念证明,以及收集和导出指标的机制。
  • 不提供或授权配置语言/系统(例如,jsonnet)。它提供了一个声明性API,可以通过任意形式的声明性规范来实现。
  • 不提供或采用任何全面的机器配置,维护,管理或自我修复系统。

此外,Kubernetes不仅仅是编排系统。实际上,它消除了编排的需要。业务流程的技术定义是执行定义的工作流程:首先执行A,然后运行B,然后运行C.相反,Kubernetes由一组独立的,可组合的控制流程组成,这些流程将当前状态持续推向所提供的所需状态。 如何从A到C无关紧要。也不需要集中控制。 这使得系统更易于使用且功能更强大,更强大,更具弹性且可扩展。

为什么用容器?

找找应该使用容器的原因?

部署应用程序的旧方法是使用操作系统软件包管理器在主机上安装应用程序。这样做的缺点是将应用程序的可执行文件,配置,类库和生命周期混在一起,并与主机操作系统纠缠在一起。 可以构建不可变的虚拟机映像以实现可预测的部署和回滚,但虚拟机是重量级且不可移植的。

新方法是基于操作系统级虚拟化而不是硬件虚拟化来部署容器。这些容器彼此隔离并与主机隔离:它们具有自己的文件系统,它们无法看到彼此的进程,并且它们的计算资源使用可能是有限的。它们比虚拟机更容易构建,并且因为它们与底层基础设施和主机文件系统解藕,所以它们可以跨云和操作系统分发进行移植。

由于容器小而快,因此可以在每个容器映像中打包一个应用程序。 这种一对一的应用程序到映像关系解锁了容器的全部优势。 使用容器,可以在构建/发布时而不是部署时创建不可变容器映像,因为每个应用程序不需要与应用程序堆栈的其余部分组合,也不需要与生产基础设施环境结合。 在构建/发布时生成容器映像可以实现从开发到生产的一致环境。 同样,容器比VM更加透明,这有利于监控和管理。当容器的进程生命周期由基础设施管理而不是由容器内的进程管理器隐藏时,尤其如此。 最后,每个容器使用一个应用程序,管理容器就等于管理应用程序的部署。

容器好处总结如下:

  • 应用程序创建和部署更敏捷:与VM映像使用相比,增加了容器映像创建的简便性和效率。
  • 持续开发,集成和部署:通过快速简便的回滚(源于镜像不变性)提供可靠且频繁的容器镜像构建和部署。
  • Dev和Ops关注点分离:在构建/发布时而不是部署时创建应用程序容器映像,从而将应用程序与基础设施解耦。
  • 可观察性:不仅可以显示操作系统级别的信息和指标,还可以显示应用程序运行状况和其他信号。
  • 开发,测试和生产的环境一致性:在笔记本电脑上运行与在云中运行相同。
  • 云和OS分发可移植性:在Ubuntu,RHEL,CoreOS,本地,Google Kubernetes引擎以及其他任何地方运行。
  • 以应用程序为中心的管理:提升抽象级别,从在虚拟硬件上运行OS到使用逻辑资源在OS上运行应用程序。
  • 松散耦合,分布式,弹性,解放的微服务:应用程序被分解为更小,独立的部分,可以动态部署和管理 - 而不是在一台大型单一用途机器上运行的单体堆栈。
  • 资源隔离:可预测的应用程序性能。
  • 资源利用:高效率和高密度。

参考资料

1.2 - Kubernetes资料收集

收集Kubernetes的各种资料

官方资料

社区资料

学习资料

  • Kubernetes指南: 这是目前最新最好的Kubernetes中文资料,强烈推荐!

2 - 安装

Kubernetes的安装

2.1 - 通过 kubeadm 安装 kubenetes

通过 kubeadm 安装 kubenetes 集群

2.1.1 - 在 debian12 上安装 kubenetes

在 debian12 上用 kubeadm 安装 kubenetes

参考官方文档:

https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/install-kubeadm/

2.1.1.1 - 准备工作

在 debian12 上安装 kubenetes 之前的准备工作

系统更新

确保更新debian系统到最新,移除不再需要的软件,清理无用的安装包:

sudo apt update && sudo apt full-upgrade -y
sudo apt autoremove
sudo apt autoclean

如果更新了内核,最好重启一下。

swap 分区

安装 Kubernetes 要求机器不能有 swap 分区。

参考:

https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/install-kubeadm/#swap-configuration

开启模块

cat <<EOF | sudo tee /etc/modules-load.d/k8s.conf
overlay
br_netfilter
EOF

sudo modprobe overlay
sudo modprobe br_netfilter

# sysctl params required by setup, params persist across reboots
cat <<EOF | sudo tee /etc/sysctl.d/k8s.conf
net.bridge.bridge-nf-call-iptables  = 1
net.bridge.bridge-nf-call-ip6tables = 1
net.ipv4.ip_forward                 = 1
EOF

# Apply sysctl params without reboot
sudo sysctl --system

container runtime

Kubernetes 支持多种 container runtime,这里暂时继续使用 docker engine + cri-dockerd。

参考:

https://kubernetes.io/docs/setup/production-environment/container-runtimes/

安装 docker + cri-dockerd

docker 的安装参考:

https://skyao.io/learning-docker/docs/installation/debian12/

cri-dockerd 的安装参考:

https://mirantis.github.io/cri-dockerd/usage/install/

从 release 页面下载:

https://github.com/Mirantis/cri-dockerd/releases

debian 12 选择下载文件

https://github.com/Mirantis/cri-dockerd/releases/download/v0.3.16/cri-dockerd_0.3.16.3-0.debian-bookworm_amd64.deb

下载后安装:

sudo dpkg -i ./cri-dockerd_0.3.16.3-0.debian-bookworm_amd64.deb

安装后会提示:

Selecting previously unselected package cri-dockerd.
(Reading database ... 48498 files and directories currently installed.)
Preparing to unpack .../cri-dockerd_0.3.16.3-0.debian-bookworm_amd64.deb ...
Unpacking cri-dockerd (0.3.16~3-0~debian-bookworm) ...
Setting up cri-dockerd (0.3.16~3-0~debian-bookworm) ...
Created symlink /etc/systemd/system/multi-user.target.wants/cri-docker.service → /lib/systemd/system/cri-docker.service.
Created symlink /etc/systemd/system/sockets.target.wants/cri-docker.socket → /lib/systemd/system/cri-docker.socket.

安装后查看状态:

sudo systemctl status cri-docker.service

如果成功则状态为:

● cri-docker.service - CRI Interface for Docker Application Container Engine
     Loaded: loaded (/lib/systemd/system/cri-docker.service; enabled; preset: enabled)
     Active: active (running) since Tue 2025-03-04 19:18:50 CST; 3min 25s ago
TriggeredBy: ● cri-docker.socket
       Docs: https://docs.mirantis.com
   Main PID: 2665 (cri-dockerd)
      Tasks: 9
     Memory: 15.0M
        CPU: 21ms
     CGroup: /system.slice/cri-docker.service
             └─2665 /usr/bin/cri-dockerd --container-runtime-endpoint fd://

Mar 04 19:18:50 debian12 cri-dockerd[2665]: time="2025-03-04T19:18:50+08:00" level=info msg="Hairpin mode is set to none"
Mar 04 19:18:50 debian12 cri-dockerd[2665]: time="2025-03-04T19:18:50+08:00" level=info msg="The binary conntrack is not installed, this can cause failures in network conn>
Mar 04 19:18:50 debian12 cri-dockerd[2665]: time="2025-03-04T19:18:50+08:00" level=info msg="The binary conntrack is not installed, this can cause failures in network conn>
Mar 04 19:18:50 debian12 cri-dockerd[2665]: time="2025-03-04T19:18:50+08:00" level=info msg="Loaded network plugin cni"
Mar 04 19:18:50 debian12 cri-dockerd[2665]: time="2025-03-04T19:18:50+08:00" level=info msg="Docker cri networking managed by network plugin cni"
Mar 04 19:18:50 debian12 cri-dockerd[2665]: time="2025-03-04T19:18:50+08:00" level=info msg="Setting cgroupDriver systemd"
Mar 04 19:18:50 debian12 cri-dockerd[2665]: time="2025-03-04T19:18:50+08:00" level=info msg="Docker cri received runtime config &RuntimeConfig{NetworkConfig:&NetworkConfig>
Mar 04 19:18:50 debian12 cri-dockerd[2665]: time="2025-03-04T19:18:50+08:00" level=info msg="Starting the GRPC backend for the Docker CRI interface."
Mar 04 19:18:50 debian12 cri-dockerd[2665]: time="2025-03-04T19:18:50+08:00" level=info msg="Start cri-dockerd grpc backend"
Mar 04 19:18:50 debian12 systemd[1]: Started cri-docker.service - CRI Interface for Docker Application Container Engine.

安装 containerd

TODO:后面考虑换 containerd

安装 helm

参考:

https://helm.sh/docs/intro/install/#from-apt-debianubuntu

安装:

curl https://baltocdn.com/helm/signing.asc | gpg --dearmor | sudo tee /usr/share/keyrings/helm.gpg > /dev/null
sudo apt-get install apt-transport-https --yes
echo "deb [arch=$(dpkg --print-architecture) signed-by=/usr/share/keyrings/helm.gpg] https://baltocdn.com/helm/stable/debian/ all main" | sudo tee /etc/apt/sources.list.d/helm-stable-debian.list
sudo apt-get update
sudo apt-get install helm

安装后取消 helm 的自动更新:

sudo vi /etc/apt/sources.list.d/helm-stable-debian.list

查看安装的版本:

$ helm version
version.BuildInfo{Version:"v3.17.1", GitCommit:"980d8ac1939e39138101364400756af2bdee1da5", GitTreeState:"clean", GoVersion:"go1.23.5"}

2.1.1.2 - 安装命令行

在 debian12 上安装 kubeadm / kubelet / kubectl

参考: https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/install-kubeadm/

安装 kubeadm / kubelet / kubectl

sudo apt-get update
sudo apt-get install -y apt-transport-https ca-certificates curl gpg

假定要安装的 kubernetes 版本为 1.32:

export K8S_VERSION=1.32

# sudo mkdir -p -m 755 /etc/apt/keyrings
curl -fsSL https://pkgs.k8s.io/core:/stable:/v${K8S_VERSION}/deb/Release.key | sudo gpg --dearmor -o /etc/apt/keyrings/kubernetes-apt-keyring.gpg

echo "deb [signed-by=/etc/apt/keyrings/kubernetes-apt-keyring.gpg] https://pkgs.k8s.io/core:/stable:/v${K8S_VERSION}/deb/ /" | sudo tee /etc/apt/sources.list.d/kubernetes.list

开始安装 kubelet kubeadm kubectl:

sudo apt-get update
sudo apt-get install -y kubelet kubeadm kubectl

禁止这三个程序的自动更新:

sudo apt-mark hold kubelet kubeadm kubectl

验证安装:

kubectl version --client && echo && kubeadm version

输出为:

Client Version: v1.32.2
Kustomize Version: v5.5.0

kubeadm version: &version.Info{Major:"1", Minor:"32", GitVersion:"v1.32.2", GitCommit:"67a30c0adcf52bd3f56ff0893ce19966be12991f", GitTreeState:"clean", BuildDate:"2025-02-12T21:24:52Z", GoVersion:"go1.23.6", Compiler:"gc", Platform:"linux/amd64"}

在运行 kubeadm 之前,先启动 kubelet 服务:

sudo systemctl enable --now kubelet

安装后配置

优化 zsh

vi ~/.zshrc

增加以下内容:

# k8s auto complete
alias k=kubectl
complete -F __start_kubectl k

执行:

source ~/.zshrc

之后即可使用,此时用 k 这个别名来执行 kubectl 命令时也可以实现自动完成,非常的方便。

取消更新

kubeadm / kubelet / kubectl 的版本没有必要升级到最新,因此可以取消他们的自动更新。

sudo vi /etc/apt/sources.list.d/kubernetes.list

2.1.1.3 - 初始化集群

在 debian12 上初始化 kubernetes 集群

参考官方文档:

https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/create-cluster-kubeadm/

初始化集群

pod-network-cidr 尽量用 10.244.0.0/16 这个范围,不然有些网络插件会需要额外的配置。

cri-socket 的配置参考:

https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/install-kubeadm/#installing-runtime

因为前面用的 Docker Engine 和 cri-dockerd ,因此这里的 cri-socket 需要指定为 “unix:///var/run/cri-dockerd.sock”。

apiserver-advertise-address 需要指定为当前节点的 IP 地址,因为当前节点是单节点,因此这里指定为 192.168.3.215。

sudo kubeadm init --pod-network-cidr 10.244.0.0/16 --cri-socket unix:///var/run/cri-dockerd.sock --apiserver-advertise-address=192.168.3.215

输出为:

[init] Using Kubernetes version: v1.32.2
[preflight] Running pre-flight checks
[preflight] Pulling images required for setting up a Kubernetes cluster
[preflight] This might take a minute or two, depending on the speed of your internet connection
[preflight] You can also perform this action beforehand using 'kubeadm config images pull'
W0304 20:23:50.183712    5058 checks.go:846] detected that the sandbox image "registry.k8s.io/pause:3.9" of the container runtime is inconsistent with that used by kubeadm.It is recommended to use "registry.k8s.io/pause:3.10" as the CRI sandbox image.
[certs] Using certificateDir folder "/etc/kubernetes/pki"
[certs] Generating "ca" certificate and key
[certs] Generating "apiserver" certificate and key
[certs] apiserver serving cert is signed for DNS names [debian12 kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local] and IPs [10.96.0.1 192.168.3.215]
[certs] Generating "apiserver-kubelet-client" certificate and key
[certs] Generating "front-proxy-ca" certificate and key
[certs] Generating "front-proxy-client" certificate and key
[certs] Generating "etcd/ca" certificate and key
[certs] Generating "etcd/server" certificate and key
[certs] etcd/server serving cert is signed for DNS names [debian12 localhost] and IPs [192.168.3.215 127.0.0.1 ::1]
[certs] Generating "etcd/peer" certificate and key
[certs] etcd/peer serving cert is signed for DNS names [debian12 localhost] and IPs [192.168.3.215 127.0.0.1 ::1]
[certs] Generating "etcd/healthcheck-client" certificate and key
[certs] Generating "apiserver-etcd-client" certificate and key
[certs] Generating "sa" key and public key
[kubeconfig] Using kubeconfig folder "/etc/kubernetes"
[kubeconfig] Writing "admin.conf" kubeconfig file
[kubeconfig] Writing "super-admin.conf" kubeconfig file
[kubeconfig] Writing "kubelet.conf" kubeconfig file
[kubeconfig] Writing "controller-manager.conf" kubeconfig file
[kubeconfig] Writing "scheduler.conf" kubeconfig file
[etcd] Creating static Pod manifest for local etcd in "/etc/kubernetes/manifests"
[control-plane] Using manifest folder "/etc/kubernetes/manifests"
[control-plane] Creating static Pod manifest for "kube-apiserver"
[control-plane] Creating static Pod manifest for "kube-controller-manager"
[control-plane] Creating static Pod manifest for "kube-scheduler"
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Starting the kubelet
[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests"
[kubelet-check] Waiting for a healthy kubelet at http://127.0.0.1:10248/healthz. This can take up to 4m0s
[kubelet-check] The kubelet is healthy after 500.939992ms
[api-check] Waiting for a healthy API server. This can take up to 4m0s
[api-check] The API server is healthy after 3.00043501s
[upload-config] Storing the configuration used in ConfigMap "kubeadm-config" in the "kube-system" Namespace
[kubelet] Creating a ConfigMap "kubelet-config" in namespace kube-system with the configuration for the kubelets in the cluster
[upload-certs] Skipping phase. Please see --upload-certs
[mark-control-plane] Marking the node debian12 as control-plane by adding the labels: [node-role.kubernetes.io/control-plane node.kubernetes.io/exclude-from-external-load-balancers]
[mark-control-plane] Marking the node debian12 as control-plane by adding the taints [node-role.kubernetes.io/control-plane:NoSchedule]
[bootstrap-token] Using token: 8e5a3n.rqbqfbnvhf4uyjft
[bootstrap-token] Configuring bootstrap tokens, cluster-info ConfigMap, RBAC Roles
[bootstrap-token] Configured RBAC rules to allow Node Bootstrap tokens to get nodes
[bootstrap-token] Configured RBAC rules to allow Node Bootstrap tokens to post CSRs in order for nodes to get long term certificate credentials
[bootstrap-token] Configured RBAC rules to allow the csrapprover controller automatically approve CSRs from a Node Bootstrap Token
[bootstrap-token] Configured RBAC rules to allow certificate rotation for all node client certificates in the cluster
[bootstrap-token] Creating the "cluster-info" ConfigMap in the "kube-public" namespace
[kubelet-finalize] Updating "/etc/kubernetes/kubelet.conf" to point to a rotatable kubelet client certificate and key
[addons] Applied essential addon: CoreDNS
[addons] Applied essential addon: kube-proxy

Your Kubernetes control-plane has initialized successfully!

To start using your cluster, you need to run the following as a regular user:

  mkdir -p $HOME/.kube
  sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
  sudo chown $(id -u):$(id -g) $HOME/.kube/config

Alternatively, if you are the root user, you can run:

  export KUBECONFIG=/etc/kubernetes/admin.conf

You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
  https://kubernetes.io/docs/concepts/cluster-administration/addons/

Then you can join any number of worker nodes by running the following on each as root:

kubeadm join 192.168.3.215:6443 --token 8e5a3n.rqbqfbnvhf4uyjft \
        --discovery-token-ca-cert-hash sha256:183b3e9965d298e67689baddeff2ff88c32b3f18aa9dd9a15be1881d26025a22

根据提示操作:

mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config

对于测试用的单节点,去除 master/control-plane 的污点:

kubectl taint nodes --all node-role.kubernetes.io/control-plane-

执行:

kubectl get node  

能看到此时节点的状态会是 NotReady:

NAME       STATUS     ROLES           AGE     VERSION
debian12   NotReady   control-plane   3m49s   v1.32.2

执行:

kubectl describe node debian12

能看到节点的错误信息:

Conditions:
  Type             Status  LastHeartbeatTime                 LastTransitionTime                Reason                       Message
  ----             ------  -----------------                 ------------------                ------                       -------
  MemoryPressure   False   Tue, 04 Mar 2025 20:28:00 +0800   Tue, 04 Mar 2025 20:23:53 +0800   KubeletHasSufficientMemory   kubelet has sufficient memory available
  DiskPressure     False   Tue, 04 Mar 2025 20:28:00 +0800   Tue, 04 Mar 2025 20:23:53 +0800   KubeletHasNoDiskPressure     kubelet has no disk pressure
  PIDPressure      False   Tue, 04 Mar 2025 20:28:00 +0800   Tue, 04 Mar 2025 20:23:53 +0800   KubeletHasSufficientPID      kubelet has sufficient PID available
  Ready            False   Tue, 04 Mar 2025 20:28:00 +0800   Tue, 04 Mar 2025 20:23:53 +0800   KubeletNotReady              container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized

需要继续安装网络插件。

安装网络插件

安装 flannel

参考官方文档: https://github.com/flannel-io/flannel#deploying-flannel-with-kubectl

kubectl apply -f https://github.com/flannel-io/flannel/releases/latest/download/kube-flannel.yml

如果一切正常,就能看到 k8s 集群内的 pod 都启动完成状态为 Running:

k get pods -A
NAMESPACE      NAME                               READY   STATUS    RESTARTS        AGE
kube-flannel   kube-flannel-ds-ts6n8              1/1     Running   7 (9m27s ago)   15m
kube-system    coredns-668d6bf9bc-rbkzb           1/1     Running   0               3h55m
kube-system    coredns-668d6bf9bc-vbltg           1/1     Running   0               3h55m
kube-system    etcd-debian12                      1/1     Running   0               3h55m
kube-system    kube-apiserver-debian12            1/1     Running   1 (5h57m ago)   3h55m
kube-system    kube-controller-manager-debian12   1/1     Running   0               3h55m
kube-system    kube-proxy-95ccr                   1/1     Running   0               3h55m
kube-system    kube-scheduler-debian12            1/1     Running   1 (6h15m ago)   3h55m

如果发现 kube-flannel-ds pod 的状态总是 CrashLoopBackOff:

 k get pods -A
NAMESPACE      NAME                               READY   STATUS              RESTARTS        AGE
kube-flannel   kube-flannel-ds-ts6n8              0/1     CrashLoopBackOff    2 (22s ago)     42s

继续查看 pod 的具体错误信息:

k describe pods -n kube-flannel kube-flannel-ds-ts6n8

发现报错 “Back-off restarting failed container kube-flannel in pod kube-flannel”:

Events:
  Type     Reason     Age                 From               Message
  ----     ------     ----                ----               -------
  Normal   Scheduled  117s                default-scheduler  Successfully assigned kube-flannel/kube-flannel-ds-ts6n8 to debian12
  Normal   Pulled     116s                kubelet            Container image "ghcr.io/flannel-io/flannel-cni-plugin:v1.6.2-flannel1" already present on machine
  Normal   Created    116s                kubelet            Created container: install-cni-plugin
  Normal   Started    116s                kubelet            Started container install-cni-plugin
  Normal   Pulled     115s                kubelet            Container image "ghcr.io/flannel-io/flannel:v0.26.4" already present on machine
  Normal   Created    115s                kubelet            Created container: install-cni
  Normal   Started    115s                kubelet            Started container install-cni
  Normal   Pulled     28s (x5 over 114s)  kubelet            Container image "ghcr.io/flannel-io/flannel:v0.26.4" already present on machine
  Normal   Created    28s (x5 over 114s)  kubelet            Created container: kube-flannel
  Normal   Started    28s (x5 over 114s)  kubelet            Started container kube-flannel
  Warning  BackOff    2s (x10 over 110s)  kubelet            Back-off restarting failed container kube-flannel in pod kube-flannel-ds-ts6n8_kube-flannel(1e03c200-2062-4838

此时应该去检查准备工作中 “开启模块” 一节的内容是不是有疏漏。

补救之后,就能看到 kube-flannel-ds 这个 pod 正常运行了:

k get pods -A
NAMESPACE      NAME                               READY   STATUS    RESTARTS        AGE
kube-flannel   kube-flannel-ds-ts6n8              1/1     Running   7 (9m27s ago)   15m

安装 Calico

https://docs.tigera.io/calico/latest/getting-started/kubernetes/self-managed-onprem/onpremises#install-calico

查看最新版本,当前最新版本是 v3.29.2:

kubectl create -f https://raw.githubusercontent.com/projectcalico/calico/v3.29.2/manifests/tigera-operator.yaml

TODO:用了 flannel, Calico 后面再验证。

2.1.1.4 - 安装 dashboard

安装 kubernetes 的 dashboard

安装 dashboard

参考:https://github.com/kubernetes/dashboard/#installation

在下面地址上查看当前 dashboard 的版本:

https://github.com/kubernetes/dashboard/releases

根据对 kubernetes 版本的兼容情况选择对应的 dashboard 的版本:

  • kubernetes-dashboard-7.11.0 ,兼容 k8s 1.32

最新版本需要用 helm 进行安装:

helm repo add kubernetes-dashboard https://kubernetes.github.io/dashboard/
helm upgrade --install kubernetes-dashboard kubernetes-dashboard/kubernetes-dashboard --create-namespace --namespace kubernetes-dashboard

输出为:

"kubernetes-dashboard" already exists with the same configuration, skipping
Release "kubernetes-dashboard" does not exist. Installing it now.
NAME: kubernetes-dashboard
LAST DEPLOYED: Wed Mar  5 00:53:17 2025
NAMESPACE: kubernetes-dashboard
STATUS: deployed
REVISION: 1
TEST SUITE: None
NOTES:
*************************************************************************************************
*** PLEASE BE PATIENT: Kubernetes Dashboard may need a few minutes to get up and become ready ***
*************************************************************************************************

Congratulations! You have just installed Kubernetes Dashboard in your cluster.

To access Dashboard run:
  kubectl -n kubernetes-dashboard port-forward svc/kubernetes-dashboard-kong-proxy 8443:443

NOTE: In case port-forward command does not work, make sure that kong service name is correct.
      Check the services in Kubernetes Dashboard namespace using:
        kubectl -n kubernetes-dashboard get svc

Dashboard will be available at:
  https://localhost:8443

此时 dashboard 的 service 和 pod 情况:

kubectl -n kubernetes-dashboard get services

输出为:

NAME                                   TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)    AGE
kubernetes-dashboard-api               ClusterIP   10.108.225.190   <none>        8000/TCP   2m5s
kubernetes-dashboard-auth              ClusterIP   10.99.205.102    <none>        8000/TCP   2m5s
kubernetes-dashboard-kong-proxy        ClusterIP   10.96.247.162    <none>        443/TCP    2m5s
kubernetes-dashboard-metrics-scraper   ClusterIP   10.103.222.22    <none>        8000/TCP   2m5s
kubernetes-dashboard-web               ClusterIP   10.108.219.9     <none>        8000/TCP   2m5s

查看 pod 的情况:

kubectl -n kubernetes-dashboard get pods

等待两三分钟之后,pod 启动完成,输出为:

NAME                                                    READY   STATUS    RESTARTS   AGE
kubernetes-dashboard-api-7d8567b8f-9ksk2                1/1     Running   0          3m8s
kubernetes-dashboard-auth-6877bf44b9-9qfmg              1/1     Running   0          3m8s
kubernetes-dashboard-kong-79867c9c48-rzlhp              1/1     Running   0          3m8s
kubernetes-dashboard-metrics-scraper-794c587449-6phjv   1/1     Running   0          3m8s
kubernetes-dashboard-web-75576c76b-sm2wj                1/1     Running   0          3m8s

为了方便,使用 node port 来访问 dashboard,需要执行:

kubectl -n kubernetes-dashboard edit service kubernetes-dashboard-kong-proxy

然后修改 type: ClusterIPtype: NodePort。然后看一下具体分配的 node port 是哪个:

kubectl -n kubernetes-dashboard get service kubernetes-dashboard-kong-proxy

输出为:

NAME                              TYPE       CLUSTER-IP      EXTERNAL-IP   PORT(S)         AGE
kubernetes-dashboard-kong-proxy   NodePort   10.96.247.162   <none>        443:32616/TCP   17m

现在可以用浏览器直接访问:

https://192.168.3.215:32616/

创建用户并登录 dashboard

参考:Creating sample user

创建 admin-user 用户:

vi dashboard-adminuser.yaml

内容为:

apiVersion: v1
kind: ServiceAccount
metadata:
  name: admin-user
  namespace: kubernetes-dashboard

执行:

k create -f dashboard-adminuser.yaml

然后绑定角色:

vi dashboard-adminuser-binding.yaml

内容为:

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: admin-user
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: cluster-admin
subjects:
- kind: ServiceAccount
  name: admin-user
  namespace: kubernetes-dashboard

执行:

k create -f dashboard-adminuser-binding.yaml

然后创建 token :

kubectl -n kubernetes-dashboard create token admin-user

输出为:

eyJhbGciOiJSUzI1NiIsImtpZCI6Ik9sWnJsTk5UNE9JVlVmRFMxMUpwNC1tUlVndTl5Zi1WQWtmMjIzd2hDNmcifQ.eyJhdWQiOlsiaHR0cHM6Ly9rdWJlcm5ldGVzLmRlZmF1bHQuc3ZjLmNsdXN0ZXIubG9jYWwiXSwiZXhwIjoxNzQxMTEyNDg4LCJpYXQiOjE3NDExMDg4ODgsImlzcyI6Imh0dHBzOi8va3ViZXJuZXRlcy5kZWZhdWx0LnN2Yy5jbHVzdGVyLmxvY2FsIiwianRpIjoiNDU5ZGQxNjctNWI5OS00MWIzLTgzZWEtNGIxMGY3MTc5ZjEyIiwia3ViZXJuZXRlcy5pbyI6eyJuYW1lc3BhY2UiOiJrdWJlcm5ldGVzLWRhc2hib2FyZCIsInNlcnZpY2VhY2NvdW50Ijp7Im5hbWUiOiJhZG1pbi11c2VyIiwidWlkIjoiZjMxN2VhZTItNTNiNi00MGZhLWI3MWYtMzZiNDI1YmY4YWQ0In19LCJuYmYiOjE3NDExMDg4ODgsInN1YiI6InN5c3RlbTpzZXJ2aWNlYWNjb3VudDprdWJlcm5ldGVzLWRhc2hib2FyZDphZG1pbi11c2VyIn0.TYzOdrMFXcSEeVMbc1ewIA13JVi4FUYoRN7rSH5OstbVfKIF48X_o1RWxOGM_AurhgLxuKZHzmns3K_pX_OR3u1URfK6-gGos4iAQY-H1yntfRmzzsip_FbZh95EYFGTN43gw21jTyfem3OKBXXLgzsnVT_29uMnJzSnCDnrAciVKMoCEUP6x2RSHQhp6PrxrIrx_NMB3vojEZYq3AysQoNqYYjRDd4MnDRClm03dNvW5lvKSgNCVmZFje_EEa2EhI2X6d3X8zx6tHwT5M4-T3hMmyIpzHUwf3ixeZR85rhorMbskNVvRpH6VLH6BXP31c3NMeSgYk3BG8d7UjCYxQ

这个 token 就可以用在 kubernetes-dashboard 的登录页面上了。

为了方便,将这个 token 存储在 Secret :

vi dashboard-adminuser-secret.yaml

内容为:

apiVersion: v1
kind: Secret
metadata:
  name: admin-user
  namespace: kubernetes-dashboard
  annotations:
    kubernetes.io/service-account.name: "admin-user"   
type: kubernetes.io/service-account-token

执行:

k create -f dashboard-adminuser-secret.yaml

之后就可以用命令随时获取这个 token 了:

kubectl get secret admin-user -n kubernetes-dashboard -o jsonpath="{.data.token}" | base64 -d

2.1.1.5 - 安装 metrics server

安装 kubernetes 的 metrics server

参考:https://github.com/kubernetes-sigs/metrics-server/#installation

安装 metrics server

下载:

mkdir -p ~/work/soft/k8s
cd ~/work/soft/k8s
wget https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml

修改下载下来的 components.yaml, 增加 --kubelet-insecure-tls 并修改 --kubelet-preferred-address-types

  template:
    metadata:
      labels:
        k8s-app: metrics-server
    spec:
      containers:
      - args:
        - --cert-dir=/tmp
        - --secure-port=4443
        - --kubelet-preferred-address-types=InternalIP   # 修改这行,默认是InternalIP,ExternalIP,Hostname
        - --kubelet-use-node-status-port
        - --metric-resolution=15s
        - --kubelet-insecure-tls  # 增加这行

然后安装:

k apply -f components.yaml

稍等片刻看是否启动:

kubectl get pod -n kube-system | grep metrics-server

验证一下,查看 service 信息

kubectl describe svc metrics-server -n kube-system

简单验证一下基本使用:

kubectl top nodes
kubectl top pods -n kube-system 

参考资料

2.1.1.6 - 安装监控

安装 prometheus 和 grafana 以监控 kubernetes 集群

参考:https://github.com/prometheus-operator/prometheus-operator

https://computingforgeeks.com/setup-prometheus-and-grafana-on-kubernetes/

2.2 - 安装kubectl

单独安装 kubectl 命令行工具

kubectl 是 Kubernetes 的命令行工具,允许对Kubernetes集群运行命令。

单独安装 kubectl 命令行工具,可以方便的在本地远程操作集群。

2.2.1 - 在 ubuntu 上安装 kubectl

安装配置 kubectl

参考 Kubernetes 官方文档:

分步骤安装

和后面安装 kubeadm 方式一样,只是这里只需要安装 kubectl 一个工具,不需要安装 kubeadm 和 kublete

执行如下命令:

sudo apt-get update
sudo apt-get install -y apt-transport-https ca-certificates curl

curl -fsSL https://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo gpg --dearmor -o /etc/apt/keyrings/kubernetes-archive-keyring.gpg

echo "deb [signed-by=/etc/apt/keyrings/kubernetes-archive-keyring.gpg] https://apt.kubernetes.io/ kubernetes-xenial main" | sudo tee /etc/apt/sources.list.d/kubernetes.list
sudo apt-get update

k8s 暂时固定使用 1.23.14 版本:

sudo apt-get install kubectl=1.23.14-00
# sudo apt-get install kubelet=1.23.14-00 kubeadm=1.23.14-00 kubectl=1.23.14-00

直接安装

不推荐这样安装,会安装最新版本,而且安装目录是 /usr/local/bin/

curl -LO "https://dl.k8s.io/release/$(curl -L -s https://dl.k8s.io/release/stable.txt)/bin/linux/amd64/kubectl"
sudo install -o root -g root -m 0755 kubectl /usr/local/bin/kubectl
rm kubectl

如果 /usr/local/bin/ 不在 path 路径下,则需要修改一下 path:

export PATH=/usr/local/bin:$PATH

验证一下:

kubectl version --output=yaml

输出为:

clientVersion:
  buildDate: "2023-06-14T09:53:42Z"
  compiler: gc
  gitCommit: 25b4e43193bcda6c7328a6d147b1fb73a33f1598
  gitTreeState: clean
  gitVersion: v1.27.3
  goVersion: go1.20.5
  major: "1"
  minor: "27"
  platform: linux/amd64
kustomizeVersion: v5.0.1

The connection to the server localhost:8080 was refused - did you specify the right host or port?

配置

oh-my-zsh自动完成

在使用 oh-my-zsh 之后,会更加的简单(强烈推荐使用 oh-my-zsh ),只要在 oh-my-zsh 的 plugins 列表中增加 kubectl 即可。

然后,在 ~/.zshrc 中增加以下内容:

# k8s auto complete
alias k=kubectl
complete -F __start_kubectl k

source ~/.zshrc 之后即可使用,此时用 k 这个别名来执行 kubectl 命令时也可以实现自动完成,非常的方便。

3 - Sidecar Container

Kubernetes Sidecar Container

3.1 - Sidecar Container概述

Kubernetes Sidecar Container概述

From Kubernetes 1.18 containers can be marked as sidecars

Unfortunately, that features has been removed from 1.18, then removed from 1.19 and currently has no specific date for landing.

reference: kubernetes/enhancements#753

资料

官方正式资料

社区介绍资料

相关项目的处理

Istio

信息1

https://github.com/kubernetes/enhancements/issues/753#issuecomment-684176649

We use a custom daemon image like a supervisor to wrap the user’s program. The daemon will also listen to a particular port to convey the health status of users’ programs (exited or not).

我们使用一个类似supervisor的自定义守护进程镜像来包装用户的程序。守护进程也会监听特定的端口来传达用户程序的健康状态(是否退出)。

Here is the workaround:

  • Using the daemon image as initContainers to copy the binary to a shared volume.
  • Our CD will hijack users’ command, let the daemon start first. Then, the daemon runs the users’ program until Envoy is ready.
  • Also, we add preStop, a script that keeps checking the daemon’s health status, for Envoy.

下面是变通的方法:

  • 以 “initContainers” 的方式用守护进程的镜像来复制二进制文件到共享卷。
  • 我们的 CD 会劫持用户的命令,让守护进程先启动,然后,守护进程运行用户的程序,直到 Envoy 准备好。
  • 同时,我们还为Envoy添加 preStop,一个不断检查守护进程健康状态的脚本。

As a result, the users’ process will start if Envoy is ready, and Envoy will stop after the process of users is exited.

结果,如果Envoy准备好了,用户的程序就会启动,而Envoy会在用户的程序退出后停止。

It’s a complicated workaround, but it works fine in our production environment.

这是一个复杂的变通方法,但在我们的生产环境中运行良好。

信息2

还找到一个答复: https://github.com/kubernetes/enhancements/issues/753#issuecomment-687184232

Allow users to delay application start until proxy is ready

for startup issues, the istio community came up with a quite clever workaround which basically injects envoy as the first container in the container list and adds a postStart hook that checks and wait for envoy to be ready. This is blocking and the other containers are not started making sure envoy is there and ready before starting the app container.

对于启动问题,istio社区想出了一个相当聪明的变通方法,基本上是将envoy作为容器列表中的第一个容器注入,并添加一个postStart钩子,检查并等待envoy准备好。这是阻塞的,而其他容器不会启动,这样确保envoy启动并且准备好之后,然后再启动应用程序容器。

We had to port this to the version we’re running but is quite straightforward and are happy with the results so far.

我们已经将其移植到我们正在运行的版本中,很直接,目前对结果很满意。

For shutdown we are also ‘solving’ with preStop hook but adding an arbitrary sleep which we hope the application would have gracefully shutdown before continue with SIGTERM.

对于关机,我们也用 preStop 钩子来 “解决”,但增加了一个任意的 sleep,我们希望应用程序在继续 SIGTERM 之前能优雅地关机。

相关issue: Enable holdApplicationUntilProxyStarts at pod level

Knative

dapr

3.2 - KEP753: Sidecar Container

Kubernetes KEP753: Sidecar Container

相关issue

https://github.com/kubernetes/enhancements/issues/753

这个issue 开启于 2019年1月。

One-line enhancement description: Containers can now be a marked as sidecars so that they startup before normal containers and shutdown after all other containers have terminated.

一句话改进描述:容器现在可以被标记为 sidecar,使其在正常容器之前启动,并在所有其他容器终止后关闭。

设计提案链接:https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/753-sidecar-containers

3.3 - 推翻KEP753的讨论

推翻 Kubernetes KEP753 Sidecar Container的讨论

https://github.com/kubernetes/enhancements/pull/1980

这是一个关于 sidecar 的讨论汇总,最后得出的结论是推翻 kep753.

起于derekwaynecarr的发言

I want to capture my latest thoughts on sidecar concepts, and get a path forward.

Here is my latest thinking:

我想归纳我对 sidecar 概念的最新思考,并得到一条前进的道路。

这是我的最新思考。

I think it’s important to ask if the introduction of sidecar containers will actually address an end-user requirement or just shift a problem and further constrain adoption of sidecars themselves by pod authors. To help frame this exercise, I will look at the proposed use of sidecar containers in the service mesh community.

我认为重要的是要问一下 sidecar容器的引入是否会真正解决最终用户的需求,或者只是转移一个问题,并进一步限制pod作者对sidecars本身的采用。为了帮助构架这项工作,我将看看服务网格社区中拟议的 sidecar 容器的使用情况。

User story

I want to enable mTLS for all traffic in my mesh because my auditor demands it.

我想在我的Mesh中启用mTLS,因为我的会计要求这样做。

The proposed solution is the introduction of sidecar containers that change the pod lifecycle:

提出的解决方案是引入sidecar container,改变 pod 的生命周期:

  1. Init containers start/stop
  2. Sidecar containers start
  3. Primary containers start/stop
  4. Sidecar containers stop

The issue with the proposed solution meeting the user story is as follows:

建议的解决方案可以满足用户故事的问题如下:

  • Init containers are not subject to service mesh because the proxy is not running. This is because init containers run to completion before starting the next container. Many users do network interaction that should be subject to the mesh in their init container.

    Init container 不受服务网格的影响,因为代理没有运行。这是因为init container 在启动下一个容器之前会运行到完成状态。很多用户在 init container 中做网络交互,应该受制于网格。

  • Sidecar containers (once introduced) will be used by users for use cases unrelated to the mesh, but subject to the mesh. The proposal makes no semantic guarantees on ordering among sidecars. Similar to init containers, this means sidecars are not guaranteed to participate in the mesh.

    Sidecar 容器(一旦引入)将被用户用于与网格无关但受网格制约的用例。该提案没有对sidecars之间的顺序进行语义保证。与 init 容器类似,这意味着 sidecar 不能保证参与 mesh。

The real requirement is that the proxy container MUST stop last even among sidecars if those sidecars require network.

真正的需求是,如果这些sidecar需要网络,代理容器也必须最后停止,即使代理容器也是 sidecar

Similar to the behavior observed with init containers (users externalize run-once setup from their main application container), the introduction of sidecar containers will result in more elements of the application getting externalized into sidecars, but those elements will still desire to be part of the mesh when they require a network. Hence, we are just shifting, and not solving the problem.

与观察到的init容器的行为类似(用户从他们的主应用容器中外部化一次性设置),引入sidecar容器将导致更多的应用元素被外部化到sidecar中,但是当这些元素需要网络时,它们仍然会渴望成为网格结构的一部分。因此,我们只是在转移,而不是解决问题

Given the above gaps, I feel we are not actually solving a primary requirement that would drive improved adoption of a service mesh (ensure all traffic is mTLS from my pod) to meet auditing.

鉴于上述差距,我觉得我们并没有实际上解决主要需求,这个需求将推动服务网格的改进采用(确保所有来自我的pod的流量都是mTLS),以满足审计。

Alternative proposal:

  • Support an ordered graph among containers in the pod (it’s inevitable), possibly with N rings of runlevels?
  • Identify which containers in that graph must run to completion before initiating termination (Job use case).
  • Move init containers into the graph (collapse the concept)
  • Have some way to express if a network is required by the container to act as a hint for the mesh community on where to inject a proxy in the graph.

替代建议:

  • 支持在pod中的容器之间建立一个有序图(这是不可避免的),可能有N个运行级别的环?
  • 识别该图中的哪些容器必须在启动终止之前运行至完成状态(Job用例)。
  • 将 init 容器移入图中(折叠概念)。
  • 有某种方式来标记容器是否需要网络,用来作为网格社区的提示,在图中某处注入代理。

A few other notes based on Red Hat’s experience with service mesh:

Red Hat does not support injection of privileged sidecar containers and will always require CNI approach. In this flow, the CNI runs, multus runs, iptables are setup, and then init containers start. The iptables rules are setup, but no proxy is running, so init containers lose connectivity. Users are unhappy that init containers are not participating in the mesh. Users should not have to sacrifice usage of an init container (or any aspect of the pod lifecycle) to fulfill auditor requirements. The API should be flexible enough to support graceful introduction in the right level of a intra pod life-cycle graph transparent to the user.

根据红帽在服务网格方面的经验,还有一些其他说明:

红帽不支持注入特权sidecar容器,总是需要CNI方式。在这个流程中,CNI运行,multus运行,设置iptables,然后 init 容器启动。iptables规则设置好了,但是没有代理运行,所以 init容器 失去了连接。用户对init容器不参与网格感到不满。用户不应该为了满足审计师的要求而牺牲init容器的使用(或pod生命周期的任何方面)。API应该足够灵活,以支持在正确的层次上优雅地引入对用户透明的 pod 生命周期图。

Proposed next steps:

  • Get a dedicated set of working meetings to ensure that across the mesh and kubernetes community, we can meet a users auditing requirement without limiting usage or adoption of init containers and/or sidecar containers themselves by pod authors.
  • I will send a doodle.

拟议的下一步措施:

召开一组专门的工作会议,以确保在整个mesh和kubernetes社区,我们可以满足用户审计要求,而不限制pod作者使用或采用init容器和/或sidecar容器本身。

我会发一个涂鸦。

其他人的意见

mrunalp

Agree! We might as well tackle this general problem vs. doing it step by step with baggage added along the way.

同意! 我们不妨解决这个普遍性的问题,而不是按部就班地做,在做的过程中增加包袱。

sjenning

I agree @derekwaynecarr

I think that in order to satisfy fully the use cases mentioned, we are gravitating toward systemd level semantics where there is just an ordered graph of services containers in the pod spec.

You could basically collapse init containers into the normal containers map and add two fields to Container; oneshot bool that expresses if the container terminates and dependent containers should wait for it to terminate (handles init containers w/ ordering), and requires map[string] a list of container names upon which the current container depends.

This is flexible enough to accommodate a oneshot: true container (init container) depending on a oneshot: false container (a proxy container on which the init container depends).

Admittedly this would be quite the undertaking and there is API compatibility to consider.

我同意 @derekwaynecarr

我认为,为了充分满足上述用例,我们正在倾向于systemd级别的语义,在pod规范中,需要有一个有序的服务容器图。

你基本上可以把init容器折叠到普通容器图中,并在Container中添加两个字段; oneshot bool,表示容器是否终止,依赖的容器是否应该等待它终止(处理init容器 w/排序),和 requires map[string] 一个当前容器依赖的容器名称列表。

这足够灵活,可以容纳一个 oneshot: true 容器(init 容器)依赖于一个 oneshot: false 容器(init 容器依赖的代理容器)。

诚然,这将是一个相当大的工程,而且还要考虑API的兼容性。

thockin:

I have also been thinking about this. There are a number of open issues, feature-requests, etc that all circle around the topic of pod and container lifecycle. I’ve been a vocal opponent of complex API here, but it’s clear that what we have is inadequate.

When we consider init-container x sidecar-container, it is clear we will inevitably eventually need an init-sidecar.

我也一直在思考这个问题。有一些开放的问题、功能需求等,都是围绕着pod和容器生命周期这个话题展开的。我在这里一直是复杂API的强烈反对者,但很明显,我们所拥有的是不够的。

当我们考虑 init-container x sidecar-container 时,很明显我们最终将不可避免地需要一个init-sidecar。

Some (non-exhaustive) of the other related topics:

  • Node shutdown -> Pod shutdown (in progress?)
  • Voluntary pod restart (“Something bad happened, please burn me down to the ground and start over”)
  • Voluntary pod failure (“I know something you don’t, and I can’t run here - please terminate me and do not retry”)
  • “Critical” or “Keystone” containers (“when this container exits, the rest should be stopped”)
  • Startup/shutdown phases with well-defined semantics (e.g. “phase 0 has no network”)
  • Mixed restart policies in a pod (e.g. helper container which runs and terminates)
  • Clearer interaction between pod, network, and device plugins

其他的一些(非详尽的)相关主题:

  • 节点关闭 -> Pod关闭(正在进行中?
  • 自愿重启pod(“发生了不好的事情,请把我摧毁,然后重新开始”)。
  • 自愿pod失败(“我知道一些你不知道的事情,我无法在这里运行–请终止我,不要重试”)
  • “关键 “或 “基石 “容器(“当这个容器退出时,其他容器应停止”)。
  • 具有明确语义的启动/关闭阶段(如 “phase 0 没有网络”)。
  • 在一个pod中混合重启策略(例如,帮助容器,它会运行并终止)。
  • 更清晰的 pod、网络和设备插件之间的交互。

** thockin:**

This is a big enough topic that we almost certainly need to explore multiple avenues before we can have confidence in any one.

这是一个足够大的话题,我们几乎肯定需要探索多种途径,才能对任何一种途径有信心。

kfox1111:

the dependency idea also would allow for doing an init container, then a sidecar network plugin, then more init containers, etc, which has some nice features.

Also the readyness checks and oneshot could all play together with the dependencies so the next steps aren’t started before ready.

So, as a user experience, I think that api might be very nice.

Implementation wise there are probably lots of edge cases to carefully consider there.

依赖的想法还可以做一个init容器,然后做一个sidecar网络插件,然后做更多的init容器等等,这有一些不错的功能。

另外 readyness 检查和 oneshot 都可以和依赖一起考虑,这样就不会在准备好之前就开始下一步。

所以,作为用户体验来说,我觉得这个api可能是非常不错的。

从实现上来说,可能有很多边缘情况需要仔细考虑。

SergeyKanzhelev:

this is great idea to set up a working group to move it forward in bigger scope. One topic I suggest we cover early on in the discussions is whether we need to address the existing pain point of injecting sidecars in jobs in 1.20. This KEP intentionally limited the scope to just this - formalizing what people are already trying to do today with workarounds.

From Google side we also would love the bigger scope of a problem be addressed, but hope to address some immediate pain points early if possible. Either in current scope or slightly bigger.

这是一个很好的想法,成立一个工作组,在更大范围内推进它。我建议我们在讨论中尽早涉及的一个话题是,我们是否需要在1.20中解决现有的Job中注入 sidecar 的痛点。这个KEP有意将范围限制在这一点上–将人们今天已经在尝试的工作方法正式化。

从Google方面来说,我们也希望更大范围的问题能够得到解决,但如果可能的话,希望能够尽早解决一些直接的痛点。要么在目前的范围内,要么稍微大一点。

derekwaynecarr:

I would speculate that the dominant consumer of the job scenario is a job that required participation in a mesh to complete its task, and since I don’t see much point in solving for the mesh use case (which I view as the primary motivator for defining side car semantics) for only one workload type, I would rather ensure a pattern that solves the problem in light of our common experience across mesh and k8s communities.

我推测工作场景的主要消费者是需要参与网格来完成任务的Job,由于我认为只为一种工作负载类型解决mesh用例(我认为这是定义 sidecar 语义的主要动机)没有太大意义,所以我宁愿根据我们在 mesh 和k8s社区中的共同经验,确保一个能解决问题的模式。