介绍Kubernetes学习笔记的基本资料和访问方式
Kubernetes学习笔记
- 1: 介绍
- 1.1: Kubernetes概述
- 1.2: Kubernetes资料收集
- 2: 安装
- 2.1: 通过 kubeadm 安装 kubenetes
- 2.1.1: 在 debian12 上安装 kubenetes
- 2.1.1.1: 在线安装 kubenetes
- 2.1.1.1.1: 准备工作
- 2.1.1.1.2: 安装命令行
- 2.1.1.1.3: 初始化集群
- 2.1.1.1.4: 安装 dashboard
- 2.1.1.1.5: 安装 metrics server
- 2.1.1.1.6: 安装监控
- 2.1.1.2: 预热安装 kubenetes
- 2.1.1.3: 离线安装 kubenetes
- 2.1.1.3.1: 在 debian12 上离线安装 kubeadmin
- 2.1.1.3.2: 在 debian12 上离线安装 k8s
- 2.2: 安装kubectl
- 2.2.1: 在 ubuntu 上安装 kubectl
- 3: Sidecar Container
- 3.1: Sidecar Container概述
- 3.2: KEP753: Sidecar Container
- 3.3: 推翻KEP753的讨论
1 - 介绍
1.1 - Kubernetes概述
kubernetes是什么?
Kubernetes是一个可移植,可扩展的开源平台,用于管理容器化工作负载和服务,有助于声明性配置和自动化。 它拥有庞大而快速发展的生态系统。 Kubernetes服务,支持和工具广泛可用。
谷歌在2014年开源了Kubernetes项目。Kubernetes建立在谷歌十五年来大规模运行生产负载经验的基础上,结合了社区中最佳的创意和实践。
为什么需要kubernets和它可以做什么?
Kubernetes拥有许多功能。 它可以被认为是:
- 容器平台
- 微服务平台
- 可移植云平台,还有更多
Kubernetes提供以容器为中心的管理环境。 它代表用户工作负载编排计算,网络和存储基础设施。这极大的简化了PaaS,并具有IaaS的灵活性,还支持跨基础设施提供商的可移植性。
Kubernetes如何成为一个平台?
Kubernetes提供了许多功能,但总会有新的方案可以从新功能中受益。它可以简化特定于应用程序的工作流程,以加快开发速度。 最初被认可的编排通常需要较强的大规模自动化能力。这就是为什么Kubernetes还可以作为构建组件和工具生态系统的平台,以便更轻松地部署,扩展和管理应用程序。
Label 允许用户按照自己的方式组织管理对应的资源。 Annotations 使用户能够以自定义的描述信息来修饰资源,以适用于自己的工作流,并为管理工具提供检查点状态的简单方法。
此外,Kubernetes 控制面构建在相同的 API 上面,开发人员和用户都可以用。用户可以编写自己的控制器, 如调度器,如果这么做,根据新加的自定义 API ,可以扩展当前的通用 CLI 命令行工具。
此外,Kubernetes控制平面基于开发人员和用户可用的相同API构建。用户可以使用自己的API编写自己的控制器,如scheduler,这些API可以通过通用命令行工具进行定位。
这种设计使得能够在Kubernetes上面构建许多其他系统。
Kubernetes不是什么?
Kubernetes 不是一个传统的,包罗万象的 PaaS(Platform as a Service)系统。由于Kubernetes在容器级而非硬件级运行,因此它提供了PaaS产品常用的一些通用功能,例如部署,扩展,负载平衡,日志和监控。 但是,Kubernetes不是单体,而且这些默认解决方案是可选的和可插拔的。 Kubernetes提供了构建开发人员平台的构建块,但在重要的地方保留了用户选择和灵活性。
Kubernetes:
- 不限制支持的应用程序类型。 Kubernetes旨在支持各种各样的工作负载,包括无状态,有状态和数据处理工作负载。如果一个应用程序可以在一个容器中运行,它应该在Kubernetes上运行得很好。
- 不部署源代码并且不构建您的应用程序。持续集成,交付和部署(CI / CD)工作流程由组织文化和偏好以及技术要求决定。
- 不提供应用程序级服务,例如中间件(例如,消息总线),数据处理框架(例如,Spark),数据库(例如,mysql),高速缓存,也不提供集群存储系统(例如,Ceph)作为内建服务。这些组件可以在Kubernetes上运行,可以被在Kubernetes上运行的应用程序访问,通过可移植机制(例如Open Service Broker)。
- 不指定记录,监控或告警解决方案。它提供了一些集成作为概念证明,以及收集和导出指标的机制。
- 不提供或授权配置语言/系统(例如,jsonnet)。它提供了一个声明性API,可以通过任意形式的声明性规范来实现。
- 不提供或采用任何全面的机器配置,维护,管理或自我修复系统。
此外,Kubernetes不仅仅是编排系统。实际上,它消除了编排的需要。业务流程的技术定义是执行定义的工作流程:首先执行A,然后运行B,然后运行C.相反,Kubernetes由一组独立的,可组合的控制流程组成,这些流程将当前状态持续推向所提供的所需状态。 如何从A到C无关紧要。也不需要集中控制。 这使得系统更易于使用且功能更强大,更强大,更具弹性且可扩展。
为什么用容器?
找找应该使用容器的原因?
部署应用程序的旧方法是使用操作系统软件包管理器在主机上安装应用程序。这样做的缺点是将应用程序的可执行文件,配置,类库和生命周期混在一起,并与主机操作系统纠缠在一起。 可以构建不可变的虚拟机映像以实现可预测的部署和回滚,但虚拟机是重量级且不可移植的。
新方法是基于操作系统级虚拟化而不是硬件虚拟化来部署容器。这些容器彼此隔离并与主机隔离:它们具有自己的文件系统,它们无法看到彼此的进程,并且它们的计算资源使用可能是有限的。它们比虚拟机更容易构建,并且因为它们与底层基础设施和主机文件系统解藕,所以它们可以跨云和操作系统分发进行移植。
由于容器小而快,因此可以在每个容器映像中打包一个应用程序。 这种一对一的应用程序到映像关系解锁了容器的全部优势。 使用容器,可以在构建/发布时而不是部署时创建不可变容器映像,因为每个应用程序不需要与应用程序堆栈的其余部分组合,也不需要与生产基础设施环境结合。 在构建/发布时生成容器映像可以实现从开发到生产的一致环境。 同样,容器比VM更加透明,这有利于监控和管理。当容器的进程生命周期由基础设施管理而不是由容器内的进程管理器隐藏时,尤其如此。 最后,每个容器使用一个应用程序,管理容器就等于管理应用程序的部署。
容器好处总结如下:
- 应用程序创建和部署更敏捷:与VM映像使用相比,增加了容器映像创建的简便性和效率。
- 持续开发,集成和部署:通过快速简便的回滚(源于镜像不变性)提供可靠且频繁的容器镜像构建和部署。
- Dev和Ops关注点分离:在构建/发布时而不是部署时创建应用程序容器映像,从而将应用程序与基础设施解耦。
- 可观察性:不仅可以显示操作系统级别的信息和指标,还可以显示应用程序运行状况和其他信号。
- 开发,测试和生产的环境一致性:在笔记本电脑上运行与在云中运行相同。
- 云和OS分发可移植性:在Ubuntu,RHEL,CoreOS,本地,Google Kubernetes引擎以及其他任何地方运行。
- 以应用程序为中心的管理:提升抽象级别,从在虚拟硬件上运行OS到使用逻辑资源在OS上运行应用程序。
- 松散耦合,分布式,弹性,解放的微服务:应用程序被分解为更小,独立的部分,可以动态部署和管理 - 而不是在一台大型单一用途机器上运行的单体堆栈。
- 资源隔离:可预测的应用程序性能。
- 资源利用:高效率和高密度。
参考资料
- What is Kubernetes?: 官方文档的介绍篇,还是官方文档写的到位
1.2 - Kubernetes资料收集
官方资料
- Kubernetes官网
- kubernetes@github
- 官方文档:英文版 ,还有 中文翻译版本,不过目前完成度还比较低
- https://k8smeetup.github.io/docs/home/ : 这里有另一份中文翻译版本(官方中文版本的前身),完成度较高
社区资料
学习资料
- Kubernetes指南: 这是目前最新最好的Kubernetes中文资料,强烈推荐!
2 - 安装
2.1 - 通过 kubeadm 安装 kubenetes
2.1.1 - 在 debian12 上安装 kubenetes
有三种安装方式:
-
在线安装: 最标准的安装方法,最大的问题就是需要联网+科学上网,速度慢,中途有被墙/被dns污染的风险
-
预热安装: 在在线安装的基础上,提前准备好安装文件和镜像文件,速度快,而且不需要用到镜像仓库。需要充分的提前准备,最好结合 pve 模板一起使用
-
离线安装: 需要提前下载好所有需要的文件到本地或者本地镜像仓库,速度快,但是同样需要充分的提前准备,而且需要用到 harbor 之类的镜像仓库
2.1.1.1 - 在线安装 kubenetes
参考官方文档:
https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/install-kubeadm/
2.1.1.1.1 - 准备工作
系统更新
确保更新debian系统到最新,移除不再需要的软件,清理无用的安装包:
sudo apt update && sudo apt full-upgrade -y
sudo apt autoremove
sudo apt autoclean
如果更新了内核,最好重启一下。
swap 分区
安装 Kubernetes 要求机器不能有 swap 分区。
参考:
开启模块
cat <<EOF | sudo tee /etc/modules-load.d/k8s.conf
overlay
br_netfilter
EOF
sudo modprobe overlay
sudo modprobe br_netfilter
# sysctl params required by setup, params persist across reboots
cat <<EOF | sudo tee /etc/sysctl.d/k8s.conf
net.bridge.bridge-nf-call-iptables = 1
net.bridge.bridge-nf-call-ip6tables = 1
net.ipv4.ip_forward = 1
EOF
# Apply sysctl params without reboot
sudo sysctl --system
container runtime
Kubernetes 支持多种 container runtime,这里暂时继续使用 docker engine + cri-dockerd。
参考:
https://kubernetes.io/docs/setup/production-environment/container-runtimes/
安装 docker + cri-dockerd
docker 的安装参考:
https://skyao.io/learning-docker/docs/installation/debian12/
cri-dockerd 的安装参考:
https://mirantis.github.io/cri-dockerd/usage/install/
从 release 页面下载:
https://github.com/Mirantis/cri-dockerd/releases
debian 12 选择下载文件
下载后安装:
sudo dpkg -i ./cri-dockerd_0.4.0.3-0.debian-bookworm_amd64.deb
安装后会提示:
Selecting previously unselected package cri-dockerd.
(Reading database ... 68005 files and directories currently installed.)
Preparing to unpack .../cri-dockerd_0.4.0.3-0.debian-bookworm_amd64.deb ...
Unpacking cri-dockerd (0.4.0~3-0~debian-bookworm) ...
Setting up cri-dockerd (0.4.0~3-0~debian-bookworm) ...
Created symlink /etc/systemd/system/multi-user.target.wants/cri-docker.service → /lib/systemd/system/cri-docker.service.
Created symlink /etc/systemd/system/sockets.target.wants/cri-docker.socket → /lib/systemd/system/cri-docker.socket.
安装后查看状态:
sudo systemctl status cri-docker.service
如果成功则状态为:
● cri-docker.service - CRI Interface for Docker Application Container Engine
Loaded: loaded (/lib/systemd/system/cri-docker.service; enabled; preset: enabled)
Active: active (running) since Sat 2025-05-10 20:39:41 CST; 19s ago
TriggeredBy: ● cri-docker.socket
Docs: https://docs.mirantis.com
Main PID: 8294 (cri-dockerd)
Tasks: 9
Memory: 13.1M
CPU: 18ms
CGroup: /system.slice/cri-docker.service
└─8294 /usr/bin/cri-dockerd --container-runtime-endpoint fd://
May 10 20:39:41 debian12 cri-dockerd[8294]: time="2025-05-10T20:39:41+08:00" level=info msg="Hairpin mode is set to none"
May 10 20:39:41 debian12 cri-dockerd[8294]: time="2025-05-10T20:39:41+08:00" level=info msg="The binary conntrack is not installed, this can cau>
May 10 20:39:41 debian12 cri-dockerd[8294]: time="2025-05-10T20:39:41+08:00" level=info msg="The binary conntrack is not installed, this can cau>
May 10 20:39:41 debian12 cri-dockerd[8294]: time="2025-05-10T20:39:41+08:00" level=info msg="Loaded network plugin cni"
May 10 20:39:41 debian12 cri-dockerd[8294]: time="2025-05-10T20:39:41+08:00" level=info msg="Docker cri networking managed by network plugin cni"
May 10 20:39:41 debian12 cri-dockerd[8294]: time="2025-05-10T20:39:41+08:00" level=info msg="Setting cgroupDriver systemd"
May 10 20:39:41 debian12 cri-dockerd[8294]: time="2025-05-10T20:39:41+08:00" level=info msg="Docker cri received runtime config &RuntimeConfig{N>
May 10 20:39:41 debian12 cri-dockerd[8294]: time="2025-05-10T20:39:41+08:00" level=info msg="Starting the GRPC backend for the Docker CRI interf>
May 10 20:39:41 debian12 cri-dockerd[8294]: time="2025-05-10T20:39:41+08:00" level=info msg="Start cri-dockerd grpc backend"
May 10 20:39:41 debian12 systemd[1]: Started cri-docker.service - CRI Interface for Docker Application Container Engine.
安装 containerd
TODO:后面考虑换 containerd
安装 helm
参考:
https://helm.sh/docs/intro/install/#from-apt-debianubuntu
安装:
curl https://baltocdn.com/helm/signing.asc | gpg --dearmor | sudo tee /usr/share/keyrings/helm.gpg > /dev/null
sudo apt-get install apt-transport-https --yes
echo "deb [arch=$(dpkg --print-architecture) signed-by=/usr/share/keyrings/helm.gpg] https://baltocdn.com/helm/stable/debian/ all main" | sudo tee /etc/apt/sources.list.d/helm-stable-debian.list
sudo apt-get update
sudo apt-get install helm
安装后取消 helm 的自动更新:
sudo vi /etc/apt/sources.list.d/helm-stable-debian.list
查看安装的版本:
$ helm version
version.BuildInfo{Version:"v3.17.3", GitCommit:"e4da49785aa6e6ee2b86efd5dd9e43400318262b", GitTreeState:"clean", GoVersion:"go1.23.7"}
2.1.1.1.2 - 安装命令行
参考: https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/install-kubeadm/
安装 kubeadm / kubelet / kubectl
sudo apt-get update
sudo apt-get install -y apt-transport-https ca-certificates curl gpg
假定要安装的 kubernetes 版本为 1.33:
export K8S_VERSION=1.33
# sudo mkdir -p -m 755 /etc/apt/keyrings
curl -fsSL https://pkgs.k8s.io/core:/stable:/v${K8S_VERSION}/deb/Release.key | sudo gpg --dearmor -o /etc/apt/keyrings/kubernetes-apt-keyring.gpg
echo "deb [signed-by=/etc/apt/keyrings/kubernetes-apt-keyring.gpg] https://pkgs.k8s.io/core:/stable:/v${K8S_VERSION}/deb/ /" | sudo tee /etc/apt/sources.list.d/kubernetes.list
开始安装 kubelet kubeadm kubectl:
sudo apt-get update
sudo apt-get install -y kubelet kubeadm kubectl
禁止这三个程序的自动更新:
sudo apt-mark hold kubelet kubeadm kubectl
验证安装:
kubectl version --client && echo && kubeadm version
输出为:
Client Version: v1.33.0
Kustomize Version: v5.6.0
kubeadm version: &version.Info{Major:"1", Minor:"33", EmulationMajor:"", EmulationMinor:"", MinCompatibilityMajor:"", MinCompatibilityMinor:"", GitVersion:"v1.33.0", GitCommit:"60a317eadfcb839692a68eab88b2096f4d708f4f", GitTreeState:"clean", BuildDate:"2025-04-23T13:05:48Z", GoVersion:"go1.24.2", Compiler:"gc", Platform:"linux/amd64"}
在运行 kubeadm 之前,先启动 kubelet 服务:
sudo systemctl enable --now kubelet
安装后配置
优化 zsh
vi ~/.zshrc
增加以下内容:
# k8s auto complete
alias k=kubectl
complete -F __start_kubectl k
执行:
source ~/.zshrc
之后即可使用,此时用 k 这个别名来执行 kubectl 命令时也可以实现自动完成,非常的方便。
取消更新
kubeadm / kubelet / kubectl 的版本没有必要升级到最新,因此可以取消他们的自动更新。
sudo vi /etc/apt/sources.list.d/kubernetes.list
注释掉里面的内容。
备注:前面执行 apt-mark hold 后已经不会再更新了,但依然会拖慢 apt update 的速度,因此还是需要手动注释。
常见问题
prod-cdn.packages.k8s.io 无法访问
偶然会遇到 prod-cdn.packages.k8s.io 无法访问的问题,此时的报错如下:
sudo apt-get update
Hit:1 http://mirrors.ustc.edu.cn/debian bookworm InRelease
Hit:2 http://mirrors.ustc.edu.cn/debian bookworm-updates InRelease
Hit:3 http://security.debian.org/debian-security bookworm-security InRelease
Ign:4 https://prod-cdn.packages.k8s.io/repositories/isv:/kubernetes:/core:/stable:/v1.32/deb InRelease
Ign:4 https://prod-cdn.packages.k8s.io/repositories/isv:/kubernetes:/core:/stable:/v1.32/deb InRelease
Ign:4 https://prod-cdn.packages.k8s.io/repositories/isv:/kubernetes:/core:/stable:/v1.32/deb InRelease
Err:4 https://prod-cdn.packages.k8s.io/repositories/isv:/kubernetes:/core:/stable:/v1.32/deb InRelease
Could not connect to prod-cdn.packages.k8s.io:443 (221.228.32.13), connection timed out
Reading package lists... Done
W: Failed to fetch https://pkgs.k8s.io/core:/stable:/v1.32/deb/InRelease Could not connect to prod-cdn.packages.k8s.io:443 (221.228.32.13), connection timed out
W: Some index files failed to download. They have been ignored, or old ones used instead.
首先排除是网络问题,因为实际配好网络代理,也依然无法访问。
后来发现,在不同地区的机器上 ping prod-cdn.packages.k8s.io 的 ip 地址是不一样的,
$ ping prod-cdn.packages.k8s.io
Pinging dkhzw6k7x6ord.cloudfront.net [108.139.10.84] with 32 bytes of data:
Reply from 108.139.10.84: bytes=32 time=164ms TTL=242
Reply from 108.139.10.84: bytes=32 time=166ms TTL=242
......
# 这个地址无法访问
$ ping prod-cdn.packages.k8s.io
PING dkhzw6k7x6ord.cloudfront.net (221.228.32.13) 56(84) bytes of data.
64 bytes from 221.228.32.13 (221.228.32.13): icmp_seq=1 ttl=57 time=9.90 ms
64 bytes from 221.228.32.13 (221.228.32.13): icmp_seq=2 ttl=57 time=11.4 ms
......
因此考虑通过修改 /etc/hosts 文件来避开 dns 解析的问题:
sudo vi /etc/hosts
添加如下内容:
108.139.10.84 prod-cdn.packages.k8s.io
这样在出现问题的这台机器上,强制将 prod-cdn.packages.k8s.io 解析到 108.139.10.84 这个 ip 地址,这样就可以访问了。
2.1.1.1.3 - 初始化集群
参考官方文档:
https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/create-cluster-kubeadm/
初始化集群
pod-network-cidr 尽量用 10.244.0.0/16 这个范围,不然有些网络插件会需要额外的配置。
cri-socket 的配置参考:
因为前面用的 Docker Engine 和 cri-dockerd ,因此这里的 cri-socket 需要指定为 “unix:///var/run/cri-dockerd.sock”。
apiserver-advertise-address 需要指定为当前节点的 IP 地址,因为当前节点是单节点,因此这里指定为 192.168.3.179。
sudo kubeadm init --pod-network-cidr 10.244.0.0/16 --cri-socket unix:///var/run/cri-dockerd.sock --apiserver-advertise-address=192.168.3.179 --image-repository=192.168.3.91:5000/k8s-proxy
输出为:
W0511 22:22:37.653053 1276 version.go:109] could not fetch a Kubernetes version from the internet: unable to get URL "https://dl.k8s.io/release/stable-1.txt": Get "https://dl.k8s.io/release/stable-1.txt": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
W0511 22:22:37.653104 1276 version.go:110] falling back to the local client version: v1.33.0
[init] Using Kubernetes version: v1.33.0
[preflight] Running pre-flight checks
[preflight] Pulling images required for setting up a Kubernetes cluster
[preflight] This might take a minute or two, depending on the speed of your internet connection
[preflight] You can also perform this action beforehand using 'kubeadm config images pull'
[certs] Using certificateDir folder "/etc/kubernetes/pki"
[certs] Generating "ca" certificate and key
[certs] Generating "apiserver" certificate and key
[certs] apiserver serving cert is signed for DNS names [debian12 kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local] and IPs [10.96.0.1 192.168.3.179]
[certs] Generating "apiserver-kubelet-client" certificate and key
[certs] Generating "front-proxy-ca" certificate and key
[certs] Generating "front-proxy-client" certificate and key
[certs] Generating "etcd/ca" certificate and key
[certs] Generating "etcd/server" certificate and key
[certs] etcd/server serving cert is signed for DNS names [debian12 localhost] and IPs [192.168.3.179 127.0.0.1 ::1]
[certs] Generating "etcd/peer" certificate and key
[certs] etcd/peer serving cert is signed for DNS names [debian12 localhost] and IPs [192.168.3.179 127.0.0.1 ::1]
[certs] Generating "etcd/healthcheck-client" certificate and key
[certs] Generating "apiserver-etcd-client" certificate and key
[certs] Generating "sa" key and public key
[kubeconfig] Using kubeconfig folder "/etc/kubernetes"
[kubeconfig] Writing "admin.conf" kubeconfig file
[kubeconfig] Writing "super-admin.conf" kubeconfig file
[kubeconfig] Writing "kubelet.conf" kubeconfig file
[kubeconfig] Writing "controller-manager.conf" kubeconfig file
[kubeconfig] Writing "scheduler.conf" kubeconfig file
[etcd] Creating static Pod manifest for local etcd in "/etc/kubernetes/manifests"
[control-plane] Using manifest folder "/etc/kubernetes/manifests"
[control-plane] Creating static Pod manifest for "kube-apiserver"
[control-plane] Creating static Pod manifest for "kube-controller-manager"
[control-plane] Creating static Pod manifest for "kube-scheduler"
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Starting the kubelet
[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests"
[kubelet-check] Waiting for a healthy kubelet at http://127.0.0.1:10248/healthz. This can take up to 4m0s
[kubelet-check] The kubelet is healthy after 501.341993ms
[control-plane-check] Waiting for healthy control plane components. This can take up to 4m0s
[control-plane-check] Checking kube-apiserver at https://192.168.3.179:6443/livez
[control-plane-check] Checking kube-controller-manager at https://127.0.0.1:10257/healthz
[control-plane-check] Checking kube-scheduler at https://127.0.0.1:10259/livez
[control-plane-check] kube-controller-manager is healthy after 1.002560331s
[control-plane-check] kube-scheduler is healthy after 1.156287353s
[control-plane-check] kube-apiserver is healthy after 2.500905726s
[upload-config] Storing the configuration used in ConfigMap "kubeadm-config" in the "kube-system" Namespace
[kubelet] Creating a ConfigMap "kubelet-config" in namespace kube-system with the configuration for the kubelets in the cluster
[upload-certs] Skipping phase. Please see --upload-certs
[mark-control-plane] Marking the node debian12 as control-plane by adding the labels: [node-role.kubernetes.io/control-plane node.kubernetes.io/exclude-from-external-load-balancers]
[mark-control-plane] Marking the node debian12 as control-plane by adding the taints [node-role.kubernetes.io/control-plane:NoSchedule]
[bootstrap-token] Using token: 5dlwbv.j26vzkb6uvf9yqv6
[bootstrap-token] Configuring bootstrap tokens, cluster-info ConfigMap, RBAC Roles
[bootstrap-token] Configured RBAC rules to allow Node Bootstrap tokens to get nodes
[bootstrap-token] Configured RBAC rules to allow Node Bootstrap tokens to post CSRs in order for nodes to get long term certificate credentials
[bootstrap-token] Configured RBAC rules to allow the csrapprover controller automatically approve CSRs from a Node Bootstrap Token
[bootstrap-token] Configured RBAC rules to allow certificate rotation for all node client certificates in the cluster
[bootstrap-token] Creating the "cluster-info" ConfigMap in the "kube-public" namespace
[kubelet-finalize] Updating "/etc/kubernetes/kubelet.conf" to point to a rotatable kubelet client certificate and key
[addons] Applied essential addon: CoreDNS
[addons] Applied essential addon: kube-proxy
Your Kubernetes control-plane has initialized successfully!
To start using your cluster, you need to run the following as a regular user:
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
Alternatively, if you are the root user, you can run:
export KUBECONFIG=/etc/kubernetes/admin.conf
You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
https://kubernetes.io/docs/concepts/cluster-administration/addons/
Then you can join any number of worker nodes by running the following on each as root:
kubeadm join 192.168.3.179:6443 --token 5dlwbv.j26vzkb6uvf9yqv6 \
--discovery-token-ca-cert-hash sha256:0d1be37706d728f6c09dbcff86614c6fe04c536d969371400f4d3551f197c6e4
根据提示操作:
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
对于测试用的单节点,去除 master/control-plane 的污点:
kubectl taint nodes --all node-role.kubernetes.io/control-plane-
执行:
kubectl get node
能看到此时节点的状态会是 NotReady:
NAME STATUS ROLES AGE VERSION
debian12 NotReady control-plane 3m49s v1.32.2
执行:
kubectl describe node debian12
能看到节点的错误信息:
Conditions:
Type Status LastHeartbeatTime LastTransitionTime Reason Message
---- ------ ----------------- ------------------ ------ -------
MemoryPressure False Tue, 04 Mar 2025 20:28:00 +0800 Tue, 04 Mar 2025 20:23:53 +0800 KubeletHasSufficientMemory kubelet has sufficient memory available
DiskPressure False Tue, 04 Mar 2025 20:28:00 +0800 Tue, 04 Mar 2025 20:23:53 +0800 KubeletHasNoDiskPressure kubelet has no disk pressure
PIDPressure False Tue, 04 Mar 2025 20:28:00 +0800 Tue, 04 Mar 2025 20:23:53 +0800 KubeletHasSufficientPID kubelet has sufficient PID available
Ready False Tue, 04 Mar 2025 20:28:00 +0800 Tue, 04 Mar 2025 20:23:53 +0800 KubeletNotReady container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized
需要继续安装网络插件。
安装网络插件
安装 flannel
参考官方文档: https://github.com/flannel-io/flannel#deploying-flannel-with-kubectl
kubectl apply -f https://github.com/flannel-io/flannel/releases/latest/download/kube-flannel.yml
如果一切正常,就能看到 k8s 集群内的 pod 都启动完成状态为 Running:
k get pods -A
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-flannel kube-flannel-ds-ts6n8 1/1 Running 7 (9m27s ago) 15m
kube-system coredns-668d6bf9bc-rbkzb 1/1 Running 0 3h55m
kube-system coredns-668d6bf9bc-vbltg 1/1 Running 0 3h55m
kube-system etcd-debian12 1/1 Running 0 3h55m
kube-system kube-apiserver-debian12 1/1 Running 1 (5h57m ago) 3h55m
kube-system kube-controller-manager-debian12 1/1 Running 0 3h55m
kube-system kube-proxy-95ccr 1/1 Running 0 3h55m
kube-system kube-scheduler-debian12 1/1 Running 1 (6h15m ago) 3h55m
如果发现 kube-flannel-ds pod 的状态总是 CrashLoopBackOff:
k get pods -A
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-flannel kube-flannel-ds-ts6n8 0/1 CrashLoopBackOff 2 (22s ago) 42s
继续查看 pod 的具体错误信息:
k describe pods -n kube-flannel kube-flannel-ds-ts6n8
发现报错 “Back-off restarting failed container kube-flannel in pod kube-flannel”:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 117s default-scheduler Successfully assigned kube-flannel/kube-flannel-ds-ts6n8 to debian12
Normal Pulled 116s kubelet Container image "ghcr.io/flannel-io/flannel-cni-plugin:v1.6.2-flannel1" already present on machine
Normal Created 116s kubelet Created container: install-cni-plugin
Normal Started 116s kubelet Started container install-cni-plugin
Normal Pulled 115s kubelet Container image "ghcr.io/flannel-io/flannel:v0.26.4" already present on machine
Normal Created 115s kubelet Created container: install-cni
Normal Started 115s kubelet Started container install-cni
Normal Pulled 28s (x5 over 114s) kubelet Container image "ghcr.io/flannel-io/flannel:v0.26.4" already present on machine
Normal Created 28s (x5 over 114s) kubelet Created container: kube-flannel
Normal Started 28s (x5 over 114s) kubelet Started container kube-flannel
Warning BackOff 2s (x10 over 110s) kubelet Back-off restarting failed container kube-flannel in pod kube-flannel-ds-ts6n8_kube-flannel(1e03c200-2062-4838
此时应该去检查准备工作中 “开启模块” 一节的内容是不是有疏漏。
补救之后,就能看到 kube-flannel-ds 这个 pod 正常运行了:
k get pods -A
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-flannel kube-flannel-ds-ts6n8 1/1 Running 7 (9m27s ago) 15m
安装 Calico
查看最新版本,当前最新版本是 v3.29.2:
kubectl create -f https://raw.githubusercontent.com/projectcalico/calico/v3.29.2/manifests/tigera-operator.yaml
TODO:用了 flannel, Calico 后面再验证。
2.1.1.1.4 - 安装 dashboard
安装 dashboard
参考:https://github.com/kubernetes/dashboard/#installation
在下面地址上查看当前 dashboard 的版本:
https://github.com/kubernetes/dashboard/releases
根据对 kubernetes 版本的兼容情况选择对应的 dashboard 的版本:
- kubernetes-dashboard-7.11.0 ,兼容 k8s 1.32
最新版本需要用 helm 进行安装:
helm repo add kubernetes-dashboard https://kubernetes.github.io/dashboard/
helm upgrade --install kubernetes-dashboard kubernetes-dashboard/kubernetes-dashboard --create-namespace --namespace kubernetes-dashboard
输出为:
"kubernetes-dashboard" already exists with the same configuration, skipping
Release "kubernetes-dashboard" does not exist. Installing it now.
NAME: kubernetes-dashboard
LAST DEPLOYED: Wed Mar 5 00:53:17 2025
NAMESPACE: kubernetes-dashboard
STATUS: deployed
REVISION: 1
TEST SUITE: None
NOTES:
*************************************************************************************************
*** PLEASE BE PATIENT: Kubernetes Dashboard may need a few minutes to get up and become ready ***
*************************************************************************************************
Congratulations! You have just installed Kubernetes Dashboard in your cluster.
To access Dashboard run:
kubectl -n kubernetes-dashboard port-forward svc/kubernetes-dashboard-kong-proxy 8443:443
NOTE: In case port-forward command does not work, make sure that kong service name is correct.
Check the services in Kubernetes Dashboard namespace using:
kubectl -n kubernetes-dashboard get svc
Dashboard will be available at:
https://localhost:8443
此时 dashboard 的 service 和 pod 情况:
kubectl -n kubernetes-dashboard get services
输出为:
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kubernetes-dashboard-api ClusterIP 10.108.225.190 <none> 8000/TCP 2m5s
kubernetes-dashboard-auth ClusterIP 10.99.205.102 <none> 8000/TCP 2m5s
kubernetes-dashboard-kong-proxy ClusterIP 10.96.247.162 <none> 443/TCP 2m5s
kubernetes-dashboard-metrics-scraper ClusterIP 10.103.222.22 <none> 8000/TCP 2m5s
kubernetes-dashboard-web ClusterIP 10.108.219.9 <none> 8000/TCP 2m5s
查看 pod 的情况:
kubectl -n kubernetes-dashboard get pods
等待两三分钟之后,pod 启动完成,输出为:
NAME READY STATUS RESTARTS AGE
kubernetes-dashboard-api-7d8567b8f-9ksk2 1/1 Running 0 3m8s
kubernetes-dashboard-auth-6877bf44b9-9qfmg 1/1 Running 0 3m8s
kubernetes-dashboard-kong-79867c9c48-rzlhp 1/1 Running 0 3m8s
kubernetes-dashboard-metrics-scraper-794c587449-6phjv 1/1 Running 0 3m8s
kubernetes-dashboard-web-75576c76b-sm2wj 1/1 Running 0 3m8s
为了方便,使用 node port 来访问 dashboard,需要执行:
kubectl -n kubernetes-dashboard edit service kubernetes-dashboard-kong-proxy
然后修改 type: ClusterIP
为 type: NodePort
。然后看一下具体分配的 node port 是哪个:
kubectl -n kubernetes-dashboard get service kubernetes-dashboard-kong-proxy
输出为:
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kubernetes-dashboard-kong-proxy NodePort 10.96.247.162 <none> 443:32616/TCP 17m
现在可以用浏览器直接访问:
https://192.168.3.215:32616/
创建用户并登录 dashboard
创建 admin-user 用户:
mkdir -p ~/work/soft/k8s
cd ~/work/soft/k8s
vi dashboard-adminuser.yaml
内容为:
apiVersion: v1
kind: ServiceAccount
metadata:
name: admin-user
namespace: kubernetes-dashboard
执行:
k create -f dashboard-adminuser.yaml
然后绑定角色:
vi dashboard-adminuser-binding.yaml
内容为:
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: admin-user
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: cluster-admin
subjects:
- kind: ServiceAccount
name: admin-user
namespace: kubernetes-dashboard
执行:
k create -f dashboard-adminuser-binding.yaml
然后创建 token :
kubectl -n kubernetes-dashboard create token admin-user
输出为:
eyJhbGciOiJSUzI1NiIsImtpZCI6Ik9sWnJsTk5UNE9JVlVmRFMxMUpwNC1tUlVndTl5Zi1WQWtmMjIzd2hDNmcifQ.eyJhdWQiOlsiaHR0cHM6Ly9rdWJlcm5ldGVzLmRlZmF1bHQuc3ZjLmNsdXN0ZXIubG9jYWwiXSwiZXhwIjoxNzQxMTEyNDg4LCJpYXQiOjE3NDExMDg4ODgsImlzcyI6Imh0dHBzOi8va3ViZXJuZXRlcy5kZWZhdWx0LnN2Yy5jbHVzdGVyLmxvY2FsIiwianRpIjoiNDU5ZGQxNjctNWI5OS00MWIzLTgzZWEtNGIxMGY3MTc5ZjEyIiwia3ViZXJuZXRlcy5pbyI6eyJuYW1lc3BhY2UiOiJrdWJlcm5ldGVzLWRhc2hib2FyZCIsInNlcnZpY2VhY2NvdW50Ijp7Im5hbWUiOiJhZG1pbi11c2VyIiwidWlkIjoiZjMxN2VhZTItNTNiNi00MGZhLWI3MWYtMzZiNDI1YmY4YWQ0In19LCJuYmYiOjE3NDExMDg4ODgsInN1YiI6InN5c3RlbTpzZXJ2aWNlYWNjb3VudDprdWJlcm5ldGVzLWRhc2hib2FyZDphZG1pbi11c2VyIn0.TYzOdrMFXcSEeVMbc1ewIA13JVi4FUYoRN7rSH5OstbVfKIF48X_o1RWxOGM_AurhgLxuKZHzmns3K_pX_OR3u1URfK6-gGos4iAQY-H1yntfRmzzsip_FbZh95EYFGTN43gw21jTyfem3OKBXXLgzsnVT_29uMnJzSnCDnrAciVKMoCEUP6x2RSHQhp6PrxrIrx_NMB3vojEZYq3AysQoNqYYjRDd4MnDRClm03dNvW5lvKSgNCVmZFje_EEa2EhI2X6d3X8zx6tHwT5M4-T3hMmyIpzHUwf3ixeZR85rhorMbskNVvRpH6VLH6BXP31c3NMeSgYk3BG8d7UjCYxQ
这个 token 就可以用在 kubernetes-dashboard 的登录页面上了。
为了方便,将这个 token 存储在 Secret :
vi dashboard-adminuser-secret.yaml
内容为:
apiVersion: v1
kind: Secret
metadata:
name: admin-user
namespace: kubernetes-dashboard
annotations:
kubernetes.io/service-account.name: "admin-user"
type: kubernetes.io/service-account-token
执行:
k create -f dashboard-adminuser-secret.yaml
之后就可以用命令随时获取这个 token 了:
kubectl get secret admin-user -n kubernetes-dashboard -o jsonpath="{.data.token}" | base64 -d
备注:复制 token 的时候,不要复制最后的那个 % 字符,否则会报错。
2.1.1.1.5 - 安装 metrics server
参考:https://github.com/kubernetes-sigs/metrics-server/#installation
安装 metrics server
下载:
mkdir -p ~/work/soft/k8s
cd ~/work/soft/k8s
wget https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml
修改下载下来的 components.yaml, 增加 --kubelet-insecure-tls
并修改 --kubelet-preferred-address-types
:
template:
metadata:
labels:
k8s-app: metrics-server
spec:
containers:
- args:
- --cert-dir=/tmp
- --secure-port=4443
- --kubelet-preferred-address-types=InternalIP # 修改这行,默认是InternalIP,ExternalIP,Hostname
- --kubelet-use-node-status-port
- --metric-resolution=15s
- --kubelet-insecure-tls # 增加这行
然后安装:
k apply -f components.yaml
稍等片刻看是否启动:
kubectl get pod -n kube-system | grep metrics-server
验证一下,查看 service 信息
kubectl describe svc metrics-server -n kube-system
简单验证一下基本使用:
kubectl top nodes
kubectl top pods -n kube-system
参考资料
2.1.1.1.6 - 安装监控
参考:https://github.com/prometheus-operator/prometheus-operator
https://computingforgeeks.com/setup-prometheus-and-grafana-on-kubernetes/
2.1.1.2 - 预热安装 kubenetes
原理
所谓预热安装,就是在在线安装的基础上,在执行 kubeadmin init
之前,提前准备好所有的安装文件和镜像文件,然后制造成 pve 模板。
之后就可以重用该模板,在需要时创建虚拟机,在虚拟机中执行 kubeadmin init
即可快速安装 kubenetes。
原则上,在执行 kubeadmin init
之前的各种准备工作都可以参考在线安装的方式。而在 kubeadmin init
之后的安装工作,就只能通过提前准备安装文件,提前下载镜像文件等方式来加速。
准备工作
-
安装 docker: 参考 https://skyao.io/learning-docker/docs/installation/debian12/ ,在线安装和离线安装都可以。
-
安装 kubeadm: 参考前面的在线安装方式,或者直接用后面的离线安装方式,将 cri-dockerd / helm 和kubeadm / kubelete / kubectl 安装好。
预下载镜像文件
k8s cluster
kubeadm config images pull --cri-socket unix:///var/run/cri-dockerd.sock
这样就可以提前下载好 kubeadm init 时需要的镜像文件:
[config/images] Pulled registry.k8s.io/kube-apiserver:v1.33.0
[config/images] Pulled registry.k8s.io/kube-controller-manager:v1.33.0
[config/images] Pulled registry.k8s.io/kube-scheduler:v1.33.0
[config/images] Pulled registry.k8s.io/kube-proxy:v1.33.0
[config/images] Pulled registry.k8s.io/coredns/coredns:v1.12.0
[config/images] Pulled registry.k8s.io/pause:3.10
[config/images] Pulled registry.k8s.io/etcd:3.5.21-0
flannel
下载 flannel 需要的镜像文件:
docker pull ghcr.io/flannel-io/flannel-cni-plugin:v1.6.2-flannel1
docker pull ghcr.io/flannel-io/flannel:v0.26.7
参考在线安装文档准备以下 yaml 文件:
~/work/soft/k8s/menifests/kube-flannel.yml
dashboard
查看 dashboard 的最新版本:
helm repo add kubernetes-dashboard https://kubernetes.github.io/dashboard/
helm repo update
helm search repo kubernetes-dashboard -l
发现 dashboard 的最新版本是 7.12.0,所以下载 dashboard 需要的 charts 文件:
helm pull kubernetes-dashboard/kubernetes-dashboard --version 7.12.0 --untar --untardir ~/work/soft/k8s/charts
下载 dashboard 需要的镜像文件:
docker pull docker.io/kubernetesui/dashboard-api:1.12.0
docker pull docker.io/kubernetesui/dashboard-auth:1.2.4
docker pull docker.io/kubernetesui/dashboard-web:1.6.2
docker pull docker.io/kubernetesui/dashboard-metrics-scraper:1.2.2
参考在线安装文档准备以下 yaml 文件:
~/work/soft/k8s/menifests/dashboard-adminuser-binding.yaml
~/work/soft/k8s/menifests/dashboard-adminuser.yaml
~/work/soft/k8s/menifests/dashboard-adminuser-secret.yaml
metrics-server
下载 metrics-server 需要的镜像文件:
docker pull registry.k8s.io/metrics-server/metrics-server:v0.7.2
docker pull docker.io/kubernetesui/dashboard-metrics-scraper:1.2.2
参考在线安装文档准备以下 yaml 文件:
~/work/soft/k8s/menifests/metrics-server-components.yaml
安装
手工安装
执行 kubeadm init
命令, 注意检查并修改 IP 地址为实际 IP 地址:
NODE_IP=192.168.3.175
sudo kubeadm init --pod-network-cidr 10.244.0.0/16 --cri-socket unix:///var/run/cri-dockerd.sock --apiserver-advertise-address=$NODE_IP
配置 kube config:
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
配置 flannel 网络:
kubectl apply -f ~/work/soft/k8s/menifests/kube-flannel.yml
去除污点:
kubectl taint nodes --all node-role.kubernetes.io/control-plane-
安装 dashboard :
helm upgrade --install kubernetes-dashboard \
~/work/soft/k8s/charts/kubernetes-dashboard \
--create-namespace \
--namespace kubernetes-dashboard
准备用于登录 dashboard 的 admin-user 用户:
kubectl apply -f ~/work/soft/k8s/menifests/dashboard-adminuser.yaml
kubectl apply -f ~/work/soft/k8s/menifests/dashboard-adminuser-binding.yaml
kubectl -n kubernetes-dashboard create token admin-user
kubectl apply -f ~/work/soft/k8s/menifests/dashboard-adminuser-secret.yaml
ADMIN_USER_TOKEN=$(kubectl get secret admin-user -n kubernetes-dashboard -o jsonpath="{.data.token}" | base64 -d)
echo $ADMIN_USER_TOKEN > ~/work/soft/k8s/dashboard-admin-user-token.txt
echo "admin-user token is: $ADMIN_USER_TOKEN"
将 kubernetes-dashboard-kong-proxy 设置为 NodePort 类型:
kubectl -n kubernetes-dashboard patch service kubernetes-dashboard-kong-proxy -p '{"spec":{"type":"NodePort"}}'
获取 NodePort:
NODE_PORT=$(kubectl -n kubernetes-dashboard get service kubernetes-dashboard-kong-proxy \
-o jsonpath='{.spec.ports[0].nodePort}')
echo "url is: https://$NODE_IP:$NODE_PORT"
安装 metrics-server:
kubectl apply -f ~/work/soft/k8s/menifests/metrics-server-components.yaml
kubectl wait --namespace kube-system \
--for=condition=Ready \
--selector=k8s-app=metrics-server \
--timeout=300s pod
echo "metrics-server installed, have a try:"
echo
echo "kubectl top nodes"
echo
kubectl top nodes
echo
echo "kubectl top pods -n kube-system"
echo
kubectl top pods -n kube-system
脚本自动安装
#!/usr/bin/env zsh
# Kubernetes 自动化安装脚本 (Debian 12 + Helm + Dashboard + Metrics Server)
# 使用方法: sudo ./install_k8s_prewarm.zsh <NODE_IP>
# 获取脚本所在绝对路径
K8S_INSTALL_PATH=$(cd "$(dirname "$0")"; pwd)
MANIFESTS_PATH="$K8S_INSTALL_PATH/menifests"
CHARTS_PATH="$K8S_INSTALL_PATH/charts"
echo "🔍 检测到安装文件目录: $K8S_INSTALL_PATH"
# 检查是否以 root 执行
if [[ $EUID -ne 0 ]]; then
echo "❌ 此脚本必须以 root 身份运行"
exit 1
fi
# 获取节点 IP
if [[ -z "$1" ]]; then
echo "ℹ️ 用法: $0 <节点IP>"
exit 1
fi
NODE_IP=$1
# 安装日志
LOG_FILE="$K8S_INSTALL_PATH/k8s_install_$(date +%Y%m%d_%H%M%S).log"
exec > >(tee -a "$LOG_FILE") 2>&1
echo "📅 开始安装 Kubernetes 集群 - $(date)"
echo "🔧 节点IP: $NODE_IP"
echo "📁 资源目录: $K8S_INSTALL_PATH"
# 步骤1: kubeadm 初始化
echo "🚀 正在初始化 Kubernetes 控制平面..."
kubeadm_init() {
kubeadm init \
--pod-network-cidr 10.244.0.0/16 \
--cri-socket unix:///var/run/cri-dockerd.sock \
--apiserver-advertise-address=$NODE_IP
if [[ $? -ne 0 ]]; then
echo "❌ kubeadm init 失败"
exit 1
fi
}
kubeadm_init
sleep 3
# 步骤2: 配置 kubectl
echo "⚙️ 为 root 用户配置 kubectl..."
mkdir -p $HOME/.kube
cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
chown $(id -u):$(id -g) $HOME/.kube/config
echo "⚙️ 为当前用户配置 kubectl..."
CURRENT_USER_HOME=$(getent passwd $SUDO_USER | cut -d: -f6)
mkdir -p $CURRENT_USER_HOME/.kube
cp -i /etc/kubernetes/admin.conf $CURRENT_USER_HOME/.kube/config
chown $(id -u $SUDO_USER):$(id -g $SUDO_USER) $CURRENT_USER_HOME/.kube/config
# 步骤3: 安装 Flannel 网络插件
echo "🌐 正在安装 Flannel 网络..."
kubectl apply -f "$MANIFESTS_PATH/kube-flannel.yml" || {
echo "❌ Flannel 安装失败"
exit 1
}
sleep 3
# 步骤4: 去除控制平面污点
echo "✨ 去除控制平面污点..."
kubectl taint nodes --all node-role.kubernetes.io/control-plane- || {
echo "⚠️ 去除污点失败 (可能不影响功能)"
}
# 步骤5: 从本地安装 Dashboard
echo "📊 正在从本地安装 Kubernetes Dashboard..."
helm upgrade --install kubernetes-dashboard \
"$CHARTS_PATH/kubernetes-dashboard" \
--create-namespace \
--namespace kubernetes-dashboard || {
echo "❌ Dashboard 安装失败"
exit 1
}
sleep 3
# 步骤6: 配置 Dashboard 管理员用户
echo "👤 创建 Dashboard 管理员用户..."
kubectl apply -f "$MANIFESTS_PATH/dashboard-adminuser.yaml" || {
echo "❌ 创建 admin-user 失败"
exit 1
}
kubectl apply -f "$MANIFESTS_PATH/dashboard-adminuser-binding.yaml" || {
echo "❌ 创建 RBAC 绑定失败"
exit 1
}
kubectl apply -f "$MANIFESTS_PATH/dashboard-adminuser-secret.yaml" || {
echo "⚠️ 创建 Secret 失败 (可能已存在)"
}
# 获取并保存 Token
echo "🔑 获取管理员 Token..."
ADMIN_TOKEN=$(kubectl -n kubernetes-dashboard create token admin-user 2>/dev/null || \
kubectl get secret admin-user -n kubernetes-dashboard -o jsonpath="{.data.token}" | base64 -d)
if [[ -z "$ADMIN_TOKEN" ]]; then
echo "❌ 获取 Token 失败"
exit 1
fi
echo "$ADMIN_TOKEN" > "$K8S_INSTALL_PATH/dashboard-admin-user-token.txt"
echo "✅ Token 已保存到: $K8S_INSTALL_PATH/dashboard-admin-user-token.txt"
# 步骤7: 修改 Dashboard Service 类型
echo "🔧 修改 Dashboard 服务类型为 NodePort..."
kubectl -n kubernetes-dashboard patch service kubernetes-dashboard-kong-proxy \
-p '{"spec":{"type":"NodePort"}}' || {
echo "❌ 修改服务类型失败"
exit 1
}
sleep 3
# 获取 NodePort
NODE_PORT=$(kubectl -n kubernetes-dashboard get service kubernetes-dashboard-kong-proxy \
-o jsonpath='{.spec.ports[0].nodePort}')
echo "🌍 Dashboard 访问地址: https://$NODE_IP:$NODE_PORT"
echo "🔑 登录 Token: $ADMIN_TOKEN"
# 步骤8: 安装 Metrics Server
echo "📈 正在安装 Metrics Server..."
kubectl apply -f "$MANIFESTS_PATH/metrics-server-components.yaml" || {
echo "❌ Metrics Server 安装失败"
exit 1
}
# 等待 Metrics Server 就绪
echo "⏳ 等待 Metrics Server 就绪 (最多5分钟)..."
kubectl wait --namespace kube-system \
--for=condition=Ready \
--selector=k8s-app=metrics-server \
--timeout=300s pod || {
echo "❌ Metrics Server 启动超时"
exit 1
}
# 验证安装
echo "✅ 安装完成!"
sleep 5
echo ""
echo "🛠️ 验证命令:"
echo "kubectl top nodes"
kubectl top nodes
echo ""
echo "kubectl top pods -n kube-system"
kubectl top pods -n kube-system
echo ""
echo "📌 重要信息:"
echo "Dashboard URL: https://$NODE_IP:$NODE_PORT"
echo "Token 文件: $K8S_INSTALL_PATH/dashboard-admin-user-token.txt"
echo "安装日志: $LOG_FILE"
2.1.1.3 - 离线安装 kubenetes
2.1.1.3.1 - 在 debian12 上离线安装 kubeadmin
制作离线安装包
mkdir -p ~/temp/k8s-offline/
cd ~/temp/k8s-offline/
cri-dockerd
下载安装包:
wget https://github.com/Mirantis/cri-dockerd/releases/download/v0.4.0/cri-dockerd_0.4.0.3-0.debian-bookworm_amd64.deb
helm
参考在线安装的方式, 同样需要先添加 helm 的 apt 仓库,
curl https://baltocdn.com/helm/signing.asc | gpg --dearmor | sudo tee /usr/share/keyrings/helm.gpg > /dev/null
sudo apt-get install apt-transport-https --yes
echo "deb [arch=$(dpkg --print-architecture) signed-by=/usr/share/keyrings/helm.gpg] https://baltocdn.com/helm/stable/debian/ all main" | sudo tee /etc/apt/sources.list.d/helm-stable-debian.list
sudo apt-get update
然后找到需要安装的版本, 下载离线安装包。
apt-get download helm
kubeadmin
添加 k8s 的 keyrings:
export K8S_VERSION=1.33
# sudo mkdir -p -m 755 /etc/apt/keyrings
curl -fsSL https://pkgs.k8s.io/core:/stable:/v${K8S_VERSION}/deb/Release.key | sudo gpg --dearmor -o /etc/apt/keyrings/kubernetes-apt-keyring.gpg
echo "deb [signed-by=/etc/apt/keyrings/kubernetes-apt-keyring.gpg] https://pkgs.k8s.io/core:/stable:/v${K8S_VERSION}/deb/ /" | sudo tee /etc/apt/sources.list.d/kubernetes.list
sudo apt-get update
下载安装包:
# 下载 k8s 的 .deb 包
apt-get download kubelet kubeadm kubectl
# 下载所有依赖(可能需要运行多次直到无新依赖)
apt-get download $(apt-cache depends kubelet kubeadm kubectl | grep -E 'Depends|Recommends' | cut -d ':' -f 2 | tr -d ' ' | grep -v "^kube" | sort -u)
rm -r iptables*.deb
下载 kubernetes-cni:
apt-get download kubernetes-cni
完成后的离线安装包如下:
ls -lh
total 124M
-rw-r--r-- 1 sky sky 35K Mar 23 2023 conntrack_1%3a1.4.7-1+b2_amd64.deb
-rw-r--r-- 1 sky sky 11M Apr 16 15:43 cri-dockerd_0.4.0.3-0.debian-bookworm_amd64.deb
-rw-r--r-- 1 sky sky 17M Apr 22 16:56 cri-tools_1.33.0-1.1_amd64.deb
-rw-r--r-- 1 sky sky 193K Dec 20 2022 ethtool_1%3a6.1-1_amd64.deb
-rw-r--r-- 1 sky sky 17M Apr 29 21:31 helm_3.17.3-1_amd64.deb
-rw-r--r-- 1 sky sky 1022K May 22 2023 iproute2_6.1.0-3_amd64.deb
-rw-r--r-- 1 sky sky 352K Jan 16 2023 iptables_1.8.9-2_amd64.deb
-rw-r--r-- 1 sky sky 13M Apr 24 02:07 kubeadm_1.33.0-1.1_amd64.deb
-rw-r--r-- 1 sky sky 12M Apr 24 02:07 kubectl_1.33.0-1.1_amd64.deb
-rw-r--r-- 1 sky sky 16M Apr 24 02:08 kubelet_1.33.0-1.1_amd64.deb
-rw-r--r-- 1 sky sky 37M Feb 5 17:03 kubernetes-cni_1.6.0-1.1_amd64.deb
-rw-r--r-- 1 sky sky 2.7M Mar 8 05:26 libc6_2.36-9+deb12u10_amd64.deb
-rw-r--r-- 1 sky sky 131K Dec 9 20:54 mount_2.38.1-5+deb12u3_amd64.deb
-rw-r--r-- 1 sky sky 1.2M Dec 9 20:54 util-linux_2.38.1-5+deb12u3_amd64.deb
将这个离线安装包压缩成一个 tar 包:
cd ~/temp/
tar -czvf k8s-offline-v1.33.tar.gz k8s-offline
离线安装
下载离线安装包到本地:
mkdir -p ~/temp/ && cd ~/temp/
scp sky@192.168.3.179:/home/sky/temp/k8s-offline-v1.33.tar.gz .
解压离线安装包:
tar -xvf k8s-offline-v1.33.tar.gz
cd k8s-offline
手工安装 cri-dockerd
sudo dpkg -i cri-dockerd*.deb
手工安装 helm
sudo dpkg -i helm*.deb
手工安装 kubeadm
安装 kubeadm 的依赖:
sudo dpkg -i util-linux*.deb
sudo dpkg -i conntrack*.deb
sudo dpkg -i cri-tools*.deb
sudo dpkg -i libc6*.deb
sudo dpkg -i ethtool*.deb
sudo dpkg -i mount*.deb
sudo dpkg -i iproute2*.deb
sudo dpkg -i iptables*.deb
sudo dpkg -i kubernetes-cni*.deb
安装 kubectl / kubelet / kubeadm :
sudo dpkg -i kubectl*.deb
sudo dpkg -i kubelet*.deb
sudo dpkg -i kubeadm*.deb
修改 ~/.zshrc
文件,添加 alias :
if ! grep -qF "# k8s auto complete" ~/.zshrc; then
cat >> ~/.zshrc << 'EOF'
# k8s auto complete
alias k=kubectl
complete -F __start_kubectl k
EOF
fi
source ~/.zshrc
制作离线安装脚本
离线安装避免在线安装的网络问题,非常方便,考虑写一个离线安装脚本,方便以后使用。
vi install_k8s_offline.zsh
内容为:
#!/usr/bin/env zsh
# ------------------------------------------------------------
# kubeadm 离线安装脚本 (Debian 12)
# 前提条件:
# 1. 所有 .deb 文件已放在 ~/k8s-offline
# 2. 已经
# ------------------------------------------------------------
set -e # 遇到错误立即退出
K8S_OFFLINE_DIR="./k8s-offline"
# 检查是否在 Debian 12 上运行
if ! grep -q "Debian GNU/Linux 12" /etc/os-release; then
echo "❌ 错误:此脚本仅适用于 Debian 12!"
exit 1
fi
# 检查是否已安装 kubeadm
if command -v kubeadm &>/dev/null; then
echo "⚠️ kubeadm 已安装,跳过安装步骤。"
exit 0
fi
# 检查离线目录是否存在
if [[ ! -d "$K8S_OFFLINE_DIR" ]]; then
echo "❌ 错误:离线目录 $K8S_OFFLINE_DIR 不存在!"
exit 1
fi
echo "🔧 开始离线安装 kubeadm..."
# ------------------------------------------------------------
# 1. 开启模块
# ------------------------------------------------------------
echo "🔧 开启模块..."
cat <<EOF | sudo tee /etc/modules-load.d/k8s.conf
overlay
br_netfilter
EOF
sudo modprobe overlay
sudo modprobe br_netfilter
cat <<EOF | sudo tee /etc/sysctl.d/k8s.conf
net.bridge.bridge-nf-call-iptables = 1
net.bridge.bridge-nf-call-ip6tables = 1
net.ipv4.ip_forward = 1
EOF
# Apply sysctl params without reboot
sudo sysctl --system
# ------------------------------------------------------------
# 2. 安装 cri-dockerd
# ------------------------------------------------------------
echo "📦 安装 cri-dockerd..."
cd "$K8S_OFFLINE_DIR"
sudo dpkg -i cri-tools*.deb
sudo dpkg -i cri-dockerd*.deb
# ------------------------------------------------------------
# 3. 安装 kubeadm 的依赖
# ------------------------------------------------------------
echo "📦 安装 kubeadm 的依赖包..."
# 按顺序安装依赖(防止 dpkg 报错)
for pkg in util-linux conntrack libc6 ethtool mount iproute2 iptables helm kubernetes-cni; do
if ls "${pkg}"*.deb &>/dev/null; then
echo "➡️ 正在安装: ${pkg}"
sudo dpkg -i "${pkg}"*.deb || true # 忽略部分错误,后续用 apt-get -f 修复
fi
done
# 修复依赖关系
echo "🛠️ 修复依赖关系..."
sudo apt-get -f install -y
# ------------------------------------------------------------
# 4. 安装 kubeadm
# ------------------------------------------------------------
# 按顺序安装 kubeadm 组件(防止 dpkg 报错)
echo "📦 安装 kubeadm 组件..."
for pkg in kubectl kubelet kubeadm; do
if ls "${pkg}"*.deb &>/dev/null; then
echo "➡️ 正在安装: ${pkg}"
sudo dpkg -i "${pkg}"*.deb || true # 忽略部分错误,后续用 apt-get -f 修复
fi
done
# 修复依赖关系
echo "🛠️ 修复依赖关系..."
sudo apt-get -f install -y
# ------------------------------------------------------------
# 5. 配置 kubectl
# ------------------------------------------------------------
echo "⚙️ 配置 kubectl 使用 alias..."
if ! grep -qF "# k8s auto complete" ~/.zshrc; then
cat >> ~/.zshrc << 'EOF'
# k8s auto complete
alias k=kubectl
complete -F __start_kubectl k
EOF
fi
# ------------------------------------------------------------
# 6. 验证安装
# ------------------------------------------------------------
echo "✅ 安装完成!验证版本:"
kubectl version --client && echo && kubelet --version && echo && kubeadm version && echo
echo "✨ kubeadm 安装完成!"
echo "👥 然后重新登录,或者执行命令以便 k alias 立即生效: source ~/.zshrc"
echo "🟢 之后请运行测试 kubectl 的别名 k: k version --client"
2.1.1.3.2 - 在 debian12 上离线安装 k8s
指定镜像仓库进行离线安装
新建一个 kubeadm-config.yaml 文件:
vi kubeadm-config.yaml
内容设置为:
apiVersion: kubeadm.k8s.io/v1beta4
kind: InitConfiguration
localAPIEndpoint:
advertiseAddress: "192.168.3.179"
bindPort: 6443
nodeRegistration:
criSocket: "unix:///var/run/containerd/containerd.sock"
---
apiVersion: kubeadm.k8s.io/v1beta4
kind: ClusterConfiguration
imageRepository: "192.168.3.91:5000/k8s-proxy"
dns:
imageRepository: "192.168.3.91:5000/k8s-proxy/coredns"
imageTag: "v1.12.0"
etcd:
local:
imageRepository: "192.168.3.91:5000/k8s-proxy"
imageTag: "3.5.21-0"
使用提前准备好的 harbor 代理仓库 192.168.3.91:5000/k8s-proxy 进行 kubeadm 的 init 操作:
sudo kubeadm init --pod-network-cidr 10.244.0.0/16 --cri-socket unix:///var/run/cri-dockerd.sock --apiserver-advertise-address=192.168.3.179 --image-repository=192.168.3.91:5000/k8s-proxy
但是报错了:
W0511 21:34:13.908906 60242 version.go:109] could not fetch a Kubernetes version from the internet: unable to get URL "https://dl.k8s.io/release/stable-1.txt": Get "https://dl.k8s.io/release/stable-1.txt": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
W0511 21:34:13.908935 60242 version.go:110] falling back to the local client version: v1.33.0
[init] Using Kubernetes version: v1.33.0
[preflight] Running pre-flight checks
[preflight] Pulling images required for setting up a Kubernetes cluster
[preflight] This might take a minute or two, depending on the speed of your internet connection
[preflight] You can also perform this action beforehand using 'kubeadm config images pull'
W0511 21:34:13.944144 60242 checks.go:846] detected that the sandbox image "registry.k8s.io/pause:3.10" of the container runtime is inconsistent with that used by kubeadm.It is recommended to use "192.168.3.91:5000/k8s-proxy/pause:3.10" as the CRI sandbox image.
error execution phase preflight: [preflight] Some fatal errors occurred:
[ERROR ImagePull]: failed to pull image 192.168.3.91:5000/k8s-proxy/coredns:v1.12.0: failed to pull image 192.168.3.91:5000/k8s-proxy/coredns:v1.12.0: Error response from daemon: unknown: repository k8s-proxy/coredns not found
[preflight] If you know what you are doing, you can make a check non-fatal with `--ignore-preflight-errors=...`
To see the stack trace of this error execute with --v=5 or higher
不知道为什么 coredns 的镜像路径和其他的不一样。这是 k8s 所有的镜像的路径,可以看到正常的名称是 “registry.k8s.io/xxxx:version”,只有 coredns 是 registry.k8s.io/coredns/coredns:v1.12.0
, 多一个路径:
kubeadm config images pull
[config/images] Pulled registry.k8s.io/kube-apiserver:v1.33.0
[config/images] Pulled registry.k8s.io/kube-controller-manager:v1.33.0
[config/images] Pulled registry.k8s.io/kube-scheduler:v1.33.0
[config/images] Pulled registry.k8s.io/kube-proxy:v1.33.0
[config/images] Pulled registry.k8s.io/coredns/coredns:v1.12.0
[config/images] Pulled registry.k8s.io/pause:3.10
[config/images] Pulled registry.k8s.io/etcd:3.5.21-0
指定镜像仓库再做一次镜像下载,验证一下,发现也是同样报错:
kubeadm config images pull --image-repository=192.168.3.91:5000/k8s-proxy
[config/images] Pulled 192.168.3.91:5000/k8s-proxy/kube-apiserver:v1.33.0
[config/images] Pulled 192.168.3.91:5000/k8s-proxy/kube-controller-manager:v1.33.0
[config/images] Pulled 192.168.3.91:5000/k8s-proxy/kube-scheduler:v1.33.0
[config/images] Pulled 192.168.3.91:5000/k8s-proxy/kube-proxy:v1.33.0
failed to pull image "192.168.3.91:5000/k8s-proxy/coredns:v1.12.0": failed to pull image 192.168.3.91:5000/k8s-proxy/coredns:v1.12.0: Error response from daemon: unknown: resource not found: repo k8s-proxy/coredns, tag v1.12.0 not found
To see the stack trace of this error execute with --v=5 or higher
2.2 - 安装kubectl
kubectl 是 Kubernetes 的命令行工具,允许对Kubernetes集群运行命令。
单独安装 kubectl 命令行工具,可以方便的在本地远程操作集群。
2.2.1 - 在 ubuntu 上安装 kubectl
参考 Kubernetes 官方文档:
分步骤安装
和后面安装 kubeadm 方式一样,只是这里只需要安装 kubectl 一个工具,不需要安装 kubeadm 和 kublete
执行如下命令:
sudo apt-get update
sudo apt-get install -y apt-transport-https ca-certificates curl
curl -fsSL https://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo gpg --dearmor -o /etc/apt/keyrings/kubernetes-archive-keyring.gpg
echo "deb [signed-by=/etc/apt/keyrings/kubernetes-archive-keyring.gpg] https://apt.kubernetes.io/ kubernetes-xenial main" | sudo tee /etc/apt/sources.list.d/kubernetes.list
sudo apt-get update
k8s 暂时固定使用 1.23.14 版本:
sudo apt-get install kubectl=1.23.14-00
# sudo apt-get install kubelet=1.23.14-00 kubeadm=1.23.14-00 kubectl=1.23.14-00
直接安装
不推荐这样安装,会安装最新版本,而且安装目录是 /usr/local/bin/
。
curl -LO "https://dl.k8s.io/release/$(curl -L -s https://dl.k8s.io/release/stable.txt)/bin/linux/amd64/kubectl"
sudo install -o root -g root -m 0755 kubectl /usr/local/bin/kubectl
rm kubectl
如果 /usr/local/bin/
不在 path 路径下,则需要修改一下 path:
export PATH=/usr/local/bin:$PATH
验证一下:
kubectl version --output=yaml
输出为:
clientVersion:
buildDate: "2023-06-14T09:53:42Z"
compiler: gc
gitCommit: 25b4e43193bcda6c7328a6d147b1fb73a33f1598
gitTreeState: clean
gitVersion: v1.27.3
goVersion: go1.20.5
major: "1"
minor: "27"
platform: linux/amd64
kustomizeVersion: v5.0.1
The connection to the server localhost:8080 was refused - did you specify the right host or port?
配置
oh-my-zsh自动完成
在使用 oh-my-zsh 之后,会更加的简单(强烈推荐使用 oh-my-zsh ),只要在 oh-my-zsh 的 plugins 列表中增加 kubectl 即可。
然后,在 ~/.zshrc
中增加以下内容:
# k8s auto complete
alias k=kubectl
complete -F __start_kubectl k
source ~/.zshrc
之后即可使用,此时用 k 这个别名来执行 kubectl 命令时也可以实现自动完成,非常的方便。
3 - Sidecar Container
3.1 - Sidecar Container概述
From Kubernetes 1.18 containers can be marked as sidecars
Unfortunately, that features has been removed from 1.18, then removed from 1.19 and currently has no specific date for landing.
reference: kubernetes/enhancements#753
资料
官方正式资料
-
Sidecar Containers(kubernetes/enhancements#753): 最权威的资料了,准备细读
-
Support startup dependencies between containers on the same Pod
社区介绍资料
- Sidecar Containers improvement in Kubernetes 1.18: 重点阅读
- Kubernetes — Learn Sidecar Container Pattern
- Sidecar container lifecycle changes in Kubernetes 1.18
- Tutorial: Apply the Sidecar Pattern to Deploy Redis in Kubernetes
- Sidecar Containers:by 陈鹏,特别鸣谢
相关项目的处理
Istio
信息1
https://github.com/kubernetes/enhancements/issues/753#issuecomment-684176649
We use a custom daemon image like a supervisor
to wrap the user’s program. The daemon will also listen to a particular port to convey the health status of users’ programs (exited or not).
我们使用一个类似
supervisor
的自定义守护进程镜像来包装用户的程序。守护进程也会监听特定的端口来传达用户程序的健康状态(是否退出)。
Here is the workaround:
- Using the daemon image as
initContainers
to copy the binary to a shared volume. - Our
CD
will hijack users’ command, let the daemon start first. Then, the daemon runs the users’ program until Envoy is ready. - Also, we add
preStop
, a script that keeps checking the daemon’s health status, for Envoy.
下面是变通的方法:
- 以 “initContainers” 的方式用守护进程的镜像来复制二进制文件到共享卷。
- 我们的
CD
会劫持用户的命令,让守护进程先启动,然后,守护进程运行用户的程序,直到 Envoy 准备好。- 同时,我们还为Envoy添加
preStop
,一个不断检查守护进程健康状态的脚本。
As a result, the users’ process will start if Envoy is ready, and Envoy will stop after the process of users is exited.
结果,如果Envoy准备好了,用户的程序就会启动,而Envoy会在用户的程序退出后停止。
It’s a complicated workaround, but it works fine in our production environment.
这是一个复杂的变通方法,但在我们的生产环境中运行良好。
信息2
还找到一个答复: https://github.com/kubernetes/enhancements/issues/753#issuecomment-687184232
Allow users to delay application start until proxy is ready
for startup issues, the istio community came up with a quite clever workaround which basically injects envoy as the first container in the container list and adds a postStart hook that checks and wait for envoy to be ready. This is blocking and the other containers are not started making sure envoy is there and ready before starting the app container.
对于启动问题,istio社区想出了一个相当聪明的变通方法,基本上是将envoy作为容器列表中的第一个容器注入,并添加一个postStart钩子,检查并等待envoy准备好。这是阻塞的,而其他容器不会启动,这样确保envoy启动并且准备好之后,然后再启动应用程序容器。
We had to port this to the version we’re running but is quite straightforward and are happy with the results so far.
我们已经将其移植到我们正在运行的版本中,很直接,目前对结果很满意。
For shutdown we are also ‘solving’ with preStop hook but adding an arbitrary sleep which we hope the application would have gracefully shutdown before continue with SIGTERM.
对于关机,我们也用 preStop 钩子来 “解决”,但增加了一个任意的 sleep,我们希望应用程序在继续 SIGTERM 之前能优雅地关机。
相关issue: Enable holdApplicationUntilProxyStarts at pod level
Knative
dapr
- Clarify lifecycle of Dapr process and app process : dapr项目中在等待 sidecar container的结果。在此之前,dapr做了一个简单的调整,将daprd这个sidecar的启动顺序放在最前面(详见 https://github.com/dapr/dapr/pull/2341)
3.2 - KEP753: Sidecar Container
相关issue
https://github.com/kubernetes/enhancements/issues/753
这个issue 开启于 2019年1月。
One-line enhancement description: Containers can now be a marked as sidecars so that they startup before normal containers and shutdown after all other containers have terminated.
一句话改进描述:容器现在可以被标记为 sidecar,使其在正常容器之前启动,并在所有其他容器终止后关闭。
设计提案链接:https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/753-sidecar-containers
3.3 - 推翻KEP753的讨论
https://github.com/kubernetes/enhancements/pull/1980
这是一个关于 sidecar 的讨论汇总,最后得出的结论是推翻 kep753.
起于derekwaynecarr的发言
I want to capture my latest thoughts on sidecar concepts, and get a path forward.
Here is my latest thinking:
我想归纳我对 sidecar 概念的最新思考,并得到一条前进的道路。
这是我的最新思考。
I think it’s important to ask if the introduction of sidecar containers will actually address an end-user requirement or just shift a problem and further constrain adoption of sidecars themselves by pod authors. To help frame this exercise, I will look at the proposed use of sidecar containers in the service mesh community.
我认为重要的是要问一下 sidecar容器的引入是否会真正解决最终用户的需求,或者只是转移一个问题,并进一步限制pod作者对sidecars本身的采用。为了帮助构架这项工作,我将看看服务网格社区中拟议的 sidecar 容器的使用情况。
User story
I want to enable mTLS for all traffic in my mesh because my auditor demands it.
我想在我的Mesh中启用mTLS,因为我的会计要求这样做。
The proposed solution is the introduction of sidecar containers that change the pod lifecycle:
提出的解决方案是引入sidecar container,改变 pod 的生命周期:
- Init containers start/stop
- Sidecar containers start
- Primary containers start/stop
- Sidecar containers stop
The issue with the proposed solution meeting the user story is as follows:
建议的解决方案可以满足用户故事的问题如下:
-
Init containers are not subject to service mesh because the proxy is not running. This is because init containers run to completion before starting the next container. Many users do network interaction that should be subject to the mesh in their init container.
Init container 不受服务网格的影响,因为代理没有运行。这是因为init container 在启动下一个容器之前会运行到完成状态。很多用户在 init container 中做网络交互,应该受制于网格。
-
Sidecar containers (once introduced) will be used by users for use cases unrelated to the mesh, but subject to the mesh. The proposal makes no semantic guarantees on ordering among sidecars. Similar to init containers, this means sidecars are not guaranteed to participate in the mesh.
Sidecar 容器(一旦引入)将被用户用于与网格无关但受网格制约的用例。该提案没有对sidecars之间的顺序进行语义保证。与 init 容器类似,这意味着 sidecar 不能保证参与 mesh。
The real requirement is that the proxy container MUST stop last even among sidecars if those sidecars require network.
真正的需求是,如果这些sidecar需要网络,代理容器也必须最后停止,即使代理容器也是 sidecar。
Similar to the behavior observed with init containers (users externalize run-once setup from their main application container), the introduction of sidecar containers will result in more elements of the application getting externalized into sidecars, but those elements will still desire to be part of the mesh when they require a network. Hence, we are just shifting, and not solving the problem.
与观察到的init容器的行为类似(用户从他们的主应用容器中外部化一次性设置),引入sidecar容器将导致更多的应用元素被外部化到sidecar中,但是当这些元素需要网络时,它们仍然会渴望成为网格结构的一部分。因此,我们只是在转移,而不是解决问题。
Given the above gaps, I feel we are not actually solving a primary requirement that would drive improved adoption of a service mesh (ensure all traffic is mTLS from my pod) to meet auditing.
鉴于上述差距,我觉得我们并没有实际上解决主要需求,这个需求将推动服务网格的改进采用(确保所有来自我的pod的流量都是mTLS),以满足审计。
Alternative proposal:
- Support an ordered graph among containers in the pod (it’s inevitable), possibly with N rings of runlevels?
- Identify which containers in that graph must run to completion before initiating termination (Job use case).
- Move init containers into the graph (collapse the concept)
- Have some way to express if a network is required by the container to act as a hint for the mesh community on where to inject a proxy in the graph.
替代建议:
- 支持在pod中的容器之间建立一个有序图(这是不可避免的),可能有N个运行级别的环?
- 识别该图中的哪些容器必须在启动终止之前运行至完成状态(Job用例)。
- 将 init 容器移入图中(折叠概念)。
- 有某种方式来标记容器是否需要网络,用来作为网格社区的提示,在图中某处注入代理。
A few other notes based on Red Hat’s experience with service mesh:
Red Hat does not support injection of privileged sidecar containers and will always require CNI approach. In this flow, the CNI runs, multus runs, iptables are setup, and then init containers start. The iptables rules are setup, but no proxy is running, so init containers lose connectivity. Users are unhappy that init containers are not participating in the mesh. Users should not have to sacrifice usage of an init container (or any aspect of the pod lifecycle) to fulfill auditor requirements. The API should be flexible enough to support graceful introduction in the right level of a intra pod life-cycle graph transparent to the user.
根据红帽在服务网格方面的经验,还有一些其他说明:
红帽不支持注入特权sidecar容器,总是需要CNI方式。在这个流程中,CNI运行,multus运行,设置iptables,然后 init 容器启动。iptables规则设置好了,但是没有代理运行,所以 init容器 失去了连接。用户对init容器不参与网格感到不满。用户不应该为了满足审计师的要求而牺牲init容器的使用(或pod生命周期的任何方面)。API应该足够灵活,以支持在正确的层次上优雅地引入对用户透明的 pod 生命周期图。
Proposed next steps:
- Get a dedicated set of working meetings to ensure that across the mesh and kubernetes community, we can meet a users auditing requirement without limiting usage or adoption of init containers and/or sidecar containers themselves by pod authors.
- I will send a doodle.
拟议的下一步措施:
召开一组专门的工作会议,以确保在整个mesh和kubernetes社区,我们可以满足用户审计要求,而不限制pod作者使用或采用init容器和/或sidecar容器本身。
我会发一个涂鸦。
其他人的意见
mrunalp:
Agree! We might as well tackle this general problem vs. doing it step by step with baggage added along the way.
同意! 我们不妨解决这个普遍性的问题,而不是按部就班地做,在做的过程中增加包袱。
sjenning :
I agree @derekwaynecarr
I think that in order to satisfy fully the use cases mentioned, we are gravitating toward systemd level semantics where there is just an ordered graph of services containers in the pod spec.
You could basically collapse init containers into the normal containers map and add two fields to Container
; oneshot bool
that expresses if the container terminates and dependent containers should wait for it to terminate (handles init containers w/ ordering), and requires map[string]
a list of container names upon which the current container depends.
This is flexible enough to accommodate a oneshot: true
container (init container) depending on a oneshot: false
container (a proxy container on which the init container depends).
Admittedly this would be quite the undertaking and there is API compatibility to consider.
我同意 @derekwaynecarr
我认为,为了充分满足上述用例,我们正在倾向于systemd级别的语义,在pod规范中,需要有一个有序的
服务容器图。你基本上可以把init容器折叠到普通容器图中,并在Container中添加两个字段;
oneshot bool
,表示容器是否终止,依赖的容器是否应该等待它终止(处理init容器 w/排序),和requires map[string]
一个当前容器依赖的容器名称列表。这足够灵活,可以容纳一个
oneshot: true
容器(init 容器)依赖于一个oneshot: false
容器(init 容器依赖的代理容器)。诚然,这将是一个相当大的工程,而且还要考虑API的兼容性。
thockin:
I have also been thinking about this. There are a number of open issues, feature-requests, etc that all circle around the topic of pod and container lifecycle. I’ve been a vocal opponent of complex API here, but it’s clear that what we have is inadequate.
When we consider init-container x sidecar-container, it is clear we will inevitably eventually need an init-sidecar.
我也一直在思考这个问题。有一些开放的问题、功能需求等,都是围绕着pod和容器生命周期这个话题展开的。我在这里一直是复杂API的强烈反对者,但很明显,我们所拥有的是不够的。
当我们考虑 init-container x sidecar-container 时,很明显我们最终将不可避免地需要一个init-sidecar。
Some (non-exhaustive) of the other related topics:
- Node shutdown -> Pod shutdown (in progress?)
- Voluntary pod restart (“Something bad happened, please burn me down to the ground and start over”)
- Voluntary pod failure (“I know something you don’t, and I can’t run here - please terminate me and do not retry”)
- “Critical” or “Keystone” containers (“when this container exits, the rest should be stopped”)
- Startup/shutdown phases with well-defined semantics (e.g. “phase 0 has no network”)
- Mixed restart policies in a pod (e.g. helper container which runs and terminates)
- Clearer interaction between pod, network, and device plugins
其他的一些(非详尽的)相关主题:
- 节点关闭 -> Pod关闭(正在进行中?
- 自愿重启pod(“发生了不好的事情,请把我摧毁,然后重新开始”)。
- 自愿pod失败(“我知道一些你不知道的事情,我无法在这里运行–请终止我,不要重试”)
- “关键 “或 “基石 “容器(“当这个容器退出时,其他容器应停止”)。
- 具有明确语义的启动/关闭阶段(如 “phase 0 没有网络”)。
- 在一个pod中混合重启策略(例如,帮助容器,它会运行并终止)。
- 更清晰的 pod、网络和设备插件之间的交互。
** thockin:**
This is a big enough topic that we almost certainly need to explore multiple avenues before we can have confidence in any one.
这是一个足够大的话题,我们几乎肯定需要探索多种途径,才能对任何一种途径有信心。
kfox1111:
the dependency idea also would allow for doing an init container, then a sidecar network plugin, then more init containers, etc, which has some nice features.
Also the readyness checks and oneshot could all play together with the dependencies so the next steps aren’t started before ready.
So, as a user experience, I think that api might be very nice.
Implementation wise there are probably lots of edge cases to carefully consider there.
依赖的想法还可以做一个init容器,然后做一个sidecar网络插件,然后做更多的init容器等等,这有一些不错的功能。
另外 readyness 检查和 oneshot 都可以和依赖一起考虑,这样就不会在准备好之前就开始下一步。
所以,作为用户体验来说,我觉得这个api可能是非常不错的。
从实现上来说,可能有很多边缘情况需要仔细考虑。
SergeyKanzhelev:
this is great idea to set up a working group to move it forward in bigger scope. One topic I suggest we cover early on in the discussions is whether we need to address the existing pain point of injecting sidecars in jobs in 1.20. This KEP intentionally limited the scope to just this - formalizing what people are already trying to do today with workarounds.
From Google side we also would love the bigger scope of a problem be addressed, but hope to address some immediate pain points early if possible. Either in current scope or slightly bigger.
这是一个很好的想法,成立一个工作组,在更大范围内推进它。我建议我们在讨论中尽早涉及的一个话题是,我们是否需要在1.20中解决现有的Job中注入 sidecar 的痛点。这个KEP有意将范围限制在这一点上–将人们今天已经在尝试的工作方法正式化。
从Google方面来说,我们也希望更大范围的问题能够得到解决,但如果可能的话,希望能够尽早解决一些直接的痛点。要么在目前的范围内,要么稍微大一点。
derekwaynecarr:
I would speculate that the dominant consumer of the job scenario is a job that required participation in a mesh to complete its task, and since I don’t see much point in solving for the mesh use case (which I view as the primary motivator for defining side car semantics) for only one workload type, I would rather ensure a pattern that solves the problem in light of our common experience across mesh and k8s communities.
我推测工作场景的主要消费者是需要参与网格来完成任务的Job,由于我认为只为一种工作负载类型解决mesh用例(我认为这是定义 sidecar 语义的主要动机)没有太大意义,所以我宁愿根据我们在 mesh 和k8s社区中的共同经验,确保一个能解决问题的模式。