From d074ce1e9023b3fdecdfee3c0255115eb80811e5 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?=E5=87=8C=E6=96=87=E9=BE=99?= Date: Tue, 25 Jan 2022 11:08:23 +0000 Subject: [PATCH 01/17] =?UTF-8?q?=E6=B7=BB=E5=8A=A0=20'k8s&container/k8s-u?= =?UTF-8?q?pgrade.md'?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit --- k8s&container/k8s-upgrade.md | 457 +++++++++++++++++++++++++++++++++++ 1 file changed, 457 insertions(+) create mode 100644 k8s&container/k8s-upgrade.md diff --git a/k8s&container/k8s-upgrade.md b/k8s&container/k8s-upgrade.md new file mode 100644 index 0000000..2f1d21f --- /dev/null +++ b/k8s&container/k8s-upgrade.md @@ -0,0 +1,457 @@ +# 升级集群 + + + +## k8s 版本信息 + +k8s 版本表示为`x.y.z`,其中`x`是主要版本,`y`是次要版本,`z`是补丁版本。 + +### k8s 发行版与 github 分支的关系 + +master分支上的代码是最新的,每隔2周会生成一个发布版本(release),由新到旧以此为 `master`-->`alpha`-->`beta`-->`Final release`。`X.Y.0`为稳定版本,一个`X.Y.0`版本会在`X.(Y-1).0`版本的3到4个月后出现,`X.Y.Z`为解决了必须的安全性漏洞、以及影响大量用户的无法解决的问题的补丁版本。总体而言,我们一般关心`X.Y.0`(稳定版本),和`X.Y.Z`(补丁版本)的特性。 + +`v1.14.0` : `1`为主要版本 : `14`为次要版本 : `0`为补丁版本 + +### 每个版本的支持周期 + +`k8s` 项目维护最新三个次要版本的发布分支。结合上述**一个`X.Y.0`版本会在`X.(Y-1).0`版本的3到4个月后出现**的描述,也就是说1年前的版本就不再维护,每个次要版本的维护周期为9~12个月,就算有安全漏洞也不会有补丁版本. + +`k8s` 项目会维护最近的三个小版本分支(1.23, 1.22, 1.21)。 `k8s` 1.19 及更高的版本将获得大约1年的补丁支持。 `k8s` 1.18 及更早的版本获得大约9个月的补丁支持。 + +## 版本兼容性 + +### kube-apiserver + +在高可用的集群中,多个`kube-apiserver` 实例的小版本号最多差 1 + +- 比如我们的集群 `kube-apiserver` 版本号如果是 **1.18** +- 则受支持的 `kube-apiserver` 版本号包括 **1.18** 和 **1.19** + +### kubelet + +`kubelet` 版本号不能高于 `kube-apiserver`,最多可以比 `kube-apiserver` 低两个小版本。 + +例如: + +- `kube-apiserver` 版本号如果是 **1.20** +- 受支持的的 `kubelet` 版本将包括 **1.20**、**1.19** 和 **1.18** + +> 如果 HA 集群中多个 `kube-apiserver` 实例版本号不一致,相应的 `kubelet` 版本号可选范围也要减小 + + + +### kube-controller-manager、 kube-scheduler 和 cloud-controller-manager + +`kube-controller-manager`、`kube-scheduler` 和 `cloud-controller-manager` 版本不能高于 `kube-apiserver` 版本号。 最好它们的版本号与 `kube-apiserver` 保持一致,但允许比 `kube-apiserver` 低一个小版本(为了支持在线升级)。 + +例如: + +- 如果 `kube-apiserver` 版本号为 **1.20** +- `kube-controller-manager`、`kube-scheduler` 和 `cloud-controller-manager` 版本支持 **1.20** 和 **1.19** + + + +> **说明:** 如果在 HA 集群中,多个 `kube-apiserver` 实例版本号不一致,他们也可以跟任意一个 `kube-apiserver` 实例通信(例如,通过 load balancer), 但 `kube-controller-manager`、`kube-scheduler` 和 `cloud-controller-manager` 版本可用范围会相应的减小。 + +例如: + +- `kube-apiserver` 实例同时存在 **1.20** 和 **1.21** 版本 +- `kube-controller-manager`、`kube-scheduler` 和 `cloud-controller-manager` 可以通过 `load balancer` 与所有的 `kube-apiserver` 通信 +- `kube-controller-manager`、`kube-scheduler` 和 `cloud-controller-manager` 可选版本为 **1.20** (不支持**1.21** 因为它比 `kube-apiserver` 的版本 **1.20** 新) + +### kubectl + +`kubectl` 可以比 `kube-apiserver` 高一个小版本,也可以低一个小版本。 + +例如: + +- 如果 `kube-apiserver` 当前是 **1.22** 版本 +- `kubectl` 则支持 **1.23**、**1.22** 和 **1.21** + +> **说明:** 如果 HA 集群中的多个 `kube-apiserver` 实例版本号不一致,相应的 `kubectl` 可用版本范围也会减小。 + +例如: + +- `kube-apiserver` 多个实例同时存在 **1.23** 和 **1.22** +- `kubectl` 可选的版本为 **1.23** 和 **1.22**(其他版本不再支持,因为它会比其中某个 `kube-apiserver` 实例高或低一个小版本 + + + +## 升级集群 + +我们集群是由 `kubeadm` 部署的,版本是 `1.18.x`,目前最新版本是 1.23, 最新稳定版是 1.22 + +下面详细介绍下集群从`1.18.x` 版本升级到 `1.19.x`,简单说下其他版本升级需要注意的细节。 + +备注: + +- 升级后,因为容器规约的哈希值已更改,所有容器都会重启。 +- 只能从一个次版本升级到下一个次版本,或者在次版本相同时升级补丁版本。 也就是说,升级时不可以跳过次版本。 例如,你只能从 `1.y` 升级到` 1.y+1`,而不能从 from `1.y` 升级到 `1.y+2`。 + +**升级的基本流程:** + +1. 先升级主控制平面节点再升级其他控制平面节点最后升级工作节点 +2. 先升级 `kube-apiserver` 再升级 `kube-controller-manager`、`kube-scheduler` 然后升级 `kubelet`最后升级 `kube-proxy` + +### 升级控制节点 + +#### 升级主控节点 + +```shell +apt update +apt-cache policy kubeadm +# 用最新的修补程序版本替换 1.19.x-00 中的 x +apt-mark unhold kubeadm && \ +apt-get update && apt-get install -y kubeadm=1.19.x-00 && \ +apt-mark hold kubeadm + +# 从 apt-get 1.1 版本起,你也可以使用下面的方法 +apt-get update && \ +apt-get install -y --allow-change-held-packages kubeadm=1.19.x-00 +``` + +升级完成,验证: + +```shell +kubeadm version +``` + +腾空控制平面节点: + +```shell +# 将 替换为你自己的控制面节点名称 +kubectl drain --ignore-daemonsets +``` + +检查集群是否可以升级,并可以获取到升级的版本 + +```shell +sudo kubeadm upgrade plan +``` + +类似下面的输出: + +``` +[upgrade/config] Making sure the configuration is correct: +[upgrade/config] Reading configuration from the cluster... +[upgrade/config] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml' +[preflight] Running pre-flight checks. +[upgrade] Running cluster health checks +[upgrade] Fetching available versions to upgrade to +[upgrade/versions] Cluster version: v1.18.4 +[upgrade/versions] kubeadm version: v1.19.0 +[upgrade/versions] Latest stable version: v1.19.0 +[upgrade/versions] Latest version in the v1.18 series: v1.18.4 + +Components that must be upgraded manually after you have upgraded the control plane with 'kubeadm upgrade apply': +COMPONENT CURRENT AVAILABLE +Kubelet 1 x v1.18.4 v1.19.0 + +Upgrade to the latest version in the v1.18 series: + +COMPONENT CURRENT AVAILABLE +API Server v1.18.4 v1.19.0 +Controller Manager v1.18.4 v1.19.0 +Scheduler v1.18.4 v1.19.0 +Kube Proxy v1.18.4 v1.19.0 +CoreDNS 1.6.7 1.7.0 +Etcd 3.4.3-0 3.4.7-0 + +You can now apply the upgrade by executing the following command: + + kubeadm upgrade apply v1.19.0 + +_____________________________________________________________________ + + The table below shows the current state of component configs as understood by this version of kubeadm. + Configs that have a "yes" mark in the "MANUAL UPGRADE REQUIRED" column require manual config upgrade or + resetting to kubeadm defaults before a successful upgrade can be performed. The version to manually + upgrade to is denoted in the "PREFERRED VERSION" column. + + API GROUP CURRENT VERSION PREFERRED VERSION MANUAL UPGRADE REQUIRED + kubeproxy.config.k8s.io v1alpha1 v1alpha1 no + kubelet.config.k8s.io v1beta1 v1beta1 no + _____________________________________________________________________ +``` + + + +> **说明:**如果 `kubeadm upgrade plan` 显示有任何组件配置需要手动升级,则用户必须 通过命令行参数 `--config` 给 `kubeadm upgrade apply` 操作 提供带有替换配置的配置文件。 + +升级到1.19版本 + +```shell +# 将 x 替换为你为此次升级所选的补丁版本号 +sudo kubeadm upgrade apply v1.19.x +``` + +看到类似下面的输出: + +``` +[upgrade/config] Making sure the configuration is correct: +[upgrade/config] Reading configuration from the cluster... +[upgrade/config] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml' +[preflight] Running pre-flight checks. +[upgrade] Running cluster health checks +[upgrade/version] You have chosen to change the cluster version to "v1.19.0" +[upgrade/versions] Cluster version: v1.18.4 +[upgrade/versions] kubeadm version: v1.19.0 +[upgrade/confirm] Are you sure you want to proceed with the upgrade? [y/N]: y +[upgrade/prepull] Pulling images required for setting up a Kubernetes cluster +[upgrade/prepull] This might take a minute or two, depending on the speed of your internet connection +[upgrade/prepull] You can also perform this action in beforehand using 'kubeadm config images pull' +[upgrade/apply] Upgrading your Static Pod-hosted control plane to version "v1.19.0"... +Static pod: kube-apiserver-kind-control-plane hash: b4c8effe84b4a70031f9a49a20c8b003 +Static pod: kube-controller-manager-kind-control-plane hash: 9ac092f0ca813f648c61c4d5fcbf39f2 +Static pod: kube-scheduler-kind-control-plane hash: 7da02f2c78da17af7c2bf1533ecf8c9a +[upgrade/etcd] Upgrading to TLS for etcd +Static pod: etcd-kind-control-plane hash: 171c56cd0e81c0db85e65d70361ceddf +[upgrade/staticpods] Preparing for "etcd" upgrade +[upgrade/staticpods] Renewing etcd-server certificate +[upgrade/staticpods] Renewing etcd-peer certificate +[upgrade/staticpods] Renewing etcd-healthcheck-client certificate +[upgrade/staticpods] Moved new manifest to "/etc/kubernetes/manifests/etcd.yaml" and backed up old manifest to "/etc/kubernetes/tmp/kubeadm-backup-manifests-2020-07-13-16-24-16/etcd.yaml" +[upgrade/staticpods] Waiting for the kubelet to restart the component +[upgrade/staticpods] This might take a minute or longer depending on the component/version gap (timeout 5m0s) +Static pod: etcd-kind-control-plane hash: 171c56cd0e81c0db85e65d70361ceddf +Static pod: etcd-kind-control-plane hash: 171c56cd0e81c0db85e65d70361ceddf +Static pod: etcd-kind-control-plane hash: 59e40b2aab1cd7055e64450b5ee438f0 +[apiclient] Found 1 Pods for label selector component=etcd +[upgrade/staticpods] Component "etcd" upgraded successfully! +[upgrade/etcd] Waiting for etcd to become available +[upgrade/staticpods] Writing new Static Pod manifests to "/etc/kubernetes/tmp/kubeadm-upgraded-manifests999800980" +[upgrade/staticpods] Preparing for "kube-apiserver" upgrade +[upgrade/staticpods] Renewing apiserver certificate +[upgrade/staticpods] Renewing apiserver-kubelet-client certificate +[upgrade/staticpods] Renewing front-proxy-client certificate +[upgrade/staticpods] Renewing apiserver-etcd-client certificate +[upgrade/staticpods] Moved new manifest to "/etc/kubernetes/manifests/kube-apiserver.yaml" and backed up old manifest to "/etc/kubernetes/tmp/kubeadm-backup-manifests-2020-07-13-16-24-16/kube-apiserver.yaml" +[upgrade/staticpods] Waiting for the kubelet to restart the component +[upgrade/staticpods] This might take a minute or longer depending on the component/version gap (timeout 5m0s) +Static pod: kube-apiserver-kind-control-plane hash: b4c8effe84b4a70031f9a49a20c8b003 +Static pod: kube-apiserver-kind-control-plane hash: b4c8effe84b4a70031f9a49a20c8b003 +Static pod: kube-apiserver-kind-control-plane hash: b4c8effe84b4a70031f9a49a20c8b003 +Static pod: kube-apiserver-kind-control-plane hash: b4c8effe84b4a70031f9a49a20c8b003 +Static pod: kube-apiserver-kind-control-plane hash: f717874150ba572f020dcd89db8480fc +[apiclient] Found 1 Pods for label selector component=kube-apiserver +[upgrade/staticpods] Component "kube-apiserver" upgraded successfully! +[upgrade/staticpods] Preparing for "kube-controller-manager" upgrade +[upgrade/staticpods] Renewing controller-manager.conf certificate +[upgrade/staticpods] Moved new manifest to "/etc/kubernetes/manifests/kube-controller-manager.yaml" and backed up old manifest to "/etc/kubernetes/tmp/kubeadm-backup-manifests-2020-07-13-16-24-16/kube-controller-manager.yaml" +[upgrade/staticpods] Waiting for the kubelet to restart the component +[upgrade/staticpods] This might take a minute or longer depending on the component/version gap (timeout 5m0s) +Static pod: kube-controller-manager-kind-control-plane hash: 9ac092f0ca813f648c61c4d5fcbf39f2 +Static pod: kube-controller-manager-kind-control-plane hash: b155b63c70e798b806e64a866e297dd0 +[apiclient] Found 1 Pods for label selector component=kube-controller-manager +[upgrade/staticpods] Component "kube-controller-manager" upgraded successfully! +[upgrade/staticpods] Preparing for "kube-scheduler" upgrade +[upgrade/staticpods] Renewing scheduler.conf certificate +[upgrade/staticpods] Moved new manifest to "/etc/kubernetes/manifests/kube-scheduler.yaml" and backed up old manifest to "/etc/kubernetes/tmp/kubeadm-backup-manifests-2020-07-13-16-24-16/kube-scheduler.yaml" +[upgrade/staticpods] Waiting for the kubelet to restart the component +[upgrade/staticpods] This might take a minute or longer depending on the component/version gap (timeout 5m0s) +Static pod: kube-scheduler-kind-control-plane hash: 7da02f2c78da17af7c2bf1533ecf8c9a +Static pod: kube-scheduler-kind-control-plane hash: 260018ac854dbf1c9fe82493e88aec31 +[apiclient] Found 1 Pods for label selector component=kube-scheduler +[upgrade/staticpods] Component "kube-scheduler" upgraded successfully! +[upload-config] Storing the configuration used in ConfigMap "kubeadm-config" in the "kube-system" Namespace +[kubelet] Creating a ConfigMap "kubelet-config-1.19" in namespace kube-system with the configuration for the kubelets in the cluster +[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml" +[bootstrap-token] configured RBAC rules to allow Node Bootstrap tokens to get nodes +[bootstrap-token] configured RBAC rules to allow Node Bootstrap tokens to post CSRs in order for nodes to get long term certificate credentials +[bootstrap-token] configured RBAC rules to allow the csrapprover controller automatically approve CSRs from a Node Bootstrap Token +[bootstrap-token] configured RBAC rules to allow certificate rotation for all node client certificates in the cluster +W0713 16:26:14.074656 2986 dns.go:282] the CoreDNS Configuration will not be migrated due to unsupported version of CoreDNS. The existing CoreDNS Corefile configuration and deployment has been retained. +[addons] Applied essential addon: CoreDNS +[addons] Applied essential addon: kube-proxy + +[upgrade/successful] SUCCESS! Your cluster was upgraded to "v1.19.0". Enjoy! + +[upgrade/kubelet] Now that your control plane is upgraded, please proceed with upgrading your kubelets if you haven't already done so. +``` + +下面手动升级 `CNI`驱动插件; + +取消对控制面节点的保护: + +```shell +# 将 替换为你的控制面节点名称 +kubectl uncordon +``` + +#### 升级其他控制节点 + +与第一个控制面节点类似,不过使用下面的命令: + +``` +sudo kubeadm upgrade node +``` + +同时,也不需要执行 `sudo kubeadm upgrade plan` + +#### 升级 kubelet 和 kubectl + +```shell +# 用最新的补丁版本替换 1.19.x-00 中的 x +apt-mark unhold kubelet kubectl && \ +apt-get update && apt-get install -y kubelet=1.19.x-00 kubectl=1.19.x-00 && \ +apt-mark hold kubelet kubectl + +# 从 apt-get 的 1.1 版本开始,你也可以使用下面的方法: + +apt-get update && \ +apt-get install -y --allow-change-held-packages kubelet=1.19.x-00 kubectl=1.19.x-00 +``` + +重启 kubelet + +```shell +sudo systemctl daemon-reload +sudo systemctl restart kubelet +``` + +### 升级工作节点 + +工作节点上的升级过程应该一次执行一个节点,或者一次执行几个节点, 以不影响运行工作负载所需的最小容量 + + + +#### 升级 kubeadm + +在所有工作节点升级 kubeadm: + +```shell +# 将 1.19.x-00 中的 x 替换为最新的补丁版本 +apt-mark unhold kubeadm && \ +apt-get update && apt-get install -y kubeadm=1.19.x-00 && \ +apt-mark hold kubeadm + +# 从 apt-get 的 1.1 版本开始,你也可以使用下面的方法: + +apt-get update && \ +apt-get install -y --allow-change-held-packages kubeadm=1.19.x-00 +``` + +#### 腾空节点 + +通过将节点标记为不可调度并逐出工作负载,为维护做好准备。运行: + +```shell +# 将 替换为你正在腾空的节点的名称 +kubectl drain --ignore-daemonsets +``` + +你应该可以看见与下面类似的输出: + +```shell +node/ip-172-31-85-18 cordoned +WARNING: ignoring DaemonSet-managed Pods: kube-system/kube-proxy-dj7d7, kube-system/weave-net-z65qx +node/ip-172-31-85-18 drained +``` + +#### 升级 kubelet 配置 + +升级 kubelet 配置: + +```shell +sudo kubeadm upgrade node +``` + +#### 升级 kubelet 与 kubectl + +在所有工作节点上升级 kubelet 和 kubectl: + +```shell +# 将 1.19.x-00 中的 x 替换为最新的补丁版本 +apt-mark unhold kubelet kubectl && \ +apt-get update && apt-get install -y kubelet=1.19.x-00 kubectl=1.19.x-00 && \ +apt-mark hold kubelet kubectl + +# 从 apt-get 的 1.1 版本开始,你也可以使用下面的方法: + +apt-get update && \ +apt-get install -y --allow-change-held-packages kubelet=1.19.x-00 kubectl=1.19.x-00 +``` + +重启 kubelet + +```shell +sudo systemctl daemon-reload +sudo systemctl restart kubelet +``` + +### 取消对节点的保护 + +通过将节点标记为可调度,让节点重新上线: + +```shell +# 将 替换为当前节点的名称 +kubectl uncordon +``` + + + +### 验证集群的状态 + +在所有节点上升级 kubelet 后,通过从 kubectl 可以访问集群的任何位置运行以下命令,验证所有节点是否再次可用: + +```shell +kubectl get nodes +``` + + + +## 从故障状态恢复 + +如果 `kubeadm upgrade` 失败并且没有回滚,例如由于执行期间意外关闭,你可以再次运行 `kubeadm upgrade`。 此命令是幂等的,并最终确保实际状态是你声明的所需状态。 要从故障状态恢复,你还可以运行 `kubeadm upgrade --force` 而不去更改集群正在运行的版本。 + +在升级期间,kubeadm 向 `/etc/kubernetes/tmp` 目录下的如下备份文件夹写入数据: + +- `kubeadm-backup-etcd--