前提条件
创建分布式容器云平台的注册集群,并将自建Kubernetes集群以内网方式接入注册集群。
自建Kubernetes集群的网络与云上注册集群使用的VPC网络互通。
集群容器网络互通,搭建混合云网络,云上容器网络与云下容器网络互通。
创建节点池
创建流程
需要根据用户的环境和上图的判断条件,最终确定使用提前预装软件包的自定义OS镜像,或是根据实例脚本构建节点部署脚本,完成节点池创建。
部署脚本构建
步骤一:创建节点部署脚本
1. 获取集群信息。
# 获取k8s集群版本号,后续设置到示例脚本环境变量KUBE_VERSION中
kubectl get no $(kubectl get nodes -l node-role.kubernetes.io/control-plane -o json | jq -r '.items[0].metadata.name') -o json | jq -r '.status.nodeInfo.kubeletVersion'
# 输出示例
v1.31.6
# 获取运行时及版本号,后续设置到示例脚本环境变量RUNTIME_VERSION中
kubectl get no $(kubectl get nodes -l node-role.kubernetes.io/control-plane -o json | jq -r '.items[0].metadata.name') -o json | jq -r '.status.nodeInfo.containerRuntimeVersion'
# 输出示例
containerd://1.6.28
# 获取kubeadm添加节点命令。需要设置为永不过期,避免节点池弹性伸缩失效。添加到部署脚本环境变量KUBEADM_JOIN_CMD
kubeadm token create --ttl 0 --print-join-command
# 输出示例
kubeadm join 192.168.XXX:6443 --token v47flx.o9vqap6*** --discovery-token-ca-cert-hash sha256:069f7f63f13c44ba61dcb10b2127f91cc79b732c***
# 获取CRS内网地址,添加到示例脚本环境变量REGISTRY_URL中
kubectl get deploy -nkube-system cceone-cluster-agent -o json|jq -r '.spec.template.spec.containers[0].image'| awk -F/ '{print $1}'
# 输出示例
registry-xxx.crs-internal.ctyun.cn2. 前提条件
脚本需要访问公网以下载对应软件工具。如需指定软件工具版本和地址,请确保地址能访问通且可下载,如containerd、kubelet、kubeadm、kubelet等。
示例脚本仅支持基于yum软件包管理系统的操作系统。
containerd运行时示例
#!/bin/bash
export KUBEADM_JOIN_CMD=<KUBEADM_JOIN_CMD>
export RUNTIME_VERSION=<RUNTIME_VERSION> # 如:1.6.28
export KUBE_VERSION=<KUBE_VERSION> # 如 v1.31.6
export REGISTRY_URL=<REGISTRY_URL> # 各资源池CRS镜像仓库内网地址,如:registry-xxx.crs-internal.ctyun.cn
export CR_URL=<CR_URL> # 127.0.0.1:5000
node_ip=$(hostname -I | awk '{print $1}')
root_dir="/data" # 节点创建过程中,将自动格式化第一块数据盘,并挂载到此路径作为kubelet、容器运行时的数据目录
ARCH=$(uname -m)
case $ARCH in
x86_64)
ARCH_TYPE="amd64"
;;
aarch64|arm64)
ARCH_TYPE="arm64"
;;
*)
ARCH_TYPE="unknown"
;;
esac
# 获取第一块数据盘,用于挂载运行时存储目录,没有则使用系统盘
devices=$(lsblk -d -n -o NAME | grep -v NAME)
# 遍历每个设备
for dev in $devices; do
# 检查是否已挂载
if ! mountpoint -q /dev/$dev; then
# 检查是否已格式化
if ! blkid /dev/$dev > /dev/null 2>&1; then
DATA_DISK="/dev/$dev"
echo $DATA_DISK
break
fi
fi
done
# 挂盘
# 判断是否存在可用磁盘
mkdir -p $root_dir $root_dir/containerd
if [ -n "$DATA_DISK" ]; then
mkfs.xfs -f $DATA_DISK # 格式化磁盘
if ! grep -qF "$DATA_DISK $root_dir xfs defaults 0 1" /etc/fstab;then
echo "$DATA_DISK $root_dir xfs defaults 0 1" >> /etc/fstab # 开机自动挂载
fi
mount -a
df -hT $root_dir | awk 'FNR == 2 {print $2}' # 判断是不是xfs
xfs_info $(df -hT $root_dir | awk 'FNR == 2 {print $NF}') | grep -o "ftype=.*" | sed 's/ftype=//'
fi
# 安装helm
wget https://get.helm.sh/helm-v3.18.6-linux-amd64.tar.gz
tar -xzvf helm-v3.18.6-linux-amd64.tar.gz
mv linux-amd64/helm /usr/bin/helm
# 关闭防火墙
systemctl disbale --now firewalld
systemctl status firewalld
# 禁用SELinux
setenforce 0 # 临时禁用SELinux强制访问控制的命令
sed -i 's#SELINUX=enforcing#SELINUX=disabled#g' /etc/selinux/config
# 关闭swap分区
sed -ri 's/.*swap.*/#&/' /etc/fstab
swapoff -a && sysctl -w vm.swappiness=0
# 时间同步
timedatectl set-ntp true
# 配置ulimit
cat >> /etc/security/limits.conf <<EOF
* soft nofile 655360
* hard nofile 131072
* soft nproc 655350
* hard nproc 655350
* seft memlock unlimited
* hard memlock unlimitedd
EOF
# 若有Containerd服务,先停掉
systemctl stop containerd
systemctl disable containerd
ps -ef|egrep 'docker|containerd|runc|nerdctl'|grep -v 'grep'|awk '{print $2}'|xargs -i kill -9 {}
# 安装containerd
yum install -y yum-utils
yum-config-manager --add-repo https://download.docker.com/linux/centos/docker-ce.repo
sed -i 's+https://download.docker.com+https://mirrors.tuna.tsinghua.edu.cn/docker-ce+' /etc/yum.repos.d/docker-ce.repo
yum install containerd.io-$RUNTIME_VERSION -y
# 配置containerd
cat > /etc/systemd/system/containerd.service <<EOF
[Unit]
Description=containerd container runtime
Documentation=https://containerd.io
After=network.target
[Service]
ExecStartPre=-/sbin/modprobe overlay
ExecStart=/usr/bin/containerd
KillMode=process
Delegate=yes
LimitNOFILE=1048576
# Having non-zero Limit*s causes performance problems due to accounting overhead
# in the kernel. We recommend using cgroups to do container-local accounting.
LimitNPROC=infinity
LimitCORE=infinity
LimitMEMLOCK=infinity
TasksMax=infinity
[Install]
WantedBy=multi-user.target
EOF
# 配置containerd所需模块
cat <<EOF | sudo tee /etc/modules-load.d/ccse.conf
overlay
br_netfilter
EOF
systemctl restart systemd-modules-load.service
lsmod |egrep "overlay|netfilter" #验证
# 配置containerd所需内核
cat <<EOF | sudo tee /etc/sysctl.d/ccse.conf
net.bridge.bridge-nf-call-ip6tables=1
net.bridge.bridge-nf-call-iptables=1
net.ipv4.ip_forward=1
EOF
sed -i "s#net.ipv4.ip_forward=0#net.ipv4.ip_forward=1#g" /etc/sysctl.d/99-sysctl.conf
# 加载内核
sysctl --system
mkdir /etc/containerd
# 配置 /etc/containerd/config.toml
containerd config default | tee /etc/containerd/config.toml
# 修改Containerd的配置文件
sed -i "s#SystemdCgroup\ \=\ false#SystemdCgroup\ \=\ true#g" /etc/containerd/config.toml
cat /etc/containerd/config.toml | grep SystemdCgroup
# 镜像地址根据每个资源池不同,写入对应资源池的镜像地址
sed -i "s#registry.k8s.io#$REGISTRY_URL/library#g" /etc/containerd/config.toml
cat /etc/containerd/config.toml | grep sandbox_image
sed -i "s#config_path\ \=\ \"\"#config_path\ \=\ \"/etc/containerd/certs.d\"#g" /etc/containerd/config.toml
cat /etc/containerd/config.toml | grep certs.d
sed -i "s#root\ \=\ \"/var/lib/containerd\"#root\ \=\ \"$root_dir/containerd\"#g" /etc/containerd/config.toml
# 配置加速器
mkdir /etc/containerd/certs.d/docker.io -pv
cat > /etc/containerd/certs.d/docker.io/hosts.toml << EOF
server = "https://docker.io"
[host."https://registry-1.docker.io"]
capabilities = ["pull", "resolve"]
EOF
mkdir /etc/containerd/certs.d/$CR_URL -pv
cat > /etc/containerd/certs.d/$CR_URL/hosts.toml << EOF
server = "https://$CR_URL}}"
[host."http://$CR_URL"]
capabilities = ["pull", "resolve"]
skip_verify = true
EOF
mkdir /etc/containerd/certs.d/$REGISTRY_URL -pv
cat > /etc/containerd/certs.d/$REGISTRY_URL/hosts.toml << EOF
server = "https://$REGISTRY_URL"
[host."https://$REGISTRY_URL"]
capabilities = ["pull", "resolve"]
skip_verify = true
EOF
# 启动containerd服务
systemctl daemon-reload
systemctl enable --now containerd.service
systemctl start containerd.service
systemctl status containerd.service
# 配置crictl工具
wget https://github.com/kubernetes-sigs/cri-tools/releases/download/v1.28.0/crictl-v1.28.0-linux-amd64.tar.gz
tar xf crictl-v*-linux-amd64.tar.gz -C /usr/bin/
cat > /etc/crictl.yaml <<EOF
runtime-endpoint: unix:///var/run/containerd/containerd.sock
image-endpoint: unix:///var/run/containerd/containerd.sock
timeout: 10
debug: false
EOF
systemctl restart containerd
mkdir /app/ccseone/assets/kubernetes/ -pv
# 安装kubelet
oras pull $REGISTRY_URL/library/kubernetes:$KUBE_VERSION-$ARCH_TYPE --output /app/ccseone/assets/kubernetes/
chmod 755 /app/ccseone/assets/kubernetes/$KUBE_VERSION/kube*
cp /app/ccseone/assets/kubernetes/$KUBE_VERSION/kube* /usr/bin
# kubelet service配置
cat > /etc/systemd/system/kubelet.service << EOF
[Unit]
Description=kubelet: The Kubernetes Node Agent
Documentation=https://kubernetes.io/docs/
[Service]
ExecStart=/usr/bin/kubelet
Restart=always
StartLimitInterval=0
RestartSec=10
[Install]
WantedBy=multi-user.target
EOF
systemctl daemon-reload
systemctl enable --now kubelet.service
systemctl start kubelet.service
systemctl status kubelet.service
# 安装conntrack
yum -y install conntrack
mkdir -p /etc/systemd/system/kubelet.service.d
# 生成kubeadm.conf (kubelet配置文件)
cat > /etc/systemd/system/kubelet.service.d/10-kubeadm.conf << EOF
# Note: This dropin only works with kubeadm and kubelet v1.11+
[Service]
Environment="KUBELET_KUBECONFIG_ARGS=--bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf"
Environment="KUBELET_CONFIG_ARGS=--config=/var/lib/kubelet/config.yaml"
# This is a file that "kubeadm init" and "kubeadm join" generates at runtime, populating the KUBELET_KUBEADM_ARGS variable dynamically
EnvironmentFile=-/var/lib/kubelet/kubeadm-flags.env
# This is a file that the user can use for overrides of the kubelet args as a last resort. Preferably, the user should use
# the .NodeRegistration.KubeletExtraArgs object in the configuration files instead. KUBELET_EXTRA_ARGS should be sourced from this file.
EnvironmentFile=-/etc/sysconfig/kubelet
ExecStart=
ExecStart=/usr/bin/kubelet \$KUBELET_KUBECONFIG_ARGS \$KUBELET_CONFIG_ARGS \$KUBELET_KUBEADM_ARGS \$KUBELET_EXTRA_ARGS --node-ip=$node_ip --root-dir=$root_dir/kubelet
EOF
mkdir $root_dir/kubelet/plugins_registry -pv
kubeadm reset --force
systemctl stop kubelet.service
mkdir -p /var/lib/kubelet
cat > /var/lib/kubelet/config.yaml << EOF
apiVersion: kubelet.config.k8s.io/v1beta1
clusterDNS:
- 10.96.0.10
clusterDomain: cluster.local
cpuManagerPolicy: none
cpuManagerReconcilePeriod: 0s
evictionHard:
imagefs.available: 5%
memory.available: 5%
nodefs.available: 5%
nodefs.inodesFree: 5%
evictionPressureTransitionPeriod: 0s
fileCheckFrequency: 0s
healthzBindAddress: 127.0.0.1
healthzPort: 10248
httpCheckFrequency: 0s
imageGCHighThresholdPercent: 90
imageMinimumGCAge: 0s
kind: KubeletConfiguration
kubeReserved:
cpu: 50m
ephemeral-storage: 1Gi
memory: 897Mi
logging:
flushFrequency: 0
options:
json:
infoBufferSize: "0"
verbosity: 0
memorySwap: {}
nodeStatusReportFrequency: 0s
nodeStatusUpdateFrequency: 0s
resolvConf: /etc/resolv.conf
rotateCertificates: true
runtimeRequestTimeout: 0s
shutdownGracePeriod: 0s
shutdownGracePeriodCriticalPods: 0s
staticPodPath: /etc/kubernetes/manifests
streamingConnectionIdleTimeout: 0s
syncFrequency: 0s
systemReserved:
cpu: 50m
ephemeral-storage: 2Gi
memory: 897Mi
tlsCipherSuites:
- TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256
- TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256
- TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305
- TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384
- TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305
- TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384
- TLS_RSA_WITH_AES_128_GCM_SHA256
- TLS_RSA_WITH_AES_256_GCM_SHA384
volumeStatsAggPeriod: 0s
maxPods: 48
podPidsLimit: 16384
EOF
# 执行kubeadm join
$KUBEADM_JOIN_CMD --v=6 2>&1 | tee /var/log/kubeadm-join.log
docker运行时示例
#!/bin/bash
export KUBEADM_JOIN_CMD=<KUBEADM_JOIN_CMD>
export RUNTIME_VERSION=<RUNTIME_VERSION>
export KUBE_VERSION=<KUBE_VERSION>
export REGISTRY_URL=<REGISTRY_URL>
export CR_URL=<CR_URL>
node_ip=$(hostname -I | awk '{print $1}')
root_dir="/var/lib/container" # 定义kubelet、运行时数据存储目录,请保持与云下一致
# 获取数据盘,用于挂载kubelet、容器运行时数据目录,没有数据盘则默认用系统盘
# 获取所有块设备
devices=$(lsblk -d -n -o NAME | grep -v NAME)
# 遍历每个设备
for dev in $devices; do
# 检查是否已挂载
if ! mountpoint -q /dev/$dev; then
# 检查是否已格式化
if ! blkid /dev/$dev > /dev/null 2>&1; then
DATA_DISK="/dev/$dev"
echo $DATA_DISK
break
fi
fi
done
# 挂载
mkdir -p $root_dir
if [ -n "$DATA_DISK" ]; then
mkfs.xfs -f $DATA_DISK
if ! grep -qF "$DATA_DISK $root_dir xfs defaults 0 1" /etc/fstab;then
echo "$DATA_DISK $root_dir xfs defaults 0 1" >> /etc/fstab # 开机自动挂载
fi
mount -a
df -hT $root_dir | awk 'FNR == 2 {print $2}' # 判断是不是xfs
xfs_info $(df -hT $root_dir | awk 'FNR == 2 {print $NF}') | grep -o "ftype=.*" | sed 's/ftype=//' # 判断是不是1
fi
ARCH=$(uname -m)
case $ARCH in
x86_64)
ARCH_TYPE="amd64"
;;
aarch64|arm64)
ARCH_TYPE="arm64"
;;
*)
ARCH_TYPE="unknown"
;;
esac
# 安装helm
wget -4 https://get.helm.sh/helm-v3.18.6-linux-amd64.tar.gz
tar -xzvf helm-v3.18.6-linux-amd64.tar.gz
mv linux-amd64/helm /usr/bin/helm
# 关闭防火墙
systemctl disbale --now firewalld
systemctl status firewalld
# 禁用SELinux
setenforce 0 临时禁用SELinux强制访问控制的命令
sed -i 's#SELINUX=enforcing#SELINUX=disabled#g' /etc/selinux/config
# 关闭swap分区
sed -ri 's/.*swap.*/#&/' /etc/fstab
swapoff -a && sysctl -w vm.swappiness=0
# 网络配置
sed -ri 's/.*swap.*/#&/' /etc/fstab
swapoff -a && sysctl -w vm.swappiness=0
# 时间同步
timedatectl set-ntp true
# 配置ulimit
cat >> /etc/security/limits.conf <<EOF
* soft nofile 655360
* hard nofile 131072
* soft nproc 655350
* hard nproc 655350
* seft memlock unlimited
* hard memlock unlimitedd
EOF
# 配置内核参数
cat <<EOF | sudo tee /etc/sysctl.d/99-sysctl-k8s.conf
net.bridge.bridge-nf-call-ip6tables=1
net.bridge.bridge-nf-call-iptables=1
net.ipv4.ip_forward=1
EOF
sed -i "s#net.ipv4.ip_forward=0#net.ipv4.ip_forward=1#g" /etc/sysctl.conf
sed -i "s#net.ipv4.ip_forward=0#net.ipv4.ip_forward=1#g" /etc/sysctl.d/99-sysctl.conf
sysctl -p
sysctl --system
# 若有docker服务,先停掉
systemctl stop docker
systemctl disable docker
systemctl stop containerd
systemctl disable containerd
ps -ef|egrep 'docker|containerd|runc|nerdctl'|grep -v 'grep'|awk '{print $2}'|xargs -i kill -9 {}
# 安装docker
if grep -qi "ctyunos" /etc/os-release; then
oras pull $REGISTRY_URL/library/cri-dockerd:0.3.4-$ARCH_TYPE --output /app/ccse/assets/cri/
oras pull $REGISTRY_URL/library/docker:$RUNTIME_VERSION-$ARCH_TYPE --output /app/ccse/assets/cri/
oras pull $REGISTRY_URL/library/containerd:1.6.23-$ARCH_TYPE --output /app/ccse/assets/cri/
oras pull $REGISTRY_URL/library/runc:1.1.12-$ARCH_TYPE --output /app/ccse/assets/cri/containerd-$CONTAINERD_RUNTIME_VERSION/
cp /app/ccse/assets/cri/docker-$RUNTIME_VERSION/* /usr/bin
cp /app/ccse/assets/cri/containerd-1.6.23/* /usr/bin
cp /app/ccse/assets/cri/cri-dockerd-0.3.4/* /usr/bin
elif grep -qi "centos" /etc/os-release; then
yum install -y yum-utils
yum-config-manager --add-repo https://download.docker.com/linux/centos/docker-ce.repo
sed -i 's+https://download.docker.com+https://mirrors.tuna.tsinghua.edu.cn/docker-ce+' /etc/yum.repos.d/docker-ce.repo
yum install -y docker-ce-$RUNTIME_VERSION docker-ce-cli-$RUNTIME_VERSION containerd.io
else
echo "unknown"
fi
# 配置docker service
cat > /etc/systemd/system/docker.service << EOF
[Unit]
Description=Docker Application Container Engine
Documentation=http://docs.docker.com
After=network.target docker.socket containerd.service
BindsTo=containerd.service
Wants=docker.socket
[Service]
Type=notify
Environment=GOTRACEBACK=crash
ExecReload=/bin/kill -s HUP $MAINPID
Delegate=yes
KillMode=process
ExecStart=/usr/bin/dockerd \
$DOCKER_OPTS \
$DOCKER_STORAGE_OPTIONS \
$DOCKER_NETWORK_OPTIONS \
$DOCKER_DNS_OPTIONS \
$INSECURE_REGISTRY
TasksMax=infinity
LimitNOFILE=1048576
LimitNPROC=1048576
LimitCORE=infinity
TimeoutStartSec=1min
# restart the docker process if it exits prematurely
Restart=on-failure
StartLimitBurst=3
StartLimitInterval=60s
[Install]
WantedBy=multi-user.target
EOF
cat > /etc/systemd/system/containerd.service <<EOF
[Unit]
Description=containerd container runtime
Documentation=https://containerd.io
After=network.target
[Service]
ExecStartPre=-/sbin/modprobe overlay
ExecStart=/usr/bin/containerd
KillMode=process
Delegate=yes
LimitNOFILE=1048576
# Having non-zero Limit*s causes performance problems due to accounting overhead
# in the kernel. We recommend using cgroups to do container-local accounting.
LimitNPROC=infinity
LimitCORE=infinity
LimitMEMLOCK=infinity
TasksMax=infinity
[Install]
WantedBy=multi-user.target
EOF
mkdir -p /etc/docker
chmod 0755 /etc/docker
cat > /etc/docker/daemon.json << EOF
{
"insecure-registries": [ "$CR_URL" ],
"storage-driver": "overlay2",
"data-root": "$root_dir/docker",
"log-driver": "json-file",
"log-opts": {
"max-size": "1g"
},
"exec-opts": ["native.cgroupdriver=systemd"],
"bridge": "none"
}
EOF
# 启动docker服务
systemctl daemon-reload
systemctl enable --now containerd.service
systemctl start containerd.service
systemctl status containerd.servic
systemctl daemon-reload
systemctl enable docker
systemctl restart docker
cat > /etc/systemd/system/cri-docker.socket << EOF
[Unit]
Description=CRI Docker Socket for the API
PartOf=cri-docker.service
[Socket]
ListenStream=%t/cri-dockerd.sock
SocketMode=0660
SocketUser=root
SocketGroup=root
[Install]
WantedBy=sockets.target
EOF
# 配置并启动cri-docker服务
cat > /etc/systemd/system/cri-docker.service << EOF
[Unit]
Description=CRI Interface for Docker Application Container Engine
Documentation=https://docs.mirantis.com
After=network-online.target firewalld.service docker.service
Wants=network-online.target
Requires=cri-docker.socket
[Service]
Type=notify
ExecStart=/usr/bin/cri-dockerd --ipv6-dual-stack --log-level debug --pod-infra-container-image $REGISTRY_URL/library/pause:3.10 --container-runtime-endpoint fd://
#ExecStart=/usr/bin/cri-dockerd --container-runtime-endpoint fd://
ExecReload=/bin/kill -s HUP $MAINPID
TimeoutSec=0
RestartSec=2
Restart=always
StartLimitInterval=0s
# Note that StartLimit* options were moved from "Service" to "Unit" in systemd 229.
# Both the old, and new location are accepted by systemd 229 and up, so using the old location
# to make them work for either version of systemd.
#StartLimitBurst=3
# Note that StartLimitInterval was renamed to StartLimitIntervalSec in systemd 230.
# Both the old, and new name are accepted by systemd 230 and up, so using the old name to make
# this option work for either version of systemd.
#StartLimitInterval=60s
# Having non-zero Limit*s causes performance problems due to accounting overhead
# in the kernel. We recommend using cgroups to do container-local accounting.
LimitNOFILE=infinity
LimitNPROC=infinity
LimitCORE=infinity
# Comment TasksMax if your systemd version does not support it.
# Only systemd 226 and above support this option.
TasksMax=infinity
Delegate=yes
KillMode=process
[Install]
WantedBy=multi-user.target
EOF
systemctl daemon-reload
systemctl enable cri-docker
systemctl restart cri-docker
# 安装kubelet
oras pull $REGISTRY_URL/library/kubelet:v$KUBE_VERSION-$ARCH_TYPE --output /app/ccse/assets/kubernetes/v$KUBE_VERSION/
oras pull $REGISTRY_URL/library/kubernetes:v$KUBE_VERSION-$ARCH_TYPE --output /app/ccse/assets/kubernetes/
chmod 755 /app/ccse/assets/kubernetes/v$KUBE_VERSION/kube*
cp /app/ccse/assets/kubernetes/v$KUBE_VERSION/kube* /usr/bin
mkdir -p /etc/systemd/system/kubelet.service.d
mkdir /etc/kubernetes/patches -pv
mkdir -p $root_dir/kubelet/plugins_registry
mkdir -p /etc/kubernetes/manifests
cat > /etc/systemd/system/kubelet.service.d/10-kubeadm.conf << EOF
# Note: This dropin only works with kubeadm and kubelet v1.11+
[Service]
Environment="KUBELET_KUBECONFIG_ARGS=--bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf"
Environment="KUBELET_CONFIG_ARGS=--config=/var/lib/kubelet/config.yaml"
# This is a file that "kubeadm init" and "kubeadm join" generates at runtime, populating the KUBELET_KUBEADM_ARGS variable dynamically
EnvironmentFile=-/var/lib/kubelet/kubeadm-flags.env
# This is a file that the user can use for overrides of the kubelet args as a last resort. Preferably, the user should use
# the .NodeRegistration.KubeletExtraArgs object in the configuration files instead. KUBELET_EXTRA_ARGS should be sourced from this file.
EnvironmentFile=-/etc/sysconfig/kubelet
ExecStart=
ExecStart=/usr/bin/kubelet \$KUBELET_KUBECONFIG_ARGS \$KUBELET_CONFIG_ARGS \$KUBELET_KUBEADM_ARGS \$KUBELET_EXTRA_ARGS --node-ip=$node_ip --root-dir=$root_dir/kubelet
EOF
# 配置kubelet service
cat > /etc/systemd/system/kubelet.service << EOF
[Unit]
Description=kubelet: The Kubernetes Node Agent
Documentation=https://kubernetes.io/docs/
[Service]
ExecStart=/usr/bin/kubelet
Restart=always
StartLimitInterval=0
RestartSec=10
[Install]
WantedBy=multi-user.target
EOF
systemctl daemon-reload
systemctl enable --now kubelet.service # 设置开机自启动
systemctl start kubelet.service
systemctl status kubelet.service
# 安装conntrack
yum -y install conntrack
kubeadm reset --force --cri-socket unix://var/run/cri-dockerd.sock
systemctl stop kubelet.service
# 提取控制平面地址
API_ADDRESS=$(echo "$KUBEADM_JOIN_CMD" | grep -oP '\d+\.\d+\.\d+\.\d+:\d+')
# 提取Token
TOKEN=$(echo "$KUBEADM_JOIN_CMD" | grep -oP -- '--token \K[^\s]+')
cat > /etc/kubernetes/kubeadm.yaml << EOF
---
apiVersion: kubeadm.k8s.io/v1beta3
caCertPath: /etc/kubernetes/pki/ca.crt
discovery:
bootstrapToken:
apiServerEndpoint: "$API_ADDRESS"
token: "$TOKEN"
unsafeSkipCAVerification: true
timeout: 5m0s
tlsBootstrapToken: "$TOKEN"
kind: JoinConfiguration
nodeRegistration:
criSocket: unix:///var/run/cri-dockerd.sock
imagePullPolicy: IfNotPresent
name: "$HOSTNAME"
patches:
directory: /etc/kubernetes/patches
EOF
# 加入集群
kubeadm join --config /etc/kubernetes/kubeadm.yaml --v=6 2>&1 | tee /var/log/kubeadm-join.log步骤二:创建节点池
用户登录分布式容器云平台,在左侧导航栏,选择集群资源-> 集群管理;
在集群列表,单击目标集群,进入集群页面;
在左侧导航栏,选择节点管理 -> 节点池,选择创建节点池;
在创建节点池时,选择对应云上规格机器和相关配置,并添加上述的节点部署脚本,管理节点的标签、污点等,完成节点池创建。
自定义OS镜像构建
为缩短云上节点从创建到就绪状态的时间,可以通过预先安装所需软件包自定义OS镜像的方式,减少软件下载时间,提升效率。
本参考以Centos7.9操作系统为例,接入1.31.6版本的Kubernetes集群,制作自定义OS镜像,完成节点池创建。
步骤一:配置基础环境
1. 创建ECS云主机
登录计算-弹性云主机控制台。
选择对应资源池,然后 点击创建云主机,配置相关内容。
等待云主机实例创建完成之后,登录节点。
2. 安装工具包,配置基础环境
yum -y install yum-utils,conntrack
# 安装helm
wget https://get.helm.sh/helm-v3.18.6-linux-amd64.tar.gz
tar -xzvf helm-v3.18.6-linux-amd64.tar.gz
mv linux-amd64/helm /usr/bin/helm
# 关闭防火墙
systemctl disbale --now firewalld
systemctl status firewalld
# 禁用SELinux
setenforce 0 # 临时禁用SELinux强制访问控制的命令
sed -i 's#SELINUX=enforcing#SELINUX=disabled#g' /etc/selinux/config
# 关闭swap分区
sed -ri 's/.*swap.*/#&/' /etc/fstab
swapoff -a && sysctl -w vm.swappiness=0
# 时间同步
timedatectl set-ntp true
# 配置ulimit
cat >> /etc/security/limits.conf <<EOF
* soft nofile 655360
* hard nofile 131072
* soft nproc 655350
* hard nproc 655350
* seft memlock unlimited
* hard memlock unlimitedd
EOF3. 容器运行时配置
下载containerd
配置系统模块
配置内核参数
运行时配置文件
镜像仓库配置
containerd service配置
crictl命令配置
# 下载containerd
yum-config-manager --add-repo https://download.docker.com/linux/centos/docker-ce.repo
sed -i 's+https://download.docker.com+https://mirrors.tuna.tsinghua.edu.cn/docker-ce+' /etc/yum.repos.d/docker-ce.repo
yum install containerd.io-$RUNTIME_VERSION -y
# 配置containerd service
cat > /etc/systemd/system/containerd.service <<EOF
[Unit]
Description=containerd container runtime
Documentation=https://containerd.io
After=network.target
[Service]
ExecStartPre=-/sbin/modprobe overlay
ExecStart=/usr/bin/containerd
KillMode=process
Delegate=yes
LimitNOFILE=1048576
# Having non-zero Limit*s causes performance problems due to accounting overhead
# in the kernel. We recommend using cgroups to do container-local accounting.
LimitNPROC=infinity
LimitCORE=infinity
LimitMEMLOCK=infinity
TasksMax=infinity
[Install]
WantedBy=multi-user.target
EOF
# 配置containerd所需系统模块
cat <<EOF | sudo tee /etc/modules-load.d/ccse.conf
overlay
br_netfilter
EOF
systemctl restart systemd-modules-load.service
lsmod |egrep "overlay|netfilter" #验证
# 配置containerd所需内核
cat <<EOF | sudo tee /etc/sysctl.d/ccse.conf
net.bridge.bridge-nf-call-ip6tables=1
net.bridge.bridge-nf-call-iptables=1
net.ipv4.ip_forward=1
EOF
sed -i "s#net.ipv4.ip_forward=0#net.ipv4.ip_forward=1#g" /etc/sysctl.d/99-sysctl.conf
# 加载内核
sysctl --system4. kubelet配置
下载kubelet
配置kubelet service
# 获取二进制文件,登录Master节点,将二进制文件拷贝到该节点
scp /usr/bin/kube{let,adm,ctl} $NODEIP:/usr/bin
# kubelet service配置
cat > /etc/systemd/system/kubelet.service << EOF
[Unit]
Description=kubelet: The Kubernetes Node Agent
Documentation=https://kubernetes.io/docs/
[Service]
ExecStart=/usr/bin/kubelet
Restart=always
StartLimitInterval=0
RestartSec=10
[Install]
WantedBy=multi-user.target
EOF
# 启动kubelet
systemctl daemon-reload
systemctl enable --now kubelet.service
systemctl start kubelet.service
systemctl status kubelet.service步骤二:导出自定义镜像
登录计算控制台。
在左侧导航栏,选择弹性云主机 > 选择云主机实例,点击进入实例详情,关机云主机。
退出云主机实例详情页面,在目标实例右侧,选择 更多 > 制作镜像。
在创建私有镜像页面,填入私有镜像信息,完成创建。
计算控制台,左侧导航栏,选择 镜像服务 > 私有镜像可以查看,状态为正常。
步骤三:修改节点部署脚本
由于私有镜像已包含所需的工具包,所以节点部署脚本只需接收加入集群的相关信息即可。
#!/bin/bash
set -x
export KUBEADM_JOIN_CMD="kubeadm join 172.31.XXX:6443 --token utz5te.fxphv4vijih*** --discovery-token-ca-cert-hash sha256:57dab39b29f8d5a2e0cc0e5c9b26425ed1b***"
export KUBE_VERSION="v1.31.9-aliyun.1"
export RUNTIME_VERSION="2.1.3"
export REGISTRY_URL="registry-huabei2.crs-internal.ctyun.cn"
export CR_URL="127.0.0.1:5000"
node_ip=$(hostname -I | awk '{print $1}')
# 规划数据盘挂载路径,用于容器运行时和kubelet根目录存储数据
root_dir="/data"
devices=$(lsblk -d -n -o NAME | grep -v NAME)
for dev in $devices; do
if ! mountpoint -q /dev/$dev; then
if ! blkid /dev/$dev > /dev/null 2>&1; then
DATA_DISK="/dev/$dev"
echo $DATA_DISK
break
fi
fi
done
# 格式化数据盘,挂盘
mkfs.xfs -f $DATA_DISK
mkdir -p $root_dir/container $root_dir/kubelet $root_dir/containerd
if ! grep -qF "$DATA_DISK $root_dir/container xfs defaults 0 1" /etc/fstab;then
echo "$DATA_DISK $root_dir/container xfs defaults 0 1" >> /etc/fstab
fi
mount -a
mkdir -p $root_dir/container/containerd $root_dir/container/kubelet
if ! grep -qF "$root_dir/container/kubelet $root_dir/kubelet none defaults,bind,slave,shared 0 0" /etc/fstab;then
echo "$root_dir/container/kubelet $root_dir/kubelet none defaults,bind,slave,shared 0 0" >> /etc/fstab
fi
if ! grep -qF "$root_dir/container/containerd $root_dir/containerd none defaults,bind 0 0" /etc/fstab;then
echo "$root_dir/container/containerd $root_dir/containerd none defaults,bind 0 0" >> /etc/fstab
fi
mount -a
df -hT $/root_dir/container | awk 'FNR == 2 {print $2}'
xfs_info $(df -hT $root_dir/container | awk 'FNR == 2 {print $NF}') | grep -o "ftype=.*" | sed 's/ftype=//'
systemctl stop containerd
systemctl disable containerd
ps -ef|egrep 'docker|containerd|runc|nerdctl'|grep -v 'grep'|awk '{print $2}'|xargs -i kill -9 {}
mkdir /etc/containerd
containerd config default | tee /etc/containerd/config.toml
sed -i "s#SystemdCgroup\ \=\ false#SystemdCgroup\ \=\ true#g" /etc/containerd/config.toml
cat /etc/containerd/config.toml | grep SystemdCgroup
sed -i "s#registry.k8s.io#$REGISTRY_URL/library#g" /etc/containerd/config.toml
cat /etc/containerd/config.toml | grep sandbox_image
sed -i "s#config_path\ \=\ \"\"#config_path\ \=\ \"/etc/containerd/certs.d\"#g" /etc/containerd/config.toml
cat /etc/containerd/config.toml | grep certs.d
sed -i "s#root\ \=\ \"/var/lib/containerd\"#root\ \=\ \"$root_dir/containerd\"#g" /etc/containerd/config.toml
mkdir /etc/containerd/certs.d/docker.io -pv
cat > /etc/containerd/certs.d/docker.io/hosts.toml << EOF
server = "https://docker.io"
[host."https://registry-1.docker.io"]
capabilities = ["pull", "resolve"]
EOF
mkdir /etc/containerd/certs.d/$CR_URL -pv
cat > /etc/containerd/certs.d/$CR_URL/hosts.toml << EOF
server = "https://$CR_URL}}"
[host."http://$CR_URL"]
capabilities = ["pull", "resolve"]
skip_verify = true
EOF
mkdir /etc/containerd/certs.d/$REGISTRY_URL -pv
cat > /etc/containerd/certs.d/$REGISTRY_URL/hosts.toml << EOF
server = "https://$REGISTRY_URL"
[host."https://$REGISTRY_URL"]
capabilities = ["pull", "resolve"]
skip_verify = true
EOF
systemctl daemon-reload
systemctl enable --now containerd.service
systemctl start containerd.service
systemctl status containerd.service
cat > /etc/crictl.yaml <<EOF
runtime-endpoint: unix:///var/run/containerd/containerd.sock
image-endpoint: unix:///var/run/containerd/containerd.sock
timeout: 10
debug: false
EOF
systemctl restart containerd
cat > /etc/systemd/system/kubelet.service << EOF
[Unit]
Description=kubelet: The Kubernetes Node Agent
Documentation=https://kubernetes.io/docs/
[Service]
ExecStart=/usr/bin/kubelet
Restart=always
StartLimitInterval=0
RestartSec=10
[Install]
WantedBy=multi-user.target
EOF
systemctl daemon-reload
systemctl enable --now kubelet.service
systemctl start kubelet.service
systemctl status kubelet.service
mkdir -p /etc/systemd/system/kubelet.service.d
cat > /etc/systemd/system/kubelet.service.d/10-kubeadm.conf << EOF
# Note: This dropin only works with kubeadm and kubelet v1.11+
[Service]
Environment="KUBELET_KUBECONFIG_ARGS=--bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf"
Environment="KUBELET_CONFIG_ARGS=--config=/var/lib/kubelet/config.yaml"
Environment="KUBELET_EXTRA_ARGS=--client-ca-file=/etc/kubernetes/pki/ca.crt --register-with-taints=ctyun:NoSchedule --node-labels=k8s.aliyun.com/ignore-by-terway=true"
# This is a file that "kubeadm init" and "kubeadm join" generates at runtime, populating the KUBELET_KUBEADM_ARGS variable dynamically
EnvironmentFile=-/var/lib/kubelet/kubeadm-flags.env
# This is a file that the user can use for overrides of the kubelet args as a last resort. Preferably, the user should use
# the .NodeRegistration.KubeletExtraArgs object in the configuration files instead. KUBELET_EXTRA_ARGS should be sourced from this file.
EnvironmentFile=-/etc/sysconfig/kubelet
ExecStart=
ExecStart=/usr/bin/kubelet \$KUBELET_KUBECONFIG_ARGS \$KUBELET_CONFIG_ARGS \$KUBELET_KUBEADM_ARGS \$KUBELET_EXTRA_ARGS --node-ip=$node_ip --root-dir=$root_dir/kubelet
EOF
mkdir $root_dir/kubelet/plugins_registry -pv
kubeadm reset --force # clear
systemctl stop kubelet.service
cat > /var/lib/kubelet/config.yaml << EOF
apiVersion: kubelet.config.k8s.io/v1beta1
clusterDNS:
- 10.1.0.10
clusterDomain: cluster.local
cgroupDriver: systemd
cpuManagerPolicy: none
cpuManagerReconcilePeriod: 0s
evictionHard:
imagefs.available: 5%
memory.available: 5%
nodefs.available: 5%
nodefs.inodesFree: 5%
evictionPressureTransitionPeriod: 0s
fileCheckFrequency: 0s
healthzBindAddress: 127.0.0.1
healthzPort: 10248
httpCheckFrequency: 0s
imageGCHighThresholdPercent: 90
imageMinimumGCAge: 0s
kind: KubeletConfiguration
kubeReserved:
cpu: 50m
ephemeral-storage: 1Gi
memory: 897Mi
logging:
flushFrequency: 0
options:
json:
infoBufferSize: "0"
verbosity: 0
memorySwap: {}
nodeStatusReportFrequency: 0s
nodeStatusUpdateFrequency: 0s
resolvConf: /etc/resolv.conf
rotateCertificates: true
runtimeRequestTimeout: 0s
shutdownGracePeriod: 0s
shutdownGracePeriodCriticalPods: 0s
staticPodPath: /etc/kubernetes/manifests
streamingConnectionIdleTimeout: 0s
syncFrequency: 0s
systemReserved:
cpu: 50m
ephemeral-storage: 2Gi
memory: 897Mi
tlsCipherSuites:
- TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256
- TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256
- TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305
- TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384
- TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305
- TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384
- TLS_RSA_WITH_AES_128_GCM_SHA256
- TLS_RSA_WITH_AES_256_GCM_SHA384
volumeStatsAggPeriod: 0s
maxPods: 48
podPidsLimit: 16384
EOF
$KUBEADM_JOIN_CMD --v=6 2<&1 | tee /etc/kubernetes/kubeadm-join.log步骤四:使用自定义镜像创建节点池
登录分布式容器云平台控制台,在左侧导航栏选择集群资源 > 集群管理。
在集群列表页面,单击目标集群,在左侧导航栏选择 节点 > 节点池。
单击 创建节点池,在创建页面填写信息时,节点配置处操作系统选择上述导出的私有镜像;高级配置处填写节点部署脚本,完成创建。
查看节点池
点击节点池名称,查看以下内容:
基本信息:展示节点池信息、节点配置信息;
节点管理:展示节点池内所有节点,支持对节点进行退订、移除操作;
伸缩活动:节点池内扩容、缩容操作及其活动状态,当扩缩容失败时,可以查看失败原因;
编辑节点池
节点池创建完成后,部分节点池配置支持修改,包括节点池名称、计费模式、存储配置、网络配置-子网配置、高级配置(部署前执行脚本、节点标签、注解、污点、资源标签、节点不可调度、描述等);
节点池更新后,不会修改节点池内已有节点的配置,仅作用于新扩容节点。特殊场景除外(存量节点标签、注解、污点同步)。
删除节点池
在删除节点池前,须确保节点已全部缩容,节点池内无节点,方可删除节点池。
节点池扩容
在用户完成节点池创建之后,进行节点池扩容,节点池状态变为扩容中,等待扩容完成之后,即可在集群中和节点列表中看到对应节点,节点池状态恢复为已激活。
节点池扩容包括两个步骤:
弹出ECS实例:根据配置的期望实例数和节点池配置执行扩容,节点池显示扩容中。
将ECS实例加入到集群:在ECS实例创建成功之后,自动执行节点部署脚本,对节点进行部署,将节点加入到集群中。节点成功加入到集群后,节点池显示已激活,伸缩活动显示成功。
扩容出来的节点状态说明如下所示:
| 状态 | 说明 |
|---|---|
| 正常 | 节点在集群中运行正常 |
| 异常 | 节点在集群中运行异常 |
| 创建中 | 节点实例正在创建 |
| 创建失败 | ECS节点实例创建失败 |
| 仅部署成功 | 仅执行节点部署脚本成功,节点未成功加入集群,请检查节点部署脚本 |
| 部署失败 | 执行节点部署脚本失败 |
| 驱逐中 | 正在驱逐节点上的Pod到其它节点 |
| 已封锁 | 节点已被封锁,不允许新Pod调度到该节点 |
| 删除中 | 节点正在被删除 |
| 删除失败 | 节点删除失败 |
| 移除中 | 节点正在从集群中移除 |
| 移除失败 | 节点移除失败 |
节点池缩容
节点池缩容分为两种操作:
退订:节点从集群中移除,同时会退订节点实例,请事先备份重要数据。
移除:节点从集群中移除,但不会删除节点实例,节点实例会重装系统,移除前请事先备份重要数据。