容器监控-天翼云开发者社区

部署监控

`StatefulsetReplicaNotReady`

statefluset出现不可用的副本，使用场景如下

alertmanager这个statelfulset未就绪的副本数

 kube_statefulset_status_replicas{statefulset=~".*alertmanager.*",namespace="sxx"}-kube_statefulset_status_replicas_ready>0

`DeploymentReplicasUnavaliable`

每个deployment中不可用副本，使用场景如下

对于event-exporter这个deployment不可用副本数量

kube_deployment_status_replicas_unavailable{deployment="event-exporter",namespace="monitoring"}>0

`DeamonsetStatusNumberUnavailable`

daemonset未就绪的节点数量，使用场景如下

kube-proxy未准备就绪的节点的数量

kube_daemonset_status_number_unavailable{daemonset="kube-proxy",namespace="kube-system"}>0

状态监控

`PodStatusNoRunning`

Pod的状态为未运行,使用场景如下

在gostack这个namespace下出现pod状态为未运行

  sum (kube_pod_status_phase{phase!="Running",namespace="gostack"}) by (pod,phase) >0

在gostack这个namespace下web 这个pod状态为未运行

  sum (kube_pod_status_phase{phase!="Running",namespace="gostack",pod=~"web.*"}) by (pod,phase) >0

`PodRestart`

pod出现重启，使用场景如下

在gostack这个namespace下出现pod重启

 sum (increase (kube_pod_container_status_restarts_total{namespace="gostack"}[2m])) by (namespace,pod) >0

在 gostack这个namespace下xxxx-nginx出现重启

 sum (increase (kube_pod_container_status_restarts_total{namespace="gostack",pod=~".*nginx.*"}[2m])) by (namespace,pod) >0

`JobFailed`

Job执行失败，使用场景如下

查询 vm-az1 这个namespace下执行失败的job

kube_job_status_failed{namespace="vm-az1"}>0

cpu使用量(毫核)

1 core=1000 m core,假设我们为应用程序分配了 0.4 CPU 的 CPU 限制。这意味着应用程序每 100 毫秒周期获得 40 毫秒的运行时间

`PodCpuUsage`

pod的cpu使用量,使用场景如下

查询coredns的cpu使用量

 round(sum(irate(container_cpu_usage_seconds_total{job="kubelet", pod=~"coredns.*", image!=""}[5m])) by(pod),0.001)*1000

cpu监控

`CpuUsage`

pod的cpu使用率,需要设置cpu limit，使用场景如下

在 monitoring这个namespace下node-exporter的cpu使用率高于75%

    100*sum(irate(container_cpu_usage_seconds_total{container="node-exporter",pod=~".*node-exporter.*",namespace="monitoring"}[3m])) by(pod,id,namespace,container,image)
    /
    sum(container_spec_cpu_quota/container_spec_cpu_period) by(pod,id,namespace,container,image)>75

内存监控

`PodMemUsage`

pod的内存使用量,使用场景如下:

查询coredns的内存使用量

 sum(container_memory_working_set_bytes{pod=~"coredns.*",image!="",job="kubelet"}) by (pod)

`PodMemUsage`

pod的内存使用率,需要设置memory limit,使用场景如下

内存使用量除以内存限制量，就是使用率，但是后面跟了and container_specmemory_limit_bytes!=0 是因为有些容器没有配置limit的内存大小

建议使用container_memory_working_set_bytes，而不用container_memory_usage_bytes,container_memory_usage_bytes包含了cache，如filesystem cache,当出现mem pressure时可以被回收

在kube-system这个namespace下容器flannel的内存使用率高于75%

 (100*container_memory_working_set_bytes{container="kube-flannel",namespace="kube-system"} / container_spec_memory_limit_bytes{container="kube-flannel",namespace="kube-system"} and container_spec_memory_limit_bytes{container="kube-flannel",namespace="kube-system"}!=0)>75

网络监控

PodNetReceiveBytes

pod网络上行流量,使用场景如下

查询apiserver的网络上行流量

sum(irate(container_network_receive_bytes_total{pod=~"kube-apiserver.*"}[5m]))

PodNetTransmitBytes

pod网络下行流量,使用场景如下

查询apiserver的网络下行流量

sum(irate(container_network_transmit_bytes_total{pod=~"kube-apiserver.*"}[5m]))

磁盘监控

PodReadBytes

Pod的硬盘读IO，使用场景如下

查询etcd的读IO

 sum(irate(container_fs_reads_bytes_total{pod=~"etcd.*"}[5m]))

PodWriteBytes

Pod的硬盘写IO，使用场景如下

查询etcd的写IO

 sum(irate(container_fs_writes_bytes_total{pod=~"etcd.*"}[5m]))

100*sum(irate(container_cpu_usage_seconds_total{container="node-exporter",pod=~".*node-exporter.*",namespace="monitoring"}[3m])) by(pod,id,namespace,container,image) / sum(container_spec_cpu_quota/container_spec_cpu_period) by(pod,id,namespace,container,image)>75

(100*container_memory_working_set_bytes{container="kube-flannel",namespace="kube-system"} / container_spec_memory_limit_bytes{container="kube-flannel",namespace="kube-system"} and container_spec_memory_limit_bytes{container="kube-flannel",namespace="kube-system"}!=0)>75

活动

智算服务

应用商城

合作伙伴

开发者

支持与服务

了解天翼云

容器监控

部署监控

StatefulsetReplicaNotReady

DeploymentReplicasUnavaliable

DeamonsetStatusNumberUnavailable

状态监控

PodStatusNoRunning

PodRestart

JobFailed

cpu使用量(毫核)

PodCpuUsage

cpu监控

CpuUsage

内存监控

PodMemUsage

PodMemUsage

网络监控

PodNetReceiveBytes

PodNetTransmitBytes

磁盘监控

PodReadBytes

PodWriteBytes

容器监控

部署监控

StatefulsetReplicaNotReady

DeploymentReplicasUnavaliable

DeamonsetStatusNumberUnavailable

状态监控

PodStatusNoRunning

PodRestart

JobFailed

cpu使用量(毫核)

PodCpuUsage

cpu监控

CpuUsage

内存监控

PodMemUsage

PodMemUsage

网络监控

PodNetReceiveBytes

PodNetTransmitBytes

磁盘监控

PodReadBytes

PodWriteBytes

`StatefulsetReplicaNotReady`

`DeploymentReplicasUnavaliable`

`DeamonsetStatusNumberUnavailable`

`PodStatusNoRunning`

`PodRestart`

`JobFailed`

`PodCpuUsage`

`CpuUsage`

`PodMemUsage`

`PodMemUsage`

`StatefulsetReplicaNotReady`

`DeploymentReplicasUnavaliable`

`DeamonsetStatusNumberUnavailable`

`PodStatusNoRunning`

`PodRestart`

`JobFailed`

`PodCpuUsage`

`CpuUsage`

`PodMemUsage`

`PodMemUsage`