ceph rbd 限速功能测试
测试背景
大概在上周,有个用户在上百台虚拟机上批量下发了全盘扫描的指令。但是后端ceph使用的是SATA HDD,并且集群规模不大。当时集群就出现了慢请求,并且影响到了同集群其他虚拟机的数据读写。因此在扩容SSD存储池,迁移业务之前。需要对虚拟机后端使用的RBD volume配置qos进行限速。
测试配置
因为没有物理机环境,只能使用三台centos 7的虚拟机搭建一个ceph集群,进行功能性测试。
- 操作系统:CentOS Linux release 7.9.2009 (Core)
- 内核:5.4.248-1.el7.elrepo.x86_64
- CPU / 内存:2C / 4G
- 硬盘:100G
- ceph:14.2.22 nautilus (stable)
测试流程
- 创建测试rbd image,使用rbd-nbd(用户态)挂载
- 使用dd测试无限速情况下iops
- 开启 image qos 限速 测试iops
- 开启 pool qos 限速 测试iops
- 删除qos再次进行测试,验证已经恢复
测试步骤
(关于ceph部署的步骤,等下次再开坑吧)
- 创建测试rbd image,使用rbd-nbd(用户态)挂载
# 当前rbd 使用的是rbd pool
[root@ceph1 ~]# ceph df
RAW STORAGE:
CLASS SIZE AVAIL USED RAW USED %RAW USED
hdd 300 GiB 294 GiB 1.2 GiB 6.2 GiB 1.24
TOTAL 300 GiB 294 GiB 1.2 GiB 6.2 GiB 1.24
POOLS:
POOL ID PGS STORED OBJECTS USED %USED MAX AVAIL
rbd 12 8 324 MiB 101 976 MiB 0.20 92 GiB
cephfs_metadata 13 16 20 KiB 22 1.5 MiB 0 92 GiB
cephfs_data 14 16 0 B 0 0 B 0 92 GiB
# 创建测试用rbd
[root@ceph1 ~]# rbd create img --size 10G
[root@ceph1 ~]# rbd ls -p rbd
img
# 如果没有安装rbd-nbd的话,先执行一下 yum install rbd-nbd ,安装好nbd模块
[root@ceph1 ~]# rbd-nbd map rbd/img
/dev/nbd0
[root@ceph1 ~]# lsblk /dev/nbd0
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
nbd0 43:0 0 10G 0 disk
- 使用dd测试无限速情况下iops
# 检查qos配置
[root@cep1 ~]# rbd config image list img| grep qos
rbd_qos_bps_burst 0 config
rbd_qos_bps_limit 0 config
rbd_qos_iops_burst 0 config
rbd_qos_iops_limit 0 config
rbd_qos_read_bps_burst 0 config
rbd_qos_read_bps_limit 0 config
rbd_qos_read_iops_burst 0 config
rbd_qos_read_iops_limit 0 config
rbd_qos_schedule_tick_min 50 config
rbd_qos_write_bps_burst 0 config
rbd_qos_write_bps_limit 0 config
rbd_qos_write_iops_burst 0 config
rbd_qos_write_iops_limit 0 config
# 无限速情况下测试512 byte直接写裸盘,时间14s
[root@ceph1 ~]# time dd if=/dev/urandom of=/dev/nbd0 count=4096 bs=512 oflag=direct,nonblock
4096+0 records in
4096+0 records out
2097152 bytes (2.1 MB) copied, 14.227 s, 147 kB/s
real 0m14.228s
user 0m0.063s
sys 0m0.000s
- 开启 image qos 限速 测试iops
# 开启 qos 限速,配置为image级别
[root@ceph1 ~]# rbd config image set img rbd_qos_iops_limit 50
[root@ceph1 ~]# rbd config image list img| grep qos
rbd_qos_bps_burst 0 config
rbd_qos_bps_limit 0 config
rbd_qos_iops_burst 0 config
rbd_qos_iops_limit 50 image
rbd_qos_read_bps_burst 0 config
rbd_qos_read_bps_limit 0 config
rbd_qos_read_iops_burst 0 config
rbd_qos_read_iops_limit 0 config
rbd_qos_schedule_tick_min 50 config
rbd_qos_write_bps_burst 0 config
rbd_qos_write_bps_limit 0 config
rbd_qos_write_iops_burst 0 config
rbd_qos_write_iops_limit 0 config
# 同样参数dd测试,时间为1m24s,说明限速生效。且写入带宽 25kB/s 与 50 * 512 Byte 可以对应
[root@ceph1 ~]# time dd if=/dev/urandom of=/dev/nbd0 count=4096 bs=512 oflag=direct,nonblock
4096+0 records in
4096+0 records out
2097152 bytes (2.1 MB) copied, 83.3984 s, 25.1 kB/s
real 1m23.399s
user 0m0.000s
sys 0m0.077s
- 开启 pool qos 限速 测试iops
# 先删除 image 级别qos,然后再配置pool级别限制。同配置dd写入时间为41s,写入带宽 51kB/s 与 100 * 512 Byte 可以对应
[root@ceph1 ~]# rbd config image remove img rbd_qos_iops_limit
[root@ceph1 ~]# rbd config pool set rbd rbd_qos_iops_limit 100
[root@ceph1 ~]# rbd config image list img| grep qos
rbd_qos_bps_burst 0 config
rbd_qos_bps_limit 0 config
rbd_qos_iops_burst 0 config
rbd_qos_iops_limit 100 pool
rbd_qos_read_bps_burst 0 config
rbd_qos_read_bps_limit 0 config
rbd_qos_read_iops_burst 0 config
rbd_qos_read_iops_limit 0 config
rbd_qos_schedule_tick_min 50 config
rbd_qos_write_bps_burst 0 config
rbd_qos_write_bps_limit 0 config
rbd_qos_write_iops_burst 0 config
rbd_qos_write_iops_limit 0 config
[root@ceph1 ~]# time dd if=/dev/urandom of=/dev/nbd0 count=4096 bs=512 oflag=direct,nonblock
4096+0 records in
4096+0 records out
2097152 bytes (2.1 MB) copied, 41.0735 s, 51.1 kB/s
real 0m41.074s
user 0m0.071s
sys 0m0.000s
- 删除qos再次进行测试,验证已经恢复
# 删除配置后,dd写入测试,同配置写入时间恢复到14s
[root@ceph1 ~]# rbd config pool remove rbd rbd_qos_iops_limit
[root@ceph1 ~]# rbd config image list img| grep qos
rbd_qos_bps_burst 0 config
rbd_qos_bps_limit 0 config
rbd_qos_iops_burst 0 config
rbd_qos_iops_limit 0 config
rbd_qos_read_bps_burst 0 config
rbd_qos_read_bps_limit 0 config
rbd_qos_read_iops_burst 0 config
rbd_qos_read_iops_limit 0 config
rbd_qos_schedule_tick_min 50 config
rbd_qos_write_bps_burst 0 config
rbd_qos_write_bps_limit 0 config
rbd_qos_write_iops_burst 0 config
rbd_qos_write_iops_limit 0 config
[root@ceph1 ~]# time dd if=/dev/urandom of=/dev/nbd0 count=4096 bs=512 oflag=direct,nonblock
4096+0 records in
4096+0 records out
2097152 bytes (2.1 MB) copied, 14.567 s, 144 kB/s
real 0m14.568s
user 0m0.000s
sys 0m0.064s
测试结论
- ceph 在 nautilus 新增的这个rbd QOS 功能可以限制住用户态挂载的虚拟卷的iops,且实时生效。
- 后续在内核态挂载的测试中,验证无效。适用范围是使用NBD(librbd)模式挂载的RBD。
遗留疑问
在重复测试中,开启限速后,偶现dd写入失败报错场景。取消限速后可恢复正常,该现象没有稳定复现。具体原因未知,猜测限速开启后,IO阻塞触发的dd写入异常。