背景描述:
使用ceph的对象存储做测试(ceph-v12.2.9版本),发现当使用默认的rgw_max_chunk_size=4M,而设置rgw_obj_stripe_size=8M的时候,使用s3cmd上传一个19M的文件(s3cmd默认当大于15M的文件的时候,会启用分段上传),会出现数据丢失的问题。
实践过程描述:
将rgw_obj_stripe_size的大小改成8M以后,chunk_size的大小不变为4M,发现分段上传部分会出现数据丢失的问题。
通过S3cmd上传一个19M的视频文件:
[root@99386cd5b819 video]# s3cmd put test.mp4 s3://openapi-hp-test
upload: 'test.mp4' -> 's3://openapi-hp-test/test.mp4' [part 1 of 2, 15MB] [1 of 1]
15728640 of 15728640 100% in 2s 6.05 MB/s done
upload: 'test.mp4' -> 's3://openapi-hp-test/test.mp4' [part 2 of 2, 3MB] [1 of 1]
3686597 of 3686597 100% in 1s 2.17 MB/s done
[root@99386cd5b819 video]#
查看rados中的数据池中有哪些对象生成,以及各个对象的大小信息:
[root@99386cd5b819 build]# ./bin/rados -p default.rgw.buckets.data ls
5af0c708-a887-4e37-8fb8-83f5d92a9dcb.4167.1__multipart_test.mp4.2~F7Z_a680T3x6HcL041K7-lwCN6GdWvS.1
5af0c708-a887-4e37-8fb8-83f5d92a9dcb.4167.1_test.mp4
5af0c708-a887-4e37-8fb8-83f5d92a9dcb.4167.1__multipart_test.mp4.2~F7Z_a680T3x6HcL041K7-lwCN6GdWvS.2
5af0c708-a887-4e37-8fb8-83f5d92a9dcb.4167.1__shadow_test.mp4.2~F7Z_a680T3x6HcL041K7-lwCN6GdWvS.1_1
#可以看到数据池中生成了4个rados对象,其中第一个分段15M的,生成了2个rados对象,而第二个分段上传,写了一个rados对象。
#接着再看下各个对象的大小数据size
[root@99386cd5b819 build]# ./bin/rados -p default.rgw.buckets.data stat 5af0c708-a887-4e37-8fb8-83f5d92a9dcb.4167.1_test.mp4
default.rgw.buckets.data/5af0c708-a887-4e37-8fb8-83f5d92a9dcb.4167.1_test.mp4 mtime 2023-07-06 15:54:27.000000, size 0
[root@99386cd5b819 build]#
[root@99386cd5b819 build]#
[root@99386cd5b819 build]# ./bin/rados -p default.rgw.buckets.data stat 5af0c708-a887-4e37-8fb8-83f5d92a9dcb.4167.1__multipart_test.mp4.2~F7Z_a680T3x6HcL041K7-lwCN6GdWvS.1
default.rgw.buckets.data/5af0c708-a887-4e37-8fb8-83f5d92a9dcb.4167.1__multipart_test.mp4.2~F7Z_a680T3x6HcL041K7-lwCN6GdWvS.1 mtime 2023-07-06 15:54:24.000000, size 8388608
[root@99386cd5b819 build]#
[root@99386cd5b819 build]#
[root@99386cd5b819 build]#
[root@99386cd5b819 build]#
[root@99386cd5b819 build]# ./bin/rados -p default.rgw.buckets.data stat 5af0c708-a887-4e37-8fb8-83f5d92a9dcb.4167.1__shadow_test.mp4.2~F7Z_a680T3x6HcL041K7-lwCN6GdWvS.1_1
default.rgw.buckets.data/5af0c708-a887-4e37-8fb8-83f5d92a9dcb.4167.1__shadow_test.mp4.2~F7Z_a680T3x6HcL041K7-lwCN6GdWvS.1_1 mtime 2023-07-06 15:54:23.000000, size 3145728
[root@99386cd5b819 build]#
[root@99386cd5b819 build]#
[root@99386cd5b819 build]#
[root@99386cd5b819 build]# ./bin/rados -p default.rgw.buckets.data stat 5af0c708-a887-4e37-8fb8-83f5d92a9dcb.4167.1__multipart_test.mp4.2~F7Z_a680T3x6HcL041K7-lwCN6GdWvS.2
default.rgw.buckets.data/5af0c708-a887-4e37-8fb8-83f5d92a9dcb.4167.1__multipart_test.mp4.2~F7Z_a680T3x6HcL041K7-lwCN6GdWvS.2 mtime 2023-07-06 15:54:25.000000, size 3686597
[root@99386cd5b819 build]#
可以看到在rados对象数据的大小为0+(8M+3M)+3686597byte是小于原文件大小的。
再看一下manifest是什么情况哈:
[root@99386cd5b819 build]# ./bin/rados -p default.rgw.buckets.data listxattr 5af0c708-a887-4e37-8fb8-83f5d92a9dcb.4167.1_test.mp4
2023-07-06 16:07:12.182 7f5d7fa89880 -1 WARNING: all dangerous and experimental features are enabled.
2023-07-06 16:07:12.210 7f5d7fa89880 -1 WARNING: all dangerous and experimental features are enabled.
2023-07-06 16:07:12.230 7f5d7fa89880 -1 WARNING: all dangerous and experimental features are enabled.
user.rgw.acl
user.rgw.content_type
user.rgw.etag
user.rgw.idtag
user.rgw.manifest
user.rgw.pg_ver
user.rgw.source_zone
user.rgw.tail_tag
user.rgw.x-amz-meta-s3cmd-attrs
[root@99386cd5b819 build]# ./bin/rados getxattr -p default.rgw.buckets.data 5af0c708-a887-4e37-8fb8-83f5d92a9dcb.4167.1_test.mp4 user.rgw.manifest > ./lhp/test.user.rgw.manifest
2023-07-06 16:07:21.375 7fb28ed7d880 -1 WARNING: all dangerous and experimental features are enabled.
2023-07-06 16:07:21.399 7fb28ed7d880 -1 WARNING: all dangerous and experimental features are enabled.
2023-07-06 16:07:21.419 7fb28ed7d880 -1 WARNING: all dangerous and experimental features are enabled.
[root@99386cd5b819 build]# ./bin/ceph-dencoder import ./lhp/test.user.rgw.manifest type RGWObjManifest decode dump_json
{
"objs": [],
"obj_size": 19415237,
"explicit_objs": "false",
"head_size": 0,
"max_head_size": 0,
"prefix": "test.mp4.2~F7Z_a680T3x6HcL041K7-lwCN6GdWvS",
"rules": [
{
"key": 0,
"val": {
"start_part_num": 1,
"start_ofs": 0,
"part_size": 15728640,
"stripe_max_size": 8388608,
"override_prefix": ""
}
},
{
"key": 15728640,
"val": {
"start_part_num": 2,
"start_ofs": 15728640,
"part_size": 3686597,
"stripe_max_size": 8388608,
"override_prefix": ""
}
}
],
"tail_instance": "",
"tail_placement": {
"bucket": {
"name": "openapi-hp-test-pool1",
"marker": "5af0c708-a887-4e37-8fb8-83f5d92a9dcb.4167.1",
"bucket_id": "5af0c708-a887-4e37-8fb8-83f5d92a9dcb.4167.1",
"tenant": "",
"explicit_placement": {
"data_pool": "",
"data_extra_pool": "",
"index_pool": ""
}
},
"placement_rule": "default-placement"
},
"begin_iter": {
"part_ofs": 0,
"stripe_ofs": 0,
"ofs": 0,
"stripe_size": 8388608,
"cur_part_id": 1,
"cur_stripe": 0,
"cur_override_prefix": "",
"location": {
"placement_rule": "default-placement",
"obj": {
"bucket": {
"name": "openapi-hp-test-pool1",
"marker": "5af0c708-a887-4e37-8fb8-83f5d92a9dcb.4167.1",
"bucket_id": "5af0c708-a887-4e37-8fb8-83f5d92a9dcb.4167.1",
"tenant": "",
"explicit_placement": {
"data_pool": "",
"data_extra_pool": "",
"index_pool": ""
}
},
"key": {
"name": "test.mp4.2~F7Z_a680T3x6HcL041K7-lwCN6GdWvS.1",
"instance": "",
"ns": "multipart"
}
},
"raw_obj": {
"pool": "",
"oid": "",
"loc": ""
},
"is_raw": false
}
},
"end_iter": {
"part_ofs": 19415237,
"stripe_ofs": 19415237,
"ofs": 19415237,
"stripe_size": 3686597,
"cur_part_id": 3,
"cur_stripe": 0,
"cur_override_prefix": "",
"location": {
"placement_rule": "default-placement",
"obj": {
"bucket": {
"name": "openapi-hp-test-pool1",
"marker": "5af0c708-a887-4e37-8fb8-83f5d92a9dcb.4167.1",
"bucket_id": "5af0c708-a887-4e37-8fb8-83f5d92a9dcb.4167.1",
"tenant": "",
"explicit_placement": {
"data_pool": "",
"data_extra_pool": "",
"index_pool": ""
}
},
"key": {
"name": "test.mp4.2~F7Z_a680T3x6HcL041K7-lwCN6GdWvS.3",
"instance": "",
"ns": "multipart"
}
},
"raw_obj": {
"pool": "",
"oid": "",
"loc": ""
},
"is_raw": false
}
}
}
[root@99386cd5b819 build]#
从Manifest中可以看到,rule里面的rados条带大小已经变成了8388608byte的大小了。
再看一下上传过程中的关键日志信息:(日志是后面上传lhp.mp4的时候采集记录的,但是过程跟test.mp4相同)
从日志可以看出,写对象的时候,分段上传的时候,第一个分段上传后Head这个数据对象被覆盖写了2次,而且他的这个第一次写的时候,居然是写的4M的大小,难道不应该是8M的大小么?
所以这里就存在两个问题,为什么head这个分段的第一个数据对象是4M的大小?按道理应该是8M,其次为什么这个对象会被写两次?
根本原因:
要查找根本原因,就需要根据日志,结合代码进行分析了。
第一个问题:分段上传的时候,第一个数据对象的大小为什么还是4M?不是8M的大小?
相关代码:
第二个问题:分段上传中,为什么会重复写第一个对象head?
相关代码如下:
从上面的注意3的代码可以知道cur_stripe=0,那不就是写的分段上传的FirstObject么?这就是为什么会覆盖写的问题,导致数据丢失的根本原因。
解决方案:
更新到更高的版本,该问题,官方已经修复:rgw: MultipartObjectProcessor supports stripe size > chunk size。