问题描述:
- 跑圈测试,跑1000圈重启测试中第137圈出现
- 偶现报错
硬件信息:
HBA卡是9500-16i Firmware:26.00.00.00 Driver:45.00.00.00
Oct 27 17:25:06 localhost systemd[1]: Startup finished in 1min 35.694s (firmware) + 7.715s (loader) + 3.808s (kernel) + 14.977s (initrd) + 10.189s (userspace) = 2min 12.385s. Oct 27 17:25:09 localhost systemd[1]: NetworkManager-dispatcher.service: Deactivated successfully. Oct 27 17:25:11 localhost kernel: [ 33.842415] mpt3sas_cm1: log_info(0x31111000): originator(PL), code(0x11), sub_code(0x1000) Oct 27 17:25:11 localhost kernel: [ 33.842417] mpt3sas_cm1: log_info(0x31111000): originator(PL), code(0x11), sub_code(0x1000) Oct 27 17:25:11 localhost kernel: [ 33.842461] mpt3sas_cm1: TEST_UNIT_READY: handle(0x002a), lun(0) Oct 27 17:25:12 localhost kernel: [ 34.344079] mpt3sas_cm1: log_info(0x31130000): originator(PL), code(0x13), sub_code(0x0000) Oct 27 17:25:12 localhost kernel: [ 34.344093] mpt3sas_cm1: log_info(0x31130000): originator(PL), code(0x13), sub_code(0x0000) Oct 27 17:25:12 localhost kernel: [ 34.345973] mpt3sas_cm1: handle(0x002a), ioc_status(0x0022)#012failure at /builddir/build/BUILD/mpt3sas-45.00.00.00/obj/mpt3sas_transport.c:316/_transport_set_identify()! Oct 27 17:25:13 localhost kernel: [ 35.409122] mpt3sas_cm1: handle(0x002a), ioc_status(0x0022)#012failure at /builddir/build/BUILD/mpt3sas-45.00.00.00/obj/mpt3sas_transport.c:316/_transport_set_identify()! Oct 27 17:25:13 localhost kernel: [ 35.410649] mpt3sas_cm1: failure at /builddir/build/BUILD/mpt3sas-45.00.00.00/obj/mpt3sas_scsih.c:10531/_scsih_add_device()! Oct 27 17:25:14 localhost kernel: [ 36.594040] mpt3sas_cm1: detecting: handle(0x002a), sas_address(0x500e004bbbbbbb04), phy(4) Oct 27 17:25:14 localhost kernel: [ 36.594553] mpt3sas_cm1: REPORT_LUNS: handle(0x002a), retries(0) Oct 27 17:25:17 localhost kernel: [ 39.842639] mpt3sas_cm1: log_info(0x31110e05): originator(PL), code(0x11), sub_code(0x0e05) Oct 27 17:25:17 localhost kernel: [ 39.842643] mpt3sas_cm1: log_info(0x31110e05): originator(PL), code(0x11), sub_code(0x0e05) Oct 27 17:25:17 localhost kernel: [ 39.843761] mpt3sas_cm1: TEST_UNIT_READY: handle(0x002a), lun(0) Oct 27 17:25:18 localhost NetworkManager[2615]: <info> [1698398718.0208] policy: set-hostname: set hostname to 'localhost.localdomain' (no hostname found) Oct 27 17:25:18 localhost systemd-hostnamed[2642]: Hostname set to <localhost.localdomain> (transient) Oct 27 17:25:18 localhost dbus-daemon[2527]: [system] Activating via systemd: service name='org.freedesktop.nm_dispatcher' unit='dbus-org.freedesktop.nm-dispatcher.service' requested by ':1.4' (uid=0 pid=2615 comm="/usr/sbin/NetworkManager --no-daemon " label="system_u:system_r:NetworkManager_t:s0") Oct 27 17:25:18 localhost systemd[1]: Starting Network Manager Script Dispatcher Service... Oct 27 17:25:18 localhost dbus-daemon[2527]: [system] Successfully activated service 'org.freedesktop.nm_dispatcher' Oct 27 17:25:18 localhost systemd[1]: Started Network Manager Script Dispatcher Service. Oct 27 17:25:18 localhost kernel: [ 40.927368] mpt3sas_cm1: detecting: handle(0x002a), sas_address(0x500e004bbbbbbb04), phy(4) Oct 27 17:25:18 localhost kernel: [ 40.927885] mpt3sas_cm1: REPORT_LUNS: handle(0x002a), retries(0) Oct 27 17:25:18 localhost kernel: [ 40.928409] mpt3sas_cm1: TEST_UNIT_READY: handle(0x002a), lun(0) Oct 27 17:25:18 localhost kernel: [ 40.929053] mpt3sas_cm1: handle(0x2a) sas_address(0x500e004bbbbbbb04) port_type(0x1) Oct 27 17:25:18 localhost kernel: [ 40.931273] scsi 15:0:12:0: Direct-Access ATA ST16000NM000J-2T SCA4 PQ: 0 ANSI: 6 Oct 27 17:25:18 localhost kernel: [ 40.932292] scsi 15:0:12:0: SATA: handle(0x002a), sas_addr(0x500e004bbbbbbb04), phy(4), device_name(0x5000c500c82dbca4) Oct 27 17:25:18 localhost kernel: [ 40.933388] scsi 15:0:12:0: enclosure logical id(0x500e004bbebbbb00), slot(4) Oct 27 17:25:18 localhost kernel: [ 40.934045] scsi 15:0:12:0: enclosure level(0x0000), connector name( C0.1) Oct 27 17:25:18 localhost kernel: [ 40.934739] scsi 15:0:12:0: atapi(n), ncq(y), asyn_notify(n), smart(y), fua(y), sw_preserve(y) Oct 27 17:25:18 localhost kernel: [ 40.935909] scsi 15:0:12:0: serial_number( ZRS007SJ) Oct 27 17:25:18 localhost kernel: [ 40.936400] scsi 15:0:12:0: qdepth(32), tagged(1), scsi_level(7), cmd_que(1) Oct 27 17:25:18 localhost kernel: [ 40.940168] sd 15:0:12:0: Attached scsi generic sg26 type 0 Oct 27 17:25:18 localhost kernel: [ 40.941858] end_device-15:0:12: mpt3sas_transport_port_add: added: handle(0x002a), sas_addr(0x500e004bbbbbbb04) |
结论:跟踪代码,io请求异常就会报该错误。出现概率0.08%,有重试机制。可忽略
|