[613405.736532] mlx5_core 0000:2a:00.0: poll_health:971:(pid 0): device's health compromised - reached miss count [613405.737166] mlx5_core 0000:2a:00.0: print_health_info:491:(pid 0): Health issue observed, firmware internal error, severity(3) ERROR: [613405.738196] mlx5_core 0000:2a:00.0: print_health_info:495:(pid 0): assert_var[0] 0x00000000 [613405.738781] mlx5_core 0000:2a:00.0: print_health_info:495:(pid 0): assert_var[1] 0x00000000 [613405.739334] mlx5_core 0000:2a:00.0: print_health_info:495:(pid 0): assert_var[2] 0x00000000 [613405.739904] mlx5_core 0000:2a:00.0: print_health_info:495:(pid 0): assert_var[3] 0x00000000 [613405.740465] mlx5_core 0000:2a:00.0: print_health_info:495:(pid 0): assert_var[4] 0x00000000 [613405.741018] mlx5_core 0000:2a:00.0: print_health_info:495:(pid 0): assert_var[5] 0x00000000 [613405.741550] mlx5_core 0000:2a:00.0: print_health_info:498:(pid 0): assert_exit_ptr 0x20a202c8 [613405.742070] mlx5_core 0000:2a:00.0: print_health_info:499:(pid 0): assert_callra 0x20a26488 [613405.742589] mlx5_core 0000:2a:00.0: print_health_info:500:(pid 0): fw_ver 26.35.2000 [613405.743089] mlx5_core 0000:2a:00.0: print_health_info:502:(pid 0): time 0 [613405.743575] mlx5_core 0000:2a:00.0: print_health_info:503:(pid 0): hw_id 0x00000216 [613405.744054] mlx5_core 0000:2a:00.0: print_health_info:504:(pid 0): rfr 0 [613405.744522] mlx5_core 0000:2a:00.0: print_health_info:505:(pid 0): severity 3 (ERROR) [613405.744989] mlx5_core 0000:2a:00.0: print_health_info:506:(pid 0): irisc_index 7 [613405.745405] mlx5_core 0000:2a:00.0: print_health_info:507:(pid 0): synd 0x1: firmware internal error [613405.745840] mlx5_core 0000:2a:00.0: print_health_info:509:(pid 0): ext_synd 0x8a02 [613405.746281] mlx5_core 0000:2a:00.0: print_health_info:510:(pid 0): raw fw_ver 0x1a2307d0 [613406.278016] mlx5_core 0000:be:00.1 ens6f1np1: Link up [613406.285205] 8021q: adding VLAN 0 to HW filter on device ens6f1np1 [613406.325260] IPv6: ADDRCONF(NETDEV_CHANGE): ens6f1np1: link becomes ready [613406.824530] mlx5_core 0000:2a:00.1: poll_health:971:(pid 0): device's health compromised - reached miss count [613406.825059] mlx5_core 0000:2a:00.1: print_health_info:491:(pid 0): Health issue observed, firmware internal error, severity(3) ERROR: [613406.825930] mlx5_core 0000:2a:00.1: print_health_info:495:(pid 0): assert_var[0] 0x00000000 [613406.826328] mlx5_core 0000:2a:00.1: print_health_info:495:(pid 0): assert_var[1] 0x00000000 [613406.826758] mlx5_core 0000:2a:00.1: print_health_info:495:(pid 0): assert_var[2] 0x00000000 [613406.827118] mlx5_core 0000:2a:00.1: print_health_info:495:(pid 0): assert_var[3] 0x00000000 [613406.827503] mlx5_core 0000:2a:00.1: print_health_info:495:(pid 0): assert_var[4] 0x00000000 [613406.827840] mlx5_core 0000:2a:00.1: print_health_info:495:(pid 0): assert_var[5] 0x00000000 [613406.828167] mlx5_core 0000:2a:00.1: print_health_info:498:(pid 0): assert_exit_ptr 0x20a202c8 [613406.828489] mlx5_core 0000:2a:00.1: print_health_info:499:(pid 0): assert_callra 0x20a26488 [613406.828819] mlx5_core 0000:2a:00.1: print_health_info:500:(pid 0): fw_ver 26.35.2000 [613406.829126] mlx5_core 0000:2a:00.1: print_health_info:502:(pid 0): time 0 [613406.829434] mlx5_core 0000:2a:00.1: print_health_info:503:(pid 0): hw_id 0x00000216 [613406.829781] mlx5_core 0000:2a:00.1: print_health_info:504:(pid 0): rfr 0 [613406.830129] mlx5_core 0000:2a:00.1: print_health_info:505:(pid 0): severity 3 (ERROR) [613406.830479] mlx5_core 0000:2a:00.1: print_health_info:506:(pid 0): irisc_index 7 [613406.830827] mlx5_core 0000:2a:00.1: print_health_info:507:(pid 0): synd 0x1: firmware internal error [613406.831150] mlx5_core 0000:2a:00.1: print_health_info:509:(pid 0): ext_synd 0x8a02 [613406.831485] mlx5_core 0000:2a:00.1: print_health_info:510:(pid 0): raw fw_ver 0x1a2307d0 [613406.888534] mlx5_core 0000:be:00.0: poll_health:971:(pid 0): device's health compromised - reached miss count [613406.888971] mlx5_core 0000:be:00.0: print_health_info:491:(pid 0): Health issue observed, firmware internal error, severity(3) ERROR: [613406.889684] mlx5_core 0000:be:00.0: print_health_info:495:(pid 0): assert_var[0] 0x00000000 [613406.890047] mlx5_core 0000:be:00.0: print_health_info:495:(pid 0): assert_var[1] 0x00000000 [613406.890392] mlx5_core 0000:be:00.0: print_health_info:495:(pid 0): assert_var[2] 0x00000000 [613406.890720] mlx5_core 0000:be:00.0: print_health_info:495:(pid 0): assert_var[3] 0x00000000 [613406.891010] mlx5_core 0000:be:00.0: print_health_info:495:(pid 0): assert_var[4] 0x00000000 [613406.891308] mlx5_core 0000:be:00.0: print_health_info:495:(pid 0): assert_var[5] 0x00000000 [613406.891605] mlx5_core 0000:be:00.0: print_health_info:498:(pid 0): assert_exit_ptr 0x20a202c8 [613406.891893] mlx5_core 0000:be:00.0: print_health_info:499:(pid 0): assert_callra 0x20a26488 [613406.892171] mlx5_core 0000:be:00.0: print_health_info:500:(pid 0): fw_ver 26.35.2000 [613406.892438] mlx5_core 0000:be:00.0: print_health_info:502:(pid 0): time 0 [613406.892705] mlx5_core 0000:be:00.0: print_health_info:503:(pid 0): hw_id 0x00000216 [613406.892999] mlx5_core 0000:be:00.0: print_health_info:504:(pid 0): rfr 0 [613406.893296] mlx5_core 0000:be:00.0: print_health_info:505:(pid 0): severity 3 (ERROR) [613406.893602] mlx5_core 0000:be:00.0: print_health_info:506:(pid 0): irisc_index 7 [613406.893909] mlx5_core 0000:be:00.0: print_health_info:507:(pid 0): synd 0x1: firmware internal error [613406.894213] mlx5_core 0000:be:00.0: print_health_info:509:(pid 0): ext_synd 0x8a02 [613406.894519] mlx5_core 0000:be:00.0: print_health_info:510:(pid 0): raw fw_ver 0x1a2307d0 [613407.976530] mlx5_core 0000:be:00.1: poll_health:971:(pid 0): device's health compromised - reached miss count [613407.976887] mlx5_core 0000:be:00.1: print_health_info:491:(pid 0): Health issue observed, firmware internal error, severity(3) ERROR: [613407.977530] mlx5_core 0000:be:00.1: print_health_info:495:(pid 0): assert_var[0] 0x00000000 [613407.977875] mlx5_core 0000:be:00.1: print_health_info:495:(pid 0): assert_var[1] 0x00000000 [613407.978170] mlx5_core 0000:be:00.1: print_health_info:495:(pid 0): assert_var[2] 0x00000000 [613407.978499] mlx5_core 0000:be:00.1: print_health_info:495:(pid 0): assert_var[3] 0x00000000 [613407.978816] mlx5_core 0000:be:00.1: print_health_info:495:(pid 0): assert_var[4] 0x00000000 [613407.979092] mlx5_core 0000:be:00.1: print_health_info:495:(pid 0): assert_var[5] 0x00000000 [613407.979379] mlx5_core 0000:be:00.1: print_health_info:498:(pid 0): assert_exit_ptr 0x20a202c8 [613407.979656] mlx5_core 0000:be:00.1: print_health_info:499:(pid 0): assert_callra 0x20a26488 [613407.979932] mlx5_core 0000:be:00.1: print_health_info:500:(pid 0): fw_ver 26.35.2000 [613407.980196] mlx5_core 0000:be:00.1: print_health_info:502:(pid 0): time 0 [613407.980464] mlx5_core 0000:be:00.1: print_health_info:503:(pid 0): hw_id 0x00000216 [613407.980729] mlx5_core 0000:be:00.1: print_health_info:504:(pid 0): rfr 0 [613407.980995] mlx5_core 0000:be:00.1: print_health_info:505:(pid 0): severity 3 (ERROR) [613407.981273] mlx5_core 0000:be:00.1: print_health_info:506:(pid 0): irisc_index 7 [613407.981559] mlx5_core 0000:be:00.1: print_health_info:507:(pid 0): synd 0x1: firmware internal error [613407.981840] mlx5_core 0000:be:00.1: print_health_info:509:(pid 0): ext_synd 0x8a02 [613407.982121] mlx5_core 0000:be:00.1: print_health_info:510:(pid 0): raw fw_ver 0x1a2307d0 |
QuadEn参数说明:
QuadEn为1表示Flash工作在四线模式,QuadEn为0表示Flash工作在二线模式。
四线模式、二线模式是Flash与SPIFLash烧写器、网卡FW的通讯方式,四线模式的速率会优于二线模式,某些情况下,当FW向Flash读取数据时,如果Flash工作于二线模式,由于速率的限制,可能不能及时响应FW的请求,会导致FW运行出现些问题。
网卡上电过程中,FW会向Flash读取数据,FW首先会检查Fash是否支持四线模式,如果支持则采用四线模式通讯,不支持则采用二线模式通讯。
问题结论:
开启固件的OuadEn参数。
解决方案:
测试过程中用的网卡没有经过生产的FT阶段, 在生产的FT阶段会开启
修改方法:
参考《ip link set down关闭后link灯依然点亮》安装mft工具修改固件参数QuadEn,重启生效