安装部署gpu pass through
1)在KVM主机上启用IOMMU
vi /etc/default/grubGRUB_TIMEOUT=5GRUB_DISTRIBUTOR="$(sed 's, release .*$,,g' /etc/system-release)"GRUB_DEFAULT=savedGRUB_DISABLE_SUBMENU=trueGRUB_TERMINAL_OUTPUT="console"GRUB_CMDLINE_LINUX="rd.lvm.lv=centos/root rd.lvm.lv=centos/swap rhgb quiet amd_iommu=on"GRUB_DISABLE_RECOVERY="true" |
如果是amd cpu在GRUB_CMDLINE_LINUX后面加上amd_iommu=on,如果是intel cpu则加上intel_iommu=on
2)禁用nouveau驱动
vi /etc/modprobe.d/blacklist-nouveau.confblacklist nouveauoptions nouveau modeset=0 |
3)升级grub参数并重启生效
grub2-mkconfig -o /boot/grub2/grub.cfgreboot检查iommu是否启动dmesg | grep -E "DMAR|IOMMU"检查nouveau是否禁用dmesg | grep -i nouveau |
4)启动 vfio-pci 驱动,并绑定到设备
modprobe vfio-pci这里需要将显卡所在的iommu_group所有设备都添加到/etc/modprobe.d/vfio.conf通过命令for iommu_group in $(ls -dv /sys/kernel/iommu_groups/*/); do echo "IOMMU group $(basename "$iommu_group")"for device in $(ls -1 "$iommu_group"/devices/); doecho -n $'\t'lspci -nns "$device"donedone查找到对应设备,将Vendor ID和Device ID添加到/etc/modprobe.d/vfio.conf...IOMMU group 2 00:03.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge [1022:1482] 00:03.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse GPP Bridge [1022:1483] 07:00.0 VGA compatible controller [0300]: NVIDIA Corporation TU116 [GeForce GTX 1650 SUPER] [10de:2187] (rev a1) 07:00.1 Audio device [0403]: NVIDIA Corporation TU116 High Definition Audio Controller [10de:1aeb] (rev a1) 07:00.2 USB controller [0c03]: NVIDIA Corporation TU116 USB 3.1 Host Controller [10de:1aec] (rev a1) 07:00.3 Serial bus controller [0c80]: NVIDIA Corporation TU116 USB Type-C UCSI Controller [10de:1aed] (rev a1)...vi /etc/modprobe.d/vfio.confoptions vfio-pci ids=10de:2187,10de:1aeb,10de:1aec,10de:1aed,1022:1482,1022:1483执行dracut --forcerebootdmesg | grep -i vfio 检查是否绑定[root@dev /]# lspci -nnk -d 10de:07:00.0 VGA compatible controller [0300]: NVIDIA Corporation TU116 [GeForce GTX 1650 SUPER] [10de:2187] (rev a1) Subsystem: Micro-Star International Co., Ltd. [MSI] Device [1462:3852] Kernel driver in use: vfio-pci Kernel modules: nouveau07:00.1 Audio device [0403]: NVIDIA Corporation TU116 High Definition Audio Controller [10de:1aeb] (rev a1) Subsystem: Micro-Star International Co., Ltd. [MSI] Device [1462:3852] Kernel driver in use: vfio-pci Kernel modules: snd_hda_intel07:00.2 USB controller [0c03]: NVIDIA Corporation TU116 USB 3.1 Host Controller [10de:1aec] (rev a1) Subsystem: Micro-Star International Co., Ltd. [MSI] Device [1462:3852] Kernel driver in use: vfio-pci07:00.3 Serial bus controller [0c80]: NVIDIA Corporation TU116 USB Type-C UCSI Controller [10de:1aed] (rev a1) Subsystem: Micro-Star International Co., Ltd. [MSI] Device [1462:3852] Kernel driver in use: vfio-pci Kernel modules: i2c_nvidia_gpu会出现有设备无法绑定情况,需要手动设置,比如USB controller这个设备绑定不了执行下面命令echo -n "0000:07:00.2" > /sys/bus/pci/drivers/xhci_hcd/unbindecho -n "0000:07:00.2" > /sys/bus/pci/drivers/vfio-pci/bind |