GPU共享调度(1) 单位为MiB,此处代表申请 2000MiB 显存 步骤3:验证显存隔离能力。 远程登录到刚刚创建的pod。 shell kubectl exec it bash 执行nvidiasmi查看显存大小,预期输出如下: shell [root@gpusharetest77db5c96cdghl9b /] nvidiasmi Mon Nov 25 08:10:08 2024 + NVIDIASMI 550.54.14 Driver Version: 550.54.14 CUDA Version: 12.4 +++ GPU Name PersistenceM BusId Disp.A Volatile Uncorr. ECC Fan Temp Perf Pwr:Usage/Cap MemoryUsage GPUUtil Compute M. MIG M. ++ 0 NVIDIA A10 On 00000000:00:06.0 Off 0 0% 33C P8 21W / 150W 0MiB / 2000MiB 0% Default N/A +++ 场景二:显存隔离和算力限制 步骤1:节点配置共享GPU调度标签 shell kubectl label no ccse.node.gpu.schedulecoremem 步骤2:提交任务,任务YAML如下: shell apiVersion: apps/v1 kind: Deployment metadata: name: gpusharetest namespace: default spec: replicas: 1 selector: matchLabels: app: gpusharetest template: metadata: labels: app: gpusharetest spec: containers: name: gpusharetest image: deeplearningexamples:v3 command: sleep 1h resources: limits: ctyun.cn/gpucore.percentage: "10"
来自: