创建Inference任务
更新时间 2026-06-10 18:21:09
最近更新时间: 2026-06-10 18:21:09
本节介绍创建Inference任务。
Inference是广泛应用的机器学习框架,能够帮助模型开发人员实现多机多卡分布式训练。在联邦集群中,可以提交Inference作业来完成inference框架下的机学习任务。
前提条件
1、成员集群已经安装inference机器学习框架
2、联邦集群版本大于或者等于v1.14.8
3、成员集群具备Inference作业运行的资源条件
4、成员集群已经添加到联邦集群中
操作步骤
步骤一:在联邦集群中创建Inference任务的自定义资源定义
1、从官网下载Inference的CRD,使用联邦的接入配置创建于联邦的控制面
kubectl --kubeconfig karmada_kubeconfig apply -f inference_crd.yaml2、查看Inference的CRD
kubectl --kubeconfig karmada_kubeconfig get crd inferences.inference.isuite.ctyun.cn预期输出:
[root@ccseagent-hxk4joo11x cceone]# kubectl --kubeconfig karmada_kubeconfig get crd inferences.inference.isuite.ctyun.cn
NAME CREATED AT
inferences.inference.isuite.ctyun.cn 2026-04-02T09:26:26Z步骤二:在联邦控制面创建Inference任务
1、使用接入配置,在联邦控制面创建自定义的Inference任务
kubectl --kubeconfig karmada_kubeconfig apply -f inference-sample.yamlinference-sample.yaml文件内容如下:
apiVersion: inference.isuite.ctyun.cn/v1
kind: Inference
metadata:
name: inference-sample
namespace: default
spec:
framework:
jobMode: Single
type: vLLM
replicaSpecs:
Master:
replicas: 1
template:
metadata:
annotations:
prometheus.io/app-metrics: "true"
prometheus.io/app-metrics-path: /metrics
prometheus.io/app-metrics-port: "8000"
prometheus.io/scrape: "true"
spec:
containers:
- args:
- vllm serve /data/models --served-model-name inference-btgkji --host
0.0.0.0 --trust-remote-code --tensor-parallel-size 1
command:
- sh
- -c
env:
- name: POD_IP
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: status.podIP
- name: PYTHONHASHSEED
value: "42"
image: isuite-pub-registry-xinan1.crs-internal.ctyun.cn/isuite/nvidia-vllm:v0.10.0-py312-cu128-ubuntu22.04-amd64
name: master
resources:
limits:
nvidia.com/gpu: "1"
requests:
nvidia.com/gpu: "1"
volumeMounts:
- mountPath: /data/models
name: vllm-models
- mountPath: /dev/shm
name: shm
- command:
- /isuite/isuite-adapter
env:
- name: ISUITE_INFERENCE_ENGINE
value: vLLM
- name: ISUITE_INFERENCE_ADAPTER_PORT
value: "9099"
- name: ISUITE_INFERENCE_ENGINE_ENDPOINT
value: http://localhost:8000/metrics
image: registry-vpc-crs-xinan1.cnsp-internal.ctyun.cn/icce/isuite-adapter:20260206
name: inference-adapter
resources: {}
hostPID: true
nodeSelector:
isuite.ctyun.cn/model-4uwpcaisqaff72bt-v1: cached
volumes:
- hostPath:
path: /data/isuite/models/model-4uwpcaisqaff72bt/v1
name: vllm-models
- emptyDir:
medium: Memory
name: shm
replicas: 1
serviceConfig:
elbID: ""
name: inference-btgkji-svc
ports:
- name: inference-btgkji-svc-0
port: 8000
protocol: TCP
targetPort: 8000
type: ClusterIP步骤三:在联邦控制面创建Inference任务的分发策略
1、使用接入配置,在联邦控制创建步骤二Inference任务的分发策略
kubectl --kubeconfig karmada_kubeconfig apply -f inference-sample-pp.yamltf-sample-pp.yaml内容如下:
apiVersion: policy.karmada.io/v1alpha1
kind: PropagationPolicy
metadata:
name: inference-sample
spec:
resourceSelectors:
- apiVersion: inference.isuite.ctyun.cn/v1
kind: Inference
name: inference-sample
placement:
replicaScheduling:
replicaDivisionPreference: Aggregated
replicaSchedulingType: Divided2、查看Inference任务的状态
kubectl --kubeconfig karmada_kubeconfig get Inference inference-sample