创建智能路由(1) 验证路由规则 1)通过智能路由页面获取调用信息“公网调用地址”或“VPC调用地址” 2)准备请求命令,并发送请求,重复两次 shell curl H 'Host: inference46rifischeduler.default.inference.cn' H "ContentType: application/json" d '{ "model": "inference46rifi", "prompt": "San Francisco is a major commercial, financial, and cultural center in Northern California.San Francisco is a major commercial, financial, and cultural center in Northern California.San Francisco is a major commercial, financial, and cultural center in Northern California.San Francisco is a major commercial, financial, and cultural center in Northern California.San Francisco is a major commercial, financial, and cultural center in Northern California.", "maxtokens": 64, "temperature": 0 }' 3)两次请求后,查看 inferencescheduler 日志,可以看到第二次请求,日志显示对 Pod 进行打分,并调度请求到同一 vllm Pod 上。 通过“工作负载”控制台查看智能路由服务 inferencescheduler 日志,可以看到路由策略打分生效,相同前缀请求调度到同一 vLLM Pod 上。
来自: