Schema and examples for your konduktor serve launch deployment.yaml
min_replicas
(required)
max_replicas
(optional)
resources: image_id
(required)
vllm/vllm-openai:v0.7.1
or other version is supported by the OpenAI APImin_replicas
(required)
max_replicas
(optional)
probe
(exclude)
run
(required)
python3 -m vllm.entrypoints.openai.api_server
(required)
--model
(required)
konduktor secret create --kind=env --inline HUGGING_FACE_HUB_TOKEN=hf_ABC123 my-hf-token
--max-model-len
(required)--tensor-parallel-size
(required w GPUs > 1; otherwise optional)vllm.entrypoints.openai.api_server