Konduktor Serve Launch Deployment Yamls

Konduktor Serve Launch Deployment Yaml

Schema

name: <string>                    # required

envs:                             # optional
  key: value
workdir: <string>                 # optional

resources:
  cpus: <int/float/string>        # required; unit is vCPUs
  memory: <int/string>            # required; unit is GiB
  # must be image_id: vllm/vllm-openai:v0.7.1 w VLLM DEPLOYMENTS
  image_id: <string>              # required
  accelerators: <string>          # optional
  labels:                           
    kueue.x-k8s.io/queue-name: <string>  # required

serving:
  # if min_replicas != max_replicas, autoscaling is enabled automatically
  min_replicas: <int>             # required
  max_replicas: <int>             # optional; defaults to min_replicas
  ports: <int>                    # optional; defaults to 8000
  # GENERAL DEPLOYMENTS ONLY
  probe: <string>                 # optional; defaults to None (no health probing)
                                  # EXCLUDE PROBE COMPLETELY w VLLM DEPLOYMENTS

file_mounts:                      # optional
  /remote/path: ./local/path

run: |
  # VLLM DEPLOYMENTS ONLY          # required
  python3 -m vllm.entrypoints.openai.api_server \
    --model <string> \
    --max-model-len <int> \
    --tensor-parallel-size <int>  # required w GPUs > 1; otherwise exclude

Details

Both workdir (optional)

local directory to sync into the remote workdir before running commands

resources: cpus (required)

default and only unit is vCPU
accepts int, float, or numeric string:
- 2, 3.1, 0.2, “4.2”, “0.5”, etc (0.5 vCPU → 500m (millicores))
default and only unit is vCPU
unsupported formats:
- “500m”
- “4 vCPU”
- anything containing letters

resources: memory (required)

default and only unit is GiB (Gi)
accepts int or numeric string:
- 2, 16, etc (16 → 16Gi)
unsupported formats:
- 8.5
- “8.5”
- floats
- “16G”
- “0.5 T”
- anything containing letters

labels: kueue.x-k8s.io/queue-name (required)

required Kueue queue label so the scheduler knows which queue to place the JobSet in

serving: min_replicas (required) serving: max_replicas (optional)

if min_replicas != max_replicas, autoscaling is enabled automatically

file_mounts (optional)

mapping of local files/dirs to remote paths; Konduktor syncs them before launching your job

vLLM (Aibrix) Specific resources: image_id (required)

only vllm/vllm-openai:v0.7.1 or other version is supported by the OpenAI API

probe (exclude)

only /health is supported by the OpenAI API, so just exclude for simplicity and it will default to /health

run (required)

python3 -m vllm.entrypoints.openai.api_server (required)
- --model (required)
  - some models like Llama 3.1 require authentication through a hugging face token, which can be passed into the deployment using Konduktor Secrets
  - ex. konduktor secret create --kind=env --inline HUGGING_FACE_HUB_TOKEN=hf_ABC123 my-hf-token
- --max-model-len (required)
- --tensor-parallel-size (required w GPUs > 1; otherwise optional)
See here for more info on vllm.entrypoints.openai.api_server

Schemas

Example Task Yamls

Example Deployment Yamls

Konduktor Serve Launch Deployment Yamls

Konduktor Serve Launch Deployment Yaml

Schema

Details

Schemas

Example Task Yamls

Example Deployment Yamls

​Konduktor Serve Launch Deployment Yaml

​Schema

​Details

Konduktor Serve Launch Deployment Yaml

Schema

Details