This example also demonstrates the creation and use of --kind=env secrets using konduktor secret create. This is required for some models such as meta-llama/Meta-Llama-3.1-8B-Instruct, which require Hugging Face tokens for authentication.

Prerequisites

Setup

  1. Create a --kind=env secret for your HF token called my-hf-token
$ konduktor secret create --kind=env --inline HUGGING_FACE_HUB_TOKEN=hf_ABC123 my-hf-token
  1. Check that the secret was properly created with:
$ konduktor secret list
For more details, check out the setup of secrets here.

Current Working Directory

$ ls
deployment.yaml

Launching

$ konduktor serve launch deployment.yaml

Deployment.yaml

# autoscaling + custom port + multi GPU
name: serving-vllm-complex

resources:
  cpus: 4
  memory: 32
  accelerators: A100:2
  image_id: vllm/vllm-openai:v0.7.1
  labels:
    kueue.x-k8s.io/queue-name: user-queue

serving: 
  min_replicas: 1
  max_replicas: 2
  ports: 9000

run: |
  python3 -m vllm.entrypoints.openai.api_server \
    --uvicorn-log-level warning \
    --model meta-llama/Meta-Llama-3.1-8B-Instruct \
    --max-model-len 8192 \
    --tensor-parallel-size 2 \
    --dtype half