Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.trainy.ai/llms.txt

Use this file to discover all available pages before exploring further.

Note that some models may require authentication through Hugging Face tokens, which can be done using konduktor secret (see complex example here). The model deepseek-ai/DeepSeek-R1-Distill-Llama-8B does not require one.

Prerequisites

Current Working Directory

$ ls
deployment.yaml

Launching

$ konduktor serve launch deployment.yaml

Deployment.yaml

# no autoscaling + default port (8000) + single GPU
name: serving-vllm-simple

resources:
  cpus: 4
  memory: 32
  accelerators: A100:1
  image_id: vllm/vllm-openai:v0.7.1
  labels:
    kueue.x-k8s.io/queue-name: user-queue

serving: 
  min_replicas: 1

run: |
  python3 -m vllm.entrypoints.openai.api_server \
    --model deepseek-ai/DeepSeek-R1-Distill-Llama-8B \
    --max-model-len 4096