Konduktor lets you submit multiple jobs which are enqueued until capacity is available for the job. This is useful for performing batch jobs, hyperparameter sweeps, or data processing.

Write a single YAML for one job

Consider the following use case where we set which shard of data we want to perform work on via an environment variable. For example, our job might be defined as

# batch.yaml
name: batch-job

num_nodes: 1

resources:
  cpus: 2
  memory: 2
  accelerators: H100:8
  image_id: nvcr.io/nvidia/pytorch:25.02-py3
  labels:
    kueue.x-k8s.io/queue-name: user-queue
    maxRunDurationSeconds: "3200"

run: |
  set -ex
  echo $SHARD_IDX
  python batch_inference.py --shard-idx $SHARD_IDX

We can launch this job and set SHARD_IDX=0 environment variable without attaching to the log or input stream via

konduktor launch batch.yaml --yes --detach-run --env SHARD_IDX=0

To enqueue, multiple shards jobs, we can use a bash loop.

for i in {1..3}; do
  konduktor launch batch.yaml --yes --detach-run --env SHARD_IDX=$i
done

and to check the status of your jobs, use konduktor status.

(konduktor) Andrews-MacBook-Air:konduktor asai$ konduktor status
User: asai-c41a
Jobs
NAME            STATUS     RESOURCES                    SUBMITTED                     
batch-job-160a  FAILED     1x(2CPU, memory 2Gi, H100:8)   2 minutes                     
batch-job-6fc9  PENDING    1x(2CPU, memory 2Gi, H100:8)   2 minutes                     
batch-job-873f  COMPLETED  1x(2CPU, memory 2Gi, H100:8)   1 minutes