Users have the following options for submitting their application code as a Trainy workload.

  • (Re)building a docker image for every change and pushing it to a registry for launching a workload.
  • Committing changes to a development branch in git and checking it out in the workload
  • Synchronizing application code through file sync via file_mounts and workdir definitions.

The first option is often slow given the size of deep learning images so we focus on the the latter two here.

Setup

We first need to decide which storage provider we are going to use. First, we have to place our cloud service account credentials into the Trainy cluster.

konduktor check <CLOUD STORAGE ALIAS> # s3, gs, etc.

afterwards we configure the storage provider by setting ~/.konduktor/config.yaml

# ~/.konduktor/config.yaml
allowed_clouds:
  - gs

Usage

When we run konduktor launch two things happen atomically in this order. If any step fails, the workload will fast-fail.

  1. workdir and file_mounts are synchronized to object storage
  2. workload is submitted
  3. workload, once active, will sync down workdir and file_mounts

In our workload definition, we can define the following:

name: single-file-upload

num_nodes: 1

workdir: .

file_mounts:
  # syntax is <remote_dir>:<local_dir>
  ~/test_dir: ./test_dir
  # syntax is <remote_file>:<local_file>
  ~/static_path.txt: ./static_path.txt

resources:
  cpus: 1
  memory: 1
  image_id: ubuntu
  labels:
    kueue.x-k8s.io/queue-name: user-queue
    maxRunDurationSeconds: "600"

run: |
  ls -lah
  ls -lah ~/

Cloning private GitHub Repositories.

Currently, cloning private repositories is _only supported via file sync of ssh keys to your object store. _In the following example, we sync an ssh key from our workstation onto the workload and configure SSH for pulling from a private repository.

name: private-repo-ssh

num_nodes: 1

resources:
  cpus: 1
  memory: 2
  image_id: ubuntu
  labels:
    kueue.x-k8s.io/queue-name: user-queue
    maxRunDurationSeconds: "3200"


file_mounts:
  ~/.ssh/test-ssh-key: ./tests/secrets/test-ssh-key

run: |
  set -eux
  apt-get update && apt-get install -y git openssh-client

  if [[ -f ~/.ssh/test-ssh-key && -s ~/.ssh/test-ssh-key ]]; then
    echo "SSH key mounted and non-empty"
  else
    echo "SSH key missing or empty"
    exit 1
  fi

  chmod 600 ~/.ssh/test-ssh-key
  echo -e "Host github.com\n\tIdentityFile ~/.ssh/test-ssh-key\n\tStrictHostKeyChecking no\n" > ~/.ssh/config

  git clone git@github.com:mygithubaccount/My-App.git