Documentation Index
Fetch the complete documentation index at: https://docs.trainy.ai/llms.txt
Use this file to discover all available pages before exploring further.
Users have the following options for submitting their application code as a Trainy workload.
- (Re)building a docker image for every change and pushing it to a registry for launching a workload.
- Committing changes to a development branch in git and checking it out in the workload
- Synchronizing application code through file sync via
file_mounts and workdir definitions.
The first option is often slow given the size of deep learning images so we focus on the the latter two here.
Setup
Full setup for file sync requires cloud storage configuration which can be found here.
Konduktor mounts your cloud credentials into the job containers and places them in ~/.aws (S3) or ~/.config/gcloud (GS) at startup. If you plan to use command-line tools like aws s3, gsutil, or gcloud, ensure your image includes those CLIs or install them in your run: block.
We check our cloud service account credentials in the Trainy cluster with this:
$ konduktor check <CLOUD STORAGE ALIAS> # s3, gs, etc.
Afterwards we configure the storage provider by setting ~/.konduktor/config.yaml
# ~/.konduktor/config.yaml
allowed_clouds:
- gs # {s3, gs}
Usage
When we run konduktor launch two things happen atomically in this order. If any step fails, the workload will fast-fail.
workdir and file_mounts are synchronized to object storage
- workload is submitted
- workload, once active, will sync down
workdir and file_mounts
In our workload definition, we can define the following:
name: single-file-upload
num_nodes: 1
workdir: .
file_mounts:
# syntax is <remote_dir>:<local_dir>
~/test_dir: ./test_dir
# syntax is <remote_file>:<local_file>
~/static_path.txt: ./static_path.txt
resources:
cpus: 1
memory: 1
image_id: ubuntu
labels:
kueue.x-k8s.io/queue-name: user-queue
maxRunDurationSeconds: "600"
run: |
ls -lah
ls -lah ~/
.konduktorignore
Use a .konduktorignore file to exclude files and directories from being synchronized.
It works similarly to .gitignore, and is evaluated relative to the sync root.
Patterns in .konduktorignore are matched relative to the location.
Examples
Workdir
Place .konduktorignore at ./my_dir/.konduktorignore.
File mounts
file_mounts:
/remote: ./my_dir
Place .konduktorignore at ./my_dir/.konduktorignore.
Example .konduktorignore in ./my_dir/:
*.log # ignores *.log at the sync root (my_dir)
secret.txt # ignores secret.txt at the sync root (my_dir)
secret-dir1/** # ignores the entire secret-dir1/ subtree
secret-dir2/*.bin # ignores .bin files under secret-dir2/
Cloning private GitHub Repositories
Cloning private repositories is supported via both file sync of ssh keys to your object store or through secrets. This section demonstrates how to file sync an ssh key from our workstation onto the workload and configure SSH for pulling from a private repository.
name: private-repo-ssh
num_nodes: 1
resources:
cpus: 1
memory: 2
image_id: ubuntu
labels:
kueue.x-k8s.io/queue-name: user-queue
maxRunDurationSeconds: "3200"
file_mounts:
~/.ssh/test-ssh-key: ./tests/secrets/test-ssh-key
run: |
set -eux
apt-get update && apt-get install -y git openssh-client
if [[ -f ~/.ssh/test-ssh-key && -s ~/.ssh/test-ssh-key ]]; then
echo "SSH key mounted and non-empty"
else
echo "SSH key missing or empty"
exit 1
fi
chmod 600 ~/.ssh/test-ssh-key
echo -e "Host github.com\n\tIdentityFile ~/.ssh/test-ssh-key\n\tStrictHostKeyChecking no\n" > ~/.ssh/config
git clone git@github.com:mygithubaccount/My-App.git