Get Started
Overview
Trainy-Konduktor: An ML/AI GPU platform for high performance batch jobs on k8s.
At a glance
- Simple resource declarations and
bash
UX
Run with:
-
**Out of the box observability: **📊 Trainy provides telemetry (prometheus metrics and logs) for tracking cluster performance, utilization, and health.
Getting Started
Examples
Many Parallel Jobs
Schedule multiple jobs in parallel for batch inference
Multi-node Jobs
Scale up the resources for a PyTorch distributed job with multiple machines
Interactive Workloads
SSH and connect VSCode to your GPU containers for debugging and dev via Tailscale
Observability
Monitor and troubleshoot your workloads with our curated Grafana dashboards