Trainy-Konduktor: An ML/AI GPU platform for high performance batch jobs on k8s.
At a glance
bash
UXRun with:
Out of the box observability: 📊 Trainy provides telemetry (prometheus metrics and logs) for tracking cluster performance, utilization, and health.
Getting Started
Schedule multiple jobs in parallel for batch inference
Scale up the resources for a PyTorch distributed job with multiple machines
SSH and connect VSCode to your GPU containers for debugging and dev via Tailscale
Monitor and troubleshoot your workloads with our curated Grafana dashboards
Trainy-Konduktor: An ML/AI GPU platform for high performance batch jobs on k8s.
At a glance
bash
UXRun with:
Out of the box observability: 📊 Trainy provides telemetry (prometheus metrics and logs) for tracking cluster performance, utilization, and health.
Getting Started
Schedule multiple jobs in parallel for batch inference
Scale up the resources for a PyTorch distributed job with multiple machines
SSH and connect VSCode to your GPU containers for debugging and dev via Tailscale
Monitor and troubleshoot your workloads with our curated Grafana dashboards