Experimental. This feature is new and may change in backwards-incompatible ways before it stabilizes. The KonduktorResult condition type, the $KONDUKTOR_OUTPUT_DIR/result.json contract, and the Job.result() / konduktor result surfaces are the most likely to shift based on early feedback.
Konduktor jobs can report a JSON-serializable value back to the launching process. This is useful for hyperparameter sweeps, Optuna trials, and any workflow where the launcher needs to read an objective / metric / artifact pointer from the run.
Job Handles
Every Konduktor-launched pod has:
- An
emptyDir volume mounted at /konduktor/output.
- An environment variable
KONDUKTOR_OUTPUT_DIR pointing at that directory.
To report a value, write JSON to $KONDUKTOR_OUTPUT_DIR/result.json:
# shell workload — no library install required
echo '{"val_loss": 0.42}' > "$KONDUKTOR_OUTPUT_DIR/result.json"
# python workload — stdlib only
import json, os
with open(f"{os.environ['KONDUKTOR_OUTPUT_DIR']}/result.json", "w") as f:
json.dump({"val_loss": 0.42}, f)
# python workload — with konduktor installed (optional sugar)
import konduktor
konduktor.report({"val_loss": 0.42})
Retrieving the value (Python API)
konduktor.launch() returns a Job handle (a str subclass, back-compatible with existing job_name = konduktor.launch(task) code). New methods:
import konduktor
task = konduktor.Task(
name="sweep-trial-0",
run='echo \'{"val_loss": 0.42}\' > "$KONDUKTOR_OUTPUT_DIR/result.json"',
)
task.set_resources(konduktor.Resources(cpus=1, memory=2, image_id="ubuntu"))
job = konduktor.launch(task)
value = job.result(timeout=600) # blocks until the job terminates
print(value) # {"val_loss": 0.42}
job.wait(timeout=...) returns the terminal state ("succeeded" or "failed") without parsing a value.
Retrieving the value (CLI)
konduktor launch --wait blocks until completion and prints the result JSON:
$ konduktor launch --wait -y tests/test_yamls/return_value_shell.yaml
{"val_loss": 0.123}
konduktor result <job-name> fetches the result of a terminal job:
$ konduktor result sweep-trial-0-abcd
{"val_loss": 0.42}
Both commands exit non-zero on job failure, malformed result, or timeout.
Failure modes
| Error | Raised when |
|---|
JobFailedError | Job exited non-zero or never reported a result |
ResultTooLargeError | Result exceeds 4 KB (kubelet’s terminationMessagePath cap) |
MalformedResultError | result.json wasn’t valid JSON |
NoResultReportedError | Job succeeded but didn’t write result.json |
ResultTimeoutError | timeout elapsed before the job was terminal |
If your result is larger than 4 KB, return a summary (e.g. a checkpoint path), not the full artifact.
Multi-node jobs
In a multi-node JobSet, only worker 0 of the first replicated job reports. Other workers’ termination messages are ignored.
Use case: Optuna sweep
The primitive pairs naturally with an Optuna hyperparameter sweep:
import konduktor
import optuna
study = optuna.create_study(direction="minimize")
for _ in range(50):
trial = study.ask()
lr = trial.suggest_float("lr", 1e-5, 1e-2, log=True)
task = build_task(lr=lr)
job = konduktor.launch(task, detach_run=True)
try:
val_loss = job.result(timeout=3600)["val_loss"]
study.tell(trial, val_loss)
except konduktor.JobFailedError:
study.tell(trial, state=optuna.trial.TrialState.FAIL