Environment Variables
Konduktor will pass on the following environment variables to enable distributed jobs easily, as in PyTorch.Environment variable | Description |
---|---|
MASTER_ADDR | The FQDN of the rank 0 worker. test-1234-workers-0-0.test-1234 |
NODE_HOST_IPS | A comma separated separated list of FQDN, test-1234-workers-0-0.test-1234,test-1234-workers-0-1.test-1234 |
RANK | The global rank within a job. |
NUM_NODES | The total number of nodes/workers |
NUM_GPUS_PER_NODE | The number of GPUs per node |