Skip to main content

Logging

Audio files can be logged via a file path or a NumPy array containing audio data shaped as frames × channels. To log audio, instantiate the pluto.Audio class:
audio = pluto.Audio(
    data=Union[str, np.ndarray],
    rate=int | None = 48000,
    caption=str | None = None,
)
pluto.log({"audio/0": audio}, step=step)
ParameterTypeDescription
dataUnion[str, np.ndarray]The audio data to log. Can be a path to an audio file or a NumPy array.
rateintThe sample rate of the audio data. Defaults to 48000.
captionstrA caption for the audio.

Examples

Logging from File Paths

import httpx
r = httpx.get(
    "https://actions.google.com/sounds/v1/alarms/digital_watch_alarm_long.ogg"
)
with open(f"test.ogg", "wb") as f:
    f.write(r.content)

pluto.log({"audio": pluto.Audio(data="test.ogg")}, step=step)

Logging from NumPy Arrays

data = np.array([[1, 1, 1], [1, 1, 1]], dtype=np.float32)
pluto.log({"audio": pluto.Audio(data=data)}, step=step)

Viewing

Logged audio files appear as player widgets with playback controls, volume adjustment, and per-file Analyze and Download buttons. When comparing multiple runs, each audio card shows the run name with its assigned color. Audio widget with multiple runs

Playback Controls

Each audio player includes:
  • Play / Pause with a progress slider you can drag to seek
  • Skip forward / back buttons (5-second jumps)
  • Volume slider with mute toggle
  • Download — saves the audio file locally
  • Analyze — opens the audio analysis dialog (see below)

Step Navigation

If you log audio at multiple training steps, use the step slider below the players to browse through different steps. This is useful for tracking how generated audio (e.g., text-to-speech) improves over the course of training. When multiple audio groups are displayed in the same section, their step sliders can be linked so that changing the step on one group changes all of them simultaneously. Click the lock icon on the step navigator to toggle sync on or off.

Fullscreen View

Click the expand button on any audio card’s toolbar to open it in fullscreen. The fullscreen view displays the full multi-run comparison at viewport size. Use arrow keys to navigate between steps.

Audio Analysis

Click Analyze on any audio player to open a dialog with three tabs:
  • Spectrum — Real-time frequency spectrum visualization that animates during playback
  • Waveform — Time-domain waveform drawn from the audio buffer
  • Statistics — Peak amplitude, RMS level, duration, sample rate, number of channels, and dynamic range