Stable Audio 3

This first-party Soniqo page documents Stable Audio 3 from the local speech-swift / speech-core implementation. Hugging Face bundles are linked below after the integration notes.

Internal Page First

Landing cards and docs menus now point here first; source model and bundle links remain available from this page.

At a Glance

ModelStable Audio 3
RoleText-to-music generation
BackendMLX, Medium DiT int8 default with int4 variant available
Output44.1 kHz stereo Float PCM
LanguagesPrompt language depends on the T5Gemma text encoder
LicenseStable Audio model terms apply
StatusDefault speech compose engine for Stable Audio 3 Medium
SourceStability AI Stable Audio 3
Swift productStableAudio3MusicGen
CLI / runtimespeech compose --engine sa3

Use

The snippet below mirrors the current speech-swift API or command exposed by the repo.

# Generate 30 seconds of 44.1 kHz stereo audio.
.build/release/speech compose "lofi house loop" \
  --engine sa3 \
  --sa3-variant medium-int8 \
  --seconds 30 \
  -o music.wav

Model Links

Implementation Notes