Stable Audio 3

Trang Soniqo này ghi lại Stable Audio 3 theo triển khai hiện có trong speech-swift / speech-core. Liên kết Hugging Face nằm bên dưới phần ghi chú tích hợp.

Ưu tiên trang nội bộ

Thẻ trên trang chính và menu tài liệu trỏ vào đây trước; liên kết model nguồn và bundle vẫn có trong trang này.

Tổng quan

Model	Stable Audio 3
Vai trò	Text-to-music generation
Backend	MLX, Medium DiT int8 default with int4 variant available
Đầu ra	44.1 kHz stereo Float PCM
Ngôn ngữ	Prompt language depends on the T5Gemma text encoder
Giấy phép	Stable Audio model terms apply
Trạng thái	Default speech compose engine for Stable Audio 3 Medium
Nguồn	Stability AI Stable Audio 3
Sản phẩm Swift	`StableAudio3MusicGen`
CLI / runtime	`speech compose --engine sa3`

Cách dùng

Đoạn dưới đây khớp với API hoặc lệnh hiện tại do speech-swift cung cấp.

# Generate 30 seconds of 44.1 kHz stereo audio.
.build/release/speech compose "lofi house loop" \
  --engine sa3 \
  --sa3-variant medium-int8 \
  --seconds 30 \
  -o music.wav

Liên kết model

Ghi chú triển khai

Download is already componentized into DiT, SAME encoder/decoder, and T5Gemma directories; moving it to byte-weighted progress would match the faster Fish path.
Medium DiT uses 24 layers, 1536 hidden size, differential attention, T5Gemma conditioning, and SAME-L decode.
Small Music and Small SFX bundle IDs exist, but the current Swift port wires the Medium family first.
Length is variable: latent steps are ceil(seconds * 44100 / 4096), then output is cropped to the requested duration.