OmniVoice

此 Soniqo 页面记录本地 speech-swift / speech-core 实现中的 OmniVoice。Hugging Face 包链接放在集成说明之后。

先进入站内页面

首页卡片和文档菜单先指向这里；源模型和权重包链接仍在本页提供。

概览

模型	OmniVoice
用途	Massively multilingual zero-shot voice-cloning TTS
后端	MLX int8 default bundle; fp16 bundle available
输出	24 kHz mono waveform
语言	600+ languages
许可证	Apache-2.0 upstream family
状态	Programmatic speech-swift runtime used by Studio sidecar
来源	k2-fsa OmniVoice
Swift 产品	`OmniVoiceTTS`
CLI / 运行时	Programmatic runtime; not a primary speech speak engine yet

使用

下面的片段对应当前 speech-swift 仓库暴露的 API 或命令。

import OmniVoiceTTS

let model = try await OmniVoiceTTSModel.fromPretrained()
let pcm = try model.generate(
    text: "A new sentence in the reference speaker's voice.",
    referenceAudio: URL(fileURLWithPath: "reference.wav"),
    referenceText: "This is the reference voice.",
    language: "en"
)

模型链接

实现说明

Download repairs incomplete caches by checking the backbone, tokenizer files, and audio_tokenizer model before loading.
Generation iteratively unmasks eight acoustic codebooks with classifier-free guidance.
The runtime combines a bidirectional Qwen3 backbone with a Higgs-audio v2 codec encoder/decoder.
Optional instructions cover restricted style controls such as accent, age, gender, pitch, and whisper.