OmniVoice

توثق هذه الصفحة من Soniqo نموذج OmniVoice كما هو منفذ في speech-swift / speech-core. روابط Hugging Face موجودة أدناه بعد ملاحظات الدمج.

الصفحة الداخلية أولا

بطاقات الصفحة الرئيسية وقوائم الوثائق تشير إلى هذه الصفحة أولا؛ وتبقى روابط النموذج والحزم داخلها.

لمحة سريعة

النموذج	OmniVoice
الدور	Massively multilingual zero-shot voice-cloning TTS
Backend	MLX int8 default bundle; fp16 bundle available
الإخراج	24 kHz mono waveform
اللغات	600+ languages
الرخصة	Apache-2.0 upstream family
الحالة	Programmatic speech-swift runtime used by Studio sidecar
المصدر	k2-fsa OmniVoice
منتج Swift	`OmniVoiceTTS`
CLI / runtime	Programmatic runtime; not a primary speech speak engine yet

الاستخدام

المقتطف أدناه يطابق API أو الأمر الحالي في speech-swift.

import OmniVoiceTTS

let model = try await OmniVoiceTTSModel.fromPretrained()
let pcm = try model.generate(
    text: "A new sentence in the reference speaker's voice.",
    referenceAudio: URL(fileURLWithPath: "reference.wav"),
    referenceText: "This is the reference voice.",
    language: "en"
)

روابط النموذج

ملاحظات التنفيذ

Download repairs incomplete caches by checking the backbone, tokenizer files, and audio_tokenizer model before loading.
Generation iteratively unmasks eight acoustic codebooks with classifier-free guidance.
The runtime combines a bidirectional Qwen3 backbone with a Higgs-audio v2 codec encoder/decoder.
Optional instructions cover restricted style controls such as accent, age, gender, pitch, and whisper.