OmniVoice
यह Soniqo पेज स्थानीय speech-swift / speech-core implementation में OmniVoice को दस्तावेज़ करता है। Hugging Face bundle links integration notes के बाद दिए गए हैं।
पहले आंतरिक पेज
Landing cards और docs menus पहले इसी पेज पर आते हैं; source model और bundle links यहीं उपलब्ध रहते हैं।
सारांश
| मॉडल | OmniVoice |
|---|---|
| भूमिका | Massively multilingual zero-shot voice-cloning TTS |
| Backend | MLX int8 default bundle; fp16 bundle available |
| Output | 24 kHz mono waveform |
| भाषाएँ | 600+ languages |
| लाइसेंस | Apache-2.0 upstream family |
| स्थिति | Programmatic speech-swift runtime used by Studio sidecar |
| Source | k2-fsa OmniVoice |
| Swift product | OmniVoiceTTS |
| CLI / runtime | Programmatic runtime; not a primary speech speak engine yet |
उपयोग
नीचे का snippet मौजूदा speech-swift API या command से मेल खाता है।
import OmniVoiceTTS
let model = try await OmniVoiceTTSModel.fromPretrained()
let pcm = try model.generate(
text: "A new sentence in the reference speaker's voice.",
referenceAudio: URL(fileURLWithPath: "reference.wav"),
referenceText: "This is the reference voice.",
language: "en"
)
मॉडल लिंक
implementation notes
- Download repairs incomplete caches by checking the backbone, tokenizer files, and audio_tokenizer model before loading.
- Generation iteratively unmasks eight acoustic codebooks with classifier-free guidance.
- The runtime combines a bidirectional Qwen3 backbone with a Higgs-audio v2 codec encoder/decoder.
- Optional instructions cover restricted style controls such as accent, age, gender, pitch, and whisper.