Fish Audio S2 Pro
यह Soniqo पेज स्थानीय speech-swift / speech-core implementation में Fish Audio S2 Pro को दस्तावेज़ करता है। Hugging Face bundle links integration notes के बाद दिए गए हैं।
पहले आंतरिक पेज
Landing cards और docs menus पहले इसी पेज पर आते हैं; source model और bundle links यहीं उपलब्ध रहते हैं।
सारांश
| मॉडल | Fish Audio S2 Pro |
|---|---|
| भूमिका | Experimental multilingual TTS with raw-reference cloning and style markers |
| Backend | MLX fp16 |
| Output | 44.1 kHz mono PCM |
| भाषाएँ | Multilingual |
| लाइसेंस | Research / non-commercial bundle; obtain Fish Audio license for commercial exposure |
| स्थिति | Programmatic runtime; CLI integration is still pending |
| Source | Fish Audio S2 Pro |
| Swift product | FishAudioTTS |
| CLI / runtime | Programmatic runtime today; planned speech speak --engine fish-audio |
उपयोग
नीचे का snippet मौजूदा speech-swift API या command से मेल खाता है।
import FishAudioTTS
let model = try await FishAudioTTSModel.fromPretrained()
let pcm = try await model.generate(
text: "आज मैं बहुत खुश हूँ। [excited]",
referenceAudioURL: URL(fileURLWithPath: "reference.wav"),
referenceText: "नमस्ते, यह संदर्भ आवाज है।"
)
मॉडल लिंक
implementation notes
- Download uses the newer explicit byte-weighted file manifest, so progress reflects real transferred bytes across the multi-GB shards.
- Control markers include [pause], [emphasis], [laughing], [excited], [angry], [whisper], [screaming], [shouting], [surprised], and [sad].
- The runtime generates 10 DAC codebook rows, then decodes generated or reference-conditioned codebooks through FishAudioCodec.
- Current tests cover bundle loading, codebook generation, codec encode/decode, Hindi cloning, and ASR round-trip gates.