Fish Audio S2 Pro

यह Soniqo पेज स्थानीय speech-swift / speech-core implementation में Fish Audio S2 Pro को दस्तावेज़ करता है। Hugging Face bundle links integration notes के बाद दिए गए हैं।

पहले आंतरिक पेज

Landing cards और docs menus पहले इसी पेज पर आते हैं; source model और bundle links यहीं उपलब्ध रहते हैं।

सारांश

मॉडल	Fish Audio S2 Pro
भूमिका	Experimental multilingual TTS with raw-reference cloning and style markers
Backend	MLX fp16
Output	44.1 kHz mono PCM
भाषाएँ	Multilingual
लाइसेंस	Research / non-commercial bundle; obtain Fish Audio license for commercial exposure
स्थिति	Programmatic runtime; CLI integration is still pending
Source	Fish Audio S2 Pro
Swift product	`FishAudioTTS`
CLI / runtime	Programmatic runtime today; planned speech speak --engine fish-audio

उपयोग

नीचे का snippet मौजूदा speech-swift API या command से मेल खाता है।

import FishAudioTTS

let model = try await FishAudioTTSModel.fromPretrained()
let pcm = try await model.generate(
    text: "आज मैं बहुत खुश हूँ। [excited]",
    referenceAudioURL: URL(fileURLWithPath: "reference.wav"),
    referenceText: "नमस्ते, यह संदर्भ आवाज है।"
)

मॉडल लिंक

implementation notes

Download uses the newer explicit byte-weighted file manifest, so progress reflects real transferred bytes across the multi-GB shards.
Control markers include [pause], [emphasis], [laughing], [excited], [angry], [whisper], [screaming], [shouting], [surprised], and [sad].
The runtime generates 10 DAC codebook rows, then decodes generated or reference-conditioned codebooks through FishAudioCodec.
Current tests cover bundle loading, codebook generation, codec encode/decode, Hindi cloning, and ASR round-trip gates.