Fish Audio S2 Pro

此 Soniqo 页面记录本地 speech-swift / speech-core 实现中的 Fish Audio S2 Pro。Hugging Face 包链接放在集成说明之后。

先进入站内页面

首页卡片和文档菜单先指向这里；源模型和权重包链接仍在本页提供。

概览

模型	Fish Audio S2 Pro
用途	Experimental multilingual TTS with raw-reference cloning and style markers
后端	MLX fp16
输出	44.1 kHz mono PCM
语言	Multilingual
许可证	Research / non-commercial bundle; obtain Fish Audio license for commercial exposure
状态	Programmatic runtime; CLI integration is still pending
来源	Fish Audio S2 Pro
Swift 产品	`FishAudioTTS`
CLI / 运行时	Programmatic runtime today; planned speech speak --engine fish-audio

使用

下面的片段对应当前 speech-swift 仓库暴露的 API 或命令。

import FishAudioTTS

let model = try await FishAudioTTSModel.fromPretrained()
let pcm = try await model.generate(
    text: "आज मैं बहुत खुश हूँ। [excited]",
    referenceAudioURL: URL(fileURLWithPath: "reference.wav"),
    referenceText: "नमस्ते, यह संदर्भ आवाज है।"
)

模型链接

实现说明

Download uses the newer explicit byte-weighted file manifest, so progress reflects real transferred bytes across the multi-GB shards.
Control markers include [pause], [emphasis], [laughing], [excited], [angry], [whisper], [screaming], [shouting], [surprised], and [sad].
The runtime generates 10 DAC codebook rows, then decodes generated or reference-conditioned codebooks through FishAudioCodec.
Current tests cover bundle loading, codebook generation, codec encode/decode, Hindi cloning, and ASR round-trip gates.