Fish Audio S2 Pro

이 Soniqo 페이지는 로컬 speech-swift / speech-core 구현의 Fish Audio S2 Pro을 설명합니다. Hugging Face 번들 링크는 통합 메모 뒤에 있습니다.

내부 페이지 우선

랜딩 카드와 문서 메뉴는 먼저 이 페이지로 이동하고, 원본 모델과 번들 링크는 이 페이지 안에 둡니다.

개요

모델	Fish Audio S2 Pro
역할	Experimental multilingual TTS with raw-reference cloning and style markers
백엔드	MLX fp16
출력	44.1 kHz mono PCM
언어	Multilingual
라이선스	Research / non-commercial bundle; obtain Fish Audio license for commercial exposure
상태	Programmatic runtime; CLI integration is still pending
소스	Fish Audio S2 Pro
Swift 제품	`FishAudioTTS`
CLI / 런타임	Programmatic runtime today; planned speech speak --engine fish-audio

사용

아래 스니펫은 현재 speech-swift 저장소의 API 또는 명령과 일치합니다.

import FishAudioTTS

let model = try await FishAudioTTSModel.fromPretrained()
let pcm = try await model.generate(
    text: "आज मैं बहुत खुश हूँ। [excited]",
    referenceAudioURL: URL(fileURLWithPath: "reference.wav"),
    referenceText: "नमस्ते, यह संदर्भ आवाज है।"
)

모델 링크

구현 메모

Download uses the newer explicit byte-weighted file manifest, so progress reflects real transferred bytes across the multi-GB shards.
Control markers include [pause], [emphasis], [laughing], [excited], [angry], [whisper], [screaming], [shouting], [surprised], and [sad].
The runtime generates 10 DAC codebook rows, then decodes generated or reference-conditioned codebooks through FishAudioCodec.
Current tests cover bundle loading, codebook generation, codec encode/decode, Hindi cloning, and ASR round-trip gates.