Supertonic-3

此 Soniqo 页面记录本地 speech-swift / speech-core 实现中的 Supertonic-3。Hugging Face 包链接放在集成说明之后。

先进入站内页面

首页卡片和文档菜单先指向这里；源模型和权重包链接仍在本页提供。

概览

模型	Supertonic-3
用途	G2P-free multilingual text-to-speech
后端	CoreML (Apple Neural Engine / GPU) and LiteRT
输出	44.1 kHz mono Float32 PCM
语言	31 languages plus a neutral na tag
许可证	OpenRAIL-M weights, MIT code
状态	Ready for CoreML; LiteRT reference implementation in speech-core
来源	Supertone Supertonic-3
Swift 产品	`SupertonicTTS`
CLI / 运行时	Shared TTS pipeline / server integration; LiteRT C++ wrapper for edge runtimes

使用

下面的片段对应当前 speech-swift 仓库暴露的 API 或命令。

import SupertonicTTS

let tts = try await SupertonicTTSModel.fromPretrained()
let pcm = try tts.synthesize(
    text: "Hello from an on-device voice.",
    voiceId: "F1",
    language: "en"
)

模型链接

实现说明

No espeak, phonemizer, or lexicon: text is NFKD-normalized and mapped through a Unicode index table.
Download uses explicit CoreML package globs for the four graphs plus tokenizer, config, and voice style JSON files.
The Apple export uses dynamic latent length; the current LiteRT path uses fixed graph shapes and chunks longer text.
Voices are precomputed style presets F1-F5 and M1-M5; on-device cloning is out of scope because the style extractor is not released.