Speech Studio

Open-source Mac app for local voice cloning and multi-speaker dialog generation. Drop a voice sample, clone it, write a scene, synthesize — all on your laptop. No API keys, no cloud, no per-character pricing.

github.com/soniqo/speech-studio Apache 2.0 Install

A 30-second blind test: a real voice, the same voice cloned locally by Speech Studio, and the same voice cloned by ElevenLabs in the cloud. Can you tell which is which?

What it does

Voice cloning from a short reference — drop in a few seconds of speech, clone the voice locally.
Multi-speaker dialog generation — write a scene with multiple speakers, synthesize all of them in one pass.
Runs entirely on your Mac — VoxCPM2 via MLX, DeepFilterNet3 for noise suppression, no network required.
Open source under Apache 2.0 — fork it, embed it, build on it.

Requirements

macOS 15+ (Apple Silicon), Windows 10+ (x64), or Linux (x64)
Apple Silicon on Mac; any modern 64-bit CPU on Windows/Linux
8 GB RAM minimum (16 GB recommended)
~3–5 GB disk for the speech models (downloaded on first run)

Install

Download the build for your platform from GitHub Releases — macOS .dmg, Windows .msi/.exe, or Linux .deb/.AppImage — then launch it:

↓ Download latest All releases

The builds are unsigned: on macOS open via right-click → Open (or System Settings → Privacy & Security → Open anyway); on Windows choose More info → Run anyway in SmartScreen. First launch downloads the VoxCPM2 speech model (~2.75 GB on macOS, ~4.6 GB on Windows/Linux) and caches it; later launches reuse the cache.

Prefer the CLI?

The same voice cloning pipeline ships in the speech CLI: brew install speech, then speech speak --engine voxcpm2 --voxcpm2-ref-audio reference.wav -o cloned.wav "Hello, this is my cloned voice." — useful for scripting or pre-rendering batches. See the voice cloning guide for the full flow.

Status

Speech Studio is in active preview (v0.0.4), with installers for macOS, Windows, and Linux — macOS clones via MLX, Windows and Linux via speech-core's LiteRT VoxCPM2 engine. The source repo at github.com/soniqo/speech-studio tracks the GUI app; star/watch it for release notifications.

Runner Agent

Speech Studio creates and clones voices; Runner uses the same local speech stack to connect microphone input, VAD, speech-to-text, an on-device language model, and Supertonic TTS into a live voice companion.

Open Runner page Download Runner DMG

What it's built on

Speech Studio is a thin GUI on top of speech-swift, the open-source Swift library that ships every model used in the demo:

VoxCPM2 — the voice cloning model (zero-shot, short reference)
DeepFilterNet3 — denoise the reference + cloned output
Qwen3-ASR — align speech to text (used in the demo's blind-test build pipeline)
Forced Alignment — word-level timestamps for editing
Voice Cloning guide — full overview of the pipeline

Roadmap

Today: macOS, Windows, and Linux.
Next: signed & notarized builds (no Gatekeeper/SmartScreen prompts).
After that: deeper editing surface, plugin support for swappable cloning models.

Feedback

Open an issue at github.com/soniqo/speech-studio/issues — every one gets read.