Parakeet TDT

Parakeet TDT is NVIDIA's speech recognition model, adapted to run on Apple Silicon's Neural Engine via CoreML. It uses a FastConformer encoder paired with a Token-and-Duration Transducer (TDT) decoder for accurate, efficient transcription.

Architecture

The model is split across three CoreML model files that work together during inference:

ComponentDescription
EncoderFastConformer — convolutional + self-attention layers for audio feature extraction
DecoderPrediction network that maintains a text token history
JointCombines encoder and decoder outputs to produce token probabilities

All three models are INT4 quantized for minimal memory footprint and fast Neural Engine execution.

Model Variants

ModelSizeHuggingFace
Parakeet-TDT-0.6B (CoreML INT4)315 MBaufklarer/Parakeet-TDT-v3-CoreML-INT4
Parakeet-TDT-0.6B (CoreML INT8)500 MBaufklarer/Parakeet-TDT-v3-CoreML-INT8

Performance

MetricValue
Real-time factor~32x real-time on Apple Silicon Neural Engine
Compute targetNeural Engine (via CoreML)
QuantizationINT4

CLI Usage

Use the --engine parakeet flag to select Parakeet TDT instead of the default Qwen3-ASR:

.build/release/audio transcribe recording.wav --engine parakeet

CoreML vs MLX

Parakeet TDT uses CoreML to run on the Neural Engine, while Qwen3-ASR uses MLX to run on the Metal GPU. The two approaches have different trade-offs:

Parakeet TDT (CoreML)Qwen3-ASR (MLX)
Compute targetNeural EngineMetal GPU
Speed~32x real-time~17x real-time
ArchitectureFastConformer + TDTEncoder-decoder transformer
MultilingualEnglish-focusedMultilingual
QuantizationINT44-bit (MLX)
Important

CoreML models run on the Neural Engine, which operates independently from the GPU. This means Parakeet TDT can run concurrently with GPU-based tasks like TTS without contention.