Parakeet TDT
Parakeet TDT is NVIDIA's speech recognition model, adapted to run on Apple Silicon's Neural Engine via CoreML. It uses a FastConformer encoder paired with a Token-and-Duration Transducer (TDT) decoder for accurate, efficient transcription.
Architecture
The model is split across three CoreML model files that work together during inference:
| Component | Description |
|---|---|
| Encoder | FastConformer — convolutional + self-attention layers for audio feature extraction |
| Decoder | Prediction network that maintains a text token history |
| Joint | Combines encoder and decoder outputs to produce token probabilities |
All three models are INT4 quantized for minimal memory footprint and fast Neural Engine execution.
Model Variants
| Model | Size | HuggingFace |
|---|---|---|
| Parakeet-TDT-0.6B (CoreML INT4) | 315 MB | aufklarer/Parakeet-TDT-v3-CoreML-INT4 |
| Parakeet-TDT-0.6B (CoreML INT8) | 500 MB | aufklarer/Parakeet-TDT-v3-CoreML-INT8 |
Performance
| Metric | Value |
|---|---|
| Real-time factor | ~32x real-time on Apple Silicon Neural Engine |
| Compute target | Neural Engine (via CoreML) |
| Quantization | INT4 |
CLI Usage
Use the --engine parakeet flag to select Parakeet TDT instead of the default Qwen3-ASR:
.build/release/audio transcribe recording.wav --engine parakeet
CoreML vs MLX
Parakeet TDT uses CoreML to run on the Neural Engine, while Qwen3-ASR uses MLX to run on the Metal GPU. The two approaches have different trade-offs:
| Parakeet TDT (CoreML) | Qwen3-ASR (MLX) | |
|---|---|---|
| Compute target | Neural Engine | Metal GPU |
| Speed | ~32x real-time | ~17x real-time |
| Architecture | FastConformer + TDT | Encoder-decoder transformer |
| Multilingual | English-focused | Multilingual |
| Quantization | INT4 | 4-bit (MLX) |
CoreML models run on the Neural Engine, which operates independently from the GPU. This means Parakeet TDT can run concurrently with GPU-based tasks like TTS without contention.