Parakeet TDT

Parakeet TDT is NVIDIA's speech recognition model, adapted to run on Apple Silicon's Neural Engine via CoreML. It uses a FastConformer encoder paired with a Token-and-Duration Transducer (TDT) decoder for accurate, efficient transcription.

Architecture

The model is split across three CoreML model files that work together during inference:

Component	Description
Encoder	FastConformer — convolutional + self-attention layers for audio feature extraction
Decoder	Prediction network that maintains a text token history
Joint	Combines encoder and decoder outputs to produce token probabilities

The encoder is INT8 quantized for minimal memory footprint and fast Neural Engine execution. The decoder and joint network are small enough that quantization is not needed.

Model Variants

Model	Size	HuggingFace
Parakeet-TDT-0.6B (CoreML INT8)	500 MB	aufklarer/Parakeet-TDT-v3-CoreML-INT8-30s

Performance

Metric	Value
Real-time factor	~32x real-time on Apple Silicon Neural Engine
Compute target	Neural Engine (via CoreML)
Quantization	INT8

CLI Usage

Use the --engine parakeet flag to select Parakeet TDT instead of the default Qwen3-ASR:

.build/release/speech transcribe recording.wav --engine parakeet

CoreML vs MLX

Parakeet TDT uses CoreML to run on the Neural Engine, while Qwen3-ASR uses MLX to run on the Metal GPU. The two approaches have different trade-offs:

	Parakeet TDT (CoreML)	Qwen3-ASR (MLX)
Compute target	Neural Engine	Metal GPU
Speed	~32x real-time	~17x real-time
Architecture	FastConformer + TDT	Encoder-decoder transformer
Multilingual	English-focused	Multilingual
Quantization	INT8	4-bit (MLX)

Important

CoreML models run on the Neural Engine, which operates independently from the GPU. This means Parakeet TDT can run concurrently with GPU-based tasks like TTS without contention.

Streaming variant

For real-time dictation and live captioning, see Parakeet-EOU-120M — a smaller (120 MB) RNN-T variant with an explicit end-of-utterance head, designed to run incrementally on 640 ms audio chunks. It shares the same SentencePiece vocabulary as Parakeet TDT 0.6B but is optimized for sub-second partial latency rather than peak throughput.

Also available on Android, Linux & Windows via ONNX Runtime.