RCLI runs a complete AI pipeline entirely on your Mac using Apple Silicon with Metal GPU acceleration. All models are stored locally and run without any cloud dependency.Documentation Index
Fetch the complete documentation index at: https://mintlify.com/yocxy2/RCLI/llms.txt
Use this file to discover all available pages before exploring further.
Model Architecture
RCLI uses 5 model types working together in a multi-threaded pipeline:STT (Speech-to-Text)
Converts voice input to text using Zipformer (streaming) or Whisper/Parakeet (offline)
LLM (Language Model)
Processes requests and generates responses using Qwen3 or Liquid LFM2 models
TTS (Text-to-Speech)
Synthesizes natural speech output using Piper, Kokoro, or KittenTTS voices
VAD (Voice Activity)
Detects speech vs. silence in real-time using Silero VAD (0.6 MB)
Embeddings
Generates text embeddings for RAG using Snowflake Arctic Embed S (34 MB)
Storage Location
All models are stored in a single directory on your Mac:Total default install size: ~1 GB. Models are downloaded once during
rcli setup and persist across sessions.Default Models
The default model set installed byrcli setup is optimized for speed, quality, and disk efficiency:
| Component | Default Model | Size | Key Feature |
|---|---|---|---|
| LLM | Liquid LFM2 1.2B Tool | 731 MB | Excellent tool calling (~180 t/s) |
| STT (Streaming) | Zipformer | 50 MB | Real-time streaming for live mic |
| STT (Offline) | Whisper base.en | 140 MB | ~5% WER, fast batch transcription |
| TTS | Piper Lessac | 60 MB | Fast synthesis, clear English voice |
| VAD | Silero VAD | 0.6 MB | Real-time speech detection |
| Embeddings | Snowflake Arctic S | 34 MB | High-quality text embeddings for RAG |
Model Inference Engines
RCLI uses optimized inference engines for each model type:- llama.cpp — LLM + embedding inference with Metal GPU, KV caching, Flash Attention
- sherpa-onnx — STT, TTS, and VAD via ONNX Runtime with Metal acceleration
- USearch — HNSW vector index for fast hybrid retrieval (~4ms)
Context Windows
LLM models support different context sizes:| Model Family | Default Context | Max Context |
|---|---|---|
| Qwen3 / Qwen3.5 | 4,096 tokens | 32K - 262K tokens |
| Liquid LFM2 | 4,096 tokens | 128K tokens |
--ctx-size <n> flag.
Model Selection Persistence
Active model choices persist across sessions in:Next Steps
LLM Models
Compare all 9 language models with specs and benchmarks
STT Models
Explore speech-to-text options (Zipformer, Whisper, Parakeet)
TTS Models
Browse 6 TTS voices with quality ratings
Switching Models
Learn how to hot-swap models without restart