Documentation Index
Fetch the complete documentation index at: https://mintlify.com/yocxy2/RCLI/llms.txt
Use this file to discover all available pages before exploring further.
RCLI supports 20+ AI models across LLM, STT, and TTS modalities. Use these commands to manage your local model collection.
Commands Overview
rcli models # Interactive model browser (all modalities)
rcli models llm # Jump to LLM management
rcli models stt # Jump to STT management
rcli models tts # Jump to TTS (same as `rcli voices`)
rcli voices # Manage TTS voices
rcli upgrade-llm # Guided LLM upgrade
rcli upgrade-stt # Upgrade to Parakeet TDT
rcli cleanup # Remove unused models
rcli info # Show active models and engine info
Interactive Model Browser
Launches a full-screen TUI with:
- LLM Models — 9 models (350M to 4B parameters)
- STT Models — 2 offline models (Whisper, Parakeet)
- TTS Voices — 6 voice models (11-103 speakers)
Navigation
Up/Down — Navigate list
Enter — Select or download model
ESC — Close panel
Model States
- Active — Currently loaded (green checkmark)
- Installed — Available locally
- Not installed — Available for download (grayed out)
- Default — Included in
rcli setup
- Recommended — Best for most users
Switching Models
Press Enter on a model to:
LLM Hot-Swap (Runtime)
# In TUI: M → select Qwen3.5 2B → Enter
# Model switches immediately without restart
Switching to Qwen3.5 2B...
Switched to Qwen3.5 2B
The LLM is hot-swapped at runtime:
- Unloads current model
- Loads new model to Metal GPU
- Re-detects model profile (Qwen3/LFM2/etc.)
- Re-caches system prompt with correct tool format
- Persists selection to
~/.rcli/config/model_selection.json
STT/TTS Selection (Next Launch)
# In TUI: M → select Parakeet TDT → Enter
Selected: Parakeet TDT. Restart RCLI to apply.
STT and TTS require a restart to take effect.
Downloading Models
Press Enter on an uninstalled model to download:
Downloading Qwen3.5 2B (1200 MB)...
[====================] 100%
Download complete!
Models are downloaded from Hugging Face via curl.
LLM Models
| Model | Size | Speed | License | Features |
|---|
| LFM2 1.2B Tool | 731 MB | ~180 t/s | LFM Open | Tool calling, default |
| LFM2 350M | 219 MB | ~350 t/s | LFM Open | Fastest, 128K ctx |
| LFM2.5 1.2B Instruct | 731 MB | ~180 t/s | LFM Open | 128K ctx |
| LFM2 2.6B | 1.5 GB | ~120 t/s | LFM Open | Better conversational |
| Qwen3 0.6B | 456 MB | ~250 t/s | Apache 2.0 | Ultra-fast |
| Qwen3.5 0.8B | 600 MB | ~220 t/s | Apache 2.0 | Qwen3.5 generation |
| Qwen3.5 2B | 1.2 GB | ~150 t/s | Apache 2.0 | Recommended |
| Qwen3 4B | 2.5 GB | ~80 t/s | Apache 2.0 | Smart reasoning |
| Qwen3.5 4B | 2.7 GB | ~75 t/s | Apache 2.0 | Best small model, 262K ctx |
Upgrade LLM
Interactive wizard guides you through upgrading to a larger LLM:
Upgrade LLM
Current: LFM2 1.2B Tool (731 MB)
Recommended upgrades:
1. Qwen3.5 2B 1200 MB Better reasoning
2. Qwen3.5 4B 2700 MB Best small model, 262K context
3. LFM2 2.6B 1500 MB Stronger conversational
Select an option (1-3) or q to quit: 1
Downloading Qwen3.5 2B (1200 MB)...
[====================] 100%
Download complete!
Switch to Qwen3.5 2B now? (y/n): y
Switched to Qwen3.5 2B.
STT Models
RCLI uses two STT models in parallel:
- Purpose — Real-time transcription during speech
- Accuracy — Good (~5% WER)
- Speed — ~50ms latency
- Size — ~50 MB
- Included in —
rcli setup (always active)
Offline STT Models
| Model | Size | Accuracy | License | Features |
|---|
| Whisper base.en | 140 MB | ~5% WER | MIT | English, default |
| Parakeet TDT 0.6B v3 | 640 MB | ~1.9% WER | CC-BY-4.0 | 25 languages, auto-punctuation |
Upgrade STT
Upgrades to Parakeet TDT (best accuracy):
Upgrade STT
Current: Whisper base.en (~5% WER, 140 MB)
Upgrade: Parakeet TDT 0.6B v3 (~1.9% WER, 640 MB)
Parakeet offers:
• 3x better accuracy (~1.9% WER vs ~5%)
• 25 languages (vs English-only)
• Auto-punctuation
• Slightly slower (~60ms vs ~40ms)
Download Parakeet TDT? (y/n): y
Downloading Parakeet TDT (640 MB)...
[====================] 100%
Download complete!
Restart RCLI to use Parakeet TDT.
TTS Voices
Lists all TTS voices:
Voices (auto-detect)
# Voice Size Arch Speakers Status
1 Piper Lessac (default) 60 MB Piper 1 * active
2 Piper Amy 60 MB Piper 1 installed
3 KittenTTS Nano 90 MB Kitten 8 not installed
4 Matcha LJSpeech 100 MB Matcha 1 not installed
5 Kokoro English v0.19 310 MB Kokoro 11 not installed
6 Kokoro Multi-lang v1.1 500 MB Kokoro 103 not installed
Tip: Run `rcli voices` to switch voices.
| Voice | Size | Speakers | License | Features |
|---|
| Piper Lessac | 60 MB | 1 | MIT | Clear English, default |
| Piper Amy | 60 MB | 1 | MIT | Warm female voice |
| KittenTTS Nano | 90 MB | 8 | Apache 2.0 | 4M/4F voices |
| Matcha LJSpeech | 100 MB | 1 | MIT | HiFi-GAN vocoder |
| Kokoro English v0.19 | 310 MB | 11 | Apache 2.0 | Best English quality |
| Kokoro Multi-lang v1.1 | 500 MB | 103 | Apache 2.0 | Chinese + English |
Multi-Speaker Voices
KittenTTS and Kokoro support multiple speakers. Configure via ~/.rcli/config/tts.json:
{
"model": "kokoro-en-v0_19",
"speaker_id": 3
}
Cleanup Unused Models
Interactive TUI lists all installed models:
Model Cleanup
Arrow keys to navigate, ENTER to delete, ESC to close
> Qwen3 0.6B [LLM] 456 MB
Whisper base.en [STT] 140 MB
Piper Amy [TTS] 60 MB
LFM2 1.2B Tool [LLM] 731 MB (active)
[Up/Down] navigate [Enter] delete [ESC] close
Press Enter to delete selected model:
- Active models — Cannot be deleted (switch first)
- Inactive models — Deleted immediately
Selection preferences are updated automatically.
Engine Info
Displays active models and hardware:
RCLI Engine Info
Version: 0.4.0
Models:
LLM: Qwen3.5 2B (1200 MB)
STT: Whisper base.en (offline) | Zipformer (streaming)
TTS: Piper Lessac
Embeddings: Snowflake Arctic Embed S (34 MB)
Hardware:
Chip: Apple M3 Max
CPU: 14 cores (10P+4E)
GPU: 30 cores
RAM: 36 GB
ANE: 16-core Neural Engine
Paths:
Models: ~/Library/RCLI/models
Config: ~/.rcli/config
Index: ~/Library/RCLI/index
Model Storage Locations
~/Library/RCLI/models/
├── qwen3.5-2b-q4_k_m.gguf # LLM
├── lfm2-1.2b-tool-q4_k_m.gguf # LLM
├── whisper-base-en/ # STT
│ ├── encoder.onnx
│ ├── decoder.onnx
│ └── tokens.txt
├── parakeet-tdt-0.6b-v3/ # STT
├── zipformer-streaming/ # STT (streaming)
├── piper-lessac-medium/ # TTS
│ ├── model.onnx
│ └── config.json
├── kokoro-en-v0_19/ # TTS
├── silero-vad.onnx # VAD
└── arctic-embed-s.gguf # Embeddings
Model Selection Persistence
User preferences are saved to:
~/.rcli/config/model_selection.json
{
"llm": "qwen3.5-2b",
"stt": "parakeet-tdt-0.6b-v3",
"tts": "piper-lessac-medium"
}
To reset to defaults:
rm ~/.rcli/config/model_selection.json
Benchmarking Models
Compare All LLMs
rcli bench --all-llm --suite llm
# Output:
--- LLM Benchmark (All Models) ---
Qwen3 0.6B: TTFT 18ms 250 tok/s
Qwen3.5 0.8B: TTFT 20ms 220 tok/s
Qwen3.5 2B: TTFT 25ms 150 tok/s
LFM2 1.2B Tool: TTFT 22ms 180 tok/s
Compare All TTS
rcli bench --all-tts --suite tts
# Output:
--- TTS Benchmark (All Voices) ---
Piper Lessac: 142ms 0.8x RT
Piper Amy: 138ms 0.7x RT
Kokoro English: 189ms 1.1x RT
Troubleshooting
Model Download Fails
Error: Failed to download model
curl: (56) Recv failure: Connection reset by peer
Solution: Check internet connection, retry download
Out of Disk Space
Error: Not enough disk space (need 1.2 GB, have 500 MB)
Solution: rcli cleanup to free space
Model Not Found After Download
Error: Model file not found at ~/Library/RCLI/models/qwen3.5-2b.gguf
Solution: Re-run rcli models, download again
LLM Switch Fails
Failed to switch to Qwen3.5 2B
Error: llama_model_load: failed to load model
Solution: Model file may be corrupted, delete and re-download