Model Management

RCLI supports 20+ AI models across LLM, STT, and TTS modalities. Use these commands to manage your local model collection.

Commands Overview

rcli models              # Interactive model browser (all modalities)
rcli models llm          # Jump to LLM management
rcli models stt          # Jump to STT management
rcli models tts          # Jump to TTS (same as `rcli voices`)
rcli voices              # Manage TTS voices
rcli upgrade-llm         # Guided LLM upgrade
rcli upgrade-stt         # Upgrade to Parakeet TDT
rcli cleanup             # Remove unused models
rcli info                # Show active models and engine info

Interactive Model Browser

rcli models

Launches a full-screen TUI with:

LLM Models — 9 models (350M to 4B parameters)
STT Models — 2 offline models (Whisper, Parakeet)
TTS Voices — 6 voice models (11-103 speakers)

Up/Down — Navigate list
Enter — Select or download model
ESC — Close panel

Model States

Active — Currently loaded (green checkmark)
Installed — Available locally
Not installed — Available for download (grayed out)
Default — Included in rcli setup
Recommended — Best for most users

Switching Models

Press Enter on a model to:

LLM Hot-Swap (Runtime)

# In TUI: M → select Qwen3.5 2B → Enter
# Model switches immediately without restart

Switching to Qwen3.5 2B...
Switched to Qwen3.5 2B

The LLM is hot-swapped at runtime:

Unloads current model
Loads new model to Metal GPU
Re-detects model profile (Qwen3/LFM2/etc.)
Re-caches system prompt with correct tool format
Persists selection to ~/.rcli/config/model_selection.json

STT/TTS Selection (Next Launch)

# In TUI: M → select Parakeet TDT → Enter

Selected: Parakeet TDT. Restart RCLI to apply.

STT and TTS require a restart to take effect.

Downloading Models

Press Enter on an uninstalled model to download:

Downloading Qwen3.5 2B (1200 MB)...
[====================] 100%
Download complete!

Models are downloaded from Hugging Face via curl.

LLM Models

Model	Size	Speed	License	Features
LFM2 1.2B Tool	731 MB	~180 t/s	LFM Open	Tool calling, default
LFM2 350M	219 MB	~350 t/s	LFM Open	Fastest, 128K ctx
LFM2.5 1.2B Instruct	731 MB	~180 t/s	LFM Open	128K ctx
LFM2 2.6B	1.5 GB	~120 t/s	LFM Open	Better conversational
Qwen3 0.6B	456 MB	~250 t/s	Apache 2.0	Ultra-fast
Qwen3.5 0.8B	600 MB	~220 t/s	Apache 2.0	Qwen3.5 generation
Qwen3.5 2B	1.2 GB	~150 t/s	Apache 2.0	Recommended
Qwen3 4B	2.5 GB	~80 t/s	Apache 2.0	Smart reasoning
Qwen3.5 4B	2.7 GB	~75 t/s	Apache 2.0	Best small model, 262K ctx

Upgrade LLM

rcli upgrade-llm

Interactive wizard guides you through upgrading to a larger LLM:

  Upgrade LLM

  Current: LFM2 1.2B Tool (731 MB)

  Recommended upgrades:

    1. Qwen3.5 2B       1200 MB   Better reasoning
    2. Qwen3.5 4B       2700 MB   Best small model, 262K context
    3. LFM2 2.6B        1500 MB   Stronger conversational

  Select an option (1-3) or q to quit: 1

  Downloading Qwen3.5 2B (1200 MB)...
  [====================] 100%
  Download complete!

  Switch to Qwen3.5 2B now? (y/n): y
  Switched to Qwen3.5 2B.

STT Models

RCLI uses two STT models in parallel:

Zipformer (Streaming)

Purpose — Real-time transcription during speech
Accuracy — Good (~5% WER)
Speed — ~50ms latency
Size — ~50 MB
Included in — rcli setup (always active)

Offline STT Models

Model	Size	Accuracy	License	Features
Whisper base.en	140 MB	~5% WER	MIT	English, default
Parakeet TDT 0.6B v3	640 MB	~1.9% WER	CC-BY-4.0	25 languages, auto-punctuation

Upgrade STT

rcli upgrade-stt

Upgrades to Parakeet TDT (best accuracy):

  Upgrade STT

  Current: Whisper base.en (~5% WER, 140 MB)
  Upgrade: Parakeet TDT 0.6B v3 (~1.9% WER, 640 MB)

  Parakeet offers:
    • 3x better accuracy (~1.9% WER vs ~5%)
    • 25 languages (vs English-only)
    • Auto-punctuation
    • Slightly slower (~60ms vs ~40ms)

  Download Parakeet TDT? (y/n): y

  Downloading Parakeet TDT (640 MB)...
  [====================] 100%
  Download complete!

  Restart RCLI to use Parakeet TDT.

TTS Voices

rcli voices

Lists all TTS voices:

  Voices  (auto-detect)

  #  Voice                          Size      Arch      Speakers    Status
  1  Piper Lessac (default)         60 MB     Piper     1           * active
  2  Piper Amy                      60 MB     Piper     1           installed
  3  KittenTTS Nano                 90 MB     Kitten    8           not installed
  4  Matcha LJSpeech                100 MB    Matcha    1           not installed
  5  Kokoro English v0.19           310 MB    Kokoro    11          not installed
  6  Kokoro Multi-lang v1.1         500 MB    Kokoro    103         not installed

  Tip: Run `rcli voices` to switch voices.

Voice	Size	Speakers	License	Features
Piper Lessac	60 MB	1	MIT	Clear English, default
Piper Amy	60 MB	1	MIT	Warm female voice
KittenTTS Nano	90 MB	8	Apache 2.0	4M/4F voices
Matcha LJSpeech	100 MB	1	MIT	HiFi-GAN vocoder
Kokoro English v0.19	310 MB	11	Apache 2.0	Best English quality
Kokoro Multi-lang v1.1	500 MB	103	Apache 2.0	Chinese + English

Multi-Speaker Voices

KittenTTS and Kokoro support multiple speakers. Configure via ~/.rcli/config/tts.json:

{
  "model": "kokoro-en-v0_19",
  "speaker_id": 3
}

Cleanup Unused Models

rcli cleanup

Interactive TUI lists all installed models:

  Model Cleanup
  Arrow keys to navigate, ENTER to delete, ESC to close

   > Qwen3 0.6B  [LLM]  456 MB
     Whisper base.en  [STT]  140 MB
     Piper Amy  [TTS]  60 MB
     LFM2 1.2B Tool  [LLM]  731 MB (active)

  [Up/Down] navigate  [Enter] delete  [ESC] close

Press Enter to delete selected model:

Active models — Cannot be deleted (switch first)
Inactive models — Deleted immediately

Selection preferences are updated automatically.

Engine Info

rcli info

Displays active models and hardware:

  RCLI Engine Info

  Version: 0.4.0

  Models:
    LLM: Qwen3.5 2B (1200 MB)
    STT: Whisper base.en (offline) | Zipformer (streaming)
    TTS: Piper Lessac
    Embeddings: Snowflake Arctic Embed S (34 MB)

  Hardware:
    Chip: Apple M3 Max
    CPU: 14 cores (10P+4E)
    GPU: 30 cores
    RAM: 36 GB
    ANE: 16-core Neural Engine

  Paths:
    Models: ~/Library/RCLI/models
    Config: ~/.rcli/config
    Index: ~/Library/RCLI/index

Model Storage Locations

~/Library/RCLI/models/
  ├── qwen3.5-2b-q4_k_m.gguf            # LLM
  ├── lfm2-1.2b-tool-q4_k_m.gguf        # LLM
  ├── whisper-base-en/                  # STT
  │   ├── encoder.onnx
  │   ├── decoder.onnx
  │   └── tokens.txt
  ├── parakeet-tdt-0.6b-v3/             # STT
  ├── zipformer-streaming/              # STT (streaming)
  ├── piper-lessac-medium/              # TTS
  │   ├── model.onnx
  │   └── config.json
  ├── kokoro-en-v0_19/                  # TTS
  ├── silero-vad.onnx                   # VAD
  └── arctic-embed-s.gguf               # Embeddings

Model Selection Persistence

User preferences are saved to:

~/.rcli/config/model_selection.json

{
  "llm": "qwen3.5-2b",
  "stt": "parakeet-tdt-0.6b-v3",
  "tts": "piper-lessac-medium"
}

To reset to defaults:

rm ~/.rcli/config/model_selection.json

Benchmarking Models

Compare All LLMs

rcli bench --all-llm --suite llm

# Output:
--- LLM Benchmark (All Models) ---
  Qwen3 0.6B:      TTFT 18ms   250 tok/s
  Qwen3.5 0.8B:    TTFT 20ms   220 tok/s
  Qwen3.5 2B:      TTFT 25ms   150 tok/s
  LFM2 1.2B Tool:  TTFT 22ms   180 tok/s

Compare All TTS

rcli bench --all-tts --suite tts

# Output:
--- TTS Benchmark (All Voices) ---
  Piper Lessac:    142ms   0.8x RT
  Piper Amy:       138ms   0.7x RT
  Kokoro English:  189ms   1.1x RT

Troubleshooting

Model Download Fails

Error: Failed to download model
curl: (56) Recv failure: Connection reset by peer

Solution: Check internet connection, retry download

Out of Disk Space

Error: Not enough disk space (need 1.2 GB, have 500 MB)

Solution: rcli cleanup to free space

Model Not Found After Download

Error: Model file not found at ~/Library/RCLI/models/qwen3.5-2b.gguf

Solution: Re-run rcli models, download again

LLM Switch Fails

Failed to switch to Qwen3.5 2B
Error: llama_model_load: failed to load model

Solution: Model file may be corrupted, delete and re-download

Documentation Index

​Commands Overview

​Interactive Model Browser

​Navigation

​Model States

​Switching Models

​LLM Hot-Swap (Runtime)

​STT/TTS Selection (Next Launch)

​Downloading Models

​LLM Models

​Upgrade LLM

​STT Models

​Zipformer (Streaming)

​Offline STT Models

​Upgrade STT

​TTS Voices

​Multi-Speaker Voices

​Cleanup Unused Models

​Engine Info

​Model Storage Locations

​Model Selection Persistence

​Benchmarking Models

​Compare All LLMs

​Compare All TTS

​Troubleshooting

​Model Download Fails

​Out of Disk Space

​Model Not Found After Download

​LLM Switch Fails