RCLI is organized into distinct modules for voice processing, RAG, actions, and the CLI. This guide explains the directory structure and how components interact.Documentation Index
Fetch the complete documentation index at: https://mintlify.com/yocxy2/RCLI/llms.txt
Use this file to discover all available pages before exploring further.
High-Level Architecture
- Engines — ML inference wrappers (STT, LLM, TTS, VAD, embeddings)
- Pipeline — Orchestrator coordinates data flow between engines
- RAG — Hybrid retrieval (vector + BM25) over local documents
- Actions — 43 macOS integrations via AppleScript and shell
- CLI — Interactive TUI and command-line interface
Directory Structure
src/ Modules
engines/
ML inference wrappers for each modality:engines/ Contents
engines/ Contents
| File | Purpose |
|---|---|
stt_engine.cpp/.h | Speech-to-text via sherpa-onnx (Zipformer, Whisper, Parakeet) |
llm_engine.cpp/.h | LLM inference via llama.cpp with Metal GPU |
tts_engine.cpp/.h | Text-to-speech via sherpa-onnx (Piper, Kokoro, KittenTTS) |
vad_engine.cpp/.h | Voice activity detection (Silero VAD) |
embedding_engine.cpp/.h | Text embeddings for RAG (Snowflake Arctic Embed) |
model_profile.cpp/.h | Model metadata, chat templates, tool call parsing |
- Each engine wraps a C API (
llama.cpp,sherpa-onnx) - Engines are initialized once and reused across queries
- Metal GPU acceleration for LLM and embeddings
- ONNX Runtime for STT/TTS/VAD
pipeline/
Orchestrates data flow between engines:pipeline/ Contents
pipeline/ Contents
| File | Purpose |
|---|---|
orchestrator.cpp/.h | Central class that owns all engines and coordinates the pipeline |
sentence_detector.cpp/.h | Accumulates LLM tokens and flushes complete sentences to TTS |
text_sanitizer.h | Removes non-speech text (markdown, XML tags) before TTS |
- Manages pipeline state (IDLE → LISTENING → PROCESSING → SPEAKING)
- Runs STT/LLM/TTS threads
- Dispatches tool calls to
ActionRegistry - Maintains conversation history with token-budget trimming
- System prompt KV caching for fast response
rag/
Hybrid retrieval system for local documents:rag/ Contents
rag/ Contents
| File | Purpose |
|---|---|
vector_index.cpp/.h | HNSW vector search via USearch |
bm25_index.cpp/.h | Full-text search with BM25 ranking |
hybrid_retriever.cpp/.h | Combines vector + BM25 via Reciprocal Rank Fusion |
document_processor.cpp/.h | Chunks documents (PDF, DOCX, TXT) into 512-token segments |
index_builder.cpp/.h | Builds and persists indices |
- Query is embedded via
embedding_engine - Vector search (HNSW) finds nearest chunks
- BM25 search finds keyword-matching chunks
- Results fused via RRF (Reciprocal Rank Fusion)
- Top-k chunks injected into LLM context
core/
Core types and utilities:core/ Contents
core/ Contents
| File | Purpose |
|---|---|
types.h | Shared types (ToolCall, ToolResult, PipelineState, etc.) |
ring_buffer.h | Lock-free ring buffer for zero-copy audio transfer |
memory_pool.h | Pre-allocated 64 MB arena (no runtime malloc) |
hardware_profile.h | Detects P-cores, E-cores, Metal GPU, RAM |
log.h | Logging macros (LOG_INFO, LOG_ERROR) |
base64.h | Base64 encoding/decoding |
string_utils.h | String manipulation utilities |
file_utils.h | File I/O helpers |
- Lock-free ring buffer — zero-copy audio passing between threads
- Pre-allocated memory pool — 64 MB arena allocated at init
- Hardware profiling — adapts thread count and GPU layers to hardware
audio/
CoreAudio microphone and speaker I/O:audio/ Contents
audio/ Contents
| File | Purpose |
|---|---|
audio_io.cpp/.h | CoreAudio input/output streams |
mic_permission.h/.mm | Microphone permission request (Objective-C) |
- 16 kHz mono capture for STT
- 24 kHz mono playback for TTS
- Buffer size: 512 samples (32ms at 16 kHz)
- Minimal latency configuration
tools/
Tool calling engine:tools/ Contents
tools/ Contents
| File | Purpose |
|---|---|
tool_engine.cpp/.h | Parses LLM tool calls and dispatches to ActionRegistry |
- LLM generates tool call in model-native format (e.g., Qwen3’s
<tool_call>) ToolEngineparses viaModelProfile::parse_tool_calls()- Dispatches to
ActionRegistry::execute() - Returns result to LLM
- Qwen3:
<tool_call>{...}</tool_call> - LFM2:
<|tool_call_start|>{...}<|tool_call_end|> - Generic JSON:
{"name": "...", "arguments": {...}}
bench/
Benchmark harness:bench/ Contents
bench/ Contents
| File | Purpose |
|---|---|
benchmark.cpp/.h | Runs STT, LLM, TTS, E2E, RAG, tools, memory benchmarks |
stt— Transcription latency and accuracyllm— Time to first token, throughput (tok/s)tts— Synthesis latencye2e— Voice-in to audio-out latencyrag— Retrieval latency (vector + BM25)tools— Tool calling accuracy and latencymemory— Peak memory usageall— All suites
actions/
macOS action implementations:actions/ Contents
actions/ Contents
| File | Purpose |
|---|---|
action_registry.cpp/.h | Registers actions and dispatches execution |
action_helpers.h | JSON parsing, string escaping utilities |
applescript_executor.cpp/.h | Executes AppleScript and shell commands |
register_all.cpp | Calls all registration functions |
| Category files: | |
notes_actions.cpp/.h | Apple Notes integration |
reminders_actions.cpp/.h | Reminders integration |
messages_actions.cpp/.h | Messages/iMessage |
app_control_actions.cpp/.h | Open/quit apps |
window_actions.cpp/.h | Window management |
system_actions.cpp/.h | System settings (volume, dark mode, lock) |
media_actions.cpp/.h | Spotify/Apple Music |
web_actions.cpp/.h | Web search |
browser_actions.cpp/.h | Safari/Chrome control |
clipboard_actions.cpp/.h | Clipboard read/write |
files_actions.cpp/.h | File search |
navigation_actions.cpp/.h | Maps integration |
communication_actions.cpp/.h | FaceTime |
api/
Public C API:api/ Contents
api/ Contents
| File | Purpose |
|---|---|
rcli_api.h | Public C API header (all engine functionality) |
rcli_api.cpp | API implementation |
rcli_init()— Initialize pipelinercli_query()— One-shot text queryrcli_start_listen()— Start continuous voice modercli_stop_listen()— Stop listeningrcli_cleanup()— Shutdown pipeline
cli/
CLI and TUI:cli/ Contents
cli/ Contents
| File | Purpose |
|---|---|
main.cpp | Entry point, argument parsing, command dispatch |
tui_dashboard.h | Interactive TUI dashboard (FTXUI) |
tui_app.h | TUI event loop |
actions_cli.h | Actions panel (browse, enable/disable, execute) |
model_pickers.h | Model management (LLM, STT, TTS) |
help.h | CLI help text |
setup_cmds.h | rcli setup, rcli cleanup commands |
visualizer.h | Waveform visualizer |
cli_common.h | Shared CLI utilities |
- Push-to-talk (SPACE bar)
- Models panel (M) — browse, download, hot-swap
- Actions panel (A) — enable/disable actions
- Benchmarks panel (B) — run performance tests
- RAG panel (R) — ingest documents
- Cleanup panel (D) — remove unused models
- Tool call trace (T) — debug LLM tool calls
models/
Model registries:models/ Contents
models/ Contents
| File | Purpose |
|---|---|
model_registry.h | LLM model definitions (id, URL, size, speed, tool calling) |
tts_model_registry.h | TTS voice definitions |
stt_model_registry.h | STT model definitions |
- Download URL (Hugging Face)
- Size (MB)
- Speed estimate (tokens/sec)
- Tool calling capability
- Default/recommended flags
test/
Test harness:test/ Contents
test/ Contents
| File | Purpose |
|---|---|
test_pipeline.cpp | Pipeline integration tests |
--actions-only— Fast, no models needed--llm-only— LLM inference tests--stt-only— STT transcription tests--tts-only— TTS synthesis tests--api-only— C API tests
Key Design Patterns
Orchestrator Pattern
TheOrchestrator class owns all engines and coordinates data flow:
src/pipeline/orchestrator.h
- Single source of truth for pipeline state
- Simplified thread coordination
- Easy to add new engines
Lock-Free Ring Buffer
Zero-copy audio transfer between threads:src/core/ring_buffer.h
- No mutex contention
- Zero-copy (pointers only)
- Fixed allocation (no runtime malloc)
Pre-Allocated Memory Pool
64 MB arena allocated at init:src/core/memory_pool.h
- No runtime malloc during inference
- Predictable latency
- Cache-friendly (contiguous memory)
System Prompt KV Caching
Reuses llama.cpp KV cache across queries:src/engines/llm_engine.cpp
- Avoids reprocessing system prompt (saves ~20-30ms)
- Lower latency for multi-turn conversations
Sentence-Level TTS Scheduling
TTS synthesizes complete sentences, not token-by-token:src/pipeline/sentence_detector.cpp
- Natural prosody (TTS sees full sentences)
- Double-buffered playback (next sentence synthesizes while current plays)
- Lower latency than waiting for full LLM response
Threading Model
Three threads run concurrently in live mode:STT Thread
- Captures mic audio via CoreAudio
- Runs Silero VAD to filter silence
- Detects speech endpoints
- Transcribes via Zipformer (streaming) or Whisper (batch)
- Signals LLM thread when transcription is ready
LLM Thread
- Waits for STT output (
std::condition_variable) - Generates tokens via llama.cpp with Metal GPU
- Parses tool calls and dispatches to
ActionRegistry - Feeds sentences to TTS via
SentenceDetector - Maintains conversation history with token-budget trimming
std::condition_variablefor thread wakeupstd::atomic<PipelineState>for state transitions- Lock-free ring buffers for audio transfer
Dependencies
Vendored (deps/)
Cloned byscripts/setup.sh:
- llama.cpp — LLM + embedding inference with Metal GPU
- sherpa-onnx — STT/TTS/VAD via ONNX Runtime
Fetched by CMake
Automatic viaFetchContent:
- USearch v2.16.5 — HNSW vector index (header-only)
- FTXUI v5.0.0 — Terminal UI library
macOS System Frameworks
- CoreAudio, AudioToolbox, AudioUnit
- Foundation, AVFoundation
- IOKit (hardware monitoring)
- Metal, MetalKit (GPU acceleration)
Build Outputs
Configuration Files
Runtime configuration stored in~/Library/RCLI/:
Next Steps
Building from Source
Build and install RCLI locally
Adding Actions
Extend RCLI with custom macOS actions
Contributing
Submit changes and improvements