1. Docs
  2. Developers
  3. AI systems

AI overview

The local inference stack, model tiers, gating, and what is real versus stubbed.

The AI subsystem is shipped but experimental. It is fully local by default, off by default, and gated behind one setting. This page is the map; RAG and Skills and tools go deeper.

The stack

Inference runs through llama.cpp server processes, not in-process bindings. LlamaCppServerManager spawns llama-server.exe per model role and LlamaCppHttpTextService talks to it over the local OpenAI-compatible HTTP API. Processes start on the first generation request, never at app boot, and ResourceGovernor serializes model use and unloads idle models after the AI.UnloadTimeout.

Models live under %LocalAppData%\mnemo\models\ (lowercase root), scanned by ModelRegistry:

RoleFolderPortPurpose
Managertext/manager8000Routing, skill classification, conversation summaries
Lowtext/low8001Fast chat, vision when an mmproj is present
Midtext/mid8002Reasoning tier
Hightext/high8003Reasoning tier for strong hardware
Embeddingembedding/bge-smallnoneONNX, see RAG
STTaudio/STTnoneWhisper tiny model for dictation

AIModelsSetupService downloads model zips from the project’s GitHub releases; HardwareDetector and HardwareTierEvaluator pick a recommended tier. GPU use is a setting (AI.GpuAcceleration).

There are no OpenAI, Anthropic, or Ollama clients in the codebase, and no API key surface. The one cloud path is the developer-only Vertex Gemini “teacher” client used for dataset work, gated behind hidden developer settings plus explicit credentials.

A chat turn

flowchart TD
  UI["ChatView / right sidebar"] --> CSH[ChatStreamingHelper]
  CSH --> ORC[AIOrchestrator]
  ORC --> OLS["OrchestrationLayerService<br/>routing + skill selection"]
  OLS --> SPC["SkillSystemPromptComposer<br/>prompt + tool schemas"]
  ORC --> TGS[DelegatingTextGenerationService]
  TGS --> LLM["LlamaCppHttpTextService<br/>localhost"]
  ORC --> TD[ToolDispatcher]
  TD --> FR["IFunctionRegistry<br/>feature tool handlers"]
  ORC --> MEM["ConversationMemoryInjector<br/>summaries + recall"]

The manager model (or the teacher, for developers) classifies each message and picks which skills to inject. The main model then streams, possibly emitting tool calls; ToolDispatcher executes them and feeds results back, up to eight rounds. Conversation memory summarizes every three turns and adds vector recall for long chats.

Gating

Everything hangs off AI.EnableAssistant, default false. The setting controls the Chat and Learning Path sidebar entries, navigation gating of those routes, the right-sidebar assistant, and lazy loading of tools and skill manifests through AiAssistantToolHost. Disabling the assistant unloads tools and redirects away from AI routes.

Functional versus stubbed

Verified against the code; useful before promising anything:

CapabilityStatus
Local streaming chat with toolsworks
Learning path generationworks
Whisper dictationworks once the STT model is installed
Vision attachmentsworks with the optional image model variants
Conversation memoryworks
Chat file attachments feeding RAGnot wired; files attach in UI but are not ingested
create_learning_path AI tooldisabled in its manifest
Flashcard generation, OCR, note summarizationdo not exist
AIOrchestrator.GetRagContextAsyncdead code, never called

Privacy

User content, chat history, embeddings, and inference all stay on the machine. Network traffic from the AI stack is limited to model downloads from GitHub. Opt-in developer features (teacher model, dataset logging) are the only exceptions and are off by default.

Where the code lives

ConcernPath
OrchestrationMnemo.Infrastructure/Services/AI/ (AIOrchestrator, OrchestrationLayerService, ToolDispatcher)
InferenceLlamaCppServerManager.cs, LlamaCppHttpTextService.cs (same folder)
Models and setupMnemo.Infrastructure/Services/ModelRegistry.cs, AIModelsSetupService.cs
Chat UIMnemo.UI/Modules/Chat/, Mnemo.UI/Components/RightSidebar/, Mnemo.UI/Services/ChatStreamingHelper.cs
GatingMnemo.UI/Services/AiAssistantToolHost.cs, Mnemo.UI/Services/NavigationService.cs
Training tooling (not shipped)training/, docs/datasets/ in the main repo