AI overview, Mnemo developers docs

The local inference stack, model tiers, gating, and what is real versus stubbed.

The AI subsystem is shipped but experimental. It is fully local by default, off by default, and gated behind one setting. This page is the map; RAG and Skills and tools go deeper.

The stack

Inference runs through llama.cpp server processes, not in-process bindings. LlamaCppServerManager spawns llama-server.exe per model role and LlamaCppHttpTextService talks to it over the local OpenAI-compatible HTTP API. Processes start on the first generation request, never at app boot, and ResourceGovernor serializes model use and unloads idle models after the AI.UnloadTimeout.

Models live under %LocalAppData%\mnemo\models\ (lowercase root), scanned by ModelRegistry:

Role	Folder	Port	Purpose
Manager	`text/manager`	8000	Routing, skill classification, conversation summaries
Low	`text/low`	8001	Fast chat, vision when an `mmproj` is present
Mid	`text/mid`	8002	Reasoning tier
High	`text/high`	8003	Reasoning tier for strong hardware
Embedding	`embedding/bge-small`	none	ONNX, see RAG
STT	`audio/STT`	none	Whisper tiny model for dictation

AIModelsSetupService downloads model zips from the project’s GitHub releases; HardwareDetector and HardwareTierEvaluator pick a recommended tier. GPU use is a setting (AI.GpuAcceleration).

There are no OpenAI, Anthropic, or Ollama clients in the codebase, and no API key surface. The one cloud path is the developer-only Vertex Gemini “teacher” client used for dataset work, gated behind hidden developer settings plus explicit credentials.

A chat turn

flowchart TD
  UI["ChatView / right sidebar"] --> CSH[ChatStreamingHelper]
  CSH --> ORC[AIOrchestrator]
  ORC --> OLS["OrchestrationLayerService<br/>routing + skill selection"]
  OLS --> SPC["SkillSystemPromptComposer<br/>prompt + tool schemas"]
  ORC --> TGS[DelegatingTextGenerationService]
  TGS --> LLM["LlamaCppHttpTextService<br/>localhost"]
  ORC --> TD[ToolDispatcher]
  TD --> FR["IFunctionRegistry<br/>feature tool handlers"]
  ORC --> MEM["ConversationMemoryInjector<br/>summaries + recall"]

The manager model (or the teacher, for developers) classifies each message and picks which skills to inject. The main model then streams, possibly emitting tool calls; ToolDispatcher executes them and feeds results back, up to eight rounds. Conversation memory summarizes every three turns and adds vector recall for long chats.

Gating

Everything hangs off AI.EnableAssistant, default false. The setting controls the Chat and Learning Path sidebar entries, navigation gating of those routes, the right-sidebar assistant, and lazy loading of tools and skill manifests through AiAssistantToolHost. Disabling the assistant unloads tools and redirects away from AI routes.

Functional versus stubbed

Verified against the code; useful before promising anything:

Capability	Status
Local streaming chat with tools	works
Learning path generation	works
Whisper dictation	works once the STT model is installed
Vision attachments	works with the optional image model variants
Conversation memory	works
Chat file attachments feeding RAG	not wired; files attach in UI but are not ingested
`create_learning_path` AI tool	disabled in its manifest
Flashcard generation, OCR, note summarization	do not exist
`AIOrchestrator.GetRagContextAsync`	dead code, never called

Privacy

User content, chat history, embeddings, and inference all stay on the machine. Network traffic from the AI stack is limited to model downloads from GitHub. Opt-in developer features (teacher model, dataset logging) are the only exceptions and are off by default.

Where the code lives

Concern	Path
Orchestration	`Mnemo.Infrastructure/Services/AI/` (`AIOrchestrator`, `OrchestrationLayerService`, `ToolDispatcher`)
Inference	`LlamaCppServerManager.cs`, `LlamaCppHttpTextService.cs` (same folder)
Models and setup	`Mnemo.Infrastructure/Services/ModelRegistry.cs`, `AIModelsSetupService.cs`
Chat UI	`Mnemo.UI/Modules/Chat/`, `Mnemo.UI/Components/RightSidebar/`, `Mnemo.UI/Services/ChatStreamingHelper.cs`
Gating	`Mnemo.UI/Services/AiAssistantToolHost.cs`, `Mnemo.UI/Services/NavigationService.cs`
Training tooling (not shipped)	`training/`, `docs/datasets/` in the main repo