Architecture

How DocuMind moves from raw docs to grounded answers without feeding the LLM your entire company wiki every request.

System Architecture

The architecture is intentionally boring in the best way: clear boundaries, predictable flow, minimal magic. We ingest once, retrieve fast, answer with context, and score output quality so we know when things drift.

Ingest

Accept docs via dashboard or API

Parse

Normalize raw source into clean text

Chunk

Split into overlap-aware segments

Embed

Generate vectors using selected profile

Store

Save vectors in Actian + metadata in SQLite

Retrieve

Semantic or hybrid search with filters

Ground

Answer with sources, then score quality

Data Flows

Ingestion Flow

Upload docs, parse text, split into overlap-aware chunks (commonly around 512 tokens with overlap), embed, and upsert into Actian with metadata. Translation: we turn document chaos into searchable memory.

It supports docs and conversation-like sources, because yes, team knowledge also lives in chat logs and random transcript dumps.

Query Flow

User asks question, system embeds query, runs semantic or hybrid retrieval, picks top chunks, and injects only that context into the answer prompt.

This is the opposite of paste-everything-and-pray prompting. Cheaper, faster, and way less hallucinatory.

Memory Flow

Conversation history can be ingested into a dedicated `conversation_memory` index with session and user metadata. At query time we retrieve only relevant memory snippets, not the full transcript novel.

Net effect: the assistant feels consistent across sessions instead of acting like it had a hard reset every hour.

Observability Flow

Every query cycle gets scored for retrieval quality, chunk relevance, and hallucination rate. Alerts fire on threshold breaches so bad answers do not quietly become normal.

Because seeming fine in one demo is not a monitoring strategy.