Architecture
How DocuMind moves from raw docs to grounded answers without feeding the LLM your entire company wiki every request.
System Architecture
The architecture is intentionally boring in the best way: clear boundaries, predictable flow, minimal magic. We ingest once, retrieve fast, answer with context, and score output quality so we know when things drift.
Ingest
Accept docs via dashboard or API
Parse
Normalize raw source into clean text
Chunk
Split into overlap-aware segments
Embed
Generate vectors using selected profile
Store
Save vectors in Actian + metadata in SQLite
Retrieve
Semantic or hybrid search with filters
Ground
Answer with sources, then score quality
Data Flows
Ingestion Flow
Upload docs, parse text, split into overlap-aware chunks (commonly around 512 tokens with overlap), embed, and upsert into Actian with metadata. Translation: we turn document chaos into searchable memory.
It supports docs and conversation-like sources, because yes, team knowledge also lives in chat logs and random transcript dumps.
Query Flow
User asks question, system embeds query, runs semantic or hybrid retrieval, picks top chunks, and injects only that context into the answer prompt.
This is the opposite of paste-everything-and-pray prompting. Cheaper, faster, and way less hallucinatory.
Memory Flow
Conversation history can be ingested into a dedicated `conversation_memory` index with session and user metadata. At query time we retrieve only relevant memory snippets, not the full transcript novel.
Net effect: the assistant feels consistent across sessions instead of acting like it had a hard reset every hour.
Observability Flow
Every query cycle gets scored for retrieval quality, chunk relevance, and hallucination rate. Alerts fire on threshold breaches so bad answers do not quietly become normal.
Because seeming fine in one demo is not a monitoring strategy.