Components
The moving parts behind DocuMind, explained like you are reading the architecture with coffee in one hand and logs in the other.
DocuMind API
backend/app/main.pyFastAPI app that exposes instances, knowledge bases, resources, query, memory, and observability endpoints.
Primary endpoint groups
/health, /collections/instances, /knowledge-bases, /resources/search, /query, /search/advanced, /query/advanced/observability/scores, /observability/alertsRuntime Container
backend/app/runtime.pyDependency wiring for vector DB client, control-plane store, routing, ingestion, retrieval, agent, and observability services.
Wired services
Retrieval Layer
backend/app/services/retrieval.pySemantic retrieval plus hybrid fusion (`rrf`/`dbsf`) with metadata filters for grounded lookup.
Retrieval modes
It also supports filters (`eq`, `any_of`, `between`, `gt/gte/lt/lte`, `text`) so you can scope retrieval without writing weird post-filter logic.
Observability Layer
backend/app/services/observability.pyPer-query quality summary and alert endpoints so we can catch retrieval drift and hallucination spikes early.
Scores we track
Retrieval quality score (did we fetch the right chunks?)
Chunk relevance score (did those chunks actually answer the question?)
Hallucination rate (did answer claims stay grounded?)
DCLI Interface
backend/documind_cli.pyCLI-first interface for context-aware workflows (`instance_id + namespace_id`) with human and JSON bot output modes.
DCLI is intentionally context-aware. You can set active context once and stop copy-pasting IDs into every command like it is 2016.
MCP Server
backend/mcp_server/server.pyFastMCP tool surface that lets assistants search, ask, ingest, and manage context with safety guardrails.
MCP exposes the same power to AI clients, with guardrails for risky actions. The assistant gets tools, not unchecked admin access — on purpose.