Components

The moving parts behind DocuMind, explained like you are reading the architecture with coffee in one hand and logs in the other.

DocuMind API

backend/app/main.py

FastAPI app that exposes instances, knowledge bases, resources, query, memory, and observability endpoints.

System/health, /collections

Control Plane/instances, /knowledge-bases, /resources

Retrieval/search, /query, /search/advanced, /query/advanced

Quality/observability/scores, /observability/alerts

backend/app/runtime.py

Dependency wiring for vector DB client, control-plane store, routing, ingestion, retrieval, agent, and observability services.

vectordbstoreroutingingestionretrievalagentobservability

backend/app/services/retrieval.py

Semantic retrieval plus hybrid fusion (`rrf`/`dbsf`) with metadata filters for grounded lookup.

SemanticHybrid + RRFHybrid + DBSF

It also supports filters (`eq`, `any_of`, `between`, `gt/gte/lt/lte`, `text`) so you can scope retrieval without writing weird post-filter logic.

backend/app/services/observability.py

Per-query quality summary and alert endpoints so we can catch retrieval drift and hallucination spikes early.

Retrieval quality score (did we fetch the right chunks?)

Chunk relevance score (did those chunks actually answer the question?)

Hallucination rate (did answer claims stay grounded?)

backend/documind_cli.py

CLI-first interface for context-aware workflows (`instance_id + namespace_id`) with human and JSON bot output modes.

DCLI is intentionally context-aware. You can set active context once and stop copy-pasting IDs into every command like it is 2016.

backend/mcp_server/server.py

FastMCP tool surface that lets assistants search, ask, ingest, and manage context with safety guardrails.

MCP exposes the same power to AI clients, with guardrails for risky actions. The assistant gets tools, not unchecked admin access — on purpose.