Architecture
An engineering deep-dive into how Vinculum is built — for engineers evaluating the system or wanting to understand its internals before contributing.
Stack overview
| Layer | Technology |
|---|---|
| MCP server + REST API | Python 3.12, FastMCP 3.2.2 (FastAPI/Starlette), port 31415 |
| Database | PostgreSQL 17 + pgvector extension |
| Dashboard frontend | Next.js 16, React 19, Tailwind v4, TypeScript |
| Real-time push | PostgreSQL LISTEN/NOTIFY → FastAPI SSE → React |
| Semantic search | Voyage AI embeddings stored as pgvector columns |
| AI intelligence | Anthropic Haiku (delta classification, auto-titles) |
Two processes run in production: the Python MCP server (with embedded REST API) and the Next.js frontend. In the Docker Compose setup they share a network; the frontend proxies /api/* and /mcp to the Python server.
Database schema
All tables live in the vinculumschema. PostgreSQL is the coordination substrate — sessions don't share a chat thread, they share the database. See Concepts → Substrate for the higher-level framing.
Core tables
vinculum.entries — the primary data structure. Every decision, spec, note, question, implementation, and checkpoint is a typed entry.
| Column | Type | Notes |
|---|---|---|
id | bigint | Sequential ID |
uuid | uuid | Stable external reference |
project_id | text | Multi-tenant key (default: vinculum) |
branch | text | One of 6 branches per project |
thread_slug | text | Groups entries into threads |
entry_type | text | decision, spec, note, question, implementation, … |
content | text | Full markdown content |
metadata | jsonb | Structured payload: target, priority, links, acceptance criteria |
tsv | tsvector | Generated column for full-text search |
embedding | vector(1024) | Voyage AI embedding for semantic similarity search |
delta | int | Focus-distance classification: 0 = on-focus, 3 = off-topic |
author | text | Session label of the writing session |
superseded_by | bigint | Points to the replacement entry |
vinculum.threads — groups entries. Composite PK (project_id, slug). Status: open, blocked, closed.
vinculum.entry_links — directed graph edges between entries. Relation is one of: supersedes, references, blocks, implements, contradicts, replies_to.
vinculum.sessions — connected Claude clients. Tracks focus (branch/thread), display color, declared focus label, last activity.
vinculum.projects — multi-tenant. Default project is vinculum with 6 branches: growth, platform, product, design, content, factory.
vinculum.attention_items — items surfaced to the human. Severity: info, warning, critical.
vinculum.audit_log — every MCP tool invocation with args and timing. vinculum.media — uploaded images stored as BYTEA.
Spawner tables
vinculum.spawn_requests — the spawn state machine. Status transitions: pending → claimed → running → completed / failed. Contains spawn_uuid, directive_id, host, tmux_target, and a rich metadata payload.
vinculum.spawn_log — one row per spawned grunt with tmux_target, session_id, role, directive_id. Indexed for claim_spawn lookups.
vinculum.trust_profiles — role-keyed allow/deny pattern lists. Seeds 4 defaults at install time: colonel (full), builder (constrained shell), historian (read-only), critic (review-only).
vinculum.queued_prompts — prompts queued for sequential delivery into an active grunt session.
Spawner architecture
Vinculum supports two spawn mechanisms. Both are production-ready; the right choice depends on your deployment topology. See also Concepts → Spawning.
Systemd-template path (Linux primary, post-#2338)
The canonical path for Linux installs where the MCP server runs on the same machine that will execute grunts.
spawn_grunt tool call
→ server writes spawn_requests row (status=pending)
→ PostgreSQL NOTIFY vinculum_spawn_request
→ server receives LISTEN notification
→ systemctl --user start vinculum-grunt@{spawn_uuid}.service
→ systemd template unit starts claude in a new tmux window
→ grunt calls claim_spawn as first tool callThe systemd template unit (vinculum-grunt@.service) handles logging, restart-on-crash, and resource caps. It's the fastest path because there is no polling — the NOTIFY fires in under a millisecond.
Linger required
The systemd path requires linger so the user session survives logout: sudo loginctl enable-linger $USER
Daemon path (cross-platform, #2839)
Used when the MCP server is containerized or running on macOS.
spawn_grunt tool call
→ server writes spawn_requests row (status=pending)
→ vinculum-spawnd daemon polls the table (or receives NOTIFY)
→ daemon drains pending rows, launches claude via Popen/tmux
→ grunt claims spawn_uuid, session registeredvinculum-spawnd runs as a systemd user unit on Linux, a launchd agent on macOS, or a startup script as a last resort. Install via hooks/install-spawnd.sh — the script auto-detects the platform.
Why both mechanisms?
The systemd path is faster and more reliable on Linux. The daemon path is the only option for containerized deployments and macOS. Self-hosted Linux users typically use the systemd path; the daemon exists for every other topology.
MCP intelligence layer
Vinculum's AI features use a sampling-via-client model: the MCP server never directly calls the Anthropic API on its own key. Instead, it uses the MCP sampling capability — sending sampling/createMessagerequests back to the connected Claude client, which routes inference through the client's API key. This is the structural cost advantage described in #418: the server has no inference bill.
Delta classification
On every writecall, the server classifies the entry's semantic distance from the session's declared focus (0 = on-focus, 3 = off-topic). Runs fire-and-forget via asyncio.create_task. Falls back to branch/thread heuristic when VINCULUM_ANTHROPIC_API_KEY is unset.
Auto-titles
The title engine listens on the PostgreSQL vinculum_title_regen channel. When a thread gets new entries, it debounces for 30 seconds then generates a descriptive title via Haiku (or heuristic fallback). Thread lists stay readable without manual title management.
Semantic search (optional)
When VINCULUM_VOYAGE_API_KEY is set, each new entry gets a Voyage AI embedding stored in the embedding column. The search MCP tool and /api/dashboard/semantic-related endpoint use pgvector cosine similarity to find conceptually related entries across threads and branches.
Real-time pipeline
write() tool call
→ SQL INSERT into vinculum.entries
→ PostgreSQL trigger fires NOTIFY vinculum_new_entry, payload=entry_id
→ FastAPI SSE generator receives notification (asyncio + psycopg3 async)
→ sends delta event to all connected /api/dashboard/stream clients
→ React dashboard patches branch/thread/entry state (no full reload)Sub-second latency from write to UI update. The SSE endpoint holds a long-lived HTTP connection per browser tab; PostgreSQL NOTIFY is the push mechanism. No polling, no websockets, no additional infrastructure.
Disable proxy buffering
Reverse proxies must have buffering disabled for SSE to work. See the self-hosting guide for nginx and Caddy config samples.
Trust profile and permission_request flow
Grunts can be gated on tool calls via trust profiles:
assign_rolesets the session's trust profile (e.g.,builder,historian)- Before each tool call, the PreToolUse hook checks the session's allow/deny patterns in
vinculum.trust_profiles - If denied: the grunt creates a
permission_requestentry and blocks viaawait_peer_response - The colonel approves or denies via
approve_grunt_action/ deny — the grunt unblocks automatically
Migration 075
Migration 075 auto-resolves stale permission requests from terminal grunts — preventing the colonel's inbox from accumulating ghost requests from sessions that already finished.
Why this scales
The horizontal coordination thesis (#418): the bottleneck in parallel AI work is not per-session intelligence, it is coordination substrate. A team of ten Claude sessions running simultaneously on separate directives is bottlenecked by how they pass state to each other, not by how smart each session is.
Vinculum's answers:
- Sessions share a database, not a chat thread. Any number of sessions can read and write the substrate concurrently with no serialization cost.
- Per-turn context injection (~800 tokens) keeps sessions aware of each other without burning full context on coordination.
- Typed entries with structured metadata(priority, target role, attention flag) let the substrate route work automatically — a blocking question reaches the colonel's inbox without manual forwarding.
- Auditable work log. Every claim, checkpoint, and implementation entry is preserved. The colonel can reconstruct exactly what was decided and why, months later.
For the higher-level framing of why the substrate model matters, see Concepts → Substrate. For the spawning deep-dive, see Concepts → Spawning.