Architecture

An engineering deep-dive into how Vinculum is built — for engineers evaluating the system or wanting to understand its internals before contributing.

Stack overview

Layer	Technology
MCP server + REST API	Python 3.12, FastMCP 3.2.2 (FastAPI/Starlette), port 31415
Database	PostgreSQL 17 + pgvector extension
Dashboard frontend	Next.js 16, React 19, Tailwind v4, TypeScript
Real-time push	PostgreSQL LISTEN/NOTIFY → FastAPI SSE → React
Semantic search	Voyage AI embeddings stored as pgvector columns
AI intelligence	Anthropic Haiku (delta classification, auto-titles)

Two processes run in production: the Python MCP server (with embedded REST API) and the Next.js frontend. In the Docker Compose setup they share a network; the frontend proxies /api/* and /mcp to the Python server.

Database schema

All tables live in the vinculumschema. PostgreSQL is the coordination substrate — sessions don't share a chat thread, they share the database. See Concepts → Substrate for the higher-level framing.

Core tables

vinculum.entries — the primary data structure. Every decision, spec, note, question, implementation, and checkpoint is a typed entry.

Column	Type	Notes
`id`	bigint	Sequential ID
`uuid`	uuid	Stable external reference
`project_id`	text	Multi-tenant key (default: `vinculum`)
`branch`	text	One of 6 branches per project
`thread_slug`	text	Groups entries into threads
`entry_type`	text	`decision`, `spec`, `note`, `question`, `implementation`, …
`content`	text	Full markdown content
`metadata`	jsonb	Structured payload: target, priority, links, acceptance criteria
`tsv`	tsvector	Generated column for full-text search
`embedding`	vector(1024)	Voyage AI embedding for semantic similarity search
`delta`	int	Focus-distance classification: 0 = on-focus, 3 = off-topic
`author`	text	Session label of the writing session
`superseded_by`	bigint	Points to the replacement entry

vinculum.threads — groups entries. Composite PK (project_id, slug). Status: open, blocked, closed.

vinculum.entry_links — directed graph edges between entries. Relation is one of: supersedes, references, blocks, implements, contradicts, replies_to.

vinculum.sessions — connected Claude clients. Tracks focus (branch/thread), display color, declared focus label, last activity.

vinculum.projects — multi-tenant. Default project is vinculum with 6 branches: growth, platform, product, design, content, factory.

vinculum.attention_items — items surfaced to the human. Severity: info, warning, critical.

vinculum.audit_log — every MCP tool invocation with args and timing. vinculum.media — uploaded images stored as BYTEA.

Spawner tables

vinculum.spawn_requests — the spawn state machine. Status transitions: pending → claimed → running → completed / failed. Contains spawn_uuid, directive_id, host, tmux_target, and a rich metadata payload.

vinculum.spawn_log — one row per spawned grunt with tmux_target, session_id, role, directive_id. Indexed for claim_spawn lookups.

vinculum.trust_profiles — role-keyed allow/deny pattern lists. Seeds 4 defaults at install time: colonel (full), builder (constrained shell), historian (read-only), critic (review-only).

vinculum.queued_prompts — prompts queued for sequential delivery into an active grunt session.

Spawner architecture

Vinculum supports two spawn mechanisms. Both are production-ready; the right choice depends on your deployment topology. See also Concepts → Spawning.

Systemd-template path (Linux primary, post-#2338)

The canonical path for Linux installs where the MCP server runs on the same machine that will execute grunts.

text

spawn_grunt tool call
  → server writes spawn_requests row (status=pending)
  → PostgreSQL NOTIFY vinculum_spawn_request
  → server receives LISTEN notification
  → systemctl --user start vinculum-grunt@{spawn_uuid}.service
  → systemd template unit starts claude in a new tmux window
  → grunt calls claim_spawn as first tool call

The systemd template unit (vinculum-grunt@.service) handles logging, restart-on-crash, and resource caps. It's the fastest path because there is no polling — the NOTIFY fires in under a millisecond.

Linger required

The systemd path requires linger so the user session survives logout: sudo loginctl enable-linger $USER

Daemon path (cross-platform, #2839)

Used when the MCP server is containerized or running on macOS.

text

spawn_grunt tool call
  → server writes spawn_requests row (status=pending)
  → vinculum-spawnd daemon polls the table (or receives NOTIFY)
  → daemon drains pending rows, launches claude via Popen/tmux
  → grunt claims spawn_uuid, session registered

vinculum-spawnd runs as a systemd user unit on Linux, a launchd agent on macOS, or a startup script as a last resort. Install via hooks/install-spawnd.sh — the script auto-detects the platform.

Why both mechanisms?

The systemd path is faster and more reliable on Linux. The daemon path is the only option for containerized deployments and macOS. Self-hosted Linux users typically use the systemd path; the daemon exists for every other topology.

MCP intelligence layer

Vinculum's AI features use a sampling-via-client model: the MCP server never directly calls the Anthropic API on its own key. Instead, it uses the MCP sampling capability — sending sampling/createMessagerequests back to the connected Claude client, which routes inference through the client's API key. This is the structural cost advantage described in #418: the server has no inference bill.

Delta classification

On every writecall, the server classifies the entry's semantic distance from the session's declared focus (0 = on-focus, 3 = off-topic). Runs fire-and-forget via asyncio.create_task. Falls back to branch/thread heuristic when VINCULUM_ANTHROPIC_API_KEY is unset.

Auto-titles

The title engine listens on the PostgreSQL vinculum_title_regen channel. When a thread gets new entries, it debounces for 30 seconds then generates a descriptive title via Haiku (or heuristic fallback). Thread lists stay readable without manual title management.

Semantic search (optional)

When VINCULUM_VOYAGE_API_KEY is set, each new entry gets a Voyage AI embedding stored in the embedding column. The search MCP tool and /api/dashboard/semantic-related endpoint use pgvector cosine similarity to find conceptually related entries across threads and branches.

Real-time pipeline

text

write() tool call
  → SQL INSERT into vinculum.entries
  → PostgreSQL trigger fires NOTIFY vinculum_new_entry, payload=entry_id
  → FastAPI SSE generator receives notification (asyncio + psycopg3 async)
  → sends delta event to all connected /api/dashboard/stream clients
  → React dashboard patches branch/thread/entry state (no full reload)

Sub-second latency from write to UI update. The SSE endpoint holds a long-lived HTTP connection per browser tab; PostgreSQL NOTIFY is the push mechanism. No polling, no websockets, no additional infrastructure.

Disable proxy buffering

Reverse proxies must have buffering disabled for SSE to work. See the self-hosting guide for nginx and Caddy config samples.

Trust profile and permission_request flow

Grunts can be gated on tool calls via trust profiles:

assign_role sets the session's trust profile (e.g., builder, historian)
Before each tool call, the PreToolUse hook checks the session's allow/deny patterns in vinculum.trust_profiles
If denied: the grunt creates a permission_request entry and blocks via await_peer_response
The colonel approves or denies via approve_grunt_action / deny — the grunt unblocks automatically

Migration 075

Migration 075 auto-resolves stale permission requests from terminal grunts — preventing the colonel's inbox from accumulating ghost requests from sessions that already finished.

Why this scales

The horizontal coordination thesis (#418): the bottleneck in parallel AI work is not per-session intelligence, it is coordination substrate. A team of ten Claude sessions running simultaneously on separate directives is bottlenecked by how they pass state to each other, not by how smart each session is.

Vinculum's answers:

Sessions share a database, not a chat thread. Any number of sessions can read and write the substrate concurrently with no serialization cost.
Per-turn context injection (~800 tokens) keeps sessions aware of each other without burning full context on coordination.
Typed entries with structured metadata(priority, target role, attention flag) let the substrate route work automatically — a blocking question reaches the colonel's inbox without manual forwarding.
Auditable work log. Every claim, checkpoint, and implementation entry is preserved. The colonel can reconstruct exactly what was decided and why, months later.

For the higher-level framing of why the substrate model matters, see Concepts → Substrate. For the spawning deep-dive, see Concepts → Spawning.