λx.xDocs← app

Architecture

An engineering deep-dive into how Vinculum is built — for engineers evaluating the system or wanting to understand its internals before contributing.

Stack overview

LayerTechnology
MCP server + REST APIPython 3.12, FastMCP 3.2.2 (FastAPI/Starlette), port 31415
DatabasePostgreSQL 17 + pgvector extension
Dashboard frontendNext.js 16, React 19, Tailwind v4, TypeScript
Real-time pushPostgreSQL LISTEN/NOTIFY → FastAPI SSE → React
Semantic searchVoyage AI embeddings stored as pgvector columns
AI intelligenceAnthropic Haiku (delta classification, auto-titles)

Two processes run in production: the Python MCP server (with embedded REST API) and the Next.js frontend. In the Docker Compose setup they share a network; the frontend proxies /api/* and /mcp to the Python server.

Database schema

All tables live in the vinculumschema. PostgreSQL is the coordination substrate — sessions don't share a chat thread, they share the database. See Concepts → Substrate for the higher-level framing.

Core tables

vinculum.entries — the primary data structure. Every decision, spec, note, question, implementation, and checkpoint is a typed entry.

ColumnTypeNotes
idbigintSequential ID
uuiduuidStable external reference
project_idtextMulti-tenant key (default: vinculum)
branchtextOne of 6 branches per project
thread_slugtextGroups entries into threads
entry_typetextdecision, spec, note, question, implementation, …
contenttextFull markdown content
metadatajsonbStructured payload: target, priority, links, acceptance criteria
tsvtsvectorGenerated column for full-text search
embeddingvector(1024)Voyage AI embedding for semantic similarity search
deltaintFocus-distance classification: 0 = on-focus, 3 = off-topic
authortextSession label of the writing session
superseded_bybigintPoints to the replacement entry

vinculum.threads — groups entries. Composite PK (project_id, slug). Status: open, blocked, closed.

vinculum.entry_links — directed graph edges between entries. Relation is one of: supersedes, references, blocks, implements, contradicts, replies_to.

vinculum.sessions — connected Claude clients. Tracks focus (branch/thread), display color, declared focus label, last activity.

vinculum.projects — multi-tenant. Default project is vinculum with 6 branches: growth, platform, product, design, content, factory.

vinculum.attention_items — items surfaced to the human. Severity: info, warning, critical.

vinculum.audit_log — every MCP tool invocation with args and timing. vinculum.media — uploaded images stored as BYTEA.

Spawner tables

vinculum.spawn_requests — the spawn state machine. Status transitions: pending → claimed → running → completed / failed. Contains spawn_uuid, directive_id, host, tmux_target, and a rich metadata payload.

vinculum.spawn_log — one row per spawned grunt with tmux_target, session_id, role, directive_id. Indexed for claim_spawn lookups.

vinculum.trust_profiles — role-keyed allow/deny pattern lists. Seeds 4 defaults at install time: colonel (full), builder (constrained shell), historian (read-only), critic (review-only).

vinculum.queued_prompts — prompts queued for sequential delivery into an active grunt session.

Spawner architecture

Vinculum supports two spawn mechanisms. Both are production-ready; the right choice depends on your deployment topology. See also Concepts → Spawning.

Systemd-template path (Linux primary, post-#2338)

The canonical path for Linux installs where the MCP server runs on the same machine that will execute grunts.

text
spawn_grunt tool call
  → server writes spawn_requests row (status=pending)
  → PostgreSQL NOTIFY vinculum_spawn_request
  → server receives LISTEN notification
  → systemctl --user start vinculum-grunt@{spawn_uuid}.service
  → systemd template unit starts claude in a new tmux window
  → grunt calls claim_spawn as first tool call

The systemd template unit (vinculum-grunt@.service) handles logging, restart-on-crash, and resource caps. It's the fastest path because there is no polling — the NOTIFY fires in under a millisecond.

Linger required

The systemd path requires linger so the user session survives logout: sudo loginctl enable-linger $USER

Daemon path (cross-platform, #2839)

Used when the MCP server is containerized or running on macOS.

text
spawn_grunt tool call
  → server writes spawn_requests row (status=pending)
  → vinculum-spawnd daemon polls the table (or receives NOTIFY)
  → daemon drains pending rows, launches claude via Popen/tmux
  → grunt claims spawn_uuid, session registered

vinculum-spawnd runs as a systemd user unit on Linux, a launchd agent on macOS, or a startup script as a last resort. Install via hooks/install-spawnd.sh — the script auto-detects the platform.

Why both mechanisms?

The systemd path is faster and more reliable on Linux. The daemon path is the only option for containerized deployments and macOS. Self-hosted Linux users typically use the systemd path; the daemon exists for every other topology.

MCP intelligence layer

Vinculum's AI features use a sampling-via-client model: the MCP server never directly calls the Anthropic API on its own key. Instead, it uses the MCP sampling capability — sending sampling/createMessagerequests back to the connected Claude client, which routes inference through the client's API key. This is the structural cost advantage described in #418: the server has no inference bill.

Delta classification

On every writecall, the server classifies the entry's semantic distance from the session's declared focus (0 = on-focus, 3 = off-topic). Runs fire-and-forget via asyncio.create_task. Falls back to branch/thread heuristic when VINCULUM_ANTHROPIC_API_KEY is unset.

Auto-titles

The title engine listens on the PostgreSQL vinculum_title_regen channel. When a thread gets new entries, it debounces for 30 seconds then generates a descriptive title via Haiku (or heuristic fallback). Thread lists stay readable without manual title management.

Semantic search (optional)

When VINCULUM_VOYAGE_API_KEY is set, each new entry gets a Voyage AI embedding stored in the embedding column. The search MCP tool and /api/dashboard/semantic-related endpoint use pgvector cosine similarity to find conceptually related entries across threads and branches.

Real-time pipeline

text
write() tool call
  → SQL INSERT into vinculum.entries
  → PostgreSQL trigger fires NOTIFY vinculum_new_entry, payload=entry_id
  → FastAPI SSE generator receives notification (asyncio + psycopg3 async)
  → sends delta event to all connected /api/dashboard/stream clients
  → React dashboard patches branch/thread/entry state (no full reload)

Sub-second latency from write to UI update. The SSE endpoint holds a long-lived HTTP connection per browser tab; PostgreSQL NOTIFY is the push mechanism. No polling, no websockets, no additional infrastructure.

Disable proxy buffering

Reverse proxies must have buffering disabled for SSE to work. See the self-hosting guide for nginx and Caddy config samples.

Trust profile and permission_request flow

Grunts can be gated on tool calls via trust profiles:

  1. assign_role sets the session's trust profile (e.g., builder, historian)
  2. Before each tool call, the PreToolUse hook checks the session's allow/deny patterns in vinculum.trust_profiles
  3. If denied: the grunt creates a permission_request entry and blocks via await_peer_response
  4. The colonel approves or denies via approve_grunt_action / deny — the grunt unblocks automatically

Migration 075

Migration 075 auto-resolves stale permission requests from terminal grunts — preventing the colonel's inbox from accumulating ghost requests from sessions that already finished.

Why this scales

The horizontal coordination thesis (#418): the bottleneck in parallel AI work is not per-session intelligence, it is coordination substrate. A team of ten Claude sessions running simultaneously on separate directives is bottlenecked by how they pass state to each other, not by how smart each session is.

Vinculum's answers:

  • Sessions share a database, not a chat thread. Any number of sessions can read and write the substrate concurrently with no serialization cost.
  • Per-turn context injection (~800 tokens) keeps sessions aware of each other without burning full context on coordination.
  • Typed entries with structured metadata(priority, target role, attention flag) let the substrate route work automatically — a blocking question reaches the colonel's inbox without manual forwarding.
  • Auditable work log. Every claim, checkpoint, and implementation entry is preserved. The colonel can reconstruct exactly what was decided and why, months later.

For the higher-level framing of why the substrate model matters, see Concepts → Substrate. For the spawning deep-dive, see Concepts → Spawning.