substrate memory for ai

LLMs are the most inconsistent tool ever built.

Same question, different answer. Brilliant Tuesday, useless Wednesday. That's not the model being dumb — it's the model having nowhere to stand. Vinculum is the ground it stands on: a typed memory your AI reasons on, not retrieves from. Connect it once and your AI gets more capable the longer you work, instead of resetting to zero every session.
AGPL v3self-hosted or vinculum.runworks with any MCP client
lineage graph · live
loading the graph…
the problem

The work doesn't carry.

Every session in a project cold-starts. The model can hold a conversation, not a project — so the same thinking has to happen again every time you come back to it. A pattern figured out last Tuesday gets re-figured-out this Wednesday. A decision made in one chat doesn't reach the next. The work doesn't compound; it just gets re-done.

01

Same project, cold-start every time

You opened a chat last week and worked something out. You open a new one today and the model has no idea. Project memory is the operator's job, every session, from scratch.

02

The pattern repeats

Whatever you figured out before, the model figures out again. Same approach, same architectural choice, same trade-off — reasoned into existence twice. Or three times. Or every Tuesday.

03

Decisions don't propagate

A direction you set in one session doesn't reach the next. You re-explain. You re-justify. You re-decide. The substrate of the project lives in your head, and it leaks every chat.

04

The tax compounds

Long-running projects pay re-derivation tax every session. The bill scales with project age, not with how much work you actually need done.

the thesis

Not a wrapper. The layer underneath.

Other AI memory tools wrap a session in a visual workspace. You install them, you open them, you do your AI work inside them. The orchestration lives in their application — so if you want six sessions, you open six windows, and you're the message bus again.

Vinculum sits underneath whatever clients you already run. Your chat session becomes the conductor. Your CLI workers become the workforce. Every decision any session makes gets written back. Every session that joins reads what came before. You direct. The substrate does the relaying.

One human. One chat. N workers. Zero clipboards.

loading entries…
the field

Memory tools remember you.
Vinculum remembers the work.

There's a whole field of AI memory tools. They remember the person — what you like, what you said. Vinculum remembers the project — what got decided, what it supersedes, and why.

Mem0
remembers the user

Preferences, facts, personalization — carried across your chats.

Letta
remembers the agent

Self-editing memory inside one long-running runtime.

Zep
remembers the conversation

Long-term chat history and the facts extracted from it.

Vinculum
remembers the work

Decisions, specs, implementations, and the links between them — shared across every session and every tool.

And you don't have to choose. Keep Mem0 for personalization, keep Letta for your runtime — Vinculum sits underneath all of it. The substrate doesn't compete with the rest of your stack. It's the layer everything else was riding on top of without knowing.

how it works

Five roles.
Defined by what they do.

Each role is a job— what a session is responsible for. The model tier it runs on is a property of that job, not the headline. You don't pick a model and get a role; you have work to do, and the role names it.

Generalthat's you

The human. You set direction, decide what gets built and when, and read the dashboard to keep a picture of the whole project. Not a spawned session — the person the five roles below work for.

ColonelOpus · directs

Translates intent into routed work.Every interactive chat window — claude.ai, the desktop app, or CLI Claude in a terminal. Takes what you said, writes directives, and points them at the right workers. Directs; doesn't implement.

MajorOpus · narrow mission

A colonel pointed at one thing. A recon-and-report aide, or a heavy task that needs Opus-grade judgment to execute rather than delegate. Distinguished by its assignment, not by being a lesser colonel.

LieutenantSonnet · decomposes

Breaks a brief into directives and directs.Takes a colonel-sized brief, splits it into atomic worker directives, dispatches them, and directs the workers that claim them. That's the whole job — decomposition and direction.

SergeantSonnet · executes

Executes directives, judgment included. Claims a directive and does the work, writing back everything it decides — implementations, questions, blockers — so the LT and colonel stay current without polling.

PrivateHaiku · executes

Executes the mechanical work.Same job as a Sergeant, for work where the path is clear and judgment isn't the bottleneck. The role is identical; the model tier is the only difference.

The spawner closes the loop. Your chat session calls spawn and a worker process materializes on your host — stdin pipe open, focus declared, ready to work. No terminal ceremony, no manual session management.Under the hood the API token is role="grunt" — the plumbing word for any spawned worker. The role you actually reason about — Sergeant, Private, Lieutenant — is set by the model tier you pass alongside it.spawn_grunt(role="grunt", model="claude-sonnet-4-6", session_label="builder-auth-flow")
loading workers…
why now

The agents shipped. The memory underneath them didn't.

Anthropic shipped the fan-out.

Dynamic workflows run hundreds of subagents in parallel, converge, and hand back an answer — then discard the reasoning that produced it. A hundred thousand tokens of thinking, compressed to a summary and thrown away. The orchestration is solved. The memory of it isn't. Vinculum is where that thinking lands.

The bill came due.

Teams are learning that pointing more AI at a problem doesn't compound. Every session starts cold and re-derives what the last one knew. The waste isn't the compute — it's the thinking you paid for and discarded.

A 70-line text file got 220,000 stars.

Developers are starring plain-text rule files that beg the model to not assume and not forget— because nobody gave it a way to actually remember. The answer was never a better sticky note. It's a substrate.

The memory layer is the empty lane. Vinculum is in it.

get started

Sixty seconds.
No API key. No new bill.

Live intelligence routes through MCP sampling — your existing Claude subscription does the inference, with no extra server-side charges. Self-hosted is free forever under AGPL v3; hosted starts at $5/mo and handles the ops for you.

# 1 — run the server
$ uvx vinculum-mcp --bind :31415

# 2 — add to Claude Code
$ claude mcp add vinculum http://localhost:31415

# 3 — open claude.ai in a browser tab
# say "check my vinculum inbox"
# that's your conductor interface

dashboard → localhost:31415/login
self-hosted · free forever

AGPL v3. Your data, your box. Run it on the same machine as your Claude Code sessions.

uvx vinculum-mcp
hosted · vinculum.run

Managed auth, multi-tenant, team coordination. Same codebase, same data model. No self-hosting required.

vinculum.run →

source

Everything is open. github.com/whalefall-media/vinculum →

the dashboard

Every session. Every write.
One navigable graph.

The dashboard is the workspace, not a picture of it. Nodes are entries, branches are threads, edges are relations. Mission Control sits underneath, surfacing what the substrate notices — drift, re-derivation, distillation — as the work happens. Live via SSE; new entries appear as you work.

vinculum.run/dashboard
loading mission control…
loading lineage graph…
who it's for

One assistant or six.
Either way, the work carries.

casual · $5 / mo hosted

You want AI that remembers without running infrastructure.

You use Claude in chat. You want it to remember your project across sessions without you recapping. You don't want to think about servers or hosting. Five bucks a month, hosted.

  • Hosted at vinculum.run — no setup
  • Unlimited projects, unlimited retention
  • Works with claude.ai, Claude Code, any MCP client
get substrate →
developer · free self-host

You like running your own box.

You're comfortable with uvx, you want your data on hardware you control, and you don't want to depend on someone else's infrastructure. Self-hosted Vinculum is free forever under AGPL v3.

  • uvx vinculum-mcp — up in 60 seconds
  • Same codebase as the hosted tier
  • Your box, your data, no feature gate
self-host free →
power user · $20 / mo hosted

You direct parallel sessions.

You already pay for Claude Max. You run multiple sessions, sometimes many. You want to describe what needs doing and supervise a dashboard while workers execute — instead of being the message bus between them.

  • Spawn workers from chat — no terminal ceremony
  • Real-time Mission Control: every session, every write, drift sensors live
  • Background intelligence on your Anthropic key — auto-summaries, delta, monthly clustering
go pro →
the expansion pack

Nothing to rip out.
Nothing to migrate.

Just strap it on. Vinculum sits underneath what you already run and makes all of it compound. No bridge to burn.

Your memory tool keeps remembering.
Your IDE keeps editing.
Your workflows keep running.

Strap it on. You're off to the races, and you gave up nothing.

pricing

Free to self-host.
$5 to skip the ops. $20 to run parallel.

Flat monthly, never a cut of your Claude bill. Self-hosted is free forever under AGPL v3; the hosted tiers add managed ops and background intelligence. Full pricing breakdown →

Free
$0

Self-host or 1-project hosted. AGPL v3, free forever (commercial license for enterprises).

Substrate
$5/mo

Unlimited projects. Unlimited retention. Live intelligence.

Team
$100/mo

Up to 10 members. Shared projects. Audit log.

Self-host AGPL v3 · no vendor lock-in · export your data any time · compare plans →

Enterprise — SSO · on-prem · custom RBAC · SLA · Contact us →

faq

The questions you're
already asking.

How is this different from mcp-memory-keeper?
Memory keepers give a single session a longer memory. Vinculum gives multiple sessions a shared memory they all read from and write to simultaneously. It's the difference between a notebook one person carries and a whiteboard a team updates in real time. You're not extending one session's context — you're giving all your sessions a shared memory, so they stay coherent without you in the middle.
Why not just use a bigger context window?
A 200K context window is a big clipboard. It still belongs to one session. The coordination problem isn't “can one session remember more” — it's “how do six sessions stay coherent when each one cold-starts with a fraction of the project's history?” Vinculum solves the horizontal problem. Bigger windows solve the vertical one. They're not the same problem.
Is this just for Claude Code?
No. Vinculum works with any MCP client — Claude Code, claude.ai, Cursor, Zed, Cline, anything. The typical setup is claude.ai as the conductor and Claude Code as the workers, but the substrate doesn't care what client connects. If it speaks MCP, it can read and write entries.
Can I self-host?
Yes — and self-hosted is free under the AGPL v3 license. uvx vinculum-mcpand you're up in about 60 seconds. The hosted service at vinculum.run runs the same code with managed auth, multi-tenant Postgres, and background intelligence on our infrastructure. Use it if you don't want to run a box; skip it if you do.
What's the open-source license?
AGPL v3. Full source at github.com/whalefall-media/vinculum. You can fork it, run it, modify it, use it commercially — the license lets you. The hosted service is the same codebase plus deployment config and managed ops.
Who built this?
Built and published by Whalefall Media. It started as a monitoring itch — I was running six parallel Claude sessions and couldn't see what any of them were doing — and the memory underneath the dashboard turned out to be the whole product. The stack runs on a single bare-metal box in Hillsboro, Oregon: Postgres 17, pgvector, a FastMCP server, Next.js 16. No Kubernetes, no cold starts, no VC runway. The full story's on the about page.

Give your sessions the memory they've been missing.

Self-hosted is free and always will be. The hosted tier starts at $5/mo and handles all the ops. Either way, you're running parallel Claude sessions with a substrate that actually works.

No credit card for free tier · AGPL v3 · cancel anytime

vinculum — the bar that binds

Latin for bond, link, that which binds. In mathematics, the bar over a repeating decimal — the mark that says these digits recur, indefinitely, as a unit. That is the architectural claim: a substrate that accumulates a project's intent across its entire life, holding parallel sessions together not by synchronizing them, but by giving them a single authoritative surface to read from and write to. The sessions are ephemeral. The graph is not.

— the substrate that persists —