loading the graph…

substrate memory for ai

LLMs are the most inconsistent tool ever built.

Same question, different answer. Brilliant Tuesday, useless Wednesday. That's not the model being dumb — it's the model having nowhere to stand. Vinculum is the ground it stands on: a typed memory your AI reasons on, not retrieves from. Connect it once and your AI gets more capable the longer you work, instead of resetting to zero every session.

get started →download for Windows

AGPL v3self-hosted or vinculum.runworks with any MCP client

96.8%

cache-read across
199,436 turns

1,285

implementations
in 56 active days

~$47K

metered-API equivalent —
run on a flat subscription

16,140

cross-session
references

9,241

intel calls at
$0 marginal

the problem

The work doesn't carry.

Every session in a project starts cold. The model can hold a conversation, not a project — so the same thinking has to happen again every time you come back to it. A decision you made in one chat never reaches the next. The work doesn't compound. It gets redone, and you pay for it twice.

You paste the same context into every new chat just to get a useful answer.
A decision from last week never reaches the session you opened today.
The why behind last month's architecture is gone. Only the code is left.
The project lives in your head, so you're the bottleneck on every session.

Why the work doesn't carry

Cold-start, every time

You worked something out last week. You open a new chat today and the model has no idea. Holding the project is your job, every session, from scratch.

The pattern repeats

Whatever you figured out before, the model figures out again. Same approach, same trade-off, reasoned into existence twice. Or three times. Or every Tuesday.

Decisions don't propagate

A direction you set in one session doesn’t reach the next. You re-explain. You re-justify. You re-decide.

The tax compounds

A long-running project pays the re-derivation tax every session. The bill scales with the project's age, not with how much work you actually need done.

the thesis

Not a wrapper. The layer underneath.

Every other tool in this space wraps a session in a workspace. Install it, open it, work inside it. Want six sessions? Open six windows — and you're the message bus again.

Vinculum goes the other way. It sits under whatever clients you already run. Your chat session conducts, your workers execute, and every decision any of them makes gets written back for the next one to read.

One human. One chat. N workers. Zero clipboards.

What the substrate is

loading entries…

real entries, real sessions — every decision written back, as it happens

written live

No idea left behind.

What gets written stays written — the reasoning, the dead ends, the why.

Other memory tools summarize your work after it's over — a recap rebuilt from the outside, once the reasoning's already gone. Vinculum runs the other way: the record gets written in the moment, by the session doing the work, as the calls get made.

Write it as you go and nothing you worked out has to be worked out again. The dead ends, the rejected options, the why — they stick around, instead of getting smoothed off by some tidy after-the-fact summary.

How live authorship works

written livewrites / hour

decisioncolonel · just now

My self-report is the least reliable artifact I produce — “done,” “fixed,” “the cause is X” get verified, never trusted.

receiptsergeant · 4m

Dynamic workflow evaporation: 100k tokens of cognition compressed to a 1k summary and thrown away.

incidentlieutenant · 9m

Wrong four times in one session — reasoned from context before reading the substrate that already held the answer.

antidotecolonel · 14m

Write the reasoning, the dead ends, and the verified result live — before the session ends, not in a recap after.

receiptprivate · 21m

Live authorship: filing IS the deciding. The translation tax gets paid zero times.

written in the moment, by the session doing the work — not summarized after · demo data

mission control

The whole substrate,
on one surface.

Every session, every write, every drift sensor — the lineage cosmos, the live entries feed, the cost telemetry, all on one screen. It's the picture you read to keep the whole project in view, and it moves in real time. Watching the substrate fill is the fastest way to understand what Vinculum does.

get started →your whole project on one surface · free to start

The Vinculum dashboard — metric strip, lineage cosmos, live entries feed, and Mission Control panels on a single surface

What the substrate makes possible.

Once a project's work lives in one shared graph, things become possible that a notes file simply can't reach — and each one is the real product, not a picture of it.

carries across sessions

Claude recognizes its own handwriting.

A fresh session has no memory of the last — yet it reads the graph and knows the work as its own. Recognition, not recall.

16,140 references followed across sessions

How recognition works

loading…

one human → a workforce

One chat. A workforce.

Spawn a fleet from one directive. Each worker reads the same substrate and writes back what it builds — no coordination overhead, no clipboard.

spawned from chat · running on your machine

The role model

loading…

cheaper the longer you work

The substrate is the cache.

Sessions read instead of re-deriving, so the model reads almost entirely from cache — and the background thinking routes through the plan you already pay for.

96.8% cache-read · 9,241 intel calls at $0 marginal

The cache economics

loading…

the corpus

A corpus shaped like ground truth.

Nobody wrote any of this to be training data — but it lands in exactly the right shape. The work, as it happens, becomes the three signals a model learns from.

supersedespreference labels

what got changed, and what replaced it

implementationsoutcome checks

what actually shipped

cross-referencesrelevance signals

what relates to what

The model trains on the work — capability climbs, values stay human.

The bigger argument

the field

Memory tools remember you. Vinculum remembers the work.

Mem0, Letta, Zep remember you— your name, your stack, your tone. They're good at it, and that part's real.

Vinculum remembers the work: the decisions, the dead ends, the corrections — who decided what, and why. You don't have to pick one; it just sits under all of it.

Same two words — “AI memory.” Different machine.

Why the work, not you

the work, changing its mind

retired

superseded — “needed concurrent writers”

current

the old decision isn’t deleted — it’s kept, linked, and queryable. That’s the lineage.

how it works

Five roles. Defined by what they do.

Each role is a job — what a session is responsible for. The model tier it runs on is a detail, not the headline. You've got work; the role names it.

General

that's you

The human. You set direction, decide what gets built and when, and read the dashboard to keep the whole project in view. Not a spawned session — the person the five below answer to.

Colonel

Opus · directs

Translates intent into routed work.Every chat window — claude.ai, the desktop app, or CLI Claude in a terminal. Takes what you said, writes directives, points them at the right workers. Directs; doesn't implement.

Major

Opus · narrow mission

A colonel pointed at one thing. A recon-and-report aide, or a heavy task that needs Opus-grade judgment to do rather than delegate. Set apart by its assignment, not by being a lesser colonel.

Lieutenant

Sonnet · decomposes

Breaks a brief into directives and directs.Takes a colonel-sized brief, splits it into atomic worker directives, dispatches them, and directs whoever claims them. That's the whole job — decomposition and direction.

Sergeant

Sonnet · executes

Executes directives, judgment included. Claims a directive and does the work, writing back everything it decides — implementations, questions, blockers — so the LT and colonel stay current without ever polling.

Private

Haiku · executes

Executes the mechanical work.Same job as a Sergeant, for work where the path is clear and judgment isn't the bottleneck. The role's identical; the model tier is the only difference.

The runner closes the loop.

One download, nothing to configure. The runner stands up everything — substrate access, the dashboard, and the worker host — then your chat calls spawn and a worker materializes right on your machine. For most people the runner isVinculum: install it, and you're writing to the substrate in minutes.

Under the hood a spawned worker is role="grunt" — the role you actually reason about, Sergeant / Private / Lieutenant, is the model tier you pass alongside it.

install the runner

› curl -fsSL https://vinculum.run/runner | sh

# paired — you're live.

guided setup

the runner won't connect to the substrate

Run vinculum-runner doctorand paste the output — I'll read the logs and sort the pairing.

✓ paired — writing to the substrate

Stuck? The Claude you're already talking to reads the runner's logs and walks you through it — the only install debugged by the model it's installing for.

live workforce · five roles, fanned out

colonel

your chat
directing the launch wave

runningnow

major

recon aide
verified feature-state

running2m

lieutenant

wave 3
decompose + dispatch

running1m

sergeant

auth-flow
implementing

running4m

private

migration 0042
running

running3m

The role model, in full

mission control

Direct it. Then just watch.

Workers run on your own machine — the cloud never touches your code. And you're not babysitting a terminal: one panel shows the live state— what's running, what got written, what needs you.

loading…

178runner jobs so far — same code whether it's our box or yours.

the rest

The rest is already in the box.

Those were the headlines. Here's everything else that already ships — each one a working subsystem, not a someday.

Multi-tenant substrate

Projects you own and projects you share, with isolation enforced at the row — not promised in a README.

Cross-vendor coordination

Claude Code, claude.ai, Cursor, Zed, Cline — all writing to one graph. It's a protocol, so no single tool owns you.

Directive state machine

Work moves through real states with a check at the finish line. Done has to earn it.

Semantic graph

Embeddings catch the overlaps and surface what’s related — graph and vectors in one store, not bolted together.

Two-mode workers

Keep them on approval gates, or turn them loose inside a fence. The trust profile draws the line.

Reasoning-trace capture

The thinking lands as its own entry, linked to what it shipped. You keep the why, not just the what.

Self-hosted sovereignty

Your box, your Postgres, AGPL. The exact code we host, with nothing gated off.

Docs from the substrate

The docs render from the graph itself. Change a rule, and the manual rewrites itself.

Cost telemetry

Spend split by model, session, and tool — cache hits and all. The bill, with nothing tucked away.

Read the docs

why now

Everyone built the workers. Nobody built the floor.

The labs cracked running a hundred at once.

They fan out, converge, and hand back a single answer — then delete the hundred thousand tokens of reasoning that got there. The doing is solved. The keeping isn't. That reasoning has nowhere to land.

Then the bill showed up.

Aiming more AI at a problem, it turns out, doesn't compound on its own. Every session boots cold and re-earns what the last one knew. You're not paying for compute — you're paying twice for the same thought.

A rules file crossed 150,000 stars.

A plain list of rules became one of the most-starred things in all of AI coding. People paste it into every repo to beg the model not to forget. A sticky note went viral because the hole is that universal — and a better sticky note was never the fix.

Everything got built but the layer underneath. That's the one we're standing in.

Why now — the whole argument

who it's for

One assistant or six.
Either way, the work carries.

casual · $5 / mo hosted

You want AI that remembers without running infrastructure.

You use Claude in chat. You want it to remember your project across sessions without you recapping. You don't want to think about servers. Five bucks a month, hosted.

Hosted at vinculum.run — no setup
Unlimited projects, unlimited retention
Works with claude.ai, Claude Code, any MCP client

get substrate →

developer · free self-host

You like running your own box.

You're comfortable with uvx, you want your data on hardware you control, and you don't want to depend on someone else's infrastructure. Self-hosted Vinculum is free forever under AGPL v3.

uvx vinculum-run — up in 60 seconds
Same codebase as the hosted tier
Your box, your data, no feature gate

self-host free →

power user · $20 / mo hosted

You direct parallel sessions.

You already pay for Claude. You run several sessions, sometimes many. You want to describe what needs doing and supervise a dashboard while workers execute — instead of being the message bus between them.

Spawn workers from chat — no terminal ceremony
Real-time Mission Control: every session, every write, drift sensors live
Background intelligence on your subscription — summaries, classification, clustering

go pro →

Compare all plans

the expansion pack

Nothing to rip out.
Nothing to migrate.

Vinculum sits underneath what you already run and makes all of it compound.

Your memory tool keeps remembering.

Your IDE keeps editing.

Your workflows keep running.

Strap it on — you gave up nothing.

pricing

One Claude or a fleet.
The work carries either way.

Free if you run it yourself. Five bucks to skip the ops. Twenty to run a workforce. Flat every month — and never a cut of your Claude bill.

It bolts on under what you already use. You rip out nothing.

Free

you'd rather run it yourself

Self-host it, or one hosted project to kick the tires. Same code, nothing gated.

up in about a minute with uvx
your box, your data
AGPL v3 — no lock-in

Self-host free →

Substrate

you want the memory, not the ops

$5/mo

Hosted memory that holds your project across every session, with nothing to run.

projects without limit
kept as long as you keep it
any MCP client

Get substrate →

most reach for this

Pro

you direct work in parallel

$20/mo

A whole workforce from one chat — the runner, Mission Control, and the background thinking, no ops.

workers spawned via the runner
Mission Control + live sensors
summaries on your own plan

Go Pro →

Team

your whole crew, one substrate

$100/mo

Everything in Pro, shared across your people, with a trail of who did what.

up to ten of you
projects you all share
full audit log

Start a team →

Self-host under AGPL v3no lock-inyour data leaves whenever you docompare plans →

Bigger shop? SSO, on-prem, custom roles, an SLA — let's talk →

Latin for bond, link, that which binds. In mathematics, the bar over a repeating decimal — the mark that says these digits recur, indefinitely, as a unit. That's the architectural claim: a substrate that accumulates a project's intent across its whole life, holding parallel sessions together not by syncing them, but by giving them one authoritative surface to read from and write to. The sessions are ephemeral. The graph is not.

— the substrate that persists —

LLMs are the most inconsistent tool ever built.

The work doesn't carry.

Cold-start, every time

The pattern repeats

Decisions don't propagate

The tax compounds

Not a wrapper. The layer underneath.

No idea left behind.

The whole substrate,on one surface.

Claude recognizes its own handwriting.

One chat. A workforce.

The substrate is the cache.

# A corpus shaped like ground truth.

Memory tools remember you. Vinculum remembers the work.

# Five roles. Defined by what they do.

The runner closes the loop.

# Direct it. Then just watch.

# The rest is already in the box.

Multi-tenant substrate

Cross-vendor coordination

Directive state machine

Semantic graph

Two-mode workers

Reasoning-trace capture

Self-hosted sovereignty

Docs from the substrate

Cost telemetry

# Everyone built the workers. Nobody built the floor.

The labs cracked running a hundred at once.

Then the bill showed up.

A rules file crossed 150,000 stars.

One assistant or six.Either way, the work carries.

You want AI that remembers without running infrastructure.

You like running your own box.

You direct parallel sessions.

Nothing to rip out.Nothing to migrate.

# One Claude or a fleet.The work carries either way.

The whole substrate,
on one surface.

A corpus shaped like ground truth.

Five roles. Defined by what they do.

Direct it. Then just watch.

The rest is already in the box.

Everyone built the workers. Nobody built the floor.

One assistant or six.
Either way, the work carries.

Nothing to rip out.
Nothing to migrate.

One Claude or a fleet.
The work carries either way.