Compile What Compounds, Summarize What Speculates

Mihir Wagle June 9, 2026 8 min read

agentic AImemorycuration

My knowledge vault nags me about the wrong things. It pings me that a two-year-old brainstorm has "gone stale" — a brainstorm, the most disposable thing I own — while quietly serving up a pricing number that was true last quarter and is a lie today. The tool is doing exactly what I built it to do. What I built was wrong, and it took me embarrassingly long to see why.

I'm not diagnosing this from the outside. I built the thing — Daftari, a markdown knowledge vault I expose to AI agents over MCP — so when the curation behaved stupidly, the bad assumption was mine to find.

It looks like a tuning problem. It isn't.

The instinct is to reach for the dials. Better staleness thresholds. Smarter TTLs. A confidence-decay curve with a few more knobs. I spent real time there, and it doesn't work — not because the dial is set wrong, but because there is only one dial, and it's being asked to govern two kinds of knowledge that have nothing in common.

The cleaner way to see it — the way that turned out to matter — is that the vault was treating everything as one kind of memory. Minds don't. We carry at least two.

Two kinds of memory

Psychologists have a name for the split — Endel Tulving's semantic versus episodic memory — and once you have it, the bug is obvious.

Semantic memory is the stuff you revise. The capital of France, how a hash map works, where last quarter's pricing actually landed. You keep it by revisiting it. It compounds precisely because you come back to it, and when it drifts out of date that's a defect — you want to be told. This is accumulation knowledge: a competitive-intelligence note, a researched comparison, a number you keep current. Compiled once, then maintained, then relied on by everything around it.

Episodic memory is what you learned once, from a unique experience. Your first day at a new job. The one outage that taught you to never deploy on a Friday. You don't run flashcards on an episode. It's formative and single-shot, it fades over time, and the fading is correct — it was only ever a snapshot of one moment. This is generative knowledge: a moonshot sketch, a "what if," the raw exhaust of a brainstorm. Provisional by design.

Now the failure is symmetric and stupid in both directions. Treat an episode like a fact you must keep current, and you'll torture yourself auditing memories that were never meant to last. Treat a fact like an episode, and you'll trust a number that quietly went wrong months ago. My vault was doing both at once — because it had one notion of memory, and the world has two.

The hidden assumption: everything wants to be permanent

Here's the assumption I'd been making without noticing: that every document wants to be true forever. Semantic by default. No author ever said that; the schema decided it on their behalf and held everything to the permanent-record standard.

I've written before about the envelope problem — the way a system takes an ambiguous input and silently narrows it into a specific choice nobody asked for. This is that, wearing different clothes. The vault received "store this document," an ambiguous instruction, and silently filled in "...as permanent fact." It filled the envelope. And like every filled envelope, it looked perfectly reasonable right up until it produced something dumb. The fix for a filled envelope is never a better guess. It's making the choice explicit.

Revision is the mechanism, not the metaphor

This is where the memory framing earns its keep, and where a tidier analogy — a batting average, a balance sheet — would quietly mislead you. An average doesn't become anything; it's just a tally. A memory does become something, through revision, and revision is a real mechanism with real rules.

Learning science is blunt about this. You do not build durable memory by re-reading. Re-reading feels productive and mostly isn't — it's watching the teacher solve the problem again. What builds strong memory is re-deriving the thing yourself: the generation effect, the testing effect, the whole family of "desirable difficulties" (Bjork; Roediger and Karpicke). And you don't re-derive everything every day — you revisit what's at risk of being forgotten, on a schedule keyed to the forgetting curve. That's spaced repetition, and its entire economy is revise what's fragile, leave what's solid alone.

That reframes the tired "RAG versus compile" debate too. Retrieval — keep the raw material, pull chunks at query time — is re-reading the textbook on every question. Compilation is having actually studied it. And studying isn't storing; it's revising until the knowledge is earned. Knowledge that compounds is knowledge you revise under variation.

Name what that costs, because it isn't free. Revision means doing the derivation again — more passes, more compute — when the cheap move is to re-read your own last answer and call it confirmed. But re-reading your own prior conclusion is cramming, and cramming is correlated error dressed up as agreement: you converge on the same wrong thing twice and feel more sure. Real revision demands variation — a different angle each pass — or the convergence means nothing. You pay for independence on purpose.

Make the kind a field, not a guess

In Daftari this stopped being philosophy and became a schema decision. Every document declares a domain, and the two values that carry the weight are accumulation and generative. That one field decides the thing that actually matters: which knowledge is even a candidate for revision.

Staleness. An accumulation doc past its TTL is due for review — a point on the forgetting curve. A generative doc past its TTL is left in peace.
Contradiction. Two accumulation docs that disagree is a defect to surface aggressively; the point of accumulation is converging on one trustworthy answer. Two generative docs that disagree are usually two live hypotheses, and forcing a resolution destroys information.
Investment. Accumulation knowledge earns the work of staying canonical — promotion through a lifecycle, cross-references, the inbound links other docs hang on it. Generative knowledge gets summarized and left provisional; no curation budget is spent hardening a sketch.

The tempting shortcut is to infer the kind from the folder, the tags, the file's age. I refused, on purpose. Which kind a document is isn't a property you reverse-engineer later; it's a decision the author makes at the moment of writing — the same way you declare a value nullable instead of inferring nullability from whether you've happened to see a null yet. A guessed type is a type you can't trust, and the moment the metadata is untrustworthy you're back to one dial, governing by vibes.

What this is the front door to

The domain field is small. What it's the foundation for is not.

The next thing I'm building is a consolidation loop — the vault revising itself, the way a mind consolidates memory in sleep: revisit what's due, re-derive it instead of re-reading it, and let trust be earned by surviving that re-derivation rather than declared once and frozen. That's its own post, with an eval to say whether it actually moves the needle. I'll write it when the number's in.

But the loop only has somewhere to stand because of the distinction above — which is the real reason the domain field exists: you only consolidate what's meant to be remembered. Run a revision loop over an episode, drill a brainstorm until it's "strong," and you've manufactured confidence in a guess. The field is what tells the loop where to point. Get the two kinds of memory right first, and the consolidation has a safe place to happen.

Honest assessment

Everything this post actually argues is real and in use: the two kinds of memory, and the domain field that encodes them. I'm not asking you to take the consolidation loop on faith, because the argument doesn't rest on it — the loop is the sequel, and its burden of proof travels with it. When I claim revising-what-compounds beats leaving it alone, that's a number I owe you, and I'll bring it to that post or not make the claim.

What I am staking here is the distinction itself, so here are the bets, plainly enough that future-me can check them. If authors can't reliably classify a document at write time, the explicit field is worth less than an honest heuristic, and the argument inverts. And if two domains turn out to be too few — if a real third kind of knowledge keeps getting jammed into the wrong bucket — then the clean binary I'm selling is itself an envelope I filled too eagerly. I don't think either is true. But that's what I'd be wrong about, and roughly how I'd find out.

Name the kind, and the curation falls out

The mistake was never the staleness rule or the confidence score. Those were fine. The mistake was a missing distinction — one set of rules asked to govern the things you mean to remember and the things you only meant to have happen to you once.

Name which kind a document is, and everything downstream - what you revise, what you let fade, what earns a place in the loop - stops being a tuning problem and becomes a consequence. Compile what compounds, summarize what speculates. Or, the way a mind would put it: revise what you mean to keep, and let a once-in-a-lifetime stay once.

Daftari is an open-source MCP server that exposes a curated markdown vault to AI agents. The domain distinction here is the foundation for its consolidation loop — the vault revising itself the way memory consolidates in sleep — which is the thing I'm building next.