I studied Chemistry for five years in college. But like everyone else, I had memorized the periodic table much before that. Our chemistry teacher had a unique way that helped us memorize it - which ensure that it was beyond a sequence of letters. One could actually explain why an element sits where it does. What that position means for how it behaves. You knew that sodium would react violently with water not because you memorized that fact, but because you understood where sodium sits and what that implies. That’s memory.
Now imagine someone memorized all 118 elements in order but couldn't tell you why potassium is below sodium or what that means for reactivity. That would feel like storage. That's what most agent memory systems look like today.
Memory is a cognitive science word. It has a full definition: understand something, process it, store it, retrieve it along with the context that makes retrieval accurate. That's a combined system. But visiting website of a startup selling agents and one would feel that memory is a commoditized feature. And the way it's presented, you'd think retrieval has been fundamentally solved - that the only remaining problem was storage.
So let's slow down and actually understand what memory requires - because Ai washing is widening the gap between what's being sold and what's actually needed.
Memory is a faculty (not storage)
Storage is one part of the system called memory. But memory is not storage. Storage is a precondition for memory the same way a library is a precondition for knowledge. The books exist. That's not the hard part. The hard part is whether the right book opens in the right person's hands when they actually need it.
Let's face it — teams have gotten very good at this over the last few years. Systems for storage have become amazingly well-built. Markdown files, vector embeddings, retrieval pipelines, graphs, knowledge graphs. The infrastructure works. But the infrastructure isn't supposed to remember. That's not its job.
Memory, when you break it down, means processing information, storing it with context, and retrieving it in the same sense. Remove the second half and you have nothing but a database. We internally call this the MD file trap — having a markdown file doesn't mean retrieval is fundamentally solved. The model hits the same bottleneck with MD files as it does with everything else. Sure, it's more token efficient. But that's about it.
When you hear "AI" in a technology meeting, you don't retrieve the word and then look up what it means. You retrieve AI-as-artificial-intelligence in one motion. The context was already there, sitting with the storage, inseparable from it. That's how memory works in any system that works correctly - human or otherwise. Storage and context aren't two steps. They're one unit. Retrieval pulls both simultaneously, which is why recall is accurate.
Most enterprise AI implementation approaches separate these. Chunked knowledge in a vector DB. Context in a system prompt. Instructions in a markdown file. Retrieval pulls the fact, then context gets applied afterward. That gap - between what was retrieved and the context it should have carried - is precisely where “confident wrongness” lives. The system isn't lying to you. It recalled correctly from a messed up structured storage.
Fixing this should be straightforward?
It’s natural to think that the first fix is at the retrieval layer. Before we get there, let’s take a look at storage - whether context was embedded with what was stored in the first place.
For the last three years, RAG was the answer to this larger problem. Chunk, embed, and retrieve relevant chunks at query time, inject them into context, generate. Real advance - moved storage closer to retrieval, which is the right instinct. It also has a hard ceiling.
RAG retrieves what is semantically proximate to the query. That's not the same as what is situationally relevant to the moment. A chunk can score well on similarity and still be the wrong context for what the system actually needs to do right now.
Markdown files didn't solve this either. They add a static layer - instructions, rules, domain knowledge - loaded unconditionally regardless of situation. Memory is situational. Markdown is not. The combination looks like progress because there's more context in the system. The underlying retrieval problem is identical. The industry seems to have built upward. The problem was always one layer down.
How you store is more important then where you store
I didn't have a clean framework for this for a while. This came together gradually. First things first - memory isn't a layer. It's what emerges when five things work together. Remove any one and you don't get degraded memory and you are back to storage.
It starts with storage. One would think that's the interesting part - it's not. Storage is a solved problem.
What's far more important is how you store. Governed ingestion - what decides what is worth storing, with what domain context, and in what structure. Not everything that enters a system deserves equal treatment. Without governance, contradiction gets baked in from the start. We all have had a our fair share of outdated knowledge bases. This obvioiusly is a layer that conventional RAG approaches miss.
The one that took me longest to articulate: ontology anchoring. This is where context gets embedded into what is stored, not applied afterward. When something is retrieved, it carries the relational meaning it was stored with - not a meaning the retrieval mechanism guesses at query time. This is the difference between knowing that "discharge" means hospital discharge vs clinical discharge.
Then there is episodic and relational memory - what connects facts to each other and to the situations in which they matter. A fact in isolation is trivia. A fact connected to other facts, in the context of a specific situation, is knowledge you can act on.
And finally continuous revalidation. This is what makes the whole thing alive instead of a onetime snapshot. Knowledge is dynamic. Policies update. Guidelines get revised. A system that can't revalidate what it knows will retrieve something that was true with the same confidence it retrieves something that is true. It can't tell the difference.
That's the architecture gap. The Boundless team wrote about this here.
All five, working together, produce memory. I'm still not sure if that's the complete list or if there's a sixth thing I haven't hit yet. But I am sure that dropping any one of these five gets you back to storage.
Wait, we've been here before
This is a mini déjà vu moment within a very short span of two years. Two years back, nobody called this memory. In the context of AI applications, it was called RAG. To counter RAG's challenges, there were better approaches - versions of RAG, and then step-up upgrades like neuro-symbolic RAG. Then agents arrived. The interface changed. The underlying retrieval problem remains as is.
What has changed - Three years back, teams were dealing with SharePoint files and PDF dumps, RAG was step 1 (not the answer), but at least the intent was there. Retrieve chunks, inject them into context, generate. The instinct was right. It also had a very clear ceiling. Most of those projects didn't make it to production.
What's different today is the format. Instead of SharePoint files we have markdown files. The agent has more capable tools at its disposal. Tool calling has gotten dramatically better. The AI stack has genuinely built upward.
But changing the format doesn't solve for context or retrieval. The core retrieval problem still hasn't been addressed.
Before we close
I'll leave you with the same five boring questions you would have been asking some years back.
How does the system know which document to retrieve? Your knowledge base has 400 SOPs. The agent needs one. Does it find the right one, or the most similar one?
How does the system handle conflicts within documents — what to trust, what to disregard, what to ignore? Your 2022 and 2025 policy manuals disagree. Does the system know which one is current?
How does the system understand what a word means in a specific context? A clinician asks about "discharge." Is that hospital discharge or electrical discharge?
How does the system know what supersedes what? A patient was on 50mg in January, 100mg in August. Does it know January is outdated?
How does the system bring in new knowledge, update what's changed, and retire what's old? A regulation changed last Tuesday. Nobody flagged it. Does your system know?
If you're struggling with any of these today - you do need to reconsider the terms. You might be dealing with storage. Which is where we started.
If you've read this far, let me know if this made sense. Just reply "got it." If something didn't land, tell me that too. The fun part is figuring out how to say complicated stuff in a way that actually sticks.
And if you liked it - maybe send it to someone who's wrangling with agent memory?
Best,
Vivek K

