#7 — Agent Memory: When It Helps, When It Hurts, and How to Manage It - B.O.R.I.S

Your AI agent just forgot everything. Again. You taught it your stack, your conventions, your preferences — and tomorrow it wakes up like it never happened. Memory is supposed to fix that. Here's what nobody tells you: it can also make things worse. This blog post breaks down exactly how agent memory works, when it helps, when it quietly poisons your sessions, and what to do about it.

This is episode #7 of Agentic AI in DevOps. Read the article below, or listen to the episode.

Watch on YouTube

Spotify Apple Podcasts RSS

Every session starts with amnesia

Without memory, every agent session starts from zero. It’s like onboarding the same new colleague every morning — you explain how the team deploys, where the code lives, what the conventions are. They nod. Next day, they remember nothing.

Memory is the mechanism that moves information between sessions: decisions made, tools used, preferences voiced, context accumulated. It’s what transforms a generic LLM into something that understands your team’s specific environment.

But memory isn’t free. Add it carelessly and you get an agent whose behavior you can’t predict, can’t audit, and can’t easily fix.

This blog post is about both sides of that tradeoff.

The four types of agent memory

The CoALA framework (Cognitive Architectures for Language Agents) classifies agent memory into four types. Each has different risk and control characteristics.

Procedural memory is skills, runbooks, and SOPs — the type covered in Skills, Powers, SOPs post. It’s curated and human-authored: written down, version-controlled, loaded dynamically when relevant. Because a person wrote it and a person can update it, procedural memory is the most trustworthy kind.

Semantic memory is what the agent generates itself from interactions. When you correct the agent, voice a preference, or establish a convention, it extracts that and stores it for future use. This is where the interesting problems start.

Episodic memory captures specific events. The agent remembers that last time it ran terraform plan, there was drift. It factors that into the next run. Useful — but only if the world still matches what was recorded.

Working memory is what the agent holds during a single session — the in-context window. Overloading it is like giving a junior engineer a 200-page runbook and ten simultaneous tasks. By the end of the day, the brain is exploding.

The memory lifecycle: capture, manage, retrieve

Every memory system has three stages. Most implementations get one right.

Capture — the agent decides what’s worth remembering and extracts it. Seems simple. In practice, the agent must distinguish between facts that generalize across sessions (worth keeping) and details that only matter now (not worth keeping). Bad capture means either too little (learning is lost) or too much (noise accumulates).

Management — how memory stays useful over time. This is the stage most systems skip. Memory grows unbounded, and without pruning it becomes noise faster than it becomes signal. Think of it like sleep: the brain compacts daily experience into long-term memory overnight, discarding what’s no longer relevant. Claude Code has a similar “dreaming” step where the agent consolidates a session into long-term memory, potentially removing what’s outdated. Without this you need to teach the agent to forget.

Retrieval — loading the right memory at the right moment. Context windows are finite; you can’t dump everything in. The agent must select what’s relevant to the current task. Poor retrieval means the agent either misses critical context or fills its working memory with irrelevant history that derails the task.

When memory becomes a liability

The troublesome type is semantic memory — the kind the agent writes for itself.

Without memory, at least you know your starting state: the agent knows nothing except what you provide. Every variable is visible. Add self-generated memory and the starting state becomes opaque. You don’t know exactly what the agent carries into a session unless you go read the files.

Think of ad targeting: search for something once, and it follows you everywhere. The same thing happens with AI assistants — one conversation topic starts bleeding into every subsequent session. The memory gets poisoned.

If a team changes its approach but the agent’s memory still reflects the old way, it will persistently revert to outdated practices. With file-based memory systems like Claude Code’s markdown files, at least you can tell the agent to update its own memory. But this requires active management. Memory has a lifecycle. Neglect it and it degrades the agent over time, invisibly.

Memory as a security surface

Procedural memory — CLAUDE.md files, skill definitions, instruction files — is loaded automatically and treated as authoritative. The agent doesn’t verify it. It trusts it the way a person trusts their own memory.

If an attacker can inject content into those files, the agent executes the injected instructions. This isn’t just a correctness concern. It’s a genuine attack surface.

Combined with the supply-chain risks in Skills, Powers, SOPs post — third-party skills that a developer adds to an agent without reading carefully — the attack surface for memory poisoning is broader than most teams realize. You’re not just trusting the model. You’re trusting everything that gets loaded into its context.

What B.O.R.I.S does with memory

Two concrete patterns from building B.O.R.I.S, a context layer for infrastructure:

Noise suppression. Some AWS health notifications — ECS Fargate task retirements, for example — are routine for services running multiple instances. A user tells B.O.R.I.S “I don’t care about this class of notification.” The agent stores that preference and consults it before surfacing future alerts.

Opinionated infrastructure. AWS certifications teach a canonical way of building — the running joke is “choose the answer that makes most money for AWS and you won’t be wrong” — but real-world configurations often differ for economic or technical reasons. Memory lets teams tell B.O.R.I.S “we do it this way because of that” so it stops flagging intentional deviations as problems. Without memory, every session becomes a debate about best practices the team has already resolved.

How to get agent memory right

Start with procedural memory. Skills, runbooks, and SOPs are the safest form — human-authored, version-controlled, easy to audit. Build this before any autonomous memory system.
Treat semantic memory as a liability until proven otherwise. Every fact the agent writes for itself is a starting-state variable you can’t fully control. Scope it tightly. Review it periodically.
Build memory management into the system, not as an afterthought. Set a schedule for reviewing and pruning memory files. The agent needs to learn to forget.
Know your retrieval strategy. Loading all memory on every session is not retrieval — it’s context poisoning at scale. The agent should load only what’s relevant to the current task.
Consider fresh sessions with loaded context for production workloads. Semantic Layers vs Context Layers post #12 makes the case directly: live context that checks reality is more reliable than accumulated memory that doesn’t.
Treat memory files as a security surface. If an attacker can write to them, they can control agent behavior. Apply the same access controls you’d apply to code.

Resources

CoALA: Cognitive Architectures for Language Agents (arXiv) — the foundational paper defining the four-type memory taxonomy (procedural, semantic, episodic, working) referenced throughout this blog post.
Hermes Agent (GitHub) — Nous Research’s open-source self-improving agent with built-in persistent memory (MEMORY.md, USER.md, SQLite session search), skill generation, and multi-platform support.
Mem0 — drop-in memory infrastructure for AI agents: conversation history compressed into compact, retrievable memories, with SOC 2 and HIPAA compliance.
Claude Code Memory Documentation — official documentation for Claude Code’s dual memory system: human-authored CLAUDE.md files and agent-written auto memory with session consolidation.
Using Agent Memory — Claude Managed Agents Documentation — Anthropic’s guide to managed-agent memory stores: creation, versioning, audit trails, and compliance controls.
Post #12 — Semantic Layers, Context Layers, and Agents That Stop Guessing — the case for live context over persistent memory in production workloads.
Post #3 — Skills, Powers, SOPs — procedural memory in depth: how curated skills and runbooks are the trustworthy alternative to autonomous semantic memory.

Join B.O.R.I.S Slack Playground

#7 — Agent Memory: When It Helps, When It Hurts, and How to Manage It