#7 — When Agent Memory Helps and When It Hurts - B.O.R.I.S

Every session, your agent wakes up with amnesia — the same mistakes, the same rediscovery, the same wasted tokens. Memory is how teams solve this, but as the hosts of Agentic AI in DevOps argue, it is both a superpower and a liability. Andrey goes so far as to say that semantic memory makes an already non-deterministic LLM "even less deterministic," while Fernando warns that a poisoned memory file is trusted implicitly — just like a human trusts their own recollections. In this final episode of the foundations series, the hosts unpack the full memory lifecycle — capture, management, retrieval — and share hard-won lessons from building B.O.R.I.S, their agentic DevOps teammate where memory plays a central role.

Spotify Apple Podcasts RSS

Key Topics

News: Hermes Agent and the Rise of Memory-First Agents

Andrey introduces Hermes Agent, the self-improving open-source agent from Nous Research that positions memory as its centerpiece. Like OpenClaw — the self-improving personal assistant released earlier in 2026 — Hermes can build its own skills and connect through multiple communication platforms (Telegram, Discord, Slack, WhatsApp). The difference is that Hermes ships with a built-in memory system: a MEMORY.md file (approximately 800 tokens) where the agent stores environmental facts, conventions, and learnings, and a USER.md file (approximately 500 tokens) for profile and preferences, both stored in the home directory under .hermes/memories. Past conversations persist in SQLite with full-text search and LLM summarization. Memory plugins like Mem0 can augment the built-in system. Andrey urges listeners to keep the Hermes memory architecture in mind as a reference point for the main discussion.

News: Anthropic Managed Agents Get Memory

Fernando highlights that Anthropic’s cloud-managed agents — where developers define an agent and Anthropic hosts and runs it — now support developer-pushed custom memory. The update includes compliance mechanisms so organizations in regulated industries can track who last pushed a memory record and audit how memory influences agent behavior. This matters because, as the episode explains in depth, memory directly shapes how an agent behaves: it may skip asking for confirmation because the answer already exists in memory, making proper management and auditability critical.

News: Claude Code Memory, Anthropic on AWS, and the Copilot Gap

Andrey points out that Claude Code itself recently gained a file-based memory system. Project instructions live in CLAUDE.md files within the repository, while agent-generated auto memory is stored as text files under ~/.claude/projects/<project>/memory/. He encourages listeners to explore the .claude directory in their home directory — “there are quite a lot of interesting stuff.” Fernando notes related features like the recap summary that appears when returning to a dormant terminal session, which Andrey clarifies is more of a UX convenience than a memory system proper.

The hosts briefly discuss Anthropic’s announcement that it is bringing the Claude Platform to AWS — distinct from Bedrock — which will let builders access Anthropic models through AWS billing and governance without separate Anthropic credentials. As of the recording, this integration is announced but not yet generally available. Fernando issues an important compliance warning: unlike Bedrock, where data stays within AWS infrastructure and is not shared with model providers, using the Anthropic platform on AWS would still route data to Anthropic. Vladimir frames it as a lift-and-shift path — start with Anthropic APIs on AWS billing, then potentially migrate to Bedrock later.

The conversation touches on GitHub Copilot’s current state. Fernando asks whether anyone still finds it competitive, and Andrey reports that a conversation the day before confirmed it “feels like it was one year ago.” With GitHub recently announcing a shift from flat subscriptions to usage-based billing with AI credits for Copilot, the hosts see this as another data point in the pricing squeeze discussed in episode #6.

What Is Agent Memory?

Fernando provides the foundational definition: without memory, every agent session starts stateless. It is like onboarding a new colleague, spending a full day teaching them how the team deploys, where the code lives, and how things work — only to have them return the next morning remembering nothing. Memory is the mechanism that moves information between sessions: decisions made, tools used, preferences expressed, context accumulated. It transforms a generic LLM into something that understands a team’s specific environment.

The Four Types of Agent Memory

Andrey walks through the memory taxonomy from the CoALA framework (Cognitive Architectures for Language Agents), which classifies agent memory into four types:

Procedural memory is the skills, runbooks, and SOPs discussed in episode #3. This is curated memory — written by humans, explaining concepts and procedures the agent can load dynamically into context. Because it is authored and controlled, procedural memory is trustworthy.

Semantic memory is what the agent generates itself from interactions. When a user voices preferences, corrects the agent, or establishes conventions, the agent extracts and stores those snippets for future use. This is where things get interesting — and dangerous.

Episodic memory captures specific events and interactions. Fernando illustrates with a Terraform example: the agent remembers that last time it ran terraform plan, there was drift, and factors that into the next run.

Working memory (in-context memory) is what the agent holds during a single session — analogous to what a person keeps in their head during a workday. Fernando warns that overloading this is like giving a junior engineer a 200-page runbook and ten tasks: “by the end of the day, the brain is exploding.”

Memory Lifecycle: Capture, Manage, Retrieve

Andrey outlines three stages that every memory system must address:

Capture — how does the agent decide what to remember? During a session, the agent identifies snippets of information worth persisting.

Management — how does the agent maintain memory over time? This includes summarization, removal of irrelevant entries, and consolidation. The hosts draw a direct analogy to human sleep: just as the brain compacts the day’s experiences into long-term memory during sleep, agents need a similar consolidation process. Andrey references a feature in Claude Code — which the hosts refer to as “dreaming” — where the agent compacts what happened during a session into long-term memory, potentially discarding what is no longer useful. As Andrey notes, the feature “is called different ways in other agents,” but the principle is the same. Without management, memory grows unbounded and becomes noise rather than signal. As Andrey puts it, “you need to teach the agent to forget.”

Retrieval — how does the agent load the right memory at the right time? Context windows are limited, so the agent must selectively load relevant memory rather than dumping everything in. Poor retrieval means the agent either misses critical context or fills its working memory with irrelevant information that diverts it from the task at hand.

Memory as a Liability

This is where the hosts get spicy. Andrey argues that semantic memory — the kind the agent generates on its own — makes an already non-deterministic system even less predictable. Without memory, at least you know the agent’s starting state: it has no idea about anything, and the context you provide is the only variable. Add self-generated memories and the starting state becomes opaque.

Vladimir shares the analogy of web browsing and targeted ads: you search for something once, and suddenly every ad follows you. Agent memory works the same way. Fernando confirms this from personal experience with ChatGPT, where a single conversation topic caused the model to try correlating every subsequent chat back to that topic — “the memory gets poisoned.”

If a team changes its approach but the agent’s memory still reflects the old way, it will persistently revert to outdated practices. Fernando notes that with file-based memory systems (like Claude Code’s markdown files), at least you can tell the agent to update its own memory. But this requires active management — memory has a lifecycle, and neglecting it degrades the agent over time.

Memory as a Security Vector

Fernando raises a critical point: certain types of memory are trusted implicitly. Files like CLAUDE.md or skill definitions are loaded automatically and treated as authoritative. If an attacker can inject content into procedural memory, the agent will trust and execute it — “just like a human trusts their own recollections.” This makes memory not just a correctness concern but a genuine security surface. Combined with the supply-chain risks discussed in episode #3 around third-party skills, the attack surface for memory poisoning is significant.

Practical Memory Patterns from B.O.R.I.S

The hosts share concrete examples from building B.O.R.I.S, their agentic DevOps teammate. One pattern involves AWS health notifications: some notifications (like ECS Fargate task retirements) are routine and harmless for workloads running multiple instances. A user can tell B.O.R.I.S “I don’t care about this,” and the agent stores that preference, consulting memory before surfacing future notifications.

Another pattern addresses opinionated infrastructure. AWS certifications teach a particular way of building (“choose the answer that makes most money for AWS and you won’t be wrong”), but real-world configurations often differ for economic or technical reasons. Memory lets teams tell the agent “we do it this way because of that” so it stops flagging intentional deviations as problems. Without that memory, every session becomes a debate about best practices the team has already resolved.

The End of the Foundations Series

Andrey closes by noting this is the final episode in the foundations series. Over seven episodes, the show has built up from AI’s evolution in DevOps (episode #1), through context windows and tool layers (episode #2), skills (episode #3), harness engineering (episode #4), hooks and the agentic loop (episode #5), economics (episode #6), and now memory. With these foundations in place, the next episodes will shift to applications — where agentic systems are actually deployed in DevOps workflows.

Resources

Hermes Agent (GitHub) — Nous Research’s open-source self-improving agent with built-in persistent memory (MEMORY.md and USER.md files plus SQLite session search), optional memory plugins, skill generation, and multi-platform communication support.
OpenClaw (GitHub) — The open-source personal AI assistant that preceded Hermes, featuring a gateway architecture for connecting communication platforms but lacking built-in memory at launch.
CoALA: Cognitive Architectures for Language Agents (arXiv) — The foundational paper by Sumers, Yao, Narasimhan, and Griffiths that defines the four-type memory taxonomy (procedural, semantic, episodic, working) referenced throughout the episode.
Mem0 — Drop-in memory infrastructure for AI agents that compresses conversation history into compact memories, with SOC 2 and HIPAA compliance for enterprise use.
Claude Code Memory Documentation — Official documentation covering Claude Code’s dual memory system: human-authored CLAUDE.md files and agent-written auto memory with consolidation.
Using Agent Memory — Claude Managed Agents Documentation — Anthropic’s guide to memory stores for managed agents, covering creation, attachment, versioning, audit trails, and compliance controls like redaction.

Join B.O.R.I.S Slack Playground

#7 — When Agent Memory Helps and When It Hurts