#13 — cmux vs iTerm with Viktor Vedmich - B.O.R.I.S - The Context Layer for Infrastructure

cmux terminal workflows: when iTerm slows down AI-agent work, Viktor Vedmich shows how workspaces, session restore, and socket APIs make Claude Code more practical. Andrey Devyatkin and Fernando Gonçalves talk with Viktor Vedmich about his terminal-first agentic stack, Obsidian vault setup, Claude Code on Bedrock, AWS Kiro, spec-driven workflows, MCP search tools, rising token bills, and why open-weight model competition matters for DevOps teams.

Spotify Apple Podcasts RSS

Summary

Ask any engineer what terminal they use and you’ll get an answer with feelings attached — and that small obsession turns out to be a useful window into how someone actually works with AI agents all day. This interview episode of Agentic AI in DevOps swaps the usual lineup: Vladimir Samoylov is out, and Viktor Vedmich — a Senior Solutions Architect at Amazon Web Services (AWS), Kubernetes book author, and host of the DevOps Kitchen Talks podcast — joins Andrey Devyatkin and Fernando Gonçalves for a guided tour of his daily agentic stack and his read of the industry. The throughline: Viktor runs almost his whole work life through the terminal and an Obsidian vault, with Claude Code as the agent, Amazon Bedrock or a subscription behind it, and AWS’s own Kiro in the mix. Along the way the conversation gets blunt about the things nobody puts in a launch post — iTerm being painfully slow (“I’m switching after the recording,” Fernando says mid-episode), Claude Code winning on mind share rather than being the best tool, customers whose Bedrock bills jumped from a few hundred dollars to hundreds of thousands in months, a single hour of Fable 5 costing Viktor $300, and a half-joking “go China” plea for open-weight models to force prices down.

Key Topics

The terminal rabbit hole: from iTerm to cmux

The episode opens on the question engineers love and Viktor takes seriously: which terminal, and why. His honest answer is that he spends roughly half his day in one — reading and answering email, preparing reports, doing nearly everything through it — so raw performance is not a nicety. That requirement is what pushed him off iTerm2, which he describes as slow and GPU-starved, sometimes taking many seconds just to open. From there he walked the well-known path: WezTerm for performance, then Kitty (picked up from the Russian-language Radio-T podcast), which fell down for him on multi-language keyboard shortcuts when switching between Cyrillic, English, and German layouts. He tried Ghostty — the GPU-accelerated terminal written in Zig by HashiCorp founder Mitchell Hashimoto, which is what Andrey uses and is happy with — but found it didn’t give him the flexibility of Kitty or WezTerm.

Where he landed, “today, on the 26th of June in 2026,” is cmux: a young, native macOS terminal built on top of Ghostty’s rendering engine that adds named workspaces, splittable panes, tabs, session restore, and an embedded browser. The feature that sold him is workspace isolation. As a solutions architect juggling multiple customers, he keeps each engagement in its own workspace and switches context instead of piling everything into one window. cmux also restores state on relaunch — Viktor’s striking example is that it remembers a running Claude Code session’s working directory and session ID, so quitting and reopening resumes exactly where he left off.

The deepest part is cmux’s socket API. Viktor configures Claude Code so that, when it needs to run something interactively — his recurring case is a Slidev presentation server, which he prefers to run in a visible window rather than as a background process — the agent can open a real new terminal window and drive it over that protocol. cmux can also split a browser pane next to the terminal for Playwright-style verification, though here Viktor disagrees with the built-in option: he prefers driving his own Chrome through Playwright because, inside AWS, he needs the browser carrying his existing SSO and Midway authentication and his installed extensions, which a fresh automated browser lacks. Viktor notes that Claude Code’s “teams” sub-agent splitting historically only worked well in iTerm — so cmux solving it without iTerm’s performance penalty is what makes that workflow practical. Fernando’s reaction was immediate: “I have been doing a lot of Claude Code with iTerm and… this shit is so slow. My God. So you’re giving me now — okay, I’m switching after the recording.” Andrey is more measured: he’d opened cmux before, didn’t feel a missing piece in his life, and closed it to go back to Ghostty — but says the masterclass earned it a second try.

Agent, harness, model: where the lines actually are

A small but useful disagreement surfaces over vocabulary. Andrey frames “the harness” as Claude Code itself. Viktor disagrees: in his model, the agent is Claude Code, and the harness is everything he configures around it — skills, MCP tools, even the terminal — that together steer how the agent behaves. Andrey concedes it’s “a line in metaphysics,” but the distinction matters in practice, because it separates the thing that runs the model from the accumulated configuration that makes it useful.

The model underneath comes from different places depending on context. For work, Viktor runs Claude Code on Amazon Bedrock, AWS’s managed foundation-model service; on his personal laptop, he uses a Claude subscription. That split has consequences he returns to later: Claude Code on Bedrock doesn’t ship the full hosted-subscription feature set — notably the built-in deep web research — which is precisely the gap his MCP tooling fills.

This is also where Kiro enters: AWS’s own agentic IDE and CLI, which Viktor uses daily partly because, as an internal product, it’s effectively free for him. The honest comparison is that the models are the same — Kiro uses Claude models too — but Claude Code wins on community. The volume of extensions, plugins, skills, and MCP servers around Claude Code dwarfs what’s grown up around Kiro, which Viktor and Andrey agree remains more of a niche with a tight following among AWS Community Builders.

Kiro, and the case for spec-driven, test-heavy agents

Andrey gives Kiro a genuine shout-out grounded in months of use since its IDE preview. He credits the team with being early on spec-driven design, and praises a more recent addition: a “run all tasks” button that spawns a sub-agent per task, preserving the main agent’s context window while it works through a long spec. In his experience it ran through dozens of tasks across nearly 36 hours, pausing mainly to ask permission. That pausing is also his main complaint — the allow-list prompting feels like babysitting, and he wants Kiro to adopt something like Claude Code’s auto mode with a classifier so it can keep going safely without constant approval.

His read on Kiro’s character is that it’s tuned for a different audience than Cursor. Cursor optimizes for speed; Kiro leans toward enterprise and internal AWS use, which shows up as a heavy emphasis on writing tests and a strong tendency to follow its steering instructions — likely reinforced with hooks that keep it from drifting. The trade-off is blunt: “I like Kiro, but it’s slow. It is slow, but it’s nice.” Viktor’s wishlist item is model breadth as a competitive wedge. Most people use Claude models today, but with OpenAI models now available on Bedrock, he’d like to see them — and others — land in Kiro. Andrey’s caveat is practical: OpenAI’s distinct API format means it’s real integration work, not a flip of a switch.

A vault as the interface: Obsidian, Johnny Decimal, and skills as bash-in-English

The most distinctive part of Viktor’s setup is that his harness’s primary interface is an Obsidian vault — the same Markdown notes app he uses as a second brain. Because agents work natively in Markdown, his notes and his work surface are the same place. The vault is organized with the Johnny Decimal system, where every top-level folder carries a fixed number (10 for AWS, 20 calendar, 30 projects, 60 knowledge, 70 research), and each folder holds its own scoped CLAUDE.md. With roughly 10,000 files, the folder structure does real work: pointing the agent at a specific customer folder constrains the context it needs to gather, instead of forcing it to rediscover everything each session. As Andrey reframes it, Viktor isn’t feeding context manually — he’s pre-arranged it, then uses skills “as a bash script in human English” to say what to do.

To search across that volume, Viktor relies on a tool he calls QMD, which combines classic BM25 keyword search with a RAG system and reranking — useful precisely because his documentation and customer history are large. Following an Andrej Karpathy-style habit, he ends every session with a summary, so he can later recall what he did over the past two weeks and periodically audit where to add a new skill or tighten a sub-folder’s CLAUDE.md. He interacts largely by voice through a Whisper push-to-talk setup, piping unstructured thinking-out-loud through a “prompt optimizer” skill that restructures it before handing it to, say, a deep-research run.

On the perennial “what skills and MCP servers should I use” question, Viktor is deliberately cautious: he won’t hand over a list, because a setup is personal and every skill or server he adopts gets adjusted to his own configuration rather than used as-is. What he will name are two web-search MCP servers that fill the Bedrock deep-research gap — Jina and Tavily (he notes Tavily’s free tier of 1,000 credits a month covers his usage) — and, for development structure, the Superpowers and GSD (“Getting Stuff Done”) plugins that push him toward spec-driven work, asking questions, and doing research first. A concrete walkthrough ties it together: to prep a Slidev demo for a customer, he opens the relevant Obsidian folder, has the agent read the customer’s Markdown description, activates an AWS Slidev skill, uses the project folder as the presentation source, and spins up the two Kubernetes clusters his demo needs from a dedicated infrastructure folder — and when something is repeatable, he captures it as a skill so he never re-explains it.

What the room looks like: vibe coding, mind share, and broken trust

Asked for his read of the industry from conference floors and customer conversations, Viktor’s blunt summary is that “everyone nowadays is doing vibe coding, even product managers.” He sees the field splitting into groups: some still use AI at the chatbot level — asking questions and nothing more — while others have moved well ahead, standing up internal marketplaces of pre-approved skills and MCP servers so a frontend developer, for instance, gets the right tooling pre-installed for a project. On tooling, both hosts and guest agree Claude Code’s lead is about mind share rather than being objectively best — “It’s what you see everywhere,” as Fernando puts it.

Fernando presses on the gap between executive enthusiasm and engineer reality, citing a report on the disconnect: leadership hands everyone Claude Code, while engineers say they don’t fully trust the output and are seeing more incidents. Viktor’s experience matches. His added nuance is a timeline — in his view, models crossed into genuinely strong territory around the Opus and Sonnet 4.5 releases late in 2025. On air Andrey corrects him to 4.6 and ties it to his birthday, but the dates favor Viktor’s version: the model that actually shipped on November 24th, 2025 — Andrey’s birthday — was Opus 4.5, while Opus 4.6 didn’t arrive until February 2026. Before that crossover, Viktor says, agents were a junior-replacement that could “change the color”; after it, they could do a lot more — which is exactly what makes the next problem dangerous.

The token bill nobody budgeted for

The sharpest economic argument is about dependency and cost. Viktor’s worry is that vendors are increasingly selling tokens rather than subscriptions, and that heavy adoption creates lock-in: teams move so fast with AI that going back to the old pace becomes impossible, which makes them dependent — right as the price climbs. He’s watched customers start on Bedrock at a couple hundred dollars and reach thousands, then hundreds of thousands, within months. What subscriptions provide that raw API consumption doesn’t is a natural brake — the five-hour usage window that forces a pause. His own cautionary number: a single hour of Fable 5 cost him around $300 on Bedrock, enough to make him “question your choices,” as Andrey adds.

This connects directly to the team’s episode #6, “The Big AI Squeeze”, which argued the subsidized-subscription era would eventually give way to real economics. Andrey’s counterpoint is that inference margins appear high once you own the chips — the marginal cost trends toward electricity, cooling, and the model itself — which raises the question of how long premium API pricing survives against commoditization. His framing: if you’re merely consuming a model, it’s close to a commodity and you can switch to something like a GLM model; the moat is in the platform features built around it, such as Anthropic’s built-in advisor pattern you’d otherwise have to build yourself. Viktor reports customers already asking when GLM 5.2 will reach Bedrock, and likens the wait to how EKS and Kubernetes need time to clear AWS’s enterprise security quality gates before going generally available.

The closing note is a half-serious rallying cry for competition. “Go China,” Andrey says — rooting for Chinese labs to keep pressure on OpenAI and Anthropic, especially ahead of likely IPOs. Viktor’s sharper concern is licensing, though he flags it as uncertain: he says he may be misremembering, but his worry is that MiniMax may have restricted cloud providers like AWS from reselling its models, leaving “run it on your local machine” as cold comfort for anyone without hundreds of gigabytes of memory. (No public source confirms such a policy change, and Viktor himself hedges the claim.) Andrey calls that pattern “the Elasticsearch mistake” — echoing the licensing turn that led AWS to fork OpenSearch — and both hope open-weight models stay genuinely open so inference providers can serve teams that can’t pay full Opus or Fable prices. Viktor’s parting line, for listeners to keep in mind: cmux, Claude Code, Kiro, Bedrock.

Resources

cmux (manaflow-ai/cmux on GitHub) — the macOS terminal Viktor now uses, built on Ghostty (the GPU-accelerated Zig terminal by Mitchell Hashimoto that Andrey uses), with workspaces, session restore, and the socket API that lets Claude Code open and drive real terminal windows. The core of his terminal masterclass.
Kiro — AWS’s spec-driven agentic IDE and CLI, central to the comparison with Claude Code, including the “run all tasks” sub-agent feature and its test-heavy, enterprise-tuned behavior.
Slidev — the Markdown-based presentation tool Viktor runs in a dedicated window via cmux’s socket API; the concrete example behind the agent-opens-a-real-terminal workflow.
Tavily MCP server and Jina AI MCP — the two web-search MCP servers Viktor recommends to fill the deep-research gap when running Claude Code on Bedrock rather than a hosted subscription.
Superpowers (obra/superpowers on GitHub) — one of the spec-driven Claude Code skill frameworks Viktor uses to structure development; representative of the community ecosystem he says gives Claude Code its edge over Kiro.
The Johnny.Decimal system — the numbered-folder scheme behind Viktor’s ~10,000-file Obsidian vault, the structure that lets him scope an agent’s context per customer and project.
Episode #6 — The Big AI Squeeze — the team’s earlier argument that subsidized subscriptions would give way to token economics, which this episode revisits through real customer bills and the Fable 5 cost shock.
Cracking the Kubernetes Interview (Viktor Vedmich) and the DevOps Kitchen Talks podcast — the guest’s book and long-running show; see also his GitHub profile (vedmichv) for more of his work on Kubernetes and cloud-native platforms.

Join B.O.R.I.S Slack Playground

#13 — cmux vs iTerm with Viktor Vedmich