<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0" xmlns:itunes="http://www.itunes.com/dtds/podcast-1.0.dtd" xmlns:googleplay="http://www.google.com/schemas/play-podcasts/1.0"><channel><title><![CDATA[agentbuild.ai]]></title><description><![CDATA[AgentBuild brings you clear tips, true stories, real connections, and handy tools to help you learn about AI, build great agents, and share your wins. Read each issue, act on one idea, and together we’ll guide AI toward a brighter tomorrow.]]></description><link>https://newsletter.agentbuild.ai</link><image><url>https://substackcdn.com/image/fetch/$s_!OIBg!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3453368a-da00-4960-b174-e3313b941314_256x256.png</url><title>agentbuild.ai</title><link>https://newsletter.agentbuild.ai</link></image><generator>Substack</generator><lastBuildDate>Fri, 19 Jun 2026 23:50:38 GMT</lastBuildDate><atom:link href="https://newsletter.agentbuild.ai/feed" rel="self" type="application/rss+xml"/><copyright><![CDATA[Sandipan Bhaumik]]></copyright><language><![CDATA[en]]></language><webMaster><![CDATA[sanbhaumik@substack.com]]></webMaster><itunes:owner><itunes:email><![CDATA[sanbhaumik@substack.com]]></itunes:email><itunes:name><![CDATA[Sandipan Bhaumik]]></itunes:name></itunes:owner><itunes:author><![CDATA[Sandipan Bhaumik]]></itunes:author><googleplay:owner><![CDATA[sanbhaumik@substack.com]]></googleplay:owner><googleplay:email><![CDATA[sanbhaumik@substack.com]]></googleplay:email><googleplay:author><![CDATA[Sandipan Bhaumik]]></googleplay:author><itunes:block><![CDATA[Yes]]></itunes:block><item><title><![CDATA[Context Intelligence: Why Your Agent Passes Every Test and Fails in Production]]></title><description><![CDATA[Today: The model is no longer the hard part. The advantage now sits in whether your agents know how your business actually works.]]></description><link>https://newsletter.agentbuild.ai/p/context-intelligence-why-your-agent</link><guid isPermaLink="false">https://newsletter.agentbuild.ai/p/context-intelligence-why-your-agent</guid><dc:creator><![CDATA[Sandipan Bhaumik]]></dc:creator><pubDate>Sat, 13 Jun 2026 12:22:16 GMT</pubDate><enclosure url="https://substackcdn.com/image/youtube/w_728,c_limit/aRNPLi7qNFA" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>An agent quotes a customer an SLA that expired eighteen months ago. It sounded confident, the document it cited was real, and it still got the answer wrong, because the current contract never reached it.</p><p>The model did its job. What failed was everything around the model. The context. And that&#8217;s the shift worth getting your head around: the intelligence is largely a solved problem now, and the bottleneck has moved to context.</p><p>Prukalpa Sankar, Founder and Co-CEO of Atlan, puts it bluntly: with AI, context might be everything, because &#8220;the intelligence is already here.&#8221; The model is the easy part. The hard, durable, defensible part is whether your agents understand how your business actually works. That understanding is what people are starting to call context intelligence.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://newsletter.agentbuild.ai/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading agentbuild.ai! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><div><hr></div><h2>What context intelligence actually is</h2><p>It isn&#8217;t a bigger prompt or a better retrieval setup. It&#8217;s the infrastructure that gives an agent shared, governed, current knowledge of your organisation, plus a memory of the decisions it and its predecessors have made. Two ingredients: a map of how things relate, and a record of why things happened.</p><p>That map and that record are the context graph. A context graph is a living model of your business as a set of entities and the relationships between them, this customer, that contract, this SLA, that exception, joined to the decisions taken against them. <a href="https://foundationcapital.com/ideas/context-graphs-ais-trillion-dollar-opportunity">Foundation Capital, in their </a><em><a href="https://foundationcapital.com/ideas/context-graphs-ais-trillion-dollar-opportunity">Context Graphs</a></em><a href="https://foundationcapital.com/ideas/context-graphs-ais-trillion-dollar-opportunity"> essay</a> by Jaya Gupta and Ashu Garg, made the sharpest version of the argument: the durable asset isn&#8217;t the data an agent reads, it&#8217;s the decision trace it leaves behind. What was gathered, what rule applied, why an action was allowed. Capture that, and precedent becomes something an agent can look up instead of guess at. The agent stops having data with no judgement and starts having judgement.</p><div><hr></div><h2>Why this is the accuracy story</h2><p>Most enterprise agents fail on a trust gap, not a model gap. The SLA agent didn&#8217;t need a cleverer model. It needed to know which contract was current, that it was allowed to act on it, and what had been decided in similar cases before. None of that lives in the model. All of it lives in the context.</p><p>Without it, you get what Prukalpa calls context sprawl: every agent building its own private, partial view of the world, none of them agreeing on what &#8220;active customer&#8221; even means. Fifty agents, fifty versions of the truth, no shared map. Accuracy in that environment isn&#8217;t a model property. It&#8217;s an infrastructure property.</p><p>This is where the runtime and the infrastructure get confused. <a href="https://www.anthropic.com/engineering/effective-context-engineering-for-ai-agents">Anthropic&#8217;s work on context engineering</a> covers the runtime half: context is finite, and the discipline is fitting in the fewest high-value tokens at the moment of inference, not the most. That&#8217;s real, but it assumes the right context already exists to be selected. Context intelligence is the layer below it, the one that decides what context exists, whether it can be trusted, and whether the agent may use it. One is what you put in the window. The other is what&#8217;s available to put there at all.</p><div><hr></div><h2>How to think about building it</h2><p>You can read most failures here as three debts coming due. </p><ul><li><p><strong>Data Debt:</strong> no single governed source of what&#8217;s true, so your agents disagree.</p></li><li><p><strong>Decision Debt:</strong> nobody captured why past actions were taken, so the context graph has no memory. </p></li><li><p><strong>Evaluation Debt:</strong> no framework to check whether the context an agent actually used was the right context. Work out which one is biting and you know where to start.</p></li></ul><p>The build order matters more than the architecture diagram. The instinct is to spend two years plumbing every system into a perfect context layer before anything ships. Prukalpa&#8217;s advice is the opposite, and it&#8217;s right: bootstrap from the systems you already have, the CRM, the ERP, the BI definitions, get the context layer roughly 80% of the way there, and let the flywheel start turning. Every decision an agent makes then becomes institutional memory the next agent inherits. You don&#8217;t design the context graph up front. You grow it.</p><p>Which is why this isn&#8217;t really an AI problem at heart. It&#8217;s the next turn of data engineering: <strong>context as a governed product, with owners, versions, and tests, sitting between your data and your agents.</strong> The teams who treated data as a product a decade ago have a head start. The ones still treating context as something you cram into a prompt are about to learn the difference in production.</p><blockquote><p>The open question, the one nobody at the table has a clean answer to yet, is who inside the enterprise actually owns this layer. </p></blockquote><p>I got into exactly that with Prukalpa Sankar on the podcast. She&#8217;s been arguing for the context layer longer than almost anyone, and it&#8217;s the clearest thinking I&#8217;ve heard on where this is heading. </p><div id="youtube2-aRNPLi7qNFA" class="youtube-wrap" data-attrs="{&quot;videoId&quot;:&quot;aRNPLi7qNFA&quot;,&quot;startTime&quot;:null,&quot;endTime&quot;:null}" data-component-name="Youtube2ToDOM"><div class="youtube-inner"><iframe src="https://www.youtube-nocookie.com/embed/aRNPLi7qNFA?rel=0&amp;autoplay=0&amp;showinfo=0&amp;enablejsapi=0" frameborder="0" loading="lazy" gesture="media" allow="autoplay; fullscreen" allowautoplay="true" allowfullscreen="true" width="728" height="409"></iframe></div></div><p>Also, read Prukalpa&#8217;s article - <a href="https://atlan.com/context-and-chaos/issue/what-an-enterprise-context-layer-actually-is/">What an Enterprise Context Layer Actually Is</a></p><p>Enjoy your weekend.</p><p>Talk soon, <br>Sandi</p><div><hr></div><p><em>P.S. If you&#8217;re new here - <strong>welcome</strong> &#127881;. AgentBuild is a community of practitioners working through the real challenges of getting AI into production inside large organisations. Every week I share practical, grounded thinking from the people doing this work at the sharp end. The goal is never theory - it&#8217;s always: what can you use Monday morning.</em></p><div><hr></div><p><em><strong>Ask your friends to join.</strong><br>More valuable content coming your way.</em></p><p></p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://newsletter.agentbuild.ai/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading agentbuild.ai! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[How to Build an AI Agent Registry — Part 2: Risk Class and Autonomy Level]]></title><description><![CDATA[Today: Agent Registry - The enforcement layer that makes your registry more than a catalogue. What is it? how to build it?]]></description><link>https://newsletter.agentbuild.ai/p/how-to-build-an-ai-agent-registry-408</link><guid isPermaLink="false">https://newsletter.agentbuild.ai/p/how-to-build-an-ai-agent-registry-408</guid><dc:creator><![CDATA[Sandipan Bhaumik]]></dc:creator><pubDate>Wed, 10 Jun 2026 14:12:53 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/2f7cc492-0927-4dfa-8106-3be6bea02709_1280x720.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h3>Where We Left Off</h3><p><a href="https://newsletter.agentbuild.ai/p/how-to-build-an-ai-agent-registry?r=36xwjn">Part 1 covered Schema and Ownership</a>: the description and accountability layer of an Agent Registry. Unity Catalog as the agent registry. AWS IAM for per-agent identity. CloudTrail for audit trails. A stack that answers the questions an audit demands: what exists, who owns it, what it can do.</p><p>That is necessary, but it is not sufficient.</p><p>Schema and Ownership describe an agent. They do not constrain it. An agent with a well-documented schema and a clear owner can still make a consequential decision without human approval, access data it should not touch, or operate at an autonomy level the business never signed off on. That is where <strong>Risk Class</strong> and <strong>Autonomy Level </strong>come in and where the registry becomes a governance control rather than a filing system.</p><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;524d8c03-8dad-4574-8aac-45032e14b1b9&quot;,&quot;caption&quot;:&quot;The Problem&quot;,&quot;cta&quot;:null,&quot;showBylines&quot;:true,&quot;showDescription&quot;:true,&quot;showImage&quot;:true,&quot;size&quot;:&quot;sm&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;How to Build an AI Agent Registry &#8212; Part 1: Schema and Ownership &quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:193058051,&quot;name&quot;:&quot;Sandipan Bhaumik&quot;,&quot;bio&quot;:&quot;I&#8217;ve spent almost 2 decades building Data &amp; AI foundations. Now, through AgentBuild Weekly, I share how builders and founders can move beyond AI hype to create Agentic systems that think, adapt, and truly work.&quot;,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/651c04c2-d92e-4a2e-905f-a59346e3e950_1024x1024.png&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2026-06-06T13:01:27.753Z&quot;,&quot;cover_image&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/dcff23d1-a3ff-456d-950a-d704252082f7_1280x720.png&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://newsletter.agentbuild.ai/p/how-to-build-an-ai-agent-registry&quot;,&quot;section_name&quot;:null,&quot;video_upload_id&quot;:null,&quot;id&quot;:200870763,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:8,&quot;comment_count&quot;:5,&quot;publication_id&quot;:2211527,&quot;publication_name&quot;:&quot;agentbuild.ai&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!OIBg!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3453368a-da00-4960-b174-e3313b941314_256x256.png&quot;,&quot;belowTheFold&quot;:false,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div><p></p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://newsletter.agentbuild.ai/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading agentbuild.ai! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><div><hr></div><h3>Component 3: Risk Class - Not All Agents Carry Equal Risk</h3><p>Risk classification is the decision that determines what governance an agent receives. If you get it wrong and you either over-govern low-risk agents until teams route around the process, or under-govern high-risk agents until something goes wrong in production.</p><p>The four risk categories worth working with in a regulated environment:</p><ol><li><p><strong>Low Risk</strong>: bounded output, read-only, no consequential action. An agent that summarises documents or answers questions from a knowledge base.</p></li><li><p><strong>Medium Risk</strong>: supervised action. The agent can write or update records, but a human reviews before execution.</p></li><li><p><strong>High Risk</strong>: consequential action inside a tightly constrained boundary. The agent can execute without per-action approval, but only against approved tools, approved data paths, and fully audited systems.</p></li><li><p><strong>Critical Risk</strong>: autonomous consequential action across multiple systems, with downstream effects that are difficult or impossible to reverse.</p></li></ol><p>In Databricks, risk class is best treated as governed metadata attached to the agent assets you register. For example, on Unity Catalog models, model versions, and functions, all of which support tagging. That makes risk class visible, queryable, and reviewable across the lifecycle. If an agent gains new tools, new data access, or a broader execution boundary, the classification should be reassessed before the next promotion.</p><p>So, if you ar eon Databricks, use Unity Catalog to register the agent, its model versions, its functions, and its metadata. Use tags to record risk class. Use Unity Catalog privileges, managed authentication, and deployment-time permission checks to enforce which tools and data paths are actually reachable at runtime. If you need an explicit risk-policy engine that says, for example, a High Risk agent may call Tool A but never Tool B, that policy layer still sits outside Databricks - in an API gateway, middleware layer, or external authoriser.</p><p>That is the line between registry and policy engine. Databricks gives you the governed assets, the permissions model, and the audit surface. If you need deterministic policy decisions over agent behaviour itself, you compose that on top.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!D2-T!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd25bffc3-fa81-4151-beec-3b723b3f77d0_1486x992.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!D2-T!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd25bffc3-fa81-4151-beec-3b723b3f77d0_1486x992.png 424w, https://substackcdn.com/image/fetch/$s_!D2-T!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd25bffc3-fa81-4151-beec-3b723b3f77d0_1486x992.png 848w, https://substackcdn.com/image/fetch/$s_!D2-T!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd25bffc3-fa81-4151-beec-3b723b3f77d0_1486x992.png 1272w, https://substackcdn.com/image/fetch/$s_!D2-T!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd25bffc3-fa81-4151-beec-3b723b3f77d0_1486x992.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!D2-T!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd25bffc3-fa81-4151-beec-3b723b3f77d0_1486x992.png" width="1456" height="972" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d25bffc3-fa81-4151-beec-3b723b3f77d0_1486x992.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:972,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:173493,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.agentbuild.ai/i/201449801?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd25bffc3-fa81-4151-beec-3b723b3f77d0_1486x992.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!D2-T!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd25bffc3-fa81-4151-beec-3b723b3f77d0_1486x992.png 424w, https://substackcdn.com/image/fetch/$s_!D2-T!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd25bffc3-fa81-4151-beec-3b723b3f77d0_1486x992.png 848w, https://substackcdn.com/image/fetch/$s_!D2-T!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd25bffc3-fa81-4151-beec-3b723b3f77d0_1486x992.png 1272w, https://substackcdn.com/image/fetch/$s_!D2-T!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd25bffc3-fa81-4151-beec-3b723b3f77d0_1486x992.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Conceptual: Runtime flow of enforcing risk class rules on Agents</figcaption></figure></div><p><strong>On the AWS-native path:</strong> <a href="https://docs.aws.amazon.com/bedrock-agentcore/latest/devguide/policy-understanding-cedar.html">Amazon Bedrock AgentCore Gateway with Cedar policies </a>provides a similar enforcement pattern - deterministic allow/deny decisions on every tool call, with Lambda interceptors for dynamic validation. Cedar policies are authored in a declarative language and evaluated against principal, action, and resource with optional conditions over request context. Worth monitoring as it moves towards general availability.</p><div><hr></div><h3>Component 4: Autonomy Level - How Independently an Agent Operates</h3><p>Risk class tells you what an agent is allowed to do. Autonomy level tells you how much independent judgement it is permitted to exercise in doing it. These are related but distinct. A High Risk agent can still operate at L1 with human approval, or at L3 with bounded multi-step execution. The risk class constrains the action space. The autonomy level constrains the operating model.</p><p>The five levels from the registry:</p><ul><li><p>L0 - <strong>Assistive only</strong>. Generates output for human review. No execution.</p></li><li><p>L1 - <strong>Human-approved actions</strong>. The agent proposes; a human confirms before execution.</p></li><li><p>L2 - <strong>Semi-autonomous workflows</strong>. Executes within defined boundaries without per-action approval.</p></li><li><p>L3 - <strong>Goal-driven execution</strong>. Plans and executes multi-step tasks. Human oversight at checkpoints, not per action.</p></li><li><p>L4 - <strong>Multi-agent autonomy</strong>. Coordinates with other agents, spawns sub-agents, orchestrates complex workflows with minimal human intervention.</p></li></ul><p>In Databricks, autonomy level should be treated as an explicit registry attribute you manage in your own governance model, not as a built-in Unity Catalog field with native enforcement semantics. The platform gives you the places to record it, such as model and model-version metadata, but the meaning of L0 through L4 remains an operating policy you define and then enforce through workflow design, approvals, deployment controls, and runtime boundaries.<br><br>MLflow Tracing captures the full execution path - every tool call, every decision point, every input and output - linked to the agent version and autonomy level at the time of execution. In a regulatory investigation, that trace is the evidence.<br><br>So, autonomy level is declared in the registry, promotion is gated through MLflow deployment workflows, runtime execution is bounded by Databricks permissions and isolated tool execution, and trace evidence is captured through MLflow. The registry records the autonomy decision; the surrounding control plane makes it real.<br></p><p><strong>Open source alternative:</strong> Microsoft released the Agent Governance Toolkit in April 2026 under MIT licence, an open source project that addresses runtime security governance for autonomous agents. It includes execution rings inspired by CPU privilege levels, kill switches for emergency agent termination, and circuit breakers. It maps directly to the <a href="https://genai.owasp.org/resource/owasp-top-10-for-agentic-applications-for-2026/">OWASP Top 10 for Agentic Applications (2026)</a> and is the most purpose-built open source option currently available for autonomy-level enforcement.</p><div><hr></div><h3>What the Registry Now Does</h3><p>Across both parts, the four components work as a single control system.</p><p>Schema tells the registry what the agent is. Ownership tells it who is accountable. Risk Class tells it what level of control the agent requires, and that classification is carried as governed metadata across the agent&#8217;s registered assets. Autonomy Level tells it how much independent judgement the agent is permitted to exercise, and that decision is enforced through release workflows, runtime permissions, and execution boundaries rather than left implicit in code.</p><p>The audit request scenario from Part 1 - fourteen agents, nobody can answer, is no longer possible. Every agent has a schema record, an owner, a declared risk class, an explicit autonomy level, governed access to tools and data, and an execution trail that can be reconstructed through MLflow and platform audit logs. The registry is not a reporting artefact. It is the control plane that makes autonomous operation in a regulated environment defensible.</p><div><hr></div><p><em>P.S. If you&#8217;re new here - <strong>welcome</strong> &#127881;. AgentBuild is a community of practitioners working through the real challenges of getting AI into production inside large organisations. Every week I share practical, grounded thinking from the people doing this work at the sharp end. The goal is never theory - it&#8217;s always: what can you use Monday morning.</em><br><br>Talk soon,<br>Sandi.</p><div><hr></div><p><em><strong>Ask your friends to join.</strong><br><br>More valuable content coming your way.</em></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://newsletter.agentbuild.ai/?utm_source=substack&amp;utm_medium=email&amp;utm_content=share&amp;action=share&quot;,&quot;text&quot;:&quot;Share agentbuild.ai&quot;,&quot;action&quot;:null,&quot;class&quot;:&quot;button-wrapper&quot;}" data-component-name="ButtonCreateButton"><a class="button primary button-wrapper" href="https://newsletter.agentbuild.ai/?utm_source=substack&amp;utm_medium=email&amp;utm_content=share&amp;action=share"><span>Share agentbuild.ai</span></a></p><p>Thanks for reading agentbuild.ai! Subscribe for free to receive new posts and support my work.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://newsletter.agentbuild.ai/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading agentbuild.ai! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[How to Build an AI Agent Registry — Part 1: Schema and Ownership ]]></title><description><![CDATA[This week: Agent Registry - The infrastructure layer that turns an agent sprawl problem into a governance capability. What is it? how to build it?]]></description><link>https://newsletter.agentbuild.ai/p/how-to-build-an-ai-agent-registry</link><guid isPermaLink="false">https://newsletter.agentbuild.ai/p/how-to-build-an-ai-agent-registry</guid><dc:creator><![CDATA[Sandipan Bhaumik]]></dc:creator><pubDate>Sat, 06 Jun 2026 13:01:27 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/dcff23d1-a3ff-456d-950a-d704252082f7_1280x720.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h3>The Problem</h3><p>An organisation deploys 50 agents across 6 teams over the course of a year. Without any central register, wihtout any ownership mapping, wihtout any shared schema.</p><p>Then the audit request arrives.</p><p>They ask the following questions:</p><ul><li><p>Which agents are live? </p></li><li><p>Who approved the one touching customer financial data? </p></li><li><p>What has it been told in its system prompt? <br></p></li></ul><p>Nobody could answer.<br><br>This is not a technology failure. The agents worked fine but the governance infrastructure was never built.</p><div><hr></div><h3>A Note on the choice of technology</h3><p>Before we process, I want to make a note on my technology choices here. The architecture in this issue is Databricks-on-AWS. That is not a neutral choice, but it reflects where I spend most of my time. I work with Tier 1 UK financial institutions, and the patterns I see repeatedly across those engagements have shaped this stack.</p><p>Databricks gives me Unity Catalog for governance, lineage, and access control across the full data and AI estate. AWS gives me IAM for identity and CloudTrail for audit trails. These tools are production-stable, work in hybrid deployments, and hold up under compliance scrutiny. Where open source alternatives exist and are worth knowing about, I have noted them inline.</p><p>Where open source alternatives exist and are worth knowing about, I have noted them inline. But the primary recommendation here is the stack I have seen hold up under real compliance scrutiny.</p><div><hr></div><h3>What an Agent Registry Actually Is</h3><p>An Agent Registry is the control layer that makes an agent estate governable. It is not a dashboard. It is not a catalog you update manually. It is infrastructure, the layer that sits beneath your agents and answers the questions an audit demands: what exists, who owns it, what it can do, and how much autonomy it operates with.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!cYGa!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd5e2a288-6e7e-4e83-8dc3-9074689cd41e_936x1384.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!cYGa!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd5e2a288-6e7e-4e83-8dc3-9074689cd41e_936x1384.png 424w, https://substackcdn.com/image/fetch/$s_!cYGa!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd5e2a288-6e7e-4e83-8dc3-9074689cd41e_936x1384.png 848w, https://substackcdn.com/image/fetch/$s_!cYGa!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd5e2a288-6e7e-4e83-8dc3-9074689cd41e_936x1384.png 1272w, https://substackcdn.com/image/fetch/$s_!cYGa!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd5e2a288-6e7e-4e83-8dc3-9074689cd41e_936x1384.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!cYGa!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd5e2a288-6e7e-4e83-8dc3-9074689cd41e_936x1384.png" width="936" height="1384" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d5e2a288-6e7e-4e83-8dc3-9074689cd41e_936x1384.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1384,&quot;width&quot;:936,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1082439,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.agentbuild.ai/i/200870763?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd5e2a288-6e7e-4e83-8dc3-9074689cd41e_936x1384.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!cYGa!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd5e2a288-6e7e-4e83-8dc3-9074689cd41e_936x1384.png 424w, https://substackcdn.com/image/fetch/$s_!cYGa!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd5e2a288-6e7e-4e83-8dc3-9074689cd41e_936x1384.png 848w, https://substackcdn.com/image/fetch/$s_!cYGa!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd5e2a288-6e7e-4e83-8dc3-9074689cd41e_936x1384.png 1272w, https://substackcdn.com/image/fetch/$s_!cYGa!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd5e2a288-6e7e-4e83-8dc3-9074689cd41e_936x1384.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">I posted this on <a href="https://www.linkedin.com/posts/sandipanbhaumik_an-audit-request-arrives-%F0%9D%9F%8F%F0%9D%9F%92-%F0%9D%90%9A%F0%9D%90%A0%F0%9D%90%9E%F0%9D%90%A7%F0%9D%90%AD-activity-7468582493735260160-IoDM?utm_source=share&amp;utm_medium=member_desktop&amp;rcm=ACoAAASFSfgBvs8z6304VU0bKtIIWJHdDqqIil4">LinkedIn</a></figcaption></figure></div><p><strong>It has four components.</strong> This issue covers the first two: Schema and Ownership. Part 2, out on Wednesday, covers Risk Class and Autonomy Level - <em>the enforcement layer.</em></p><p>I am splitting this deliberately. </p><ul><li><p>Schema and Ownership are about description and accountability - what an agent is and who controls it.</p></li><li><p>Risk Class and Autonomy Level are about enforcement - what an agent is allowed to do and what stops it. </p></li></ul><p>These are different engineering problems. Collapsing them into one piece does neither justice.</p><div><hr></div><h3>Component 1: Schema &#8212; How an Agent Describes Itself</h3><p>Schema is the agent&#8217;s self-declaration. Without it, agents are opaque. Orchestration breaks. Integration fails silently. You cannot route work to an agent you cannot describe.</p><p>A schema record needs to capture: capabilities and skills, the APIs and tools the agent can call, input and output formats, memory and context handling behaviour, access permissions, and communication protocols.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!RZa9!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1a314844-1c5f-42ec-b862-baba73e51971_1568x1066.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!RZa9!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1a314844-1c5f-42ec-b862-baba73e51971_1568x1066.png 424w, https://substackcdn.com/image/fetch/$s_!RZa9!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1a314844-1c5f-42ec-b862-baba73e51971_1568x1066.png 848w, https://substackcdn.com/image/fetch/$s_!RZa9!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1a314844-1c5f-42ec-b862-baba73e51971_1568x1066.png 1272w, https://substackcdn.com/image/fetch/$s_!RZa9!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1a314844-1c5f-42ec-b862-baba73e51971_1568x1066.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!RZa9!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1a314844-1c5f-42ec-b862-baba73e51971_1568x1066.png" width="1456" height="990" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/1a314844-1c5f-42ec-b862-baba73e51971_1568x1066.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:990,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:165120,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.agentbuild.ai/i/200870763?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1a314844-1c5f-42ec-b862-baba73e51971_1568x1066.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!RZa9!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1a314844-1c5f-42ec-b862-baba73e51971_1568x1066.png 424w, https://substackcdn.com/image/fetch/$s_!RZa9!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1a314844-1c5f-42ec-b862-baba73e51971_1568x1066.png 848w, https://substackcdn.com/image/fetch/$s_!RZa9!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1a314844-1c5f-42ec-b862-baba73e51971_1568x1066.png 1272w, https://substackcdn.com/image/fetch/$s_!RZa9!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1a314844-1c5f-42ec-b862-baba73e51971_1568x1066.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Agent Schema Components</figcaption></figure></div><p><a href="https://www.databricks.com/product/unity-catalog">Unity Catalog (UC)</a> is the agent registry in this stack. It governs agent tools as registered, versioned functions - each one a securable object with access control, lineage tracking, and metadata surfaced through <a href="https://docs.databricks.com/aws/en/catalog-explorer/">Catalog Explorer</a> or the REST API. It supports attribute-based access control on tags, which means you can attach metadata directly to agent capability records and build access policies around them. If you think about it, UC becomes the same governance layer that already covers your data assets, your ML models, and your pipelines, just extended to agents without adding a separate system.</p><p>Unity Catalog also works across Databricks workspaces on AWS, Azure, and GCP. If your agent estate spans cloud environments, governance travels with it through the same control plane. That&#8217;s powerful.</p><p><strong><a href="https://aws.amazon.com/blogs/machine-learning/the-future-of-managing-agents-at-scale-aws-agent-registry-now-in-preview/">AWS AgentCore Agent Registry</a> </strong>is worth knowing about as a complementary discovery layer. It stores agent records across frameworks and clouds and supports semantic search and approval workflows - and is useful if your estate includes agents running entirely outside Databricks and you need a single cross-platform catalogue. It is a discoverability tool, not a governance layer. For most <strong>Databricks-on-AWS </strong>deployments, Unity Catalog covers the registry function without it.</p><p><em><strong>Open source alternative: </strong>purpose-built open source agent registry tooling is immature right now. The practical path is a lightweight service catalogue like Backstage for discoverability combined with OPA for policy enforcement - neither of which was designed for agents, but both of which work today without significant custom engineering. I have found this <a href="https://github.com/agentoperations/agent-registry">GitHub Repo</a>, I never tried it or even explored it - might you might want to have a look.<br><br>If you find one or know one - let me know in comments.</em></p><div><hr></div><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!ibdV!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F69b0c877-4acf-49a1-87ff-218f36e24e15_1660x1278.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!ibdV!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F69b0c877-4acf-49a1-87ff-218f36e24e15_1660x1278.png 424w, https://substackcdn.com/image/fetch/$s_!ibdV!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F69b0c877-4acf-49a1-87ff-218f36e24e15_1660x1278.png 848w, https://substackcdn.com/image/fetch/$s_!ibdV!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F69b0c877-4acf-49a1-87ff-218f36e24e15_1660x1278.png 1272w, https://substackcdn.com/image/fetch/$s_!ibdV!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F69b0c877-4acf-49a1-87ff-218f36e24e15_1660x1278.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!ibdV!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F69b0c877-4acf-49a1-87ff-218f36e24e15_1660x1278.png" width="1456" height="1121" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/69b0c877-4acf-49a1-87ff-218f36e24e15_1660x1278.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1121,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:217939,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.agentbuild.ai/i/200870763?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F69b0c877-4acf-49a1-87ff-218f36e24e15_1660x1278.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!ibdV!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F69b0c877-4acf-49a1-87ff-218f36e24e15_1660x1278.png 424w, https://substackcdn.com/image/fetch/$s_!ibdV!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F69b0c877-4acf-49a1-87ff-218f36e24e15_1660x1278.png 848w, https://substackcdn.com/image/fetch/$s_!ibdV!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F69b0c877-4acf-49a1-87ff-218f36e24e15_1660x1278.png 1272w, https://substackcdn.com/image/fetch/$s_!ibdV!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F69b0c877-4acf-49a1-87ff-218f36e24e15_1660x1278.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><div><hr></div><h3>Component 2: Ownership &#8212; Who Controls the Agent</h3><p>Ownership is where shadow AI comes from. An agent without an owner is an agent nobody is responsible for. In regulated environments, that is not an abstract risk.</p><p>An ownership record needs to capture: the team or business owner, accountability mapping, the access governance model, the escalation path, approval authority, and lifecycle responsibility - including who decommissions the agent and when.</p><p>On the identity side, AWS IAM is the enforcement mechanism. Each agent should operate under a dedicated IAM role with least-privilege permissions - no shared credentials, no roles that accumulate access over time. <a href="https://aws.amazon.com/blogs/security/iam-policy-autopilot-an-open-source-tool-that-brings-iam-policy-expertise-to-builders-and-ai-coding-assistants/">IAM Policy Autopilot</a>, allows AI coding tools to generate baseline IAM policies directly from application code, reducing the gap between what an agent was built to do and the permissions it actually holds.</p><p>Accountability trails run through <a href="https://aws.amazon.com/cloudtrail/">AWS CloudTrail</a>. Every registry access and administrative action is logged. In AgentCore Agent Registry, CloudTrail integration is built in, meaning you have an auditable record of who approved an agent, when it was registered, and when its record was last modified.</p><p>Unity Catalog extends this further. Audit logs capture every agent action. <a href="https://www.databricks.com/product/artificial-intelligence/ai-gateway">Unity AI Gateway</a> release introduced MCP server governance - controlling which agents can access which external systems and tracking how that data is used. Ownership in the registry is not just a field in a database. It maps directly to the IAM role, the Unity Catalog access policy, and the audit trail.</p><p><em><strong>Open source alternative: </strong><a href="https://www.openpolicyagent.org/">Open Policy Agent (OPA)</a> provides policy enforcement for ownership and access governance in environments not running on the Databricks or AWS managed stack. It is cloud-agnostic and widely used in regulated industries.<br><br>If you find good open-source stack please comment here. I am looking for them as well.</em></p><div><hr></div><h3>Coming Wednesday: Risk Class and Autonomy Level</h3><p>Schema and Ownership tell you what an agent is and who is responsible for it. </p><p>That is necessary but not sufficient.</p><p>The harder question is what an agent is allowed to do and what stops it when it operates outside its boundaries.</p><p>Part 2 covers Risk Class: how risk classification moves from a metadata tag in Unity Catalog to a live runtime control via Unity AI Gateway and AgentCore Policy. I will also cover Autonomy Level, the scale from assistive-only (L0) to multi-agent autonomy (L4), and the kill switch infrastructure that makes higher autonomy levels safe enough to deploy in a regulated environment.</p><p>This is the part of the registry most organisations skip entirely. It is also the part that determines whether your governance is real or decorative.<br><br>Talk soon,<br>Sandi.</p><div><hr></div><p><em>P.S. If you&#8217;re new here - <strong>welcome</strong> &#127881;. AgentBuild is a community of practitioners working through the real challenges of getting AI into production inside large organisations. Every week I share practical, grounded thinking from the people doing this work at the sharp end. The goal is never theory - it&#8217;s always: what can you use Monday morning.</em></p><div><hr></div><p><em><strong>Ask your friends to join.</strong><br><br>More valuable content coming your way.</em></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://newsletter.agentbuild.ai/?utm_source=substack&amp;utm_medium=email&amp;utm_content=share&amp;action=share&quot;,&quot;text&quot;:&quot;Share agentbuild.ai&quot;,&quot;action&quot;:null,&quot;class&quot;:&quot;button-wrapper&quot;}" data-component-name="ButtonCreateButton"><a class="button primary button-wrapper" href="https://newsletter.agentbuild.ai/?utm_source=substack&amp;utm_medium=email&amp;utm_content=share&amp;action=share"><span>Share agentbuild.ai</span></a></p><p>Thanks for reading agentbuild.ai! Subscribe for free to receive new posts and support my work.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://newsletter.agentbuild.ai/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading agentbuild.ai! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[How to Fund the Right AI Use Case ]]></title><description><![CDATA[This week: New video on YouTube - how to find the right use-cases to fund your AI initiatives. How to make data-driven decisions based on infrastrcuture gap and business value.]]></description><link>https://newsletter.agentbuild.ai/p/how-to-fund-the-right-ai-use-case</link><guid isPermaLink="false">https://newsletter.agentbuild.ai/p/how-to-fund-the-right-ai-use-case</guid><dc:creator><![CDATA[Sandipan Bhaumik]]></dc:creator><pubDate>Sat, 30 May 2026 13:03:14 GMT</pubDate><enclosure url="https://substackcdn.com/image/youtube/w_728,c_limit/TPfHtbTne78" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Hey everyone,</p><p>Mnay organisations I talk to gathers 50-100 AI use cases. They hold a big &#8220;priority workshop.&#8221; They vote on business value, strategic fit, and stakeholder excitement. They pick the top three, hand them to the engineering team, and wait for the magic to happen.</p><p>Six months later, the project is quietly shelved. </p><p>The reason? &#8220;The data infrastructure doesn&#8217;t support it.&#8221;</p><p>In my latest video, I talk about why this happens and more importantly, how to stop it using a framework I&#8217;ve been developing for production AI.<br><br></p><div id="youtube2-TPfHtbTne78" class="youtube-wrap" data-attrs="{&quot;videoId&quot;:&quot;TPfHtbTne78&quot;,&quot;startTime&quot;:null,&quot;endTime&quot;:null}" data-component-name="Youtube2ToDOM"><div class="youtube-inner"><iframe src="https://www.youtube-nocookie.com/embed/TPfHtbTne78?rel=0&amp;autoplay=0&amp;showinfo=0&amp;enablejsapi=0" frameborder="0" loading="lazy" gesture="media" allow="autoplay; fullscreen" allowautoplay="true" allowfullscreen="true" width="728" height="409"></iframe></div></div><div><hr></div><h3>The &#8220;Human-to-Agent&#8221; Gap</h3><p>The core problem is something I call the &#8220;Human vs. Agent&#8221; data standard.</p><p>As technical people, we often forget that data built for a human analyst is completely different from data built for an AI agent. </p><ul><li><p><strong>Humans</strong> can reconcile messy schemas and ask a colleague for help.</p></li><li><p><strong>AI Agents</strong> cannot. They either fail, or worse, they fail silently.</p></li></ul><p>If your infrastructure is built for BI dashboards, it simply won&#8217;t sustain a fully autonomous AI agent.</p><div><hr></div><h3>The 8-Dimension Assessment</h3><p>To fix this, I&#8217;ve broken down AI readiness into <strong>8 key dimensions</strong> across two groups:</p><p>1.  <strong>Data Infrastructure</strong>: Consumers, Access Latency, Schema Rigor, and Metadata.</p><p>2.  <strong>AI Operations</strong>: Error Handling, Memory/State, Evaluation, and Observability.</p><p>The goal isn&#8217;t to be &#8220;perfect&#8221; at all of them. The goal is to be <strong>honest</strong>. When you score your current setup (1 to 4) against what a use case actually requires, the &#8220;readiness&#8221; of your project stops being a debate and starts being a number.</p><div><hr></div><h3>Finding Your &#8220;Beachhead&#8221;</h3><p>The most actionable part of this framework is finding your <strong>Beachhead Use Case</strong>. </p><p>A beachhead isn&#8217;t your most ambitious goal (like a fully autonomous loan negotiator). It&#8217;s the use case where:</p><p>1.  The infrastructure gap is <strong>small</strong> (you can ship in weeks, not months).</p><p>2.  The business value is <strong>standalone</strong> (it pays for itself immediately).</p><p>By starting here, you aren&#8217;t just &#8220;doing a pilot&#8221; - you&#8217;re building the foundation that makes the ambitious stuff possible later.</p><div><hr></div><h3>Want to run this assessment yourself?</h3><p>I&#8217;ve put together a full walkthrough of these 8 dimensions and how to build your own &#8220;Gap Map.&#8221; If you&#8217;re tired of the &#8220;AI priority workshops&#8221; that lead nowhere, this might be the most useful 15 minutes of your week.</p><p><strong><a href="https://docs.google.com/spreadsheets/d/1i6e4Dfa543HuGCc4nrWWSXtKcZ5-ph-YiKRFDz1QMoc/edit?usp=drive_link">Check out the full breakdown here.</a></strong><br></p><p>I&#8217;d love to hear your thoughts&#8212;drop a comment on the video and let me know which of the 8 dimensions is currently the biggest blocker in your org.</p><p>Have a great weekend,</p><p>Sandi.</p><div><hr></div><p><em>P.S. If you&#8217;re new here - <strong>welcome</strong> &#127881;. AgentBuild is a community of practitioners working through the real challenges of getting AI into production inside large organisations. Every week I share practical, grounded thinking from the people doing this work at the sharp end. The goal is never theory - it&#8217;s always: what can you use Monday morning.</em></p><div><hr></div><p><em><strong>Ask your friends to join.</strong><br><br>More valuable content coming your way.</em></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://newsletter.agentbuild.ai/?utm_source=substack&amp;utm_medium=email&amp;utm_content=share&amp;action=share&quot;,&quot;text&quot;:&quot;Share agentbuild.ai&quot;,&quot;action&quot;:null,&quot;class&quot;:&quot;button-wrapper&quot;}" data-component-name="ButtonCreateButton"><a class="button primary button-wrapper" href="https://newsletter.agentbuild.ai/?utm_source=substack&amp;utm_medium=email&amp;utm_content=share&amp;action=share"><span>Share agentbuild.ai</span></a></p><p>Thanks for reading agentbuild.ai! Subscribe for free to receive new posts and support my work.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://newsletter.agentbuild.ai/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading agentbuild.ai! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[MCP: What’s Actually Working, What’s Breaking, and How to Do It Right]]></title><description><![CDATA[This week: Honest look at the Model Context Protocol and what history says about where this goes next. A decision tree, an architecture pattern, and several best practices.]]></description><link>https://newsletter.agentbuild.ai/p/mcp-whats-actually-working-whats</link><guid isPermaLink="false">https://newsletter.agentbuild.ai/p/mcp-whats-actually-working-whats</guid><dc:creator><![CDATA[Sandipan Bhaumik]]></dc:creator><pubDate>Sat, 23 May 2026 13:01:41 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/cb30b2a9-2996-402a-aef2-b89d112fe862_1920x1080.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p><em>This article is too long for email - it might have been truncated. Please read on Substack.</em><strong><br><br>TL;DR</strong></p><ul><li><p>MCP is the right abstraction for standardising tool access across multiple agents, but most teams are deploying it without the governance it needs</p></li><li><p>Five failure modes keep appearing in regulated environments: hardcoded credentials, no authorisation layer between model and tool, invisible tool calls, server sprawl, and untracked data residency</p></li><li><p>MCP is not always the right choice. Direct function calling, existing APIs, and async queues are better fits for several common patterns</p></li><li><p>The fix isn&#8217;t complicated: treat MCP servers as infrastructure, instrument every tool call at the boundary, and never let the model be your policy engine</p></li></ul><div><hr></div><h2>MCP is at that USB moment</h2><p>Something shifted in 2024. Developers started asking, &#8220;can we connect an LLM to our tools?&#8221; and then wondering, &#8220;how do we do it without building a different integration for every model, every framework, every team?&#8221;</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://newsletter.agentbuild.ai/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading agentbuild.ai! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p>These are the questions MCP answers. The <a href="https://modelcontextprotocol.io/docs/getting-started/intro">Model Context Protocol</a>, originally developed by Anthropic and released in November 2024, and now gaining ground as a <em>de facto</em> standard.</p><p>By mid-2025, MCP had moved from research-adjacent to actively deployed. GitHub Copilot, Cursor, Claude, and a growing list of enterprise agent frameworks had either adopted or announced support. The server ecosystem - registries, SDKs in Python and TypeScript, community-contributed connectors for everything from PostgreSQL to Salesforce had expanded fast. Almost every other product has its MCP.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!OVSZ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8382e34c-7c87-49a5-adcb-a78880f2a186_1095x615.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!OVSZ!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8382e34c-7c87-49a5-adcb-a78880f2a186_1095x615.png 424w, https://substackcdn.com/image/fetch/$s_!OVSZ!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8382e34c-7c87-49a5-adcb-a78880f2a186_1095x615.png 848w, https://substackcdn.com/image/fetch/$s_!OVSZ!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8382e34c-7c87-49a5-adcb-a78880f2a186_1095x615.png 1272w, https://substackcdn.com/image/fetch/$s_!OVSZ!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8382e34c-7c87-49a5-adcb-a78880f2a186_1095x615.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!OVSZ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8382e34c-7c87-49a5-adcb-a78880f2a186_1095x615.png" width="1095" height="615" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/8382e34c-7c87-49a5-adcb-a78880f2a186_1095x615.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:615,&quot;width&quot;:1095,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!OVSZ!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8382e34c-7c87-49a5-adcb-a78880f2a186_1095x615.png 424w, https://substackcdn.com/image/fetch/$s_!OVSZ!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8382e34c-7c87-49a5-adcb-a78880f2a186_1095x615.png 848w, https://substackcdn.com/image/fetch/$s_!OVSZ!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8382e34c-7c87-49a5-adcb-a78880f2a186_1095x615.png 1272w, https://substackcdn.com/image/fetch/$s_!OVSZ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8382e34c-7c87-49a5-adcb-a78880f2a186_1095x615.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>That speed should give you pause.</p><p><strong>This is a pattern we&#8217;ve seen before. </strong>When USB was introduced in 1996, it solved a real problem: a dozen incompatible port types on the back of every PC. It was the right abstraction. And within a few years, &#8220;plug and play&#8221; had become a running joke because the driver ecosystem moved faster than the discipline around it. Devices connected. Systems crashed. Enterprise IT spent years cleaning up what consumer enthusiasm had shipped. While the protocol was fine, the deployment culture was not.</p><p><strong>MCP is at that USB moment.</strong> The abstraction is right. The ecosystem is moving faster than the engineering rigour around it. Teams are shipping MCP servers in sprints, demoing them to CTOs, and having them in production eight weeks later. Six months after that, nobody can tell you what tools the agent is calling, the credentials haven&#8217;t been rotated since go-live, and there is no audit trail that would survive a compliance review.</p><p>The protocol didn&#8217;t fail them. The deployment pattern did.</p><p>This article is about the difference.</p><div><hr></div><h2>Tool Definition: A Contract with Non-Determinism</h2><p>The fundamental difference between an MCP tool and a standard API is that it is a contract between a traditional, deterministic backend system and a non-deterministic LLM agent. Standard API engineering assumes a consumer will call a function exactly as documented. Agent tooling requires developers to accept that the model will interpret the description and choose the arguments. This non-deterministic usage is why building for agents requires a higher degree of protective rigor around entitlement, validation, and audit than standard integration patterns.</p><div><hr></div><h2>What MCP gets right?</h2><p>MCP is solving a real problem, and the core of it is genuinely well-designed.</p><h4>The integration tax is real, and MCP eliminates it</h4><p>Before MCP, connecting an agent to a tool meant writing bespoke integration code. Every model had its own function-calling format. Every framework had its own abstraction layer. If you wanted to switch from one agent framework to another, you rebuilt your integrations. If you wanted the same capability accessible across multiple agents, you duplicated the logic and prayed for consistency.</p><p>Without a shared protocol, every team builds its own integration in the shape of its own constraints - its available libraries, its preferred auth pattern, its interpretation of what the downstream system needs. The result is integration sprawl that compounds with every new team that touches the same system. MCP breaks that coupling. A single MCP server exposes a set of tools, resources, and prompts under a standardised interface. Any MCP-compatible client regardless of which model or framework it uses, can discover and invoke those tools through the same protocol. Write once, expose everywhere.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!GaJV!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F56224518-2b04-4b43-8f22-603903daa70d_1108x533.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!GaJV!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F56224518-2b04-4b43-8f22-603903daa70d_1108x533.png 424w, https://substackcdn.com/image/fetch/$s_!GaJV!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F56224518-2b04-4b43-8f22-603903daa70d_1108x533.png 848w, https://substackcdn.com/image/fetch/$s_!GaJV!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F56224518-2b04-4b43-8f22-603903daa70d_1108x533.png 1272w, https://substackcdn.com/image/fetch/$s_!GaJV!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F56224518-2b04-4b43-8f22-603903daa70d_1108x533.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!GaJV!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F56224518-2b04-4b43-8f22-603903daa70d_1108x533.png" width="1108" height="533" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/56224518-2b04-4b43-8f22-603903daa70d_1108x533.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:533,&quot;width&quot;:1108,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!GaJV!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F56224518-2b04-4b43-8f22-603903daa70d_1108x533.png 424w, https://substackcdn.com/image/fetch/$s_!GaJV!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F56224518-2b04-4b43-8f22-603903daa70d_1108x533.png 848w, https://substackcdn.com/image/fetch/$s_!GaJV!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F56224518-2b04-4b43-8f22-603903daa70d_1108x533.png 1272w, https://substackcdn.com/image/fetch/$s_!GaJV!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F56224518-2b04-4b43-8f22-603903daa70d_1108x533.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>This is what happened when REST became the default for web APIs in the early 2000s, displacing the chaos of SOAP, WSDL, and proprietary RPC formats. REST won because it was simple enough that teams could independently build to the same standard and have things actually work. MCP is attempting the same move at the agent-tool layer.</p><h4>Runtime capability discovery changes the architecture</h4><p>One of the underappreciated features of MCP is the tools/list endpoint. Think of it like DNS for tools - you don&#8217;t hardcode IP addresses into your application, you resolve them at runtime. An agent queries the server on startup, gets a schema-described list of available tools, and decides which to invoke based on the task. Your agent architecture can evolve without redeployment every time a tool is added or changed.</p><h4>Where it genuinely shines</h4><p>MCP is strongest when the problem is standardised access to well-defined internal systems. Four use cases stand out:</p><ul><li><p><strong>Internal tool registries.</strong> A platform team builds and owns MCP servers for canonical internal capabilities - search, data retrieval, workflow triggers. Agent teams consume them without needing to understand the underlying integration. The boundary is clean, the ownership is clear, and the interface is versioned. This is the internal developer platform model applied to agent tooling.</p></li><li><p><strong>Governed data catalogue access.</strong> An agent needs to query dataset metadata, lineage, or schema. Exposing a data catalog (Unity Catalog, Alation, Collibra, DataHub) via MCP gives the agent a structured, permissioned interface without direct database access. The server enforces what the agent can see. This matters enormously in financial services, where an agent browsing raw schema can inadvertently surface data it has no business touching.</p></li><li><p><strong>Regulated workflow triggers.</strong> An agent initiates a downstream process - raises a ticket, submits a form, triggers a notification. MCP provides a typed, auditable interface for those triggers. The tool schema documents exactly what inputs are required; the server enforces them. The schema is the contract.</p></li><li><p><strong>Multi-agent orchestration.</strong> A supervisor agent delegates to specialist subagents, each with its own MCP server exposing its capabilities. The supervisor discovers what each subagent can do and orchestrates accordingly. This is where capability discovery really earns its keep; it makes composition between agents tractable without tight coupling.</p></li><li><p><strong>Tool Design for LLM Token Efficiency.</strong> The effectiveness of a tool is measured by its use within the LLM's context window. Prioritize tool interfaces that minimize the total volume of tokens consumed. For instance, prefer a <code>search_datasets(query: str)</code> tool to a generic <code>list_all_datasets()</code> tool. Furthermore, design tools with parameters for pagination and truncation to ensure the tool output - the data returned to the LLM is as concise and high-signal as possible. Refer to Anthropic&#8217;s blog: <a href="https://www.anthropic.com/engineering/writing-tools-for-agents">Writing effective tools for agents &#8212; with agents</a></p></li></ul><p>A minimal MCP server for a governed data catalog wraps your catalog API behind a typed search_datasets tool enforcing domain filters, capping result limits, and keeping the agent away from raw schema access entirely. The tool definition is the contract; the server enforces it.</p><p>This is MCP doing what it&#8217;s designed for.</p><div><hr></div><h2>What risks MCP brings?</h2><p>MCP moves fast to production. The failure modes tend to follow shortly after because the protocol makes it easy to ship something that works in a POC before the operational questions have been answered.</p><p>These are the five patterns you should understand.</p><h4>Developers often forget &#8220;credential best practices&#8221;</h4><p>Hardcoded credentials are not a new problem. What MCP changes is the rate at which new service boundaries get created. Spinning up an MCP server takes minutes - a few lines of Python, a decorator, done. That speed means teams are creating new integration points faster than their credential management habits have been built to handle. The result is more hardcoded credentials, in more places, with less visibility than traditional integration patterns would produce.</p><p>In a UK financial institution, this creates obligations that a missing credential policy would directly fail to meet under the Digital Operational Resilience Act (DORA). A credential with no owner and no rotation policy fails that bar.</p><p><strong>DO NOT DO THIS: hardcoded credential in MCP server</strong></p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;python&quot;,&quot;nodeId&quot;:&quot;1877be29-8bb0-4052-893a-e17311eeae2c&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-python">DATABASE_URL = &#8220;postgresql://svc_agent:SuperSecret123@prod-db:5432/customers&#8221;</code></pre></div><p>The credential is now wherever this server runs. If the server is containerised, the credential is in the image or the environment. If the image is pushed to a registry, it may be in the layer history. The blast radius of a compromise is the entire downstream system, not just the agent. This is the digital equivalent of writing your vault combination on a Post-it and sticking it to the outside of the vault.<br></p><h4>Models usually have access to every tool in an MCP server - this is not <em>least priviledge</em></h4><p>MCP puts tool invocation decisions in the hands of the model. The model reads the tool schemas, decides which tool to call, and constructs the arguments. Nothing in the base protocol validates whether that decision was appropriate, whether the arguments are safe, or whether the calling agent had the entitlement to invoke that tool for that user in that context.</p><blockquote><p>This is Saltzer and Schroeder&#8217;s <em>principle of least privilege</em> - articulated in their 1975 paper <a href="https://www.cs.virginia.edu/~evans/cs551/saltzer/">&#8220;The Protection of Information in Computer Systems&#8221;</a> and <strong>still the foundation of access control design</strong>. <br><br>The principle states that every component should operate with only the permissions it actually needs. A model that has access to every tool in an MCP server, for every user, at all times, is a maximal privilege configuration. It is the opposite of least privilege.</p></blockquote><p>In regulated industries, this matters concretely. An agent that can invoke a transfer_funds or update_credit_limit tool should not be making that invocation based solely on what the model infers from a user message. The trust chain is broken.</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;python&quot;,&quot;nodeId&quot;:&quot;17c48e96-80ea-44f1-8b9f-a6bfccabeba2&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-python">python

# Dangerous: model output routes directly to tool execution

async def run_agent(user_message: str, user_id: str):

    response = anthropic_client.messages.create(

        model=&#8221;claude-sonnet-4-20250514&#8221;,

        tools=mcp_tools,  # All tools. No entitlement check. No context.

        messages=[{&#8221;role&#8221;: &#8220;user&#8221;, &#8220;content&#8221;: user_message}]

    )

    for block in response.content:

        if block.type == &#8220;tool_use&#8221;:

            # The model decided. The server executes. Nothing in between.

            result = await mcp_session.call_tool(block.name, block.input)
</code></pre></div><p>This code grants the model access to all tools (<code>tools=mcp_tools</code>), establishing a maximal privilege configuration. The model&#8217;s <code>tool\_use</code> block is immediately trusted as the final decision, bypassing any policy or validation check. Execution proceeds directly to <code>mcp\_session.call\_tool</code> without confirming the user&#8217;s entitlement or context.</p><blockquote><p>The model is a reasoning engine. It is not a policy engine. These are different things, and conflating them is how you end up with agents doing things nobody authorised them to do.</p></blockquote><p></p><h4>MCP invocations still require explicit instrumentation to be traced end&#8209;to&#8209;end</h4><p>Any RPC to a separate process requires explicit instrumentation to appear in your trace - that is not unique to MCP. What makes MCP different is that the protocol is new enough that most observability platforms have no native integration for it yet. With a mature HTTP or gRPC stack, there is a reasonable chance your tracing library auto-instruments at the transport layer. With MCP, there is not. Teams adopting it now are on their own.</p><p>An MCP tool call is an RPC to a separate process. Without explicit instrumentation, that call disappears from your trace. You can see the model&#8217;s input and output. You cannot see which tool was called, with what arguments, what the server returned, how long it took, or whether it failed. This is the observability equivalent of a black box flight recorder that stops recording five minutes before the crash. You have most of the data. You&#8217;re missing exactly the part that matters.</p><p>For regulated deployments, this is an audit problem. The FCA&#8217;s Senior Managers and Certification Regime (SM&amp;CR) creates personal accountability for outcomes. If an agent made a decision that affected a customer - a credit flag, a document retrieval, a workflow trigger - and that decision was influenced by a tool call, you need to reconstruct exactly what the tool returned. If the tool call isn&#8217;t in your trace, you cannot reconstruct it. &#8220;The model did it&#8221; is not an explanation that satisfies a regulator.</p><p>The fix lies in instrumentation at the MCP boundary, not inside the server. This is shown in the MCPGateway pattern below.</p><p></p><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;6d4f435e-42df-4be4-8d92-e08b88f403d4&quot;,&quot;caption&quot;:&quot;Hey everyone,&quot;,&quot;cta&quot;:null,&quot;showBylines&quot;:true,&quot;showDescription&quot;:true,&quot;showImage&quot;:true,&quot;size&quot;:&quot;sm&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;Decision Traces: The Missing Black Box &#9992;&#65039; for AI Agents&quot;,&quot;publishedBylines&quot;:[],&quot;post_date&quot;:&quot;2026-04-18T13:31:15.874Z&quot;,&quot;cover_image&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d180d4d2-5f06-49f5-986e-d83ffdedf651_1280x720.png&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://newsletter.agentbuild.ai/p/decision-traces-the-missing-black&quot;,&quot;section_name&quot;:null,&quot;video_upload_id&quot;:null,&quot;id&quot;:194595643,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:1,&quot;comment_count&quot;:0,&quot;publication_id&quot;:2211527,&quot;publication_name&quot;:&quot;agentbuild.ai&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!OIBg!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3453368a-da00-4960-b174-e3313b941314_256x256.png&quot;,&quot;belowTheFold&quot;:true,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div><p></p><h4>Every team builds one, no one owns them - there is no established governance for MCP servers</h4><blockquote><p>In a large organisation, the absence of a shared standard is a vacuum that teams fill independently - each making a locally rational decision that creates a globally irrational system.</p></blockquote><p><strong>The MCP equivalent: </strong>without governance, every team designs its own simple system. The risk team builds an MCP server for their data warehouse. The finance team builds a different MCP server for the same data warehouse with different auth. The platform team builds a third server that partially overlaps with both. <strong>The agent now sees twelve tools that do variations of the same thing. </strong>The people who built them have moved on. No deprecation path exists. This is not a hypothetical - it is the same pattern that played out with internal REST APIs at most large organisations that adopted microservices without a service catalogue, and it is already starting to repeat with MCP.</p><blockquote><p>The structural cause is <a href="https://www.laws-of-software.com/laws/hyrum/">Hyrum&#8217;s Law</a>: observed by Google engineer <a href="https://www.hyrumwright.org/">Hyrum Wright</a> and now widely referenced in software engineering: <em>&#8220;With a sufficient number of users of an API, it does not matter what you promise in the contract &#8212; all observable behaviours of your system will be depended on by somebody.&#8221;</em> <br><br>Once a team starts using your MCP server, they will depend on its undocumented behaviours. Deprecating it without governance becomes painful very quickly.</p></blockquote><p>Treating MCP servers as infrastructure from the first deployment - with an owner, a version, a changelog, and a deprecation policy is not bureaucracy. It is the thing that lets you move fast in two years without digging out from under your own sprawl.</p><h3>MCP doesn&#8217;t answer the data residency question </h3><p>MCP doesn't answer the data residency question, and neither should it, but because MCP makes it trivially easy to spin up a new server anywhere, <strong>teams are creating new data residency exposure points faster than they're tracking them</strong>.</p><p>This is the failure mode that almost never appears in ecosystem documentation and is the one most likely to cause a material incident in regulated industries.</p><p>When an agent calls an MCP tool, data flows in both directions: the arguments sent to the tool, and the response returned. In a regulated context, both can contain customer data, personally identifiable information, or material non-public information. The question of where that data flows - which process handles it, which logs capture it, which jurisdiction it transits through - is not answered by the protocol. <strong>That is not a criticism of MCP. It is simply a boundary you need to understand.</strong></p><blockquote><p>The MCP server is a process. That process can run anywhere. If it runs in a container in a region that is not approved for the data it is handling, you have a data residency violation before the tool even returns a result. If the tool call arguments are logged by intermediary infrastructure before reaching your server, you have a data handling question that needs a documented answer.</p></blockquote><p>For UK firms post-Brexit, this intersects with UK GDPR, FCA data governance expectations, and potentially the location requirements of your outsourcing arrangements. The compliance question is not &#8220;does MCP support data residency?&#8221; - it does not operate at that layer - but &#8220;can you trace every byte of this tool call, confirm where it went, and show it never left an approved boundary?&#8221;</p><p>That answer requires mapping the full data flow before deploying any MCP server that touches regulated data. Not after the first incident.</p><div><hr></div><h2>Do you actually need MCP?</h2><p>I see it commonly - every team jumps to the conclusion of using MCP very fast. MCP is infrastructure and infrastructure has a cost - operational overhead, governance burden, complexity. The Unix philosophy, articulated by Doug McIlroy in the early 1970s, puts it plainly: <em>&#8220;Write programs that do one thing and do it well.&#8221;</em> Before you introduce a protocol for interoperability, ask whether you actually need interoperability.</p><p>Here is a decision tree I like to use to help teams make that decision:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!sgfI!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F323ff459-d359-4154-9b70-7911bc77e93d_3714x1278.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!sgfI!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F323ff459-d359-4154-9b70-7911bc77e93d_3714x1278.png 424w, https://substackcdn.com/image/fetch/$s_!sgfI!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F323ff459-d359-4154-9b70-7911bc77e93d_3714x1278.png 848w, https://substackcdn.com/image/fetch/$s_!sgfI!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F323ff459-d359-4154-9b70-7911bc77e93d_3714x1278.png 1272w, https://substackcdn.com/image/fetch/$s_!sgfI!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F323ff459-d359-4154-9b70-7911bc77e93d_3714x1278.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!sgfI!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F323ff459-d359-4154-9b70-7911bc77e93d_3714x1278.png" width="1456" height="501" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/323ff459-d359-4154-9b70-7911bc77e93d_3714x1278.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:501,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:404630,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.agentbuild.ai/i/198907603?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F323ff459-d359-4154-9b70-7911bc77e93d_3714x1278.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!sgfI!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F323ff459-d359-4154-9b70-7911bc77e93d_3714x1278.png 424w, https://substackcdn.com/image/fetch/$s_!sgfI!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F323ff459-d359-4154-9b70-7911bc77e93d_3714x1278.png 848w, https://substackcdn.com/image/fetch/$s_!sgfI!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F323ff459-d359-4154-9b70-7911bc77e93d_3714x1278.png 1272w, https://substackcdn.com/image/fetch/$s_!sgfI!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F323ff459-d359-4154-9b70-7911bc77e93d_3714x1278.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Decision Tree: Do you need MCP?</figcaption></figure></div><p>Let&#8217;s make each branch concrete.</p><ul><li><p><strong>When direct function calling is the right answer</strong></p><p>If you have a single agent, a small number of tools, and no requirement for reuse across teams or frameworks, native function calling in the Anthropic SDK is simpler, cheaper, and easier to observe. No additional process boundary. No MCP server to maintain. No capability discovery overhead. Adding an MCP server here is the software equivalent of installing industrial plumbing to fill a kettle.<br></p></li><li><p><strong>When a well-governed existing API beats MCP</strong></p><p>If the downstream system already has a REST or gRPC API with proper authentication, rate limiting, observability, and documentation - a Salesforce API, an internal risk platform, a data catalogue with its own REST interface, wrapping it in an MCP server often adds a layer without adding value.<br></p><p><strong>The test is simple: </strong>does the MCP layer provide something the existing API does not? If the answer is capability discovery for agent consumption, standardised schema, or unified access across multiple systems, MCP earns its keep. If the answer is &#8220;it&#8217;s just a wrapper,&#8221; you&#8217;ve added a process boundary, a deployment artefact, and an operational dependency for no functional gain.<br></p></li><li><p><strong>When an async pattern is the right architecture</strong></p><p>MCP is synchronous request-response. The agent calls a tool, blocks, and waits. That is fine for fast, bounded operations. It is the wrong shape for long-running jobs (submit and poll, not block), audit-required fire-and-forget (a message queue with dead-letter handling gives you durability and replay that MCP cannot), and event-driven workflows where the agent should be consuming from a stream, not polling in a loop.</p></li></ul><p>MCP solves a specific problem well. The mistake is treating it as the default integration pattern for anything agent-related, rather than the right answer to a specific architectural question.</p><div><hr></div><h2>One pattern that works</h2><p>Let me show you an architecture pattern to address these failure modes above. First, let&#8217;s understand the few best practice that matter here. It a bit of recap, but worth a refresh.</p><ol><li><p><strong>MCP servers are infrastructure, not glue code.</strong> You need an owner, you need versioning, changelog, deprecation policy. It should be registered in your internal catalogue and deployed through the same pipeline as your other services. <br><br>&#8220;Who owns this MCP server?&#8221; should have a human name attached to it.</p></li><li><p><strong>Every tool call must be observable.</strong> Instrument at the boundary - in the layer between the orchestrator and the MCP session - not inside the server. The instrumentation wrapper is shared infrastructure, not something each team spends time re-implementing.</p></li><li><p><strong>The model never touches credentials or entitlements directly.</strong> There is always an authorisation layer between the model&#8217;s invocation decision and the tool&#8217;s execution. This is not optional in a regulated environment. It is least privilege applied at the agent layer.</p></li><li><p><strong>Credentials are runtime injection, not baked-in secrets. </strong>Retrieve credentials at server startup via your secrets manager (AWS Secrets Manager, HashiCorp Vault, Azure Key Vault) - never at image build time, never from environment variables baked into a container. The credential is never in your code, never in your image layer history, and has a documented owner and rotation policy. Rotation is handled by the secrets manager; the server picks up new credentials on the next startup cycle.</p></li><li><p><strong>Define quality before deployment using Evals. </strong>The observability focus in production must be preceded by rigorous quality assurance during development. And these ar enot unit tests -  you need systematic measurement of the LLM's ability to use the tool correctly. Advocate for comprehensive evaluation tasks grounded in complex, real-world scenarios to measure tool efficacy. For debugging, run these evaluations programmatically, instructing agents to output their reasoning steps alongside the tool invocation. This practice helps developers probe exactly why an LLM selects or struggles with specific tools, ensuring the quality of the non-deterministic contract before it is exposed to regulated production environments.</p><p></p></li></ol><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!W1VZ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd5a864ed-fc85-47a0-a5f8-219724b69b28_1814x972.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!W1VZ!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd5a864ed-fc85-47a0-a5f8-219724b69b28_1814x972.png 424w, https://substackcdn.com/image/fetch/$s_!W1VZ!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd5a864ed-fc85-47a0-a5f8-219724b69b28_1814x972.png 848w, https://substackcdn.com/image/fetch/$s_!W1VZ!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd5a864ed-fc85-47a0-a5f8-219724b69b28_1814x972.png 1272w, https://substackcdn.com/image/fetch/$s_!W1VZ!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd5a864ed-fc85-47a0-a5f8-219724b69b28_1814x972.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!W1VZ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd5a864ed-fc85-47a0-a5f8-219724b69b28_1814x972.png" width="1456" height="780" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d5a864ed-fc85-47a0-a5f8-219724b69b28_1814x972.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:780,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:871903,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.agentbuild.ai/i/198907603?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd5a864ed-fc85-47a0-a5f8-219724b69b28_1814x972.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!W1VZ!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd5a864ed-fc85-47a0-a5f8-219724b69b28_1814x972.png 424w, https://substackcdn.com/image/fetch/$s_!W1VZ!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd5a864ed-fc85-47a0-a5f8-219724b69b28_1814x972.png 848w, https://substackcdn.com/image/fetch/$s_!W1VZ!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd5a864ed-fc85-47a0-a5f8-219724b69b28_1814x972.png 1272w, https://substackcdn.com/image/fetch/$s_!W1VZ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd5a864ed-fc85-47a0-a5f8-219724b69b28_1814x972.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Sequence Diagram: MCP call Orchestration Pattern</figcaption></figure></div><p>Every tool call passes through the Guardrail Layer before it gets anywhere near the MCP server. That layer validates tokens, masks PII, fetches credentials from the Secrets Manager at runtime, and writes an immutable audit entry before execution starts. The MCP server receives a clean, credentialled, traced call. The response comes back, gets sanitised, and the trace span closes. The user gets a safe payload. Nothing touches the downstream system without a paper trail.</p><div><hr></div><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;08263cf6-6773-45b3-8a29-0a11f3ceedc0&quot;,&quot;caption&quot;:&quot;Hello everyone,&quot;,&quot;cta&quot;:null,&quot;showBylines&quot;:true,&quot;showDescription&quot;:true,&quot;showImage&quot;:true,&quot;size&quot;:&quot;sm&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;How Do You Test AI - Practical Talk on AI Evaluation Approaches&quot;,&quot;publishedBylines&quot;:[],&quot;post_date&quot;:&quot;2026-03-14T14:31:07.946Z&quot;,&quot;cover_image&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/1bba58bf-642c-4eaf-8200-d48de101715b_1280x720.png&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://newsletter.agentbuild.ai/p/how-do-you-test-ai-practical-talk&quot;,&quot;section_name&quot;:null,&quot;video_upload_id&quot;:null,&quot;id&quot;:190921146,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:5,&quot;comment_count&quot;:0,&quot;publication_id&quot;:2211527,&quot;publication_name&quot;:&quot;agentbuild.ai&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!OIBg!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3453368a-da00-4960-b174-e3313b941314_256x256.png&quot;,&quot;belowTheFold&quot;:true,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div><div><hr></div><h2>Conclusion</h2><p>MCP is the right abstraction at the right layer. The standardisation problem it solves is real, and the use cases where it works well are genuinely valuable enterprise problems. <strong>The protocol is not the issue.</strong></p><blockquote><p>What history tells us from USB to REST to microservices is that good protocols get adopted faster than the discipline to deploy them safely. </p></blockquote><p>That gap is where incidents come from. This is commonly described as the difference between a sharp knife and a blunt one: the sharp knife is more dangerous in the wrong hands, but it&#8217;s the right tool for someone who knows what they&#8217;re doing.</p><p>The teams that get this right usually ask the boring questions first - who owns this, where does the data go, what happens when this call fails at 2am - and build systems accordingly. <br><br><strong>That&#8217;s not caution. That&#8217;s just engineering.</strong></p><div><hr></div><h2>Using or building with MCP?</h2><p>Don't let speed compromise security. Apply the governance and architectural rigor outlined here to your MCP servers now. Build your Authorization and Observability Layer first to ensure your tool calls are secure, auditable, and compliant from day one.<br><br>Tell me in comments whether this resonates, what other challenges are you facing, and where did MCP do the magic for you. I am eager to learn from your experience - so please comment, leave a feedback.</p><div><hr></div><p><em>P.S. If you&#8217;re new here - <strong>welcome</strong> &#127881;. AgentBuild is a community of practitioners working through the real challenges of getting AI into production inside large organisations. Every week I share practical, grounded thinking from the people doing this work at the sharp end. The goal is never theory - it&#8217;s always: what can you use Monday morning.</em></p><div><hr></div><p><em><strong>Ask your friends to join.</strong><br><br>More valuable content coming your way.</em></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://newsletter.agentbuild.ai/?utm_source=substack&amp;utm_medium=email&amp;utm_content=share&amp;action=share&quot;,&quot;text&quot;:&quot;Share agentbuild.ai&quot;,&quot;action&quot;:null,&quot;class&quot;:&quot;button-wrapper&quot;}" data-component-name="ButtonCreateButton"><a class="button primary button-wrapper" href="https://newsletter.agentbuild.ai/?utm_source=substack&amp;utm_medium=email&amp;utm_content=share&amp;action=share"><span>Share agentbuild.ai</span></a></p><p>Thanks for reading agentbuild.ai! Subscribe for free to receive new posts and support my work.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://newsletter.agentbuild.ai/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading agentbuild.ai! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[Why Agent-to-Agent Communication Fails - How to Design for Failure]]></title><description><![CDATA[This week: Agent communication is a major problem in multi-agent syatems. What ar ethe common failure modes, how to design for them, and key lessons I have learned.]]></description><link>https://newsletter.agentbuild.ai/p/why-agent-to-agent-communication</link><guid isPermaLink="false">https://newsletter.agentbuild.ai/p/why-agent-to-agent-communication</guid><dc:creator><![CDATA[Sandipan Bhaumik]]></dc:creator><pubDate>Sat, 16 May 2026 13:02:45 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/2aeaa59e-f607-44e2-8c59-c89e34f20306_1920x1080.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Hey everyone,<br><br>If you&#8217;ve been building AI applications recently, you&#8217;ve likely noticed a massive architectural shift. We are moving away from monolithic, &#8220;do-everything&#8221; prompts and toward multi-agent systems. It&#8217;s an elegant idea: instead of one massive language model struggling to write code, test it, and document it simultaneously, you spin up specialized agents - a Coder, a Tester, and a Writer - and have them collaborate.</p><p>But as teams push these systems into production, they are hitting a wall. Having five smart agents does not automatically equal one smart system.</p><p>The industry is quickly learning a hard lesson: <strong>orchestration is an understood problem, and communication reliability is the actual bottleneck .</strong> You can easily instantiate ten agents using frameworks like <a href="https://www.langchain.com/langgraph">LangGraph</a>, <a href="https://microsoft.github.io/autogen/">AutoGen</a>, or <a href="https://www.crewai.com/">CrewAI</a>. But getting them to talk to each other reliably without hallucinating payloads, dropping context, or getting stuck in infinite loops is where the real engineering happens.</p><p>Let&#8217;s break down the first principles of agent communication, look at how they fail in production, and explore how modern production systems are solving these exact problems.</p><div><hr></div><h3>The First Principles of Agent Communication </h3><p>At its core, getting agents to collaborate requires the same fundamentals as distributed computing, but with a chaotic twist: the &#8220;nodes&#8221; in this network are <strong>non-deterministic text engines</strong>.<br></p><h4>1. Message Passing</h4><p><strong>Principle:</strong> Message passing is the transfer of information from one node to another. When Agent A finishes its job, it must hand off a payload to Agent B to trigger the next step.</p><ul><li><p><strong>Failure Mode:</strong> The Hallucinated Payload.</p></li><li><p><strong>Example:</strong> A Data Extraction Agent is told to pull a user&#8217;s ID and pass it to a Database Agent. Instead of passing <code>12345</code>, the agent passes, <em>&#8220;Here is the user ID you requested: 12345.&#8221;</em> The Database Agent expects an integer, receives a conversational string, and crashes.</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!4zk2!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffb65a643-4086-4917-a5f5-6f235b4cf274_1118x334.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!4zk2!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffb65a643-4086-4917-a5f5-6f235b4cf274_1118x334.png 424w, https://substackcdn.com/image/fetch/$s_!4zk2!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffb65a643-4086-4917-a5f5-6f235b4cf274_1118x334.png 848w, https://substackcdn.com/image/fetch/$s_!4zk2!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffb65a643-4086-4917-a5f5-6f235b4cf274_1118x334.png 1272w, https://substackcdn.com/image/fetch/$s_!4zk2!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffb65a643-4086-4917-a5f5-6f235b4cf274_1118x334.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!4zk2!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffb65a643-4086-4917-a5f5-6f235b4cf274_1118x334.png" width="462" height="138.02146690518782" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/fb65a643-4086-4917-a5f5-6f235b4cf274_1118x334.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:334,&quot;width&quot;:1118,&quot;resizeWidth&quot;:462,&quot;bytes&quot;:50133,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.agentbuild.ai/i/197971759?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F37a08a44-d9c8-43a7-beca-849f019149df_1118x334.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!4zk2!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffb65a643-4086-4917-a5f5-6f235b4cf274_1118x334.png 424w, https://substackcdn.com/image/fetch/$s_!4zk2!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffb65a643-4086-4917-a5f5-6f235b4cf274_1118x334.png 848w, https://substackcdn.com/image/fetch/$s_!4zk2!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffb65a643-4086-4917-a5f5-6f235b4cf274_1118x334.png 1272w, https://substackcdn.com/image/fetch/$s_!4zk2!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffb65a643-4086-4917-a5f5-6f235b4cf274_1118x334.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a><figcaption class="image-caption">Agents communicating in natural langugage are prone to errors</figcaption></figure></div></li><li><p><strong>Why it happens:</strong> LLMs are fine-tuned to be helpful conversationalists, not strict state machines. Without hard constraints, they inject pleasantries and markdown formatting into their outputs, corrupting the message payload.</p></li></ul><p></p><h4>2. Protocols and Interfaces - H2A, A2C, A2A</h4><p><strong>Principle:</strong> Protocols dictate the rules of engagement. In modern agentic systems, we categorize these interfaces into three distinct buckets:</p><ul><li><p><strong>H2A (Human-to-Agent):</strong> Conversational, unstructured, and forgiving (e.g., ChatGPT).</p></li><li><p><strong>A2C (Agent-to-Computer):</strong> Rigid and deterministic. The industry standard here is the <strong><a href="https://en.wikipedia.org/wiki/Model_Context_Protocol">Model Context Protocol (MCP)</a></strong>. Introduced by Anthropic in late 2024 and now hosted under the Linux Foundation, MCP standardizes how agents securely connect to external tools, IDEs, and databases.</p></li><li><p><strong>A2A (Agent-to-Agent):</strong> Peer-to-peer communication between two non-deterministic models. This is historically the most fragmented layer, but the industry recently coalesced around the <strong><a href="https://a2a-protocol.org/">Agent-to-Agent (A2A) Protocol</a></strong>. Originally released by Google in April 2025 and unified with IBM&#8217;s <a href="https://research.ibm.com/projects/agent-communication-protocol">Agent Communication Protocol (ACP)</a>, A2A is now a Linux Foundation open standard for cross-framework agent discovery and task delegation over HTTP and JSON-RPC.</p></li><li><p><strong>Failure Mode:</strong> Interface Confusion.</p></li><li><p><strong>Example:</strong> A developer uses MCP to perfectly connect a Research Agent to a PostgreSQL database (A2C). But when the Research Agent hands the data to a Writer Agent (A2A), the developer lets them communicate in conversational English. The Writer Agent misinterprets the unstructured text and hallucinates missing facts.</p></li><li><p><strong>Why it happens:</strong> Developers often treat A2A communication like H2A communication. Unless you enforce machine-readable protocols for peer-to-agent handoffs, conversational drift will inevitably break your architecture.</p><p></p></li></ul><h4>3. Shared Context and State</h4><p><strong>Principle:</strong> Agents need a shared understanding of the environment, current progress, and available data to collaborate effectively.</p><ul><li><p><strong>Failure Mode:</strong> Context Desynchronization.</p></li><li><p><strong>Real-World Example:</strong> A Researcher Agent analyzes a 50-page PDF and passes a brief outline to a Writer Agent. The Writer tries to draft the article but fabricates details because it lacks access to the source material.</p></li><li><p><strong>Why it happens:</strong> Context windows are expensive. To save tokens and latency, builders often restrict the context passed downstream. This creates asymmetric information - Agent A knows something Agent B doesn&#8217;t, leading to poor decisions.</p><p></p></li></ul><h4>4. Intent Alignment</h4><p><strong>Principle:</strong> Every agent in the chain must understand the overarching goal of the user, not just its localized sub-task, to ensure the final output is cohesive.</p><ul><li><p><strong>Failure Mode:</strong> The Telephone Game.</p></li><li><p><strong>Example:</strong> A user asks for a &#8220;brief, humorous summary of the latest AI news.&#8221; The Manager Agent passes the news to the Summarizer Agent but forgets to include the &#8220;humorous&#8221; instruction. The Summarizer writes a dry academic brief.</p></li><li><p><strong>Why it happens:</strong> Information decays across hops. When breaking a complex prompt into smaller agentic tasks, the nuance of the original user prompt is easily lost in translation.<br></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!7Wex!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7d90dcb4-e204-4f0d-9ced-08bb6055f20a_1202x1348.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!7Wex!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7d90dcb4-e204-4f0d-9ced-08bb6055f20a_1202x1348.png 424w, https://substackcdn.com/image/fetch/$s_!7Wex!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7d90dcb4-e204-4f0d-9ced-08bb6055f20a_1202x1348.png 848w, https://substackcdn.com/image/fetch/$s_!7Wex!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7d90dcb4-e204-4f0d-9ced-08bb6055f20a_1202x1348.png 1272w, https://substackcdn.com/image/fetch/$s_!7Wex!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7d90dcb4-e204-4f0d-9ced-08bb6055f20a_1202x1348.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!7Wex!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7d90dcb4-e204-4f0d-9ced-08bb6055f20a_1202x1348.png" width="282" height="316.25291181364395" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/7d90dcb4-e204-4f0d-9ced-08bb6055f20a_1202x1348.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1348,&quot;width&quot;:1202,&quot;resizeWidth&quot;:282,&quot;bytes&quot;:322038,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.agentbuild.ai/i/197971759?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7d90dcb4-e204-4f0d-9ced-08bb6055f20a_1202x1348.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!7Wex!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7d90dcb4-e204-4f0d-9ced-08bb6055f20a_1202x1348.png 424w, https://substackcdn.com/image/fetch/$s_!7Wex!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7d90dcb4-e204-4f0d-9ced-08bb6055f20a_1202x1348.png 848w, https://substackcdn.com/image/fetch/$s_!7Wex!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7d90dcb4-e204-4f0d-9ced-08bb6055f20a_1202x1348.png 1272w, https://substackcdn.com/image/fetch/$s_!7Wex!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7d90dcb4-e204-4f0d-9ced-08bb6055f20a_1202x1348.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Reference read: https://arxiv.org/pdf/2407.04503</figcaption></figure></div></li></ul><h4>5. Memory and Feedback Loops</h4><p><strong>Principle:</strong> When Agent B rejects Agent A&#8217;s work, Agent A needs memory of the failure and the ability to correct itself without repeating the exact same mistake.</p><ul><li><p><strong>Failure Mode:</strong> The Infinite Death Spiral.</p></li><li><p><strong>Example:</strong> A Coding Agent writes a Python script. The Execution Agent runs it, encounters a <code>SyntaxError</code>, and passes the error back. The Coder apologizes, generates the <em>exact same code</em>, and sends it back. They repeat this loop 50 times until the API budget is drained.</p></li><li><p><strong>Why it happens:</strong> LLMs are highly sensitive to their immediate context. If the feedback isn&#8217;t explicit, or if the model&#8217;s internal weights heavily favor a flawed syntax pattern, it will deterministically generate the same wrong answer.</p></li></ul><div><hr></div><h3>How To Solve This - Patterns That Are Working</h3><p>Building reliable multi-agent systems requires shifting your mindset from &#8220;prompt engineering&#8221; to &#8220;protocol engineering.&#8221; Here is how top-tier engineering teams are building resilience into their agent networks.</p><h4>1. Enforce Structured Outputs (a.k.a Strict Schemas)</h4><p>Never let agents talk to each other in free-text prose if they are exchanging data. Treat agent communication exactly like an API. Use tools like OpenAI&#8217;s <a href="https://platform.openai.com/docs/guides/structured-outputs">Structured Outputs</a>, <a href="https://pydantic.dev/docs/validation/latest/get-started/">Pydantic</a>, or standard Python libraries like <code>instructor</code> to enforce JSON schemas. If Agent A needs to pass an ID to Agent B, structurally guarantee that the output is <em>only</em> a valid JSON object matching your exact schema.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!XcFw!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fab90ea2b-4e01-4c49-afaa-cf811f71e138_1434x744.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!XcFw!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fab90ea2b-4e01-4c49-afaa-cf811f71e138_1434x744.png 424w, https://substackcdn.com/image/fetch/$s_!XcFw!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fab90ea2b-4e01-4c49-afaa-cf811f71e138_1434x744.png 848w, https://substackcdn.com/image/fetch/$s_!XcFw!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fab90ea2b-4e01-4c49-afaa-cf811f71e138_1434x744.png 1272w, https://substackcdn.com/image/fetch/$s_!XcFw!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fab90ea2b-4e01-4c49-afaa-cf811f71e138_1434x744.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!XcFw!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fab90ea2b-4e01-4c49-afaa-cf811f71e138_1434x744.png" width="494" height="256.30125523012555" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/ab90ea2b-4e01-4c49-afaa-cf811f71e138_1434x744.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:744,&quot;width&quot;:1434,&quot;resizeWidth&quot;:494,&quot;bytes&quot;:111632,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.agentbuild.ai/i/197971759?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fea39515a-ab4b-4726-8a70-bbe2b6377e96_1434x744.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!XcFw!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fab90ea2b-4e01-4c49-afaa-cf811f71e138_1434x744.png 424w, https://substackcdn.com/image/fetch/$s_!XcFw!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fab90ea2b-4e01-4c49-afaa-cf811f71e138_1434x744.png 848w, https://substackcdn.com/image/fetch/$s_!XcFw!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fab90ea2b-4e01-4c49-afaa-cf811f71e138_1434x744.png 1272w, https://substackcdn.com/image/fetch/$s_!XcFw!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fab90ea2b-4e01-4c49-afaa-cf811f71e138_1434x744.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Example - Agents communicate in structured schemas</figcaption></figure></div><h4>2. Adopt Standardized Protocols</h4><p>Stop reinventing the wheel for tool use and communication. Implement <strong>MCP</strong> for all A2C interactions to securely connect your agents to external systems. For A2A interactions, adopt the <strong>A2A Protocol</strong> to standardize payload structures instead of injecting variables into conversational prompt templates. Leveraging these standards ensures your agents can operate reliably across different platforms and enterprise environments.</p><h4>3. Centralize State (The &#8220;Blackboard&#8221; Pattern)</h4><p>Instead of passing massive context back and forth between agents like a hot potato, use a centralized state mechanism. In graph-based frameworks like LangGraph, all agents read from and write to a single, shared state object or &#8220;blackboard.&#8221; This ensures no agent is operating on outdated or asymmetric information.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Q2pv!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2c0bdc79-6c9c-4039-9a11-37fea2385cd4_640x878.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Q2pv!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2c0bdc79-6c9c-4039-9a11-37fea2385cd4_640x878.png 424w, https://substackcdn.com/image/fetch/$s_!Q2pv!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2c0bdc79-6c9c-4039-9a11-37fea2385cd4_640x878.png 848w, https://substackcdn.com/image/fetch/$s_!Q2pv!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2c0bdc79-6c9c-4039-9a11-37fea2385cd4_640x878.png 1272w, https://substackcdn.com/image/fetch/$s_!Q2pv!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2c0bdc79-6c9c-4039-9a11-37fea2385cd4_640x878.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Q2pv!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2c0bdc79-6c9c-4039-9a11-37fea2385cd4_640x878.png" width="202" height="277.11875" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/2c0bdc79-6c9c-4039-9a11-37fea2385cd4_640x878.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:878,&quot;width&quot;:640,&quot;resizeWidth&quot;:202,&quot;bytes&quot;:63160,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.agentbuild.ai/i/197971759?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd2389899-1eb9-45cf-b9eb-4180639bd055_640x878.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Q2pv!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2c0bdc79-6c9c-4039-9a11-37fea2385cd4_640x878.png 424w, https://substackcdn.com/image/fetch/$s_!Q2pv!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2c0bdc79-6c9c-4039-9a11-37fea2385cd4_640x878.png 848w, https://substackcdn.com/image/fetch/$s_!Q2pv!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2c0bdc79-6c9c-4039-9a11-37fea2385cd4_640x878.png 1272w, https://substackcdn.com/image/fetch/$s_!Q2pv!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2c0bdc79-6c9c-4039-9a11-37fea2385cd4_640x878.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Blackboard pattern for Agent co-ordination</figcaption></figure></div><h4>4. Implement Observability and Tracing</h4><p>When a multi-agent system fails, it fails silently and weirdly. You cannot debug these systems with standard print statements. You need dedicated LLM observability platforms like <a href="https://smith.langchain.com/">LangSmith</a>, <a href="https://langfuse.com/docs">Langfuse</a>, or <a href="https://mlflow.org/releases/3/">MLflow 3.0</a> to trace the exact input, output, and execution path of every single node. If the Telephone Game happens, you need to see exactly which agent dropped the context.</p><h4>5. Defensive Programming: Retries and HITL</h4><p>Expect agents to fail. Wrap inter-agent communication in standard retry logic with programmatic guardrails (like <a href="https://github.com/NVIDIA-NeMo/Guardrails">NVIDIA&#8217;s NeMo Guardrails</a>). If Agent A sends malformed data, catch the error programmatically and format it into a rigid prompt to force a correction. For critical workflows like writing to a production database (or deleting production databases) - enforce a Human-in-the-Loop (HITL) pause. Let the system wait for human approval before executing destructive actions.</p><div><hr></div><h3>Lessons Learnt</h3><ul><li><p><strong>Treat agents like microservices:</strong> A2A communication should mirror microservice architecture. Define rigid, structured API contracts for every handoff and validate payloads before they reach the next node.</p></li><li><p><strong>Embrace MCP and emerging standards:</strong> Separate your A2C (tool use) from your A2A (agent coordination). Use standardized open-source protocols like MCP and A2A to offload the complexity of system integrations so your agents can focus on logic.</p></li><li><p><strong>Cap your feedback loops:</strong> Always implement hard limits on iterative loops. If agents go back-and-forth more than three times without success, throw an exception and escalate to a human or a deterministic fallback script.</p></li><li><p><strong>Persist the global goal:</strong> Inject the original user intent into the system prompt of <em>every</em> agent in the pipeline. Do not assume intent will survive passing through three different LLM nodes.</p></li></ul><p>Multi-agent architectures are the future of complex AI applications, but they require rigorous distributed systems engineering. By enforcing strict protocols and shared state, you can stop the silent failures and build agentic systems that actually work in the real world.</p><div><hr></div><h3>See It In Action</h3><p>I&#8217;ve talked about the common patterns and failure modes of multi-agent orchestration and how to design for them in this video. Have a look.<br></p><div id="youtube2-2czYyrTzILg" class="youtube-wrap" data-attrs="{&quot;videoId&quot;:&quot;2czYyrTzILg&quot;,&quot;startTime&quot;:null,&quot;endTime&quot;:null}" data-component-name="Youtube2ToDOM"><div class="youtube-inner"><iframe src="https://www.youtube-nocookie.com/embed/2czYyrTzILg?rel=0&amp;autoplay=0&amp;showinfo=0&amp;enablejsapi=0" frameborder="0" loading="lazy" gesture="media" allow="autoplay; fullscreen" allowautoplay="true" allowfullscreen="true" width="728" height="409"></iframe></div></div><p><br>As always your comments and feedback are welcome. Please share your experience and thoughts. <br><br>Thanks,<br>Sandi.</p><div><hr></div><p><em>P.S. If you&#8217;re new here - <strong>welcome</strong> &#127881;. AgentBuild is a community of practitioners working through the real challenges of getting AI into production inside large organisations. Every week I share practical, grounded thinking from the people doing this work at the sharp end. The goal is never theory - it&#8217;s always: what can you use Monday morning.</em></p><div><hr></div><p><em><strong>Ask your friends to join.</strong><br><br>More valuable content coming your way.</em></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://newsletter.agentbuild.ai/?utm_source=substack&amp;utm_medium=email&amp;utm_content=share&amp;action=share&quot;,&quot;text&quot;:&quot;Share agentbuild.ai&quot;,&quot;action&quot;:null,&quot;class&quot;:&quot;button-wrapper&quot;}" data-component-name="ButtonCreateButton"><a class="button primary button-wrapper" href="https://newsletter.agentbuild.ai/?utm_source=substack&amp;utm_medium=email&amp;utm_content=share&amp;action=share"><span>Share agentbuild.ai</span></a></p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://newsletter.agentbuild.ai/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading agentbuild.ai! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[Beyond the RAG Pipeline: 3 Unspoken Truths About AI in Production]]></title><description><![CDATA[This week: The industry is building skyscrapers on top of a swamp of probability. Here is how world-class engineering teams are actually hardening their systems.]]></description><link>https://newsletter.agentbuild.ai/p/beyond-the-rag-pipeline-3-unspoken</link><guid isPermaLink="false">https://newsletter.agentbuild.ai/p/beyond-the-rag-pipeline-3-unspoken</guid><dc:creator><![CDATA[Sandipan Bhaumik]]></dc:creator><pubDate>Sun, 10 May 2026 09:01:16 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!4pnh!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fab240f26-bf6f-4df0-bb12-bd6dde4b8a6c_4928x3264.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!4pnh!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fab240f26-bf6f-4df0-bb12-bd6dde4b8a6c_4928x3264.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!4pnh!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fab240f26-bf6f-4df0-bb12-bd6dde4b8a6c_4928x3264.jpeg 424w, https://substackcdn.com/image/fetch/$s_!4pnh!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fab240f26-bf6f-4df0-bb12-bd6dde4b8a6c_4928x3264.jpeg 848w, https://substackcdn.com/image/fetch/$s_!4pnh!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fab240f26-bf6f-4df0-bb12-bd6dde4b8a6c_4928x3264.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!4pnh!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fab240f26-bf6f-4df0-bb12-bd6dde4b8a6c_4928x3264.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!4pnh!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fab240f26-bf6f-4df0-bb12-bd6dde4b8a6c_4928x3264.jpeg" width="1456" height="964" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/ab240f26-bf6f-4df0-bb12-bd6dde4b8a6c_4928x3264.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:964,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:470387,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://newsletter.agentbuild.ai/i/196997144?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fab240f26-bf6f-4df0-bb12-bd6dde4b8a6c_4928x3264.jpeg&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!4pnh!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fab240f26-bf6f-4df0-bb12-bd6dde4b8a6c_4928x3264.jpeg 424w, https://substackcdn.com/image/fetch/$s_!4pnh!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fab240f26-bf6f-4df0-bb12-bd6dde4b8a6c_4928x3264.jpeg 848w, https://substackcdn.com/image/fetch/$s_!4pnh!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fab240f26-bf6f-4df0-bb12-bd6dde4b8a6c_4928x3264.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!4pnh!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fab240f26-bf6f-4df0-bb12-bd6dde4b8a6c_4928x3264.jpeg 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"><a href="https://www.pexels.com/photo/three-wooden-human-like-figurines-sitting-on-the-edge-of-a-wooden-box-14606305/">Photo by Marco Bianchetti</a></figcaption></figure></div><p>If you are reading yet another think-piece on &#8220;scaling autonomous agents&#8221; or &#8220;optimizing your basic RAG pipeline,&#8221; you are observing the trailing edge of the industry. We all know the standard playbook by now: deploy an LLM-as-a-judge, set up a vector database, and run basic semantic search. That is no longer a competitive advantage; it is table stakes.</p><p>To survive in production at scale today, engineering teams must stop treating generative models like brilliant, autonomous colleagues and start treating them like chaotic, highly expensive engine components.</p><p>Here are the three architectural blind spots that standard DevOps playbooks are ignoring - and how to fix them.</p><div><hr></div><h3>1. Stop Building Agents. Build State Machines.</h3><p>The current industry obsession is giving LLMs autonomy - letting them chain tools, determine their own loops, and &#8220;think&#8221; their way out of problems. However, in an enterprise production environment, autonomy is just another word for liability.</p><p>You do not want an autonomous agent; you want a rigid, locked-down <strong>Finite State Machine (FSM)</strong>.</p><blockquote><p>Your software architecture should entirely dictate the exact path, the boundaries, and the execution graph. The LLM should <em>only</em> be utilized for the transition logic. Its sole job is to ingest messy, unstructured user input and output a deterministic decision: &#8220;Do we transition to State A or State B?&#8221;</p></blockquote><p>By stripping the model of its agency and restricting it to routing and classification, your latency drops, your reliability scales, and crucially, your system becomes highly debuggable when an edge case inevitably breaks the flow.</p><div><hr></div><h3>2. Eradicate the &#8220;Politeness Tax&#8221;</h3><p>If you audit your raw token logs, you will likely find that you are paying thousands of dollars a month - and sacrificing hundreds of milliseconds of latency per request - just to let your model clear its throat.</p><p>Every time a background model outputs, <em>&#8220;Certainly! I&#8217;d be happy to extract that data for you. Here is the requested JSON:&#8221;</em>, you are burning compute. At scale, politeness is an engineering flaw.</p><p>You cannot fix this with prompt engineering alone. You must enforce <strong>strict grammar constraints at the inference level</strong>. Do not politely ask the model to return JSON in the system prompt; force the API to accept <code>{</code> as the absolute only valid first token. Strip out all conversational abilities from your background processing models.</p><blockquote><p><strong>You do not need a polite assistant in your backend data pipeline; you need a ruthless text calculator.</strong></p></blockquote><div><hr></div><h3>3. Neutralize &#8220;Zombie Memory&#8221; in Semantic Caching</h3><p>Semantic caching is universally recommended to reduce API costs. A user asks a question, you embed it, check if you have answered a mathematically similar query recently, and return the cached answer.</p><p>What nobody discusses is <strong>semantic cache rot</strong>. If you are caching answers about dynamic data, like your pricing tiers, live inventory, or active user permissions, the underlying reality will eventually change, but your vector cache remains static. When this happens, the cache intercepts the query and serves up a perfectly formatted, highly confident answer that is now entirely false. Your system isn&#8217;t hallucinating; it is remembering a dead reality.</p><p>To solve this, a simple Time-to-Live (TTL) expiration is insufficient. You must bind your vector cache invalidation directly to your database webhooks. If a product goes out of stock in your primary database, your system must aggressively and automatically flush the neighborhood of vectors in your cache that map to that specific product&#8217;s metadata.</p><div><hr></div><h3>The Takeaway</h3><p>Moving AI from a compelling local demo to a hardened production environment requires a fundamental shift in engineering mindset. It is not about finding the perfect prompt or chasing the newest foundational model.</p><p>The best AI engineers do not try to find perfect model outputs . They build perfect architectural nets to catch the model when it inevitably behaves unpredictably.</p><div><hr></div><h3>What is your production AI blind spot?</h3><p>We are all writing the playbook for production AI in real-time, and the best lessons come from the trenches, not the demo environments. Hit reply and tell me about the weirdest silent failure mode you have caught in production recently, the one that no standard DevOps tool saw coming. </p><p>If this issue helped you rethink your architecture, do me a favor: forward it to the engineer on your team who is currently trying to solve a systems problem with another paragraph of prompt engineering.<br><br>Talk soon,<br>Sandi.</p><div><hr></div><p><em>P.S. If you&#8217;re new here - <strong>welcome</strong> &#127881;. AgentBuild is a community of practitioners working through the real challenges of getting AI into production inside large organisations. Every week I share practical, grounded thinking from the people doing this work at the sharp end. The goal is never theory - it&#8217;s always: what can you use Monday morning.</em></p><div><hr></div><p><em><strong>Ask your friends to join.</strong><br><br>More valuable content coming your way.</em></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://newsletter.agentbuild.ai/?utm_source=substack&amp;utm_medium=email&amp;utm_content=share&amp;action=share&quot;,&quot;text&quot;:&quot;Share agentbuild.ai&quot;,&quot;action&quot;:null,&quot;class&quot;:&quot;button-wrapper&quot;}" data-component-name="ButtonCreateButton"><a class="button primary button-wrapper" href="https://newsletter.agentbuild.ai/?utm_source=substack&amp;utm_medium=email&amp;utm_content=share&amp;action=share"><span>Share agentbuild.ai</span></a></p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://newsletter.agentbuild.ai/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading agentbuild.ai! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[The Evaluation Graph: Why Your AI Pipelines Are Lying to You]]></title><description><![CDATA[This week: the Evaluation Graph - and why the shape of your eval matters more than the score. Your system is a graph. Your evaluations are pipelines. That gap is where production failures live.]]></description><link>https://newsletter.agentbuild.ai/p/the-evaluation-graph-why-your-ai</link><guid isPermaLink="false">https://newsletter.agentbuild.ai/p/the-evaluation-graph-why-your-ai</guid><dc:creator><![CDATA[Sandipan Bhaumik]]></dc:creator><pubDate>Sat, 02 May 2026 13:31:15 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/980b3675-19d5-4893-85c7-521bc9ff584a_1280x720.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Here is a pattern I have seen more times than I can count.</p><p>A team deploys an AI system into production. It passes every evaluation they ran. Accuracy looked good. The stakeholder demo went well. The pilot was declared a success. Three months later, the system is quietly shelved because the outputs no longer make sense - or worse, they never did, and nobody caught it until real users started complaining.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://newsletter.agentbuild.ai/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading agentbuild.ai! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p>When I dig into what went wrong,  I often find the shape of the evaluation resulting in low quality agentic decisions.</p><blockquote><p>The teams running linear eval pipelines - input goes in, score comes out - are measuring <em>a snapshot of a moment</em>. They are not measuring how their system behaves as context shifts, as data drifts, as agents hand off to other agents, as the real world does what the real world always does. They are measuring a straight line. Their system is a graph.</p></blockquote><p>That mismatch is why so many AI evaluations feel thorough and turn out to be worthless.</p><div><hr></div><h2>Pipelines vs Graphs</h2><p>The word &#8216;pipeline&#8217; is everywhere in AI engineering. Data pipelines, inference pipelines, eval pipelines. We&#8217;ve adopted it as the default mental model for how AI systems work.</p><p>And for a lot of data engineering, it&#8217;s correct. Data flows in one direction. You extract, you transform, you load. A pipeline is a clean metaphor because data really does flow like water through a pipe.</p><blockquote><p>But AI systems in production - especially multi-agent systems, RAG architectures, and anything that has to maintain context across multiple turns or tool calls - do not behave like pipelines. They behave like graphs. There are loops. There are conditional branches. There are nodes that depend on the state of other nodes that were resolved two steps earlier. Context that was established at step one can poison or distort the output at step seven.</p></blockquote><p>When you evaluate a graph as if it were a pipeline, you get a false sense of confidence. You test the happy path. You test the input-output pair. You miss the edges. You miss the feedback loops. You miss the context that has been accumulating and silently corrupting your system&#8217;s reasoning.</p><p>I&#8217;ve started calling this <strong>context drift</strong> - the phenomenon where a system&#8217;s outputs because the context it&#8217;s operating in has shifted in ways your evaluations weren&#8217;t designed to detect. A pipeline eval can&#8217;t catch context drift. Only a graph-shaped evaluation can.</p><div><hr></div><h2>What is an Evaluation Graph?</h2><p>The Evaluation Graph is not a tool or a framework you install. It&#8217;s a mental model - a different way of thinking about what you&#8217;re actually evaluating and when.</p><p>In a pipeline eval, you define a set of test cases, run your system against them, and score the outputs. Done. Repeatable. Clean.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!giWY!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F042d535b-3fc1-4947-b044-eb32d3129986_2464x1330.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!giWY!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F042d535b-3fc1-4947-b044-eb32d3129986_2464x1330.png 424w, https://substackcdn.com/image/fetch/$s_!giWY!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F042d535b-3fc1-4947-b044-eb32d3129986_2464x1330.png 848w, https://substackcdn.com/image/fetch/$s_!giWY!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F042d535b-3fc1-4947-b044-eb32d3129986_2464x1330.png 1272w, https://substackcdn.com/image/fetch/$s_!giWY!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F042d535b-3fc1-4947-b044-eb32d3129986_2464x1330.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!giWY!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F042d535b-3fc1-4947-b044-eb32d3129986_2464x1330.png" width="1456" height="786" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/042d535b-3fc1-4947-b044-eb32d3129986_2464x1330.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:786,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:4300298,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.agentbuild.ai/i/196205889?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F042d535b-3fc1-4947-b044-eb32d3129986_2464x1330.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!giWY!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F042d535b-3fc1-4947-b044-eb32d3129986_2464x1330.png 424w, https://substackcdn.com/image/fetch/$s_!giWY!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F042d535b-3fc1-4947-b044-eb32d3129986_2464x1330.png 848w, https://substackcdn.com/image/fetch/$s_!giWY!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F042d535b-3fc1-4947-b044-eb32d3129986_2464x1330.png 1272w, https://substackcdn.com/image/fetch/$s_!giWY!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F042d535b-3fc1-4947-b044-eb32d3129986_2464x1330.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Evaluation Graph Concept - Generated by Author</figcaption></figure></div><p>In an evaluation graph, you map out the nodes of your system - the points where decisions are made, where context is retrieved, where agents hand off to each other, where state is read or written - and you evaluate at each node, not just at the final output.</p><div><hr></div><h2>Here is what that changes in practice.</h2><p><strong>First</strong>, <strong>you gain localised failure detection.</strong> When a pipeline eval fails, you know something went wrong. You don&#8217;t know where. When a graph eval fails, you know exactly which node broke down - was it the retrieval? The reranker? The summarisation step? The router that decided which agent to call? You can fix what&#8217;s actually broken instead of rerunning the whole system hoping for different results.</p><p><strong>Second</strong>, <strong>you can evaluate context propagation. </strong>I have seen this skipped many times. It&#8217;s not enough to evaluate whether each node produces a good output given its input. You need to evaluate whether the context being passed between nodes is coherent, relevant, and not accumulating noise. I&#8217;ve seen systems where individual components all scored above 90% in isolation, but the system as a whole produced nonsense because each node was passing slightly degraded context to the next one. No pipeline eval would catch that.</p><p><strong>Third, you can evaluate decision boundaries.</strong> Multi-agent systems have routing logic - conditions that decide which agent runs next, or whether to escalate, or whether to call a tool. These decision boundaries are often the most fragile part of a production AI system, and they&#8217;re almost never tested explicitly. In an evaluation graph, they are nodes. They get evaluated just like everything else.</p><div><hr></div><h2>How to Build One</h2><p>Starting with the evaluation graph doesn&#8217;t require you to throw away your existing evals. It requires you to extend them in a specific direction.</p><p><strong>The first step is decomposition. </strong>Draw out your system - literally, on a whiteboard or in a diagram - and identify every point where a meaningful decision is made or meaningful state changes. Each of those points is a node. Each connection between nodes is an edge. What you&#8217;re drawing is the evaluation graph. Most teams are surprised by how many nodes they find that they&#8217;ve never evaluated.</p><p><strong>The second step is context mapping.</strong> For each edge in the graph, define what context is being passed from one node to the next. What does the downstream node need to function correctly? What could the upstream node pass that would corrupt the downstream output? These become your edge-level test cases - not just input-output pairs, but context-propagation scenarios.</p><p><strong>The third step is failure mode enumeration.</strong> For each node, ask: what does this node look like when it&#8217;s failing quietly? Not failing loudly - that&#8217;s easy to catch. Quiet failures are the dangerous ones. A retrieval node that returns plausible but wrong documents. A router that sends requests to the wrong agent 15% of the time. A summarisation step that subtly omits the most important information. These failure modes need to be in your evaluation suite explicitly. If they&#8217;re not, you won&#8217;t find them until a user does.</p><p><strong>The fourth step</strong>, and this is where graph-shaped evaluation really separates from pipeline evaluation is <strong>composing your node-level evals into end-to-end scenarios </strong>that test the interaction effects. Not just &#8216;does node A work&#8217; and &#8216;does node B work&#8217;, but &#8216;when node A produces this class of output, does node B degrade in a predictable way&#8217;. The interactions between nodes are often where production AI systems fail.</p><div><hr></div><h2>This Needs a Shift in Mindset</h2><p>The teams that build AI systems that hold up in production are building the most rigorous evaluation infrastructure. And rigorous evaluation infrastructure starts with a simple question: <strong>is my evaluation shaped like my system?</strong></p><p>If your system is a graph and your evaluations are pipelines, you have a gap. That gap is where production failures live.</p><p>The evaluation graph is not a perfect solution - no evaluation framework is. Context still drifts in ways you won&#8217;t anticipate. Failure modes you didn&#8217;t enumerate will still appear. But it gets you structurally closer to what&#8217;s actually happening in your system, and that&#8217;s the difference between catching problems in staging and catching them after a customer has seen them.</p><p>This is one of the core concepts I&#8217;m currently working on. If it resonates with what you&#8217;re seeing in your own work, I&#8217;d genuinely like to hear about it. Hit reply and tell me what you see. The patterns you share inform what I write next.</p><p>Talk soon,<br>Sandi</p><div><hr></div><p>&#128073; I wrote more about the Eval Graphs in my article on Atlan&#8217;s community substack.</p><div class="embedded-post-wrap" data-attrs="{&quot;id&quot;:193583020,&quot;url&quot;:&quot;https://metadataweekly.substack.com/p/context-graphs-as-ai-evaluation-infrastructure&quot;,&quot;publication_id&quot;:585908,&quot;publication_name&quot;:&quot;Context &amp; Chaos&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!q3WY!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5d54bd2c-07b0-430f-9c05-9c349d9bf3d0_300x300.png&quot;,&quot;title&quot;:&quot;Context Graphs as AI Evaluation Infrastructure&quot;,&quot;truncated_body_text&quot;:&quot;About the Author: Sandipan Bhaumik have spent almost 2 decades building Data &amp; AI foundations. Now, through AgentBuild Weekly, he shares how builders and founders can move beyond AI hype to create Agentic systems that think, adapt, and truly work.&quot;,&quot;date&quot;:&quot;2026-04-09T14:05:59.147Z&quot;,&quot;like_count&quot;:11,&quot;comment_count&quot;:0,&quot;bylines&quot;:[{&quot;id&quot;:193058051,&quot;name&quot;:&quot;Sandipan Bhaumik&quot;,&quot;handle&quot;:&quot;sanbhaumik&quot;,&quot;previous_name&quot;:&quot;Sandi Bhaumik&quot;,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/651c04c2-d92e-4a2e-905f-a59346e3e950_1024x1024.png&quot;,&quot;bio&quot;:&quot;I&#8217;ve spent almost 2 decades building Data &amp; AI foundations. Now, through AgentBuild Weekly, I share how builders and founders can move beyond AI hype to create Agentic systems that think, adapt, and truly work.&quot;,&quot;profile_set_up_at&quot;:&quot;2023-12-29T14:48:55.893Z&quot;,&quot;reader_installed_at&quot;:&quot;2024-02-15T19:29:15.030Z&quot;,&quot;is_guest&quot;:true,&quot;bestseller_tier&quot;:null,&quot;status&quot;:{&quot;bestsellerTier&quot;:null,&quot;subscriberTier&quot;:null,&quot;leaderboard&quot;:null,&quot;vip&quot;:false,&quot;badge&quot;:null,&quot;paidPublicationIds&quot;:[],&quot;subscriber&quot;:null},&quot;primaryPublicationId&quot;:2211527,&quot;primaryPublicationName&quot;:&quot;agentbuild.ai&quot;,&quot;primaryPublicationUrl&quot;:&quot;https://newsletter.agentbuild.ai&quot;,&quot;primaryPublicationSubscribeUrl&quot;:&quot;https://newsletter.agentbuild.ai/subscribe?&quot;}],&quot;utm_campaign&quot;:null,&quot;belowTheFold&quot;:true,&quot;type&quot;:&quot;newsletter&quot;,&quot;language&quot;:&quot;en&quot;,&quot;source&quot;:null}" data-component-name="EmbeddedPostToDOM"><a class="embedded-post" native="true" href="https://metadataweekly.substack.com/p/context-graphs-as-ai-evaluation-infrastructure?utm_source=substack&amp;utm_campaign=post_embed&amp;utm_medium=web"><div class="embedded-post-header"><img class="embedded-post-publication-logo" src="https://substackcdn.com/image/fetch/$s_!q3WY!,w_56,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5d54bd2c-07b0-430f-9c05-9c349d9bf3d0_300x300.png" loading="lazy"><span class="embedded-post-publication-name">Context &amp; Chaos</span></div><div class="embedded-post-title-wrapper"><div class="embedded-post-title">Context Graphs as AI Evaluation Infrastructure</div></div><div class="embedded-post-body">About the Author: Sandipan Bhaumik have spent almost 2 decades building Data &amp; AI foundations. Now, through AgentBuild Weekly, he shares how builders and founders can move beyond AI hype to create Agentic systems that think, adapt, and truly work&#8230;</div><div class="embedded-post-cta-wrapper"><span class="embedded-post-cta">Read more</span></div><div class="embedded-post-meta">2 months ago &#183; 11 likes &#183; Sandipan Bhaumik</div></a></div><div><hr></div><p><em>P.S. If you&#8217;re new here - <strong>welcome</strong> &#127881;. AgentBuild is a community of practitioners working through the real challenges of getting AI into production inside large organisations. Every week I share practical, grounded thinking from the people doing this work at the sharp end. The goal is never theory - it&#8217;s always: what can you use Monday morning.</em></p><div><hr></div><p><em><strong>Ask your friends to join.</strong><br><br>More valuable content coming your way.</em></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://newsletter.agentbuild.ai/?utm_source=substack&amp;utm_medium=email&amp;utm_content=share&amp;action=share&quot;,&quot;text&quot;:&quot;Share agentbuild.ai&quot;,&quot;action&quot;:null,&quot;class&quot;:&quot;button-wrapper&quot;}" data-component-name="ButtonCreateButton"><a class="button primary button-wrapper" href="https://newsletter.agentbuild.ai/?utm_source=substack&amp;utm_medium=email&amp;utm_content=share&amp;action=share"><span>Share agentbuild.ai</span></a></p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://newsletter.agentbuild.ai/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading agentbuild.ai! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[Why Solution Architects Are the Real Force Behind Enterprise AI Transformation]]></title><description><![CDATA[The demo worked. The boardroom loved it. Six months later, someone made a call. It always goes to the same person.]]></description><link>https://newsletter.agentbuild.ai/p/why-solution-architects-are-the-real</link><guid isPermaLink="false">https://newsletter.agentbuild.ai/p/why-solution-architects-are-the-real</guid><pubDate>Sun, 26 Apr 2026 10:43:30 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/141a97f5-4092-4502-80da-cb58adb9f80a_1280x720.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>There&#8217;s a role inside every enterprise AI programme that nobody has a clean job title for. It isn&#8217;t the VP who sponsors the initiative. It isn&#8217;t the data scientist who builds the model. It isn&#8217;t the product manager who writes the requirements.</p><p>It&#8217;s the person who gets pulled into the room when the demo worked brilliantly and the deployment didn&#8217;t. The person who has to figure out why a system that impressed everyone in the boardroom is now sitting in a security review queue with no clear owner, no evaluation criteria, and a go-live deadline nobody wants to move.</p><p><strong>That person is usually a Solutions Architect.</strong></p><p>And in this new world of AI transfomration architects are lacking the frameworks to match the responsibility they&#8217;ve been handed.</p><div><hr></div><h3><strong>What the role has become</strong></h3><p>I&#8217;ve spent eighteen years in enterprise data and AI. The last several watching what happens when organizations decide to take AI seriously.</p><blockquote><p>Here&#8217;s what I keep seeing: Solutions Architects are becoming the load-bearing wall of AI transformation programmes. By default.</p></blockquote><p>They&#8217;re the ones who understand both the technology and the business context. They&#8217;re trusted enough to sit in executive sessions and technical ones. They have enough credibility to push back on vendor claims and enough pragmatism to know what actually ships.</p><p>So they get handed things. Big things.</p><p>Define the production readiness criteria. Assess whether the data infrastructure can support this use case. Figure out who owns the outcome when the model is wrong. Translate what the VP wants into something the engineering team can build. Get security and compliance aligned before the launch date nobody will move.</p><blockquote><p>That&#8217;s not an implementation role. That&#8217;s an organizational diagnostic role. And most architects are not prepared for it.</p></blockquote><div><hr></div><h3><strong>The gap nobody names</strong></h3><p>The architects I see are not struggling because they can&#8217;t build. They can build. They&#8217;re struggling because the job has shifted <strong>from build to diagnose</strong>, and they don&#8217;t yet have the instruments for it.</p><p>When a doctor walks into a room, they&#8217;re not improvising. They have a diagnostic protocol. Repeatable questions. Known patterns. A framework that tells them what to look for and in what order, so they can tell the difference between something that needs immediate intervention and something that needs monitoring.</p><blockquote><p>Right now, most architects walking into an AI programme are improvising. Drawing on instinct built from past projects. Pattern-matching against things they&#8217;ve seen before, hoping the pattern holds.</p></blockquote><p>Sometimes it does. Often it doesn&#8217;t.</p><p>And when it doesn&#8217;t, the cost isn&#8217;t just the failed project. It&#8217;s the six months of organizational trust that went with it. The next AI initiative that&#8217;s three times harder to fund because this one didn&#8217;t ship. The architect who now has a complicated story to tell about why the thing they led didn&#8217;t work.</p><div><hr></div><h3><strong>What real preparation looks like</strong></h3><p>I&#8217;ve been thinking for a long time about what it would mean to give architects the diagnostic tools they actually need. Something closer to a practitioner&#8217;s handbook for the organizational side of AI deployment. Not a vendor comparison or a tutorial on which framework to use.</p><p><strong>The kind of resource that helps you walk into an early-stage AI programme and ask the right questions before anyone starts building. </strong>That gives you a structured way to identify where the real risk is - not the model risk, but the <strong>Data Debt</strong> sitting in pipelines that haven&#8217;t been touched in three years. The <strong>Decision Debt</strong> in an organization where nobody has agreed on who owns an AI error. The <strong>Evaluation Debt</strong> in a team that&#8217;s been running vibe checks and calling it validation.</p><p>The kind of resource that helps you have the conversation with the VP that reframes the whole initiative - not as a technology project, but as an organizational readiness problem that happens to have a technology solution.</p><p>That&#8217;s the conversation that changes outcomes. And most architects don&#8217;t have a framework for it yet.</p><div><hr></div><h3><strong>Why I&#8217;m spending time on this</strong></h3><p>I&#8217;ve watched enough of these programmes - close enough to see the failure modes in detail - that the patterns are starting to feel predictable. Which means these are preventable.</p><p>I can walk into a kickoff meeting now and have a reasonable sense of what&#8217;s going to go wrong six months later. Not because I&#8217;m smarter than anyone in the room. Because I&#8217;ve seen it before. Enough times that it&#8217;s stopped feeling like bad luck and started feeling like a diagnostic problem with a known set of causes.</p><p>What I want to do - what I&#8217;m actively working on - is make that pattern recognition transferable. To give architects the frameworks that took me years of seeing things go wrong to develop, so they don&#8217;t have to learn the same lessons at the same cost.</p><p>That&#8217;s the work I&#8217;m orienting around. That&#8217;s the shape I want to give this community.</p><p>If you&#8217;re an architect who&#8217;s been handed one of these programmes - or knows you&#8217;re about to be - I&#8217;d genuinely like to hear what&#8217;s hard about it right now?</p><p>Talk soon,<br><strong>Sandi</strong></p><div><hr></div><p><em>P.S. If you&#8217;re new here - <strong>welcome</strong> &#127881;. AgentBuild is a community of practitioners working through the real challenges of getting AI into production inside large organisations. Every week I share practical, grounded thinking from the people doing this work at the sharp end. The goal is never theory - it&#8217;s always: what can you use Monday morning.</em></p><div><hr></div><p><em><strong>Ask your friends to join.</strong><br>More valuable content coming your way.</em></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://newsletter.agentbuild.ai/?utm_source=substack&amp;utm_medium=email&amp;utm_content=share&amp;action=share&quot;,&quot;text&quot;:&quot;Share agentbuild.ai&quot;,&quot;action&quot;:null,&quot;class&quot;:&quot;button-wrapper&quot;}" data-component-name="ButtonCreateButton"><a class="button primary button-wrapper" href="https://newsletter.agentbuild.ai/?utm_source=substack&amp;utm_medium=email&amp;utm_content=share&amp;action=share"><span>Share agentbuild.ai</span></a></p><div><hr></div><p>Thanks for reading agentbuild.ai! Subscribe for free to receive new posts and support my work.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://newsletter.agentbuild.ai/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading agentbuild.ai! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[Decision Traces: The Missing Black Box ✈️ for AI Agents]]></title><description><![CDATA[Decision traces explained - what they are, why every consequential AI agent needs them, and the architecture that makes reasoning auditable, defensible, and improvable.]]></description><link>https://newsletter.agentbuild.ai/p/decision-traces-the-missing-black</link><guid isPermaLink="false">https://newsletter.agentbuild.ai/p/decision-traces-the-missing-black</guid><pubDate>Sat, 18 Apr 2026 13:31:15 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/d180d4d2-5f06-49f5-986e-d83ffdedf651_1280x720.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Hey everyone,<br><br>Hope you all are doing well. Today, I am bringing up something that is increasingly coming up in my customer discussions. Not everyone is giving it a name, but the requirements they define clearly point to building <em><strong>Decision Traces</strong></em>.</p><p>Let me explain.</p><p><a href="https://en.wikipedia.org/wiki/Flight_recorder">Flight data recorders (FDR)</a> are popularly know as the <strong>black box</strong> of the aircraft. Technically the black box contains the FDR and the Cockpit Voice Recorder (CVR). The FDR turns raw sensor traces into timelines that explain crashes, reveal root causes, and drive global aviation safety improvements. Before flight data recorders were mandated, when something went wrong, investigators worked from witness accounts, wreckage patterns, and whatever instruments happened to be installed at the time. The analysis was mostly incomplete, often contradictory, and it rarely led to systemic changes. </p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!gVgt!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F88666d62-0e47-4451-bece-f33a324b4d7e_2848x1600.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!gVgt!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F88666d62-0e47-4451-bece-f33a324b4d7e_2848x1600.png 424w, https://substackcdn.com/image/fetch/$s_!gVgt!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F88666d62-0e47-4451-bece-f33a324b4d7e_2848x1600.png 848w, https://substackcdn.com/image/fetch/$s_!gVgt!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F88666d62-0e47-4451-bece-f33a324b4d7e_2848x1600.png 1272w, https://substackcdn.com/image/fetch/$s_!gVgt!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F88666d62-0e47-4451-bece-f33a324b4d7e_2848x1600.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!gVgt!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F88666d62-0e47-4451-bece-f33a324b4d7e_2848x1600.png" width="1456" height="818" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/88666d62-0e47-4451-bece-f33a324b4d7e_2848x1600.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:818,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Infographic showing five major aviation accidents and how flight data recorders enabled investigators to understand causes and improve safety.&quot;,&quot;title&quot;:&quot;Infographic showing five major aviation accidents and how flight data recorders enabled investigators to understand causes and improve safety.&quot;,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Infographic showing five major aviation accidents and how flight data recorders enabled investigators to understand causes and improve safety." title="Infographic showing five major aviation accidents and how flight data recorders enabled investigators to understand causes and improve safety." srcset="https://substackcdn.com/image/fetch/$s_!gVgt!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F88666d62-0e47-4451-bece-f33a324b4d7e_2848x1600.png 424w, https://substackcdn.com/image/fetch/$s_!gVgt!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F88666d62-0e47-4451-bece-f33a324b4d7e_2848x1600.png 848w, https://substackcdn.com/image/fetch/$s_!gVgt!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F88666d62-0e47-4451-bece-f33a324b4d7e_2848x1600.png 1272w, https://substackcdn.com/image/fetch/$s_!gVgt!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F88666d62-0e47-4451-bece-f33a324b4d7e_2848x1600.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The aviation industry knew flying was becoming more consequential with more routes, more passengers, more complex airspace, but the infrastructure to understand <em><strong>why things failed </strong></em>hadn&#8217;t kept pace with the deployment of the systems themselves. It was invented in response to the recognition that consequential systems operating at scale need a structured record of their reasoning - not just their outcomes.</p><p><strong>AI agents are at the same inflection point today. </strong>And the industry, especially in the regulated space is recognizing that.</p><div><hr></div><h2>What&#8217;s the gap right now?</h2><p>An AI agent deployed in a production system today typically produces two things: an input log and an output. What it doesn&#8217;t produce is the reasoning chain between them in any structured, queryable, auditable way .</p><p>This gap has a name. It&#8217;s called <strong>Decision Debt</strong>, one of the three categories of debt that block AI from working in production. Decision Debt accumulates when you build and deploy AI systems before defining how decisions get made, recorded, and reviewed. It&#8217;s not a future problem. It&#8217;s accumulating now, in every agent deployment that ships without trace infrastructure.</p><p>A decision trace is the record of how an agent got from context to conclusion: what it knew, what it considered, what it weighted, what it discarded, and at what confidence level it committed to an action. </p><p><strong>IMPORTANT:</strong> It&#8217;s not a log file. Logs capture events. Traces capture <em>reasoning</em>.</p><p>The distinction matters because when something goes wrong - and in any system operating at scale, you need to answer a different set of questions than a log can address. </p><p>Answering &#8220;what happened&#8221; is not enough, but <em><strong>&#8220;why did the agent conclude that, given what it had access to?</strong></em>&#8221; matters more.</p><div><hr></div><h2>This is a historical pattern</h2><p>Aviation gets to the black box through painful iteration. Financial services gets to trade surveillance infrastructure the same way. Healthcare builds clinical decision support audit trails only after near-misses force the question.</p><p>The pattern across every regulated industry is identical: consequential system deploys, operates without adequate observability, incident occurs, retroactive audit reveals the trace infrastructure was never built, expensive fixes follow.</p><p>What&#8217;s different with AI agents is that we can see this pattern coming before the incidents accumulate. The decision trace problem is visible now, in advance, to anyone who has watched the previous cycles play out in adjacent domains.</p><p>Nuclear power operations built decision logging infrastructure into control room design before widespread deployment. And that&#8217;s not because regulators demanded it initially, but because the engineers understood that a system making consequential decisions in real time needed to be interrogable after the fact. </p><blockquote><p>The Chernobyl investigation was partially possible because <a href="https://grokipedia.com/page/investigations_into_the_chernobyl_disaster#:~:text=The%20assessment%20relied%20heavily%20on,positive%20void%20coefficient%20of%20reactivity.">operator actions were timestamped and sequenced</a>. The lessons extracted shaped reactor design globally.</p></blockquote><p>The equivalent for AI agents isn&#8217;t complicated in principle. </p><p><strong>It is, however, work that almost nobody has started.</strong></p><div><hr></div><h2>What Decision Trace Infrastructure actually need?</h2><p>The architecture for a decision trace system has five functional layers, and each one has a specific job. Here&#8217;s how they fit together.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!LZl-!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F64c05d96-bc2e-4d87-ae6b-2ca02d500c9a_2396x1792.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!LZl-!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F64c05d96-bc2e-4d87-ae6b-2ca02d500c9a_2396x1792.png 424w, https://substackcdn.com/image/fetch/$s_!LZl-!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F64c05d96-bc2e-4d87-ae6b-2ca02d500c9a_2396x1792.png 848w, https://substackcdn.com/image/fetch/$s_!LZl-!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F64c05d96-bc2e-4d87-ae6b-2ca02d500c9a_2396x1792.png 1272w, https://substackcdn.com/image/fetch/$s_!LZl-!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F64c05d96-bc2e-4d87-ae6b-2ca02d500c9a_2396x1792.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!LZl-!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F64c05d96-bc2e-4d87-ae6b-2ca02d500c9a_2396x1792.png" width="1456" height="1089" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/64c05d96-bc2e-4d87-ae6b-2ca02d500c9a_2396x1792.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1089,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:4869833,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.agentbuild.ai/i/194595643?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F64c05d96-bc2e-4d87-ae6b-2ca02d500c9a_2396x1792.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!LZl-!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F64c05d96-bc2e-4d87-ae6b-2ca02d500c9a_2396x1792.png 424w, https://substackcdn.com/image/fetch/$s_!LZl-!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F64c05d96-bc2e-4d87-ae6b-2ca02d500c9a_2396x1792.png 848w, https://substackcdn.com/image/fetch/$s_!LZl-!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F64c05d96-bc2e-4d87-ae6b-2ca02d500c9a_2396x1792.png 1272w, https://substackcdn.com/image/fetch/$s_!LZl-!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F64c05d96-bc2e-4d87-ae6b-2ca02d500c9a_2396x1792.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Illustration of a Decision Trace Pipleine</figcaption></figure></div><ol><li><p><strong>Input Capture Service</strong> is where the trace begins - at the moment a request enters the system. Query, user identity, session context, and request metadata are captured here, backed by a metadata store (PostgreSQL can be a straightforward choice). This is the &#8220;who asked what, when, and from where&#8221; layer. Without it, you have no anchor for the rest of the trace.</p><p></p></li><li><p><strong>State Retrieval and Context Snapshot</strong> captures the world as the agent saw it at decision time: which data versions were active, which policy definitions were in force, which catalog references were resolved. This layer pulls from a prerequisites datastore - Redis for low-latency state, S3 for snapshot durability. It&#8217;s also the layer that makes post-incident analysis possible. When you need to understand why the agent concluded what it did three months ago, you need to know what it <em>knew</em> at that moment - not what the system knows now.</p><p></p><div class="callout-block" data-callout="true"><p>This layer is, in practical terms, where a context graph lives - even if most implementations don't call it that yet. A context graph is simply the structured representation of what the agent knew and how those things related to each other at decision time: data assets, policies, catalog nodes, versions, and their connections. The reason <a href="https://foundationcapital.com/ideas/context-graphs-ais-trillion-dollar-opportunity">"context graph" is gaining traction</a> as a term without a settled definition is precisely because this layer has been missing from most agent architectures. Once you build the snapshot layer properly, you have one.</p></div><p></p></li><li><p><strong>Reasoning Chain and Decision Engine</strong> is the core trace layer. Chain-of-thought steps, intermediate logic, intermediate outputs - all captured as structured records. It is the path the agent took to reach the final answer. Every branch, every intermediate conclusion, every tool invocation that shaped the reasoning is an addressable record here.</p><p></p></li><li><p><strong>Policy Binding Service</strong> records the guardrails, rules, and decision logic that were active during the reasoning process. This is what separates a decision trace from a debugging log. You&#8217;re not just capturing what the agent did, you&#8217;re capturing the constraints it was operating under. When a compliance team asks &#8220;was the agent following the policy that was in force on this date,&#8221; this layer answers that question directly.</p><p></p></li><li><p><strong>Outcome and Action Capturing</strong> records the final response, the action taken, and critically - any redress or complaint data attached to that outcome. This closes the loop between the agent&#8217;s decision and its real-world consequence. It&#8217;s also the layer that feeds dispute resolution workflows when customers or regulators challenge an outcome.</p></li></ol><p>All five layers feed into an <strong>Immutable Audit Record</strong> - timestamped, hashed, and written to an immutable trace store (S3, Delta Lake or a ledger database). The immutability is a must-have. It is the architectural guarantee that the record cannot be altered after the fact, which is what makes it defensible in a regulatory or legal context. The diagram you see specifies retention period, which aligns with financial services conduct requirements and is a reasonable baseline for any regulated environment.</p><p>From the trace store, a <strong>Trace Query Service and Data Lake</strong> make the records queryable at scale. This is the operational distinction you need to understand. A queryable trace lets you ask: &#8220;Show me every decision where the policy binding service applied rule X and the outcome was Y.&#8221; That&#8217;s the difference between evidence and insight.</p><p>The four downstream outputs from this architecture tell you exactly what it&#8217;s designed to serve: <strong>Redress and Dispute Resolution</strong> (when a decision is challenged), <strong>Audit Trail Reporting</strong> (when a regulator asks), <strong>Debugging and Root Cause Analysis</strong> (when something fails), and <strong>Improvement and ML Training</strong> (when you want to make the system better using real decision data).</p><p>No single layer here is novel in isolation. Input capture, immutable storage, policy versioning exist in adjacent systems already. What doesn&#8217;t exist yet, in any standardised form for AI agents, is this stack assembled as a coherent, purpose-built trace infrastructure. That&#8217;s the gap this architecture closes.</p><div><hr></div><h2>Why enterprise architects need to move on this now</h2><p>This infrastructure is necessary to build the cleanest parth through regulatory scrutiny, incident response, and enterprise customer due diligence. The same principle made structured engineering logging standard practice in distributed systems. You cannot debug what you cannot observe. You cannot improve what you cannot measure. And you cannot defend in a board meeting, a regulatory inquiry, or a customer audit what you never recorded.</p><blockquote><p><strong>Decision traces are the observability layer for AI reasoning.</strong> It is the infrastructure equivalent of distributed tracing in microservices, now applied to systems that don&#8217;t just execute code, but form conclusions and take actions.</p></blockquote><p>The good news is that this is buildable now, with current tooling, before the incidents force it. The question is whether engineering organisations treat it as foundational infrastructure from the first production deployment, or discover its absence after the fact.</p><div><hr></div><h2>Call To Action</h2><p>If you are building AI agents for anything consequential, the time to design trace infrastructure is before the first production deployment. Start by mapping which decisions your agent makes that you could not currently explain, audit, or defend: that list is your build priority.</p><p>If this framing is useful, share it with the architect or engineering lead on your AI team - this is the conversation that needs to happen before the system goes live, not after.</p><p>Leave a feedback or comment. Share your opinion about this topic.</p><p>Thanks,<br>Sandi.</p><p><br>&#128073; You might also find my article published on Atlan&#8217;s community Substack useful: </p><div class="embedded-post-wrap" data-attrs="{&quot;id&quot;:193583020,&quot;url&quot;:&quot;https://metadataweekly.substack.com/p/context-graphs-as-ai-evaluation-infrastructure&quot;,&quot;publication_id&quot;:585908,&quot;publication_name&quot;:&quot;Context &amp; Chaos&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!q3WY!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5d54bd2c-07b0-430f-9c05-9c349d9bf3d0_300x300.png&quot;,&quot;title&quot;:&quot;Context Graphs as AI Evaluation Infrastructure&quot;,&quot;truncated_body_text&quot;:&quot;About the Author: Sandipan Bhaumik have spent almost 2 decades building Data &amp; AI foundations. Now, through AgentBuild Weekly, he shares how builders and founders can move beyond AI hype to create Agentic systems that think, adapt, and truly work.&quot;,&quot;date&quot;:&quot;2026-04-09T14:05:59.147Z&quot;,&quot;like_count&quot;:11,&quot;comment_count&quot;:0,&quot;bylines&quot;:[{&quot;id&quot;:193058051,&quot;name&quot;:&quot;Sandipan Bhaumik&quot;,&quot;handle&quot;:&quot;sanbhaumik&quot;,&quot;previous_name&quot;:&quot;Sandi Bhaumik&quot;,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/651c04c2-d92e-4a2e-905f-a59346e3e950_1024x1024.png&quot;,&quot;bio&quot;:&quot;I&#8217;ve spent almost 2 decades building Data &amp; AI foundations. Now, through AgentBuild Weekly, I share how builders and founders can move beyond AI hype to create Agentic systems that think, adapt, and truly work.&quot;,&quot;profile_set_up_at&quot;:&quot;2023-12-29T14:48:55.893Z&quot;,&quot;reader_installed_at&quot;:&quot;2024-02-15T19:29:15.030Z&quot;,&quot;is_guest&quot;:true,&quot;bestseller_tier&quot;:null,&quot;status&quot;:{&quot;bestsellerTier&quot;:null,&quot;subscriberTier&quot;:null,&quot;leaderboard&quot;:null,&quot;vip&quot;:false,&quot;badge&quot;:null,&quot;paidPublicationIds&quot;:[],&quot;subscriber&quot;:null},&quot;primaryPublicationId&quot;:2211527,&quot;primaryPublicationName&quot;:&quot;agentbuild.ai&quot;,&quot;primaryPublicationUrl&quot;:&quot;https://newsletter.agentbuild.ai&quot;,&quot;primaryPublicationSubscribeUrl&quot;:&quot;https://newsletter.agentbuild.ai/subscribe?&quot;}],&quot;utm_campaign&quot;:null,&quot;belowTheFold&quot;:true,&quot;type&quot;:&quot;newsletter&quot;,&quot;language&quot;:&quot;en&quot;,&quot;source&quot;:null}" data-component-name="EmbeddedPostToDOM"><a class="embedded-post" native="true" href="https://metadataweekly.substack.com/p/context-graphs-as-ai-evaluation-infrastructure?utm_source=substack&amp;utm_campaign=post_embed&amp;utm_medium=web"><div class="embedded-post-header"><img class="embedded-post-publication-logo" src="https://substackcdn.com/image/fetch/$s_!q3WY!,w_56,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5d54bd2c-07b0-430f-9c05-9c349d9bf3d0_300x300.png" loading="lazy"><span class="embedded-post-publication-name">Context &amp; Chaos</span></div><div class="embedded-post-title-wrapper"><div class="embedded-post-title">Context Graphs as AI Evaluation Infrastructure</div></div><div class="embedded-post-body">About the Author: Sandipan Bhaumik have spent almost 2 decades building Data &amp; AI foundations. Now, through AgentBuild Weekly, he shares how builders and founders can move beyond AI hype to create Agentic systems that think, adapt, and truly work&#8230;</div><div class="embedded-post-cta-wrapper"><span class="embedded-post-cta">Read more</span></div><div class="embedded-post-meta">2 months ago &#183; 11 likes &#183; Sandipan Bhaumik</div></a></div><div><hr></div><p><em>P.S. If you&#8217;re new here - <strong>welcome</strong> &#127881;. AgentBuild is a community of practitioners working through the real challenges of getting AI into production inside large organisations. Every week I share practical, grounded thinking from the people doing this work at the sharp end. The goal is never theory - it&#8217;s always: what can you use Monday morning.</em></p><div><hr></div><p><em><strong>Ask your friends to join.</strong><br>More valuable content coming your way.</em></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://newsletter.agentbuild.ai/?utm_source=substack&amp;utm_medium=email&amp;utm_content=share&amp;action=share&quot;,&quot;text&quot;:&quot;Share agentbuild.ai&quot;,&quot;action&quot;:null,&quot;class&quot;:&quot;button-wrapper&quot;}" data-component-name="ButtonCreateButton"><a class="button primary button-wrapper" href="https://newsletter.agentbuild.ai/?utm_source=substack&amp;utm_medium=email&amp;utm_content=share&amp;action=share"><span>Share agentbuild.ai</span></a></p><div><hr></div><p>Thanks for reading agentbuild.ai! Subscribe for free to receive new posts and support my work.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://newsletter.agentbuild.ai/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading agentbuild.ai! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[I was offline. Here's what happened when I came back.]]></title><description><![CDATA[This edition covers the AI engineering conference I attended, my talk on multi-agent orchestration patterns, a new article on context graphs, and a deep-dive video on AI latency]]></description><link>https://newsletter.agentbuild.ai/p/context-graphs-multi-agent-orchestration</link><guid isPermaLink="false">https://newsletter.agentbuild.ai/p/context-graphs-multi-agent-orchestration</guid><pubDate>Sat, 11 Apr 2026 13:31:27 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!tBb9!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F02ad96e6-9ac1-4325-9965-549b75af1adf_1275x684.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Hello everyone,</p><p>I owe you an explanation for going quiet last Saturday.</p><p>We took an Easter break as a family - properly offline, no laptop, in the English countryside. It was superb, the rolling green fields actually delivered on the promise. - and it was sunny &#9728;&#65039; </p><p>Right. I&#8217;m back. And there&#8217;s quite a lot to catch you up on.</p><div><hr></div><h3><strong>I was at an AI engineering conference this week</strong></h3><p>And I gave a talk, it will be out soon. This is the first time it happened in Europe and I got to meet so many smart, talented founders and engineers. It was awesome experience. In-person events are irreplaceable. </p><p>I attended some fabulouse sessions on cutting-edge stuff on AI. And of course OpenClaw dominated the discussion.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!tBb9!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F02ad96e6-9ac1-4325-9965-549b75af1adf_1275x684.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!tBb9!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F02ad96e6-9ac1-4325-9965-549b75af1adf_1275x684.png 424w, https://substackcdn.com/image/fetch/$s_!tBb9!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F02ad96e6-9ac1-4325-9965-549b75af1adf_1275x684.png 848w, https://substackcdn.com/image/fetch/$s_!tBb9!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F02ad96e6-9ac1-4325-9965-549b75af1adf_1275x684.png 1272w, https://substackcdn.com/image/fetch/$s_!tBb9!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F02ad96e6-9ac1-4325-9965-549b75af1adf_1275x684.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!tBb9!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F02ad96e6-9ac1-4325-9965-549b75af1adf_1275x684.png" width="1275" height="684" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/02ad96e6-9ac1-4325-9965-549b75af1adf_1275x684.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:684,&quot;width&quot;:1275,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:652683,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://newsletter.agentbuild.ai/i/193865270?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F02ad96e6-9ac1-4325-9965-549b75af1adf_1275x684.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!tBb9!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F02ad96e6-9ac1-4325-9965-549b75af1adf_1275x684.png 424w, https://substackcdn.com/image/fetch/$s_!tBb9!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F02ad96e6-9ac1-4325-9965-549b75af1adf_1275x684.png 848w, https://substackcdn.com/image/fetch/$s_!tBb9!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F02ad96e6-9ac1-4325-9965-549b75af1adf_1275x684.png 1272w, https://substackcdn.com/image/fetch/$s_!tBb9!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F02ad96e6-9ac1-4325-9965-549b75af1adf_1275x684.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Check out the conference here: https://www.ai.engineer/europe</p><div><hr></div><h3><strong>I also made it to the online track of the conference</strong></h3><p>And the topic was something I&#8217;ve been working towards for a while - <strong>Multi-Agent Orchestration Patterns for Production.</strong></p><div id="youtube2-2czYyrTzILg" class="youtube-wrap" data-attrs="{&quot;videoId&quot;:&quot;2czYyrTzILg&quot;,&quot;startTime&quot;:null,&quot;endTime&quot;:null}" data-component-name="Youtube2ToDOM"><div class="youtube-inner"><iframe src="https://www.youtube-nocookie.com/embed/2czYyrTzILg?rel=0&amp;autoplay=0&amp;showinfo=0&amp;enablejsapi=0" frameborder="0" loading="lazy" gesture="media" allow="autoplay; fullscreen" allowautoplay="true" allowfullscreen="true" width="728" height="409"></iframe></div></div><p>The core argument: the field is moving fast, but most teams hit the same wall. They build multi-agent systems like they built single-agent systems. Same assumptions, same trust in the &#8220;it works in the demo&#8221; signal. And then production arrives, and nothing holds.</p><p>The talk walked through choreography vs orchestration, immutable state patterns, circuit breakers, and why distributed systems thinking is no longer optional if you&#8217;re building agents at any meaningful scale.<br><br>Check it out.</p><div><hr></div><h3><strong>A piece I wrote just went live </strong></h3><p>This one has been in the works for a while, and I&#8217;m genuinely proud of it.</p><p>I wrote a guest article for Context &amp; Chaos introducing two concepts I&#8217;ve been developing from my work with regulated enterprises: <strong>context drift and the evaluation graph.</strong></p><p>When an AI system gives you an answer, that answer wasn&#8217;t produced in a vacuum. It was produced against a specific version of your world - a specific definition of what &#8220;active customer&#8221; meant that week, a specific policy that was in force that month, a specific dataset that may or may not still exist.</p><p>Think of it like this: imagine a doctor&#8217;s notes. It&#8217;s not enough to record what prescription they wrote. You also need to know what guidelines were current that day, what the patient&#8217;s history showed at that point, what the lab results said. Without that context, the notes are incomplete. Enterprise AI has the same problem. We&#8217;re recording the prescription. We&#8217;re not recording everything else that informed it.</p><p>This is original IP, and I think it&#8217;s going to become a recurring theme in how regulated industries think about AI governance. </p><p>Here is the article:</p><div class="embedded-post-wrap" data-attrs="{&quot;id&quot;:193583020,&quot;url&quot;:&quot;https://metadataweekly.substack.com/p/context-graphs-as-ai-evaluation-infrastructure&quot;,&quot;publication_id&quot;:585908,&quot;publication_name&quot;:&quot;Context &amp; Chaos&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!q3WY!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5d54bd2c-07b0-430f-9c05-9c349d9bf3d0_300x300.png&quot;,&quot;title&quot;:&quot;Context Graphs as AI Evaluation Infrastructure&quot;,&quot;truncated_body_text&quot;:&quot;About the Author: Sandipan Bhaumik have spent almost 2 decades building Data &amp; AI foundations. Now, through AgentBuild Weekly, he shares how builders and founders can move beyond AI hype to create Agentic systems that think, adapt, and truly work.&quot;,&quot;date&quot;:&quot;2026-04-09T14:05:59.147Z&quot;,&quot;like_count&quot;:9,&quot;comment_count&quot;:0,&quot;bylines&quot;:[{&quot;id&quot;:193058051,&quot;name&quot;:&quot;Sandipan Bhaumik&quot;,&quot;handle&quot;:&quot;sanbhaumik&quot;,&quot;previous_name&quot;:&quot;Sandi Bhaumik&quot;,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/651c04c2-d92e-4a2e-905f-a59346e3e950_1024x1024.png&quot;,&quot;bio&quot;:&quot;I&#8217;ve spent almost 2 decades building Data &amp; AI foundations. Now, through AgentBuild Weekly, I share how builders and founders can move beyond AI hype to create Agentic systems that think, adapt, and truly work.&quot;,&quot;profile_set_up_at&quot;:&quot;2023-12-29T14:48:55.893Z&quot;,&quot;reader_installed_at&quot;:&quot;2024-02-15T19:29:15.030Z&quot;,&quot;is_guest&quot;:true,&quot;bestseller_tier&quot;:null,&quot;status&quot;:{&quot;bestsellerTier&quot;:null,&quot;subscriberTier&quot;:null,&quot;leaderboard&quot;:null,&quot;vip&quot;:false,&quot;badge&quot;:null,&quot;paidPublicationIds&quot;:[],&quot;subscriber&quot;:null},&quot;primaryPublicationId&quot;:2211527,&quot;primaryPublicationName&quot;:&quot;agentbuild.ai&quot;,&quot;primaryPublicationUrl&quot;:&quot;https://newsletter.agentbuild.ai&quot;,&quot;primaryPublicationSubscribeUrl&quot;:&quot;https://newsletter.agentbuild.ai/subscribe?&quot;}],&quot;utm_campaign&quot;:null,&quot;belowTheFold&quot;:true,&quot;type&quot;:&quot;newsletter&quot;,&quot;language&quot;:&quot;en&quot;,&quot;source&quot;:null}" data-component-name="EmbeddedPostToDOM"><a class="embedded-post" native="true" href="https://metadataweekly.substack.com/p/context-graphs-as-ai-evaluation-infrastructure?utm_source=substack&amp;utm_campaign=post_embed&amp;utm_medium=web"><div class="embedded-post-header"><img class="embedded-post-publication-logo" src="https://substackcdn.com/image/fetch/$s_!q3WY!,w_56,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5d54bd2c-07b0-430f-9c05-9c349d9bf3d0_300x300.png" loading="lazy"><span class="embedded-post-publication-name">Context &amp; Chaos</span></div><div class="embedded-post-title-wrapper"><div class="embedded-post-title">Context Graphs as AI Evaluation Infrastructure</div></div><div class="embedded-post-body">About the Author: Sandipan Bhaumik have spent almost 2 decades building Data &amp; AI foundations. Now, through AgentBuild Weekly, he shares how builders and founders can move beyond AI hype to create Agentic systems that think, adapt, and truly work&#8230;</div><div class="embedded-post-cta-wrapper"><span class="embedded-post-cta">Read more</span></div><div class="embedded-post-meta">2 months ago &#183; 9 likes &#183; Sandipan Bhaumik</div></a></div><div><hr></div><h3><strong>New YouTube video: The AI Latency Stack</strong></h3><p>While you&#8217;re in content-consumption mode this weekend, I also want to point you to a video I put out recently on AI application latency - it keeps coming up in conversations and I wanted to have something concrete to point people to.</p><div id="youtube2-fN1hxUdfkss" class="youtube-wrap" data-attrs="{&quot;videoId&quot;:&quot;fN1hxUdfkss&quot;,&quot;startTime&quot;:null,&quot;endTime&quot;:null}" data-component-name="Youtube2ToDOM"><div class="youtube-inner"><iframe src="https://www.youtube-nocookie.com/embed/fN1hxUdfkss?rel=0&amp;autoplay=0&amp;showinfo=0&amp;enablejsapi=0" frameborder="0" loading="lazy" gesture="media" allow="autoplay; fullscreen" allowautoplay="true" allowfullscreen="true" width="728" height="409"></iframe></div></div><p><strong>The short version: </strong>after your AI system ships to production, the model is almost never the problem. It&#8217;s a set of architectural decisions - streaming, database writes on the critical path, cold starts, context window bloat, prompt caching, sequential calls that should be parallel - that compound into something that makes users give up and go back to the manual process. The video walks through seven of these layers and how to address them, without swapping models or changing vendors.</p><p>Worth a watch if you&#8217;re anywhere near a production AI deployment right now.</p><div><hr></div><p>That&#8217;s it for this week. </p><p>A lot happened in a short space of time, and I wanted to share it with you directly. As always - reply if anything resonates, or if you&#8217;re wrestling with something I touched on.<br><br>Talk soon, <br>Sandi</p><div><hr></div><p><em>P.S. If you&#8217;re new here - <strong>welcome</strong> &#127881;. AgentBuild is a community of practitioners working through the real challenges of getting AI into production inside large organisations. Every week I share practical, grounded thinking from the people doing this work at the sharp end. The goal is never theory - it&#8217;s always: what can you use Monday morning.</em></p><div><hr></div><p><em><strong>Ask your friends to join.</strong><br>More valuable content coming your way.</em></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://newsletter.agentbuild.ai/?utm_source=substack&amp;utm_medium=email&amp;utm_content=share&amp;action=share&quot;,&quot;text&quot;:&quot;Share agentbuild.ai&quot;,&quot;action&quot;:null,&quot;class&quot;:&quot;button-wrapper&quot;}" data-component-name="ButtonCreateButton"><a class="button primary button-wrapper" href="https://newsletter.agentbuild.ai/?utm_source=substack&amp;utm_medium=email&amp;utm_content=share&amp;action=share"><span>Share agentbuild.ai</span></a></p><div><hr></div><p>Thanks for reading agentbuild.ai! Subscribe for free to receive new posts and support my work.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://newsletter.agentbuild.ai/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading agentbuild.ai! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[The High Agency Engineer Will Win the AI Era. Here's What I'm Seeing in the Field.]]></title><description><![CDATA[My job gives me an unusual view. I get to sit inside a lot of organisations and watch how engineering teams are actually responding to AI. Not the conference version. The real version.]]></description><link>https://newsletter.agentbuild.ai/p/the-high-agency-engineer-will-win</link><guid isPermaLink="false">https://newsletter.agentbuild.ai/p/the-high-agency-engineer-will-win</guid><pubDate>Sat, 28 Mar 2026 14:31:07 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/4548c2a9-cf59-4242-a54f-1750eec9c5e4_1280x720.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>I&#8217;m seeing something in the field right now that is genuinely opening my eyes.</p><p>I&#8217;m lucky. My job puts me in front of a lot of engineering teams across a lot of organisations. Some are moving fast. Some are moving slow. And I get to see both. Not from a distance, up close, in the actual conversations where decisions get made.</p><p><em><strong>What I&#8217;m watching is a quiet split happening inside engineering teams. </strong></em>And I think it matters for anyone thinking about where this profession is heading.</p><div><hr></div><p>Two engineers. Same company. Same tools available. Same access to AI. Completely different outcomes.</p><p>One of them, when they hit a hard problem, opens a chat window and starts working through it out loud. They dump in the messy context. The half-baked question. The data that doesn&#8217;t quite make sense yet. They&#8217;re not looking for autocomplete. They&#8217;re looking for a way through.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!--i9!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9ab87c78-7849-4b7f-8417-9407a2bdb768_2816x1536.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!--i9!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9ab87c78-7849-4b7f-8417-9407a2bdb768_2816x1536.png 424w, https://substackcdn.com/image/fetch/$s_!--i9!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9ab87c78-7849-4b7f-8417-9407a2bdb768_2816x1536.png 848w, https://substackcdn.com/image/fetch/$s_!--i9!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9ab87c78-7849-4b7f-8417-9407a2bdb768_2816x1536.png 1272w, https://substackcdn.com/image/fetch/$s_!--i9!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9ab87c78-7849-4b7f-8417-9407a2bdb768_2816x1536.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!--i9!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9ab87c78-7849-4b7f-8417-9407a2bdb768_2816x1536.png" width="1456" height="794" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/9ab87c78-7849-4b7f-8417-9407a2bdb768_2816x1536.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:794,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:9266339,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://newsletter.agentbuild.ai/i/192393790?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9ab87c78-7849-4b7f-8417-9407a2bdb768_2816x1536.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!--i9!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9ab87c78-7849-4b7f-8417-9407a2bdb768_2816x1536.png 424w, https://substackcdn.com/image/fetch/$s_!--i9!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9ab87c78-7849-4b7f-8417-9407a2bdb768_2816x1536.png 848w, https://substackcdn.com/image/fetch/$s_!--i9!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9ab87c78-7849-4b7f-8417-9407a2bdb768_2816x1536.png 1272w, https://substackcdn.com/image/fetch/$s_!--i9!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9ab87c78-7849-4b7f-8417-9407a2bdb768_2816x1536.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p><p>The other one says &#8220;AI isn&#8217;t reliable enough for this.&#8221; And goes back to doing it the slow way.</p><p>I&#8217;ve watched this play out across banks, fintechs, and large regulated enterprises. And the gap between these two engineers is only getting wider.</p><div><hr></div><h2><strong>What high agency actually looks like</strong></h2><p>I was working with a team recently trying to make sense of a large pile of unstructured documents. Audit logs, policy docs, historical reports. This kind of work that normally takes weeks of someone&#8217;s time.</p><p>One engineer on the team didn&#8217;t wait to be told how. She had no prior experience with the specific tooling. But she sat down, broke the problem into pieces, and used AI to work through each one. By end of day she had something working. Not perfect. But working.</p><blockquote><p>She didn&#8217;t have a playbook. She made one.</p></blockquote><p>That&#8217;s what high agency looks like in practice. Not waiting for a process document. Not waiting for someone to say it&#8217;s approved. When they hit a wall, the first instinct is to figure out what question to ask - not explain why the wall is there.</p><div><hr></div><h2><strong>What the resistance sounds like</strong></h2><p>I want to be careful here. <strong>The engineers pushing back on AI are not lazy.</strong> Many of them are the most experienced people in the room.</p><p>But the resistance has a pattern.</p><p>&#8220;It hallucinates too much for our use case.&#8221; </p><p>&#8220;Security hasn&#8217;t signed it off yet.&#8221; </p><p>&#8220;The outputs aren&#8217;t consistent enough to trust.&#8221; </p><p>&#8220;This is hype, let it settle.&#8221;</p><p>Some of these are valid. I work in regulated environments. I understand the constraints.</p><blockquote><p>But what I notice is this. The engineers saying these things have usually not given AI their hardest problem. They&#8217;ve given it easy tasks, watched it stumble, and concluded it isn&#8217;t ready. They&#8217;re evaluating a tool they haven&#8217;t really pushed.</p></blockquote><p>The high agency engineers hit the same limitations. They just treat them as constraints to work around, not reasons to stop.</p><div><hr></div><h2><strong>There&#8217;s something underneath the resistance</strong></h2><p>I think it goes deeper than technology skepticism.</p><p>A lot of experienced engineers have built their identity around already knowing the answer. They&#8217;re the person people come to. The one who&#8217;s seen this before.</p><p>AI is uncomfortable for that identity. Because the value is shifting. It&#8217;s moving away from already knowing - toward knowing how to ask. That&#8217;s a different skill. And it asks you to be a beginner again, at least partially.</p><p>The engineers I see thriving have a looser grip on what they already know. They&#8217;re curious before they&#8217;re skeptical. </p><p><strong>They pick the tool up before they critique it.</strong></p><div><hr></div><h2><strong>One practical thing</strong></h2><p>Ask yourself honestly: when did you last give AI your genuinely hardest problem?</p><p>Not &#8220;summarise this document.&#8221;  or &#8220;tidy up this function.&#8221; </p><p>The real hard thing. The one you&#8217;ve been circling because you don&#8217;t quite know where to start.</p><p>I&#8217;ve seen engineers use AI to compress weeks of analysis into a day. I&#8217;ve seen it catch patterns in production failures that a team had been chasing for months. I&#8217;ve seen it unlock a business conversation that had been stuck for a quarter - just by helping someone structure their thinking clearly enough to explain it.</p><p>None of that happened because the technology was perfect. It happened because someone decided to figure it out.</p><blockquote><p>The job of an engineer is changing. I&#8217;m watching it happen. The ones adapting aren&#8217;t the most experienced or the most technical. They&#8217;re the ones most willing to stay curious.</p></blockquote><p>That&#8217;s the only practical advice I have.</p><p>One question before you go - <em><strong>what's the most interesting thing you've seen an engineer do with AI that nobody is talking about yet?</strong></em></p><p>Hit reply and tell me. </p><p>Talk soon,<br>Sandi</p><div><hr></div><p><em>P.S. If you&#8217;re new here - <strong>welcome</strong> &#127881;. AgentBuild is a community of practitioners working through the real challenges of getting AI into production inside large organisations. Every week I share practical, grounded thinking from the people doing this work at the sharp end. The goal is never theory - it&#8217;s always: what can you use Monday morning.</em></p><div><hr></div><p><em><strong>Ask your friends to join.</strong><br>More valuable content coming your way.</em></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://newsletter.agentbuild.ai/?utm_source=substack&amp;utm_medium=email&amp;utm_content=share&amp;action=share&quot;,&quot;text&quot;:&quot;Share agentbuild.ai&quot;,&quot;action&quot;:null,&quot;class&quot;:&quot;button-wrapper&quot;}" data-component-name="ButtonCreateButton"><a class="button primary button-wrapper" href="https://newsletter.agentbuild.ai/?utm_source=substack&amp;utm_medium=email&amp;utm_content=share&amp;action=share"><span>Share agentbuild.ai</span></a></p><div><hr></div><p>Thanks for reading agentbuild.ai! Subscribe for free to receive new posts and support my work.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://newsletter.agentbuild.ai/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading agentbuild.ai! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[NVIDIA GTC 2026: From GPUs to AI Factories - What Vera Rubin Really Means for Builders]]></title><description><![CDATA[GTC 2026 was the week NVIDIA stopped selling GPUs and started selling AI factories -here&#8217;s what that shift means for AI, ML, and data engineers now.]]></description><link>https://newsletter.agentbuild.ai/p/nvidia-gtc-2026-from-gpus-to-ai-factories</link><guid isPermaLink="false">https://newsletter.agentbuild.ai/p/nvidia-gtc-2026-from-gpus-to-ai-factories</guid><pubDate>Sat, 21 Mar 2026 14:30:43 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/6d93bde8-10c1-4a47-8138-f84f0d6a1d5c_2988x1736.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Hello everyone,<br><br>Finally, spring is here, few sunny days here in England (I don&#8217;t want to jinx it though). Overall I am feeling happy, trying to get back to my running habit now. The disappointment I carried last week though was not being able to attend the NVIDIA GTC - I have too much going on to make a trip to the US right now.</p><p>Anyway, I have been following all the updates. I have collected the top things you should know in this newsletter. <br><br>Let&#8217;s have a look.</p><div><hr></div><p>Folks, GTC 2026 was the week NVIDIA stopped selling us GPUs and started selling us AI factories - hardware, agents, and even token budgets included. Lovely stuff.</p><p>For years, GTC keynotes have been about bigger chips, more FLOPs, and eye&#8209;watering benchmarks. This year was different. Mr. Jensen Huang&#8217;s message was clear: <em><strong>the center of gravity is moving from individual accelerators to full&#8209;stack &#8220;AI factories&#8221; that ingest data on one end and ship intelligence on the other.</strong></em></p><h2>1. At the heart of that story is Vera Rubin.</h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!9X4f!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1ed1a818-a19d-4b0f-bfd0-7109a56f3256_2560x1440.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!9X4f!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1ed1a818-a19d-4b0f-bfd0-7109a56f3256_2560x1440.jpeg 424w, https://substackcdn.com/image/fetch/$s_!9X4f!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1ed1a818-a19d-4b0f-bfd0-7109a56f3256_2560x1440.jpeg 848w, https://substackcdn.com/image/fetch/$s_!9X4f!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1ed1a818-a19d-4b0f-bfd0-7109a56f3256_2560x1440.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!9X4f!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1ed1a818-a19d-4b0f-bfd0-7109a56f3256_2560x1440.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!9X4f!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1ed1a818-a19d-4b0f-bfd0-7109a56f3256_2560x1440.jpeg" width="1456" height="819" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/1ed1a818-a19d-4b0f-bfd0-7109a56f3256_2560x1440.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;GTC 2026 All Starts Here&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="GTC 2026 All Starts Here" title="GTC 2026 All Starts Here" srcset="https://substackcdn.com/image/fetch/$s_!9X4f!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1ed1a818-a19d-4b0f-bfd0-7109a56f3256_2560x1440.jpeg 424w, https://substackcdn.com/image/fetch/$s_!9X4f!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1ed1a818-a19d-4b0f-bfd0-7109a56f3256_2560x1440.jpeg 848w, https://substackcdn.com/image/fetch/$s_!9X4f!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1ed1a818-a19d-4b0f-bfd0-7109a56f3256_2560x1440.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!9X4f!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1ed1a818-a19d-4b0f-bfd0-7109a56f3256_2560x1440.jpeg 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Vera Rubin is an integrated platform: seven specialized chips, multiple rack&#8209;scale systems, a supercomputer, orchestration software, and a roadmap to the next platform, Feynman. If Blackwell was the engine, Rubin is the entire plant. You don&#8217;t just get more TFLOPs; you get an opinionated way to build and run agentic systems at scale.</p><p><strong>That framing matters if you&#8217;re an AI, ML, or data engineer.</strong></p><p>Instead of asking &#8220;How do I get access to H100s or B100s?&#8221;, the real question becomes &#8220;Where will my AI factory live, and what will it produce?&#8221; That&#8217;s a very different conversation about architecture, data, and economics.</p><div><hr></div><h2><strong>2. The trillion&#8209;dollar AI factory build&#8209;out</strong></h2><p>Mr. Huang also did something subtle but important: he didn&#8217;t talk about AI as a feature; <strong>he talked about AI as infrastructure</strong>. The combined order pipeline he referenced for Blackwell and Vera Rubin runs into the trillion&#8209;dollar range over the next few years. Whether you believe in the exact number or not, the signal is unmistakable.</p><p>We&#8217;re no longer in the &#8220;let&#8217;s try a model&#8221; phase. We&#8217;re in a multi&#8209;year build&#8209;out of AI plants in the same way we once built data centers, clouds, and mobile networks. That means:</p><ul><li><p>Inference economics become a first&#8209;class design constraint.</p></li><li><p>Token budgets will be as real as laptop or SaaS budgets.</p></li><li><p>Capacity planning for AI will look more like power and networking planning than like a one&#8209;off POC.</p></li></ul><p>If you&#8217;re building products, this is your wake&#8209;up call to treat AI like infrastructure, not a sprinkle of magic dust at the end of a roadmap.</p><div><hr></div><h2><strong>3. The &#8220;agentic moment&#8221; is now official</strong></h2><p>Another clear shift: NVIDIA is leaning hard into agentic systems.</p><p><strong>NemoClaw</strong> and its surrounding tooling were positioned as core to how enterprises will build with these new platforms. The pattern is no longer &#8220;one giant model behind an API.&#8221; </p><p>It&#8217;s:</p><ul><li><p>Tool&#8209;using agents orchestrating calls into models and services.</p></li><li><p>Multi&#8209;step workflows that reason, plan, and act.</p></li><li><p>Customization and fine&#8209;tuning on your own data, running on your own slice of an AI factory.</p></li></ul><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Mook!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8718f5b8-96a3-4f0d-ba47-ed8d3a1f0999_2560x1440.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Mook!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8718f5b8-96a3-4f0d-ba47-ed8d3a1f0999_2560x1440.jpeg 424w, https://substackcdn.com/image/fetch/$s_!Mook!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8718f5b8-96a3-4f0d-ba47-ed8d3a1f0999_2560x1440.jpeg 848w, https://substackcdn.com/image/fetch/$s_!Mook!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8718f5b8-96a3-4f0d-ba47-ed8d3a1f0999_2560x1440.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!Mook!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8718f5b8-96a3-4f0d-ba47-ed8d3a1f0999_2560x1440.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Mook!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8718f5b8-96a3-4f0d-ba47-ed8d3a1f0999_2560x1440.jpeg" width="1456" height="819" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/8718f5b8-96a3-4f0d-ba47-ed8d3a1f0999_2560x1440.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;GTC 2026 All Starts Here&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="GTC 2026 All Starts Here" title="GTC 2026 All Starts Here" srcset="https://substackcdn.com/image/fetch/$s_!Mook!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8718f5b8-96a3-4f0d-ba47-ed8d3a1f0999_2560x1440.jpeg 424w, https://substackcdn.com/image/fetch/$s_!Mook!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8718f5b8-96a3-4f0d-ba47-ed8d3a1f0999_2560x1440.jpeg 848w, https://substackcdn.com/image/fetch/$s_!Mook!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8718f5b8-96a3-4f0d-ba47-ed8d3a1f0999_2560x1440.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!Mook!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8718f5b8-96a3-4f0d-ba47-ed8d3a1f0999_2560x1440.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Practically, that means agent orchestration, evaluation, and safety move from hacker&#8209;weekend topics to board&#8209;level concerns. It also means AI and data teams who understand tools, context, and control flows will be disproportionately valuable.</p><div><hr></div><h2><strong>4. Hardware envy and the pace problem</strong></h2><p>There&#8217;s a less comfortable undercurrent to all of this: hardware obsolescence.</p><p>If you invested heavily in last year&#8217;s &#8220;AI factory,&#8221; GTC 2026 probably gave you a twinge of regret. <strong>Rubin&#8209;class systems move the goalposts again.</strong> Throughput, efficiency, network architecture - everything just jumped.</p><p>Most teams won&#8217;t be able to rip and replace every cycle. So the question becomes: how do you architect for <strong>optionality</strong>?</p><p>(By the way, this applies to any production-grade AI system architecture)</p><p>A few practical edges:</p><ul><li><p>Design around portable abstractions (containers, standard runtimes, open protocols), not vendor&#8209;specific stuff.</p></li><li><p>Separate concerns: data platform, model platform, agent layer. You want the freedom to swap pieces as the hardware evolves.</p></li><li><p>Focus on investments that survive GPU generations: data quality, evaluation, governance, and product integration.</p></li></ul><p>The platforms will keep getting better. </p><p>Your moat will be how quickly you can adapt your stack to whatever comes next.</p><div><hr></div><h2><strong>5. Data and the physical world reclaim the spotlight</strong></h2><p>One of my favorite subplots from this GTC is that structured data, simulation, and physical AI quietly stepped into the spotlight.</p><p>DLSS 5 and the new wave of neural rendering aren&#8217;t just about prettier video games. They&#8217;re about real&#8209;time, photorealistic, physics&#8209;aware environments you can use to train and validate agents. Combine that with better edge hardware and you get a serious push toward robots, industrial agents, and AI systems that interact with the messy real world.</p><p>Check this out:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://www.youtube.com/live/jw_o0xr8MWU?si=IGyBvZrZmWxr-AeT&amp;t=811" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!1jOM!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4c7bfa4d-6670-4550-a1ee-4ba331542c61_2898x1762.png 424w, https://substackcdn.com/image/fetch/$s_!1jOM!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4c7bfa4d-6670-4550-a1ee-4ba331542c61_2898x1762.png 848w, https://substackcdn.com/image/fetch/$s_!1jOM!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4c7bfa4d-6670-4550-a1ee-4ba331542c61_2898x1762.png 1272w, https://substackcdn.com/image/fetch/$s_!1jOM!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4c7bfa4d-6670-4550-a1ee-4ba331542c61_2898x1762.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!1jOM!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4c7bfa4d-6670-4550-a1ee-4ba331542c61_2898x1762.png" width="1456" height="885" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/4c7bfa4d-6670-4550-a1ee-4ba331542c61_2898x1762.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:885,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:2746707,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:&quot;https://www.youtube.com/live/jw_o0xr8MWU?si=IGyBvZrZmWxr-AeT&amp;t=811&quot;,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.agentbuild.ai/i/191660799?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4c7bfa4d-6670-4550-a1ee-4ba331542c61_2898x1762.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!1jOM!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4c7bfa4d-6670-4550-a1ee-4ba331542c61_2898x1762.png 424w, https://substackcdn.com/image/fetch/$s_!1jOM!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4c7bfa4d-6670-4550-a1ee-4ba331542c61_2898x1762.png 848w, https://substackcdn.com/image/fetch/$s_!1jOM!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4c7bfa4d-6670-4550-a1ee-4ba331542c61_2898x1762.png 1272w, https://substackcdn.com/image/fetch/$s_!1jOM!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4c7bfa4d-6670-4550-a1ee-4ba331542c61_2898x1762.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>For data people, the implication is simple: <strong>tables, events, and logs are still the fuel</strong>. For AI engineers, simulations and digital twins are becoming as important as datasets. For product teams, the bar for &#8220;realistic&#8221; behavior in AI&#8209;powered experiences just went up.</p><div><hr></div><h2><strong>Why this GTC matters for you</strong></h2><p>If you strip away the marketing, GTC 2026 is telling builders three things:</p><ol><li><p>The unit of competition is shifting from model to factory.</p></li><li><p>Agentic systems will be the default pattern for serious AI products.</p></li><li><p>The compounding advantage still comes from data, evaluation, and integration - not just chips.</p></li></ol><p>If you&#8217;re in AI, ML, or data, your edge will come from how fast you can align your architecture, practices, and skills with that reality.</p><div><hr></div><h2><strong>What you can actually do next (without a Rubin cluster)</strong></h2><p>Most of us are not spinning up NVIDIA Vera Rubin systems next quarter. The realistic move is to upgrade how you think, learn, and design.</p><p><strong>Here are four learning objectives you can pursue right now:</strong></p><ol><li><p><strong>Think in &#8220;AI factories,&#8221; not just models</strong><br>Map your current stack - data collection, feature engineering, model training, deployment, monitoring - against the AI factory idea. Where is data still manual? Where is evaluation an afterthought? Where are agents bolted on instead of designed in from the start?</p></li><li><p><strong>Get comfortable with inference and token economics</strong><br>Even if you&#8217;re using cheap or free models, start tracking tokens&#8209;per&#8209;feature and cost&#8209;per&#8209;request. A simple spreadsheet or dashboard that shows &#8220;this feature costs X per 1,000 users&#8221; will change how you design prompts, choose models, and argue for optimizations.</p></li><li><p><strong>Practice building small, robust agentic flows</strong><br>Use open&#8209;source frameworks or your favorite LLM stack to wire up basic agents: retrieval + tool calling + simple planning. Focus less on exotic models and more on reliability, evaluation, and clear boundaries for what the agent should and shouldn&#8217;t do.</p></li><li><p><strong>Re&#8209;center your work on data, evaluation, and simulation</strong><br>Treat your tables, logs, and events as the core asset, not an afterthought. Experiment with offline evaluation harnesses. If your domain touches the physical world, explore simple simulation or synthetic scenarios - even if you&#8217;re not using Omniverse&#8209;grade tools yet.&#8203;</p></li></ol><div><hr></div><p>This is too long already folks. I will stop here.</p><p>Thank you for reading, and please leave your comments, feedback. Get in touch, tell me what you are leanring and what you would like to know more of.</p><div><hr></div><p><em>P.S. If you&#8217;re new here - <strong>welcome</strong> &#127881;. AgentBuild is a community of practitioners working through the real challenges of getting AI into production inside large organisations. Every week I share practical, grounded thinking from the people doing this work at the sharp end. The goal is never theory - it&#8217;s always: what can you use Monday morning.</em></p><div><hr></div><p><em><strong>Ask your friends to join.</strong><br>More valuable content coming your way.</em></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://newsletter.agentbuild.ai/?utm_source=substack&amp;utm_medium=email&amp;utm_content=share&amp;action=share&quot;,&quot;text&quot;:&quot;Share agentbuild.ai&quot;,&quot;action&quot;:null,&quot;class&quot;:&quot;button-wrapper&quot;}" data-component-name="ButtonCreateButton"><a class="button primary button-wrapper" href="https://newsletter.agentbuild.ai/?utm_source=substack&amp;utm_medium=email&amp;utm_content=share&amp;action=share"><span>Share agentbuild.ai</span></a></p><div><hr></div><p>Thanks for reading agentbuild.ai! Subscribe for free to receive new posts and support my work.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://newsletter.agentbuild.ai/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading agentbuild.ai! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[How Do You Test AI - Practical Talk on AI Evaluation Approaches]]></title><description><![CDATA[What Hamel Husain taught me about why most enterprise AI systems fail - and the one habit that fixes it. Don't miss it.]]></description><link>https://newsletter.agentbuild.ai/p/how-do-you-test-ai-practical-talk</link><guid isPermaLink="false">https://newsletter.agentbuild.ai/p/how-do-you-test-ai-practical-talk</guid><pubDate>Sat, 14 Mar 2026 14:31:07 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/1bba58bf-642c-4eaf-8200-d48de101715b_1280x720.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Hello everyone,</p><p>Last week, I sat down with <strong><a href="https://hamel.dev/">Hamel Husain</a></strong> for the AgentBuild podcast. Hamel is one of the most influential voices on AI evaluation in the industry right now - the kind of person that other experts quote when they&#8217;re trying to explain something difficult. The conversation was one of the most practically useful I&#8217;ve had this year, and I want to share the best of it with you.</p><div><hr></div><h3><strong>Who is Hamel Husain?</strong></h3><p>Hamel is a machine learning engineer with over 20 years of experience. He&#8217;s worked at Airbnb and GitHub - where his early LLM research contributed to what eventually became GitHub Copilot. He has led and contributed to popular open-source ML tools, and today he&#8217;s an independent consultant who has helped more than 35 organisations build real-world AI products that actually perform in production.</p><p>He co-teaches <strong><a href="https://maven.com/parlance-labs/evals?promoCode=google-ads&amp;utm_source=google&amp;utm_medium=cpc&amp;utm_campaign=search&amp;utm_source=google&amp;utm_medium=cpc&amp;utm_campaign=23489291219&amp;gad_source=1&amp;gad_campaignid=23489291219&amp;gbraid=0AAAAA_BlNfkr85s_R0pUiCwijzmV4k-D-&amp;gclid=CjwKCAjwjtTNBhB0EiwAuswYhuwNCH_0DDLATQvuNTZb94TCKYfvkg6d1jiH_x7tKwuXYhFDYWMhKRoC5d4QAvD_BwE">AI Evals for Engineers and PMs</a></strong> on Maven, a course with over 3,000 students from 500+ companies - including teams at OpenAI, Anthropic, and Google. He also writes one of the most substantive technical blogs in the AI space at hamel.dev, and is co-authoring an O&#8217;Reilly book on the subject - <a href="https://www.oreilly.com/library/view/evals-for-ai/9798341660717/">Evals for AI Engineers.</a></p><p>He is, in short, the person you call when your AI system&#8217;s quality is a mystery.</p><div><hr></div><h3><strong>The biggest misconception about AI evaluation</strong></h3><p>I asked Hamel what the most common mistake is when enterprises approach evaluation. He didn&#8217;t hesitate.</p><p>&#8220;The biggest misconception is that evaluation is as easy as going to a vendor that will give you off-the-shelf metrics in a dashboard. You plug it in, and poof - you&#8217;ve done eval. You&#8217;ve checked the box.&#8221;</p><p>He&#8217;s seen this play out more times than he can count. The team wires up a platform. They get a dashboard with coherence scores, faithfulness scores, toxicity scores. Everyone feels good for the first week. And then, slowly, people start to realise: those numbers don&#8217;t mean anything. No one knows what they&#8217;re measuring. They can&#8217;t tell if the product is getting better or worse.</p><p>Generic metrics don&#8217;t correlate to what matters for your specific application. A hallucination score doesn&#8217;t tell you if your legal AI is giving dangerous advice. A coherence score doesn&#8217;t tell you if your scheduling assistant is actually booking the right slots.</p><p>As Hamel puts it: <em>&#8220;When I see a company showing me only generic metrics, I already know they&#8217;re in trouble. There&#8217;s a direct correlation between generic dashboards and teams that feel lost.&#8221;</em></p><div><hr></div><h3><strong>Foundation model evals vs. product evals</strong></h3><p>There&#8217;s a second source of confusion worth naming. When people hear the word &#8216;evaluation&#8217;, many think of the benchmarks that model providers publish - things like MMLU, SWE-Bench, HumanEval. <strong>These are foundation model evals.</strong> They measure the general capabilities of a model at large. They have almost nothing to do with how well your AI product performs for its specific purpose.</p><blockquote><p>Hamel&#8217;s analogy is the one I&#8217;ll keep using: Foundation model evals are like a standardised test score. Product evals are like job performance. Your SAT score tells you very little about whether you&#8217;ll be a good engineer. The gap between the two can be enormous.</p></blockquote><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!_vEl!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1ef8b032-8663-487f-b49c-81316914fe45_2716x1536.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!_vEl!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1ef8b032-8663-487f-b49c-81316914fe45_2716x1536.png 424w, https://substackcdn.com/image/fetch/$s_!_vEl!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1ef8b032-8663-487f-b49c-81316914fe45_2716x1536.png 848w, https://substackcdn.com/image/fetch/$s_!_vEl!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1ef8b032-8663-487f-b49c-81316914fe45_2716x1536.png 1272w, https://substackcdn.com/image/fetch/$s_!_vEl!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1ef8b032-8663-487f-b49c-81316914fe45_2716x1536.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!_vEl!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1ef8b032-8663-487f-b49c-81316914fe45_2716x1536.png" width="1456" height="823" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/1ef8b032-8663-487f-b49c-81316914fe45_2716x1536.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:823,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:7557809,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.agentbuild.ai/i/190921146?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1ef8b032-8663-487f-b49c-81316914fe45_2716x1536.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!_vEl!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1ef8b032-8663-487f-b49c-81316914fe45_2716x1536.png 424w, https://substackcdn.com/image/fetch/$s_!_vEl!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1ef8b032-8663-487f-b49c-81316914fe45_2716x1536.png 848w, https://substackcdn.com/image/fetch/$s_!_vEl!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1ef8b032-8663-487f-b49c-81316914fe45_2716x1536.png 1272w, https://substackcdn.com/image/fetch/$s_!_vEl!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1ef8b032-8663-487f-b49c-81316914fe45_2716x1536.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><strong>What matters for the enterprise is the second kind. </strong>Not &#8216;how capable is GPT-5 generally?&#8217; but &#8216;is our claims-processing assistant handling edge cases correctly, and can we measure that consistently?&#8217;</p><div><hr></div><h3><strong>Why traditional QA isn&#8217;t enough</strong></h3><p>One of the most common objections I hear from customers: &#8220;We already have a QA team. Why can&#8217;t they just test the AI?&#8221;</p><p>Hamel&#8217;s answer is direct: <strong>treating AI evaluation like traditional software testing is a fundamental mistake.</strong></p><p>The reason is determinism. Traditional software has deterministic outputs. You write a unit test: given input X, expect output Y. Pass or fail. With AI, the output is stochastic by design. The system is explicitly built to produce varied responses. <strong>You can&#8217;t write a unit test for a stochastic system the same way.</strong></p><p>What you need instead is something that already exists - but that most AI teams have left behind: <strong>data science thinking.</strong></p><p>Data scientists have been measuring stochastic systems for decades. They know how to sample, analyse, spot patterns, and design experiments that account for variability. That entire discipline is exactly what&#8217;s needed for AI evaluation. We just need to adapt it slightly for LLMs.</p><p>Hamel summarises it simply: <strong>&#8220;Evals are essentially data science for AI.&#8221;</strong></p><div><hr></div><h3><strong>The eval loop - how it actually works</strong></h3><p>Before I go further, let me show you the structure of a proper evaluation loop. This is what we explored together on the podcast earlier this week:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!DEvM!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fec7c4fa9-646f-40da-b9a9-3c4e196378d3_2716x1536.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!DEvM!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fec7c4fa9-646f-40da-b9a9-3c4e196378d3_2716x1536.png 424w, https://substackcdn.com/image/fetch/$s_!DEvM!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fec7c4fa9-646f-40da-b9a9-3c4e196378d3_2716x1536.png 848w, https://substackcdn.com/image/fetch/$s_!DEvM!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fec7c4fa9-646f-40da-b9a9-3c4e196378d3_2716x1536.png 1272w, https://substackcdn.com/image/fetch/$s_!DEvM!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fec7c4fa9-646f-40da-b9a9-3c4e196378d3_2716x1536.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!DEvM!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fec7c4fa9-646f-40da-b9a9-3c4e196378d3_2716x1536.png" width="1456" height="823" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/ec7c4fa9-646f-40da-b9a9-3c4e196378d3_2716x1536.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:823,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:6659933,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.agentbuild.ai/i/190921146?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fec7c4fa9-646f-40da-b9a9-3c4e196378d3_2716x1536.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!DEvM!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fec7c4fa9-646f-40da-b9a9-3c4e196378d3_2716x1536.png 424w, https://substackcdn.com/image/fetch/$s_!DEvM!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fec7c4fa9-646f-40da-b9a9-3c4e196378d3_2716x1536.png 848w, https://substackcdn.com/image/fetch/$s_!DEvM!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fec7c4fa9-646f-40da-b9a9-3c4e196378d3_2716x1536.png 1272w, https://substackcdn.com/image/fetch/$s_!DEvM!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fec7c4fa9-646f-40da-b9a9-3c4e196378d3_2716x1536.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The loop is simple in principle. You feed test inputs into your model. You compare what it produces to what you wanted. You score the gap. And you use that signal to decide what to change - the prompt, the model, the data, or the test cases themselves.</p><p>The hard part is what lives inside the &#8220;evaluator&#8221; box. There are four types of evaluators, each with different trade-offs:</p><ul><li><p><strong>Exact match</strong> - Deterministic scoring. The output must match the expected answer. Fast and cheap, but brittle. Works well for classification, SQL, and structured outputs. Falls apart when the answer is right but worded differently.</p></li><li><p><strong>Heuristic</strong> - Rule-based checks. Regex patterns, keyword presence, schema validation, length constraints. Good for catching structural failures. Can&#8217;t evaluate meaning.</p></li><li><p><strong>Human review</strong> - Real people read outputs and rate them. The highest nuance. Also the slowest and most expensive. Essential for calibrating everything else, but doesn&#8217;t scale to thousands of daily outputs.</p></li><li><p><strong>LLM-as-judge</strong> - A second AI model evaluates the output against a rubric. Scales well, handles open-ended responses, captures nuance that heuristics miss. But it inherits the judge model&#8217;s biases and blind spots. Requires calibration against human labels.</p></li></ul><blockquote><p>In practice, mature teams use all four. Exact match and heuristics form the fast, cheap baseline. LLM-as-judge handles scale on open-ended outputs. Human review calibrates the judge periodically.</p></blockquote><div><hr></div><h3><strong>The most powerful habit in AI development</strong></h3><p>Here&#8217;s where Hamel said something that sounds counterintuitive - and that I think is the single most important insight from our conversation.</p><p>The highest-value activity you can do when building an AI product is to <strong>sit down and look at your data.</strong></p><p>Just open your traces, read actual outputs, and write down what you see.</p><p>When most people hear this, something in them resists. &#8220;In the age of AI, you&#8217;re telling me to open a spreadsheet and read individual data points? That can&#8217;t scale.&#8221;</p><p>But Hamel has done this with more than 50 companies. Every single time, people discover it&#8217;s not just useful - it&#8217;s transformative. They find unexpected failure modes. They identify bugs they didn&#8217;t know existed. They develop intuition that no automated system would have surfaced.</p><p>And there&#8217;s a second reason it matters: looking at data is how you elicit your own requirements. This is what Hamel calls <strong>criteria drift</strong>. You can write a specification about what a good product looks like. But it&#8217;s only when you see real user interactions that you understand what &#8220;good&#8221; actually means for your context. The process of reading real outputs and writing down what you observe is the process of transferring your taste to the system.</p><blockquote><p><strong>His practical starting point: </strong>aim to read at least 100 traces. 100 is not a magic number, it&#8217;s a concrete goal that gets people started. Keep reading until you reach what he calls <strong>theoretical saturation</strong> - the point where new traces aren&#8217;t revealing new failure patterns. In his experience, people rarely want to stop.</p></blockquote><div><hr></div><h3><strong>Who should own evaluation - and how to champion it</strong></h3><p>The other question I pressed Hamel on was organisational. In a large enterprise with 50 AI use cases in the pipeline, who owns this? Is there a central eval function? Does it sit with a team or a role?</p><p>He&#8217;s a strong advocate for <strong>bottom-up adoption</strong>, not centralised mandates. Top-down approaches - &#8220;our platform will standardise eval across the organisation&#8221; - tend to produce checkbox compliance. Teams grudgingly report metrics nobody understands.</p><blockquote><p><strong>What actually works: </strong>start needs-based. One team, one product, one clear problem they&#8217;re trying to debug. Embed the evaluation practice into the building process - not as a separate audit step, but as part of how they iterate.</p></blockquote><p>Evaluation must be owned by <strong>domain experts</strong>, not outsourced to a QA or engineering team. If it&#8217;s a legal assistant, a lawyer needs to be in the loop. If it&#8217;s a clinical tool, a clinician. The domain expert is the only person with the taste and judgment to say whether an output is genuinely good. The engineering team can build the infrastructure, but the ground truth comes from the domain.</p><p><strong>As for championing it internally? Hamel&#8217;s advice is the same I&#8217;d give for any new practice: don&#8217;t sell the methodology. Sell the results.</strong></p><blockquote><p>Don&#8217;t walk into a meeting saying &#8220;we need to do evals.&#8221; Walk in with findings. Show the error rate you found. Show the specific failure pattern you fixed. Show how you caught a regression before it went to production. Once people see that you consistently know more about what the product is actually doing than anyone else in the room, they&#8217;ll ask how.</p></blockquote><div><hr></div><h3><strong>What this means for your enterprise AI programme</strong></h3><p>If I were to distil everything Hamel shared into the things you can actually act on this week, here&#8217;s how I&#8217;d frame it:</p><ol><li><p><strong>Resist the pull of generic dashboards.</strong> If your only eval metrics are off-the-shelf coherence and faithfulness scores, you&#8217;re not evaluating - you&#8217;re performing evaluation. The metrics that matter are the ones you derive from looking at your own system&#8217;s failures.</p></li><li><p><strong>Spend 30 minutes this week reading traces</strong> from one of your AI systems. Don&#8217;t automate it yet. Just read. Write down what you notice. You&#8217;ll find things no algorithm would have flagged.</p></li><li><p><strong>Identify your domain expert.</strong> For every AI use case you&#8217;re building, there should be a person who has the authority and the proximity to say whether an output is good. That person needs to be in the loop on evaluation, not just the engineering team.</p></li><li><p>When you want to bring others along, <strong>don&#8217;t present a methodology deck.</strong> Present results. Show the before and after. Lead with what changed for the user or the business, and let the process speak for itself.</p></li></ol><p>If you haven&#8217;t listened to my conversation with Hamel yet, I&#8217;d encourage you to. It&#8217;s one of the best discussions I&#8217;ve had on what it actually takes to build AI systems that hold up in production - not just in the demo.</p><div id="youtube2-R63pasY6LJM" class="youtube-wrap" data-attrs="{&quot;videoId&quot;:&quot;R63pasY6LJM&quot;,&quot;startTime&quot;:&quot;279s&quot;,&quot;endTime&quot;:null}" data-component-name="Youtube2ToDOM"><div class="youtube-inner"><iframe src="https://www.youtube-nocookie.com/embed/R63pasY6LJM?start=279s&amp;rel=0&amp;autoplay=0&amp;showinfo=0&amp;enablejsapi=0" frameborder="0" loading="lazy" gesture="media" allow="autoplay; fullscreen" allowautoplay="true" allowfullscreen="true" width="728" height="409"></iframe></div></div><p></p><p>I&#8217;d love to know what resonated. Reply to this email or comment here - looking forward.</p><p>Talk soon,<br><strong>Sandi</strong></p><div><hr></div><p><em>P.S. If you&#8217;re new here - <strong>welcome</strong> &#127881;. AgentBuild is a community of practitioners working through the real challenges of getting AI into production inside large organisations. Every week I share practical, grounded thinking from the people doing this work at the sharp end. The goal is never theory - it&#8217;s always: what can you use Monday morning.</em></p><div><hr></div><p><em><strong>Ask your friends to join.</strong><br>More valuable content coming your way.</em></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://newsletter.agentbuild.ai/?utm_source=substack&amp;utm_medium=email&amp;utm_content=share&amp;action=share&quot;,&quot;text&quot;:&quot;Share agentbuild.ai&quot;,&quot;action&quot;:null,&quot;class&quot;:&quot;button-wrapper&quot;}" data-component-name="ButtonCreateButton"><a class="button primary button-wrapper" href="https://newsletter.agentbuild.ai/?utm_source=substack&amp;utm_medium=email&amp;utm_content=share&amp;action=share"><span>Share agentbuild.ai</span></a></p><div><hr></div><p>Thanks for reading agentbuild.ai! Subscribe for free to receive new posts and support my work.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://newsletter.agentbuild.ai/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading agentbuild.ai! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[Big Companies Are About to Test Whether Their Employees Can Think Without AI. Here’s Why That Should Matter to You. ]]></title><description><![CDATA[Gartner just predicted that half of big companies will start testing if employees can think without AI. I've been watching this happen in real meetings. Here's what it means for you.]]></description><link>https://newsletter.agentbuild.ai/p/big-companies-are-about-to-test-whether</link><guid isPermaLink="false">https://newsletter.agentbuild.ai/p/big-companies-are-about-to-test-whether</guid><pubDate>Sun, 08 Mar 2026 13:15:26 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!OIBg!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3453368a-da00-4960-b174-e3313b941314_256x256.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p><strong>Happy International Women&#8217;s Day.</strong></p><p>The future of AI is going to be shaped by the people who build it, question it, and decide how it gets used. Right now, women in tech are doing all three - often with less recognition than they deserve.</p><p>The field needs more of their voices, not fewer. If you know a woman in tech who is doing great work, <strong>today is a good day to tell her.</strong></p><div><hr></div><p>Now - to this week&#8217;s newsletter.</p><p><br>Gartner recently predicted something that made me think hard.</p><p>By 2026, they say, roughly half of large organisations will introduce what they&#8217;re calling <em>&#8220;AI-free skills assessments.&#8221;</em></p><blockquote><p>In plain English: companies are going to start formally testing whether their employees can still think, write, and solve problems without any AI help at all.</p></blockquote><p><em>Not instead of AI skills. In addition to them.</em></p><p>When I first read that, I thought it was a bit extreme. But the more I thought about it, and the more I thought about what I see inside big organisations every day, the more I think they&#8217;re onto something real.</p><p>Last month I was in a meeting with a team trying to solve a difficult problem.</p><p>Someone suggested asking ChatGPT. So they did. The AI gave them a confident, well-structured answer. Everyone nodded and moved on.</p><p>I asked one of them afterwards: &#8220;Do you actually think that was the right answer?&#8221;</p><p>She paused. &#8220;Honestly? I don&#8217;t know. It sounded right.&#8221;</p><p>That&#8217;s the problem Gartner is trying to name. Not that AI gives bad answers. But that we&#8217;re losing the ability to tell <em>whether</em> the answer is good - because we&#8217;ve stopped forming our own opinion first.</p><p>It&#8217;s like becoming so reliant on GPS that you no longer have any sense of direction yourself. Fine, until the signal drops.</p><div><hr></div><h3>Here&#8217;s why this matters specifically if you&#8217;re learning AI right now.</h3><p>You&#8217;re entering a world where everyone will have access to the same AI tools. The tools are getting cheaper and easier every month. </p><blockquote><p>In two years, using AI competently won&#8217;t be a skill that sets you apart - it&#8217;ll just be the baseline.</p></blockquote><p>What <em>will</em> set you apart is the judgment to know when the AI is wrong. The ability to ask a better question. The confidence to push back on an answer that sounds plausible but isn&#8217;t quite right.</p><p>Those things only come from practising thinking for yourself. And that&#8217;s the muscle that quietly atrophies when every task starts with &#8220;let me ask AI first.&#8221;</p><p><em>The people who will get the most out of AI are the ones who bring their own thinking to it - not the ones who outsource their thinking to it.</em></p><div><hr></div><h3>One small habit worth building now:</h3><p>Before you open any AI tool for a problem, spend five minutes writing down what you actually think. Not a perfect answer. Just your honest first attempt - in your own words, your own logic.</p><p>Then bring in the AI. Compare. Push back where something feels off.</p><p>You&#8217;ll get dramatically better results from the tool. And you&#8217;ll keep the judgment sharp that makes those results mean something.</p><blockquote><p>The companies Gartner is talking about will be testing for exactly that judgment. The good news is it&#8217;s not hard to build - it just has to be intentional.</p></blockquote><p>I&#8217;m curious whether this resonates. Have you noticed yourself reaching for AI before you&#8217;ve really thought something through? </p><p>Hit reply - I read every response and it genuinely shapes what I write next.</p><p>Talk soon,<br><strong>Sandi</strong></p><div><hr></div><p><em><strong>Ask your friends to join.</strong><br>More valuable content coming your way.</em></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://newsletter.agentbuild.ai/?utm_source=substack&amp;utm_medium=email&amp;utm_content=share&amp;action=share&quot;,&quot;text&quot;:&quot;Share agentbuild.ai&quot;,&quot;action&quot;:null,&quot;class&quot;:&quot;button-wrapper&quot;}" data-component-name="ButtonCreateButton"><a class="button primary button-wrapper" href="https://newsletter.agentbuild.ai/?utm_source=substack&amp;utm_medium=email&amp;utm_content=share&amp;action=share"><span>Share agentbuild.ai</span></a></p><div><hr></div><p>Thanks for reading agentbuild.ai! Subscribe for free to receive new posts and support my work.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://newsletter.agentbuild.ai/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading agentbuild.ai! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[The 7-Step Playbook for Turning Any Business Process Agentic]]></title><description><![CDATA[Save this playbook. A framework to help you power your business processes with AI agents. Which step are most enterprise teams skipping? Step 2. And it's why their agents fail in production.]]></description><link>https://newsletter.agentbuild.ai/p/the-7-step-playbook-for-turning-any</link><guid isPermaLink="false">https://newsletter.agentbuild.ai/p/the-7-step-playbook-for-turning-any</guid><pubDate>Mon, 02 Mar 2026 02:00:17 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/42697e91-4bf5-4d6e-8406-0a5aa47986c4_1280x720.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Something happens in almost every meeting I&#8217;m in these days.</p><p>Someone opens a slide, or just starts talking, and within two minutes we&#8217;re deep into a conversation about models. Vendors. Which LLM is better for this use case. Whether to go with one orchestration framework or another. </p><p>Features. Availability. Pricing tiers. Token limits.</p><p>And I sit there. Listening. Waiting.</p><p>Then I ask something like: &#8220;What does success look like for this?&#8221; </p><p>Or: &#8220;How will you know in six months if this worked?&#8221;</p><p>The room usually goes a bit quiet. Sometimes people look at each other. Sometimes someone gives a vague answer about &#8220;efficiency&#8221; or &#8220;reducing manual effort.&#8221; And then, almost without fail, the tools conversation resumes.</p><p>I&#8217;ve stopped being surprised by this. But I haven&#8217;t stopped being bothered by it. Because the tool conversation feels productive. It has energy. People have opinions. There&#8217;s something to debate. Meanwhile the question of what you&#8217;re actually trying to achieve - measured, specifically, in a way you could verify, just sits there unanswered.</p><p>And then teams wonder why their agentic systems don&#8217;t survive contact with production.</p><p><strong>This is the playbook I wish more of those meetings started with.</strong></p><p></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://newsletter.agentbuild.ai/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://newsletter.agentbuild.ai/subscribe?"><span>Subscribe now</span></a></p><div><hr></div><h3>First question: do you actually need multi-agent?</h3><p>I ask this because nobody else does. The assumption in most rooms is that &#8220;agentic&#8221; means multiple agents. </p><p>Sometimes that&#8217;s right. If decisions in your process genuinely can&#8217;t coexist - different data access, different authority, different latency requirements, then splitting makes sense. </p><p>However, multi-agent systems are just distributed systems. And distributed systems are hard. When something breaks, you&#8217;re chasing a failure across boundaries, through handoffs, through tool calls you can&#8217;t always replay. That complexity doesn&#8217;t disappear.</p><p>Start single agent. Let the constraints of the actual process push you toward multi-agent if they need to.</p><p><em><strong>Most of the time, the process doesn&#8217;t need it. The team just wanted to build it.</strong></em></p><div><hr></div><h3>Where does the AI go, and where does the human stay?</h3><p>The most common mistake I see: humans get put at the end. Final review. Rubber stamp before anything goes out. It feels like a safety net.</p><p>A human reviewing an output they didn&#8217;t generate, without the context that produced it, at the end of a chain they can&#8217;t fully see - that&#8217;s not oversight. That&#8217;s decoration.</p><blockquote><p>AI belongs where errors are recoverable. Humans stay where they&#8217;re not. </p></blockquote><p>A compliance violation, an irreversible action, something you&#8217;d find out about through an audit three weeks later - those stay human until the system has earned the right to handle them.</p><p>You don&#8217;t decide in a design session that the AI can handle something. You prove it. Slowly. With data. </p><blockquote><p>The human doesn&#8217;t leave the loop because the architecture says so. They leave because the evaluation says it&#8217;s safe.</p></blockquote><div><hr></div><h3>Does the whole process need to go agentic?</h3><p>Almost certainly not. But the pressure to say yes is enormous right now.</p><p>The ROI case always gets built for the full process. End-to-end automation, scale infinitely. That&#8217;s how it gets approved. And then reality shows up - the data isn&#8217;t ready, the decisions aren&#8217;t defined clearly enough, and nobody can tell whether any of it is working.</p><blockquote><p><strong>What I&#8217;ve seen to work: </strong>find the one or two decisions where human time is most expensive or delay is most painful. Start there. Leave the rest human for now.</p></blockquote><p>The process you want to automate probably wasn&#8217;t that well-designed to begin with. AI will find every shortcut, every undocumented exception, every <em>&#8220;we just know&#8221;</em> that your team built into it over the years. Automating the whole thing at once means hitting all of that simultaneously. </p><p>Pick one decision. Define it properly. Prove it works. Then move.</p><div><hr></div><h3>The 7-step playbook</h3><p>This is the <strong><a href="https://youtu.be/3nYoe4mcJCw">Reverse Strategy Framework</a> </strong>applied to process agentification. </p><p>The order matters. </p><p>Each step is a gate - if you can&#8217;t pass it, you&#8217;re not ready for the next one.</p><p></p><h4><strong>1: Map the decisions, not the steps.</strong></h4><p>Get the people who actually run the process in a room. Have them document the judgment calls. At every point where a human exercises discretion: what are they looking at, what makes them go one way versus another, what would a wrong call look like, and how quickly would you know?</p><blockquote><p>Ask: if you gave two experienced people the same input, would they make the same call? If they regularly don&#8217;t - you don&#8217;t have a process you can automate. </p></blockquote><p>You have a process you need to design first. You can&#8217;t build an agent to make a decision the organisation hasn&#8217;t agreed on.</p><p></p><h4>2: Define what &#8216;good&#8217; looks like for each decision.</h4><p>For each decision node you&#8217;re considering, map what precision do you need, what&#8217;s an acceptable error rate, what&#8217;s the cost difference between a false positive and a false negative? <em><strong>Quantify with numbers.</strong></em></p><p>These aren&#8217;t metrics you figure out after you build. They&#8217;re the thing that tells you whether the system is working at all. Most AI projects fail because nobody defined what capable meant. Do it here, before anything else.</p><p></p><h4>3: Check your data readiness, decision by decision.</h4><p>Agents make inferences from data. For each decision node you want to automate, check whether the data exist, can an agent access it in real-time, and is it structured in a way the agent can reliably use?</p><p>Most enterprise processes run on data designed for humans. PDFs. Exports. Systems that require a login and three clicks. Context that only exists because someone&#8217;s been in the role long enough to know where to look.</p><blockquote><p>Check five things per node: how accessible the data is, whether the schema is clean and consistent, whether there&#8217;s enough metadata for the agent to interpret what it&#8217;s looking at, how errors and edge cases are handled, and whether there&#8217;s any observability into what the data&#8217;s doing. </p></blockquote><p>If a node&#8217;s data isn&#8217;t ready, build the data layer first. A good model on broken data is still broken.</p><p></p><h4>4: Assign each decision - AI, Human, or Hybrid.</h4><p>Using what you&#8217;ve defined in Steps 2 and 3: AI handles high-volume, well-defined decisions where errors are recoverable. Humans handle decisions where a wrong call is asymmetric and non-recoverable. Hybrid - <strong>AI proposes, human confirms</strong> - is for the middle ground, where you think it&#8217;s probably automatable but don&#8217;t yet have the data to prove it.</p><p>Write this down. It becomes the architecture contract. If someone later asks why the agent doesn&#8217;t handle a particular decision, the answer is already there.</p><p></p><h4>5: Decide single vs. multi-agent.</h4><p>You now have the decision map. Look at it. Are there nodes that genuinely require incompatible contexts - different data access, different authority, reasoning chains that need to be isolated from each other? Those are your split points. If not, stay single.</p><p>I have seen many teams start with this conversation. It actually belongs here, in Step 5, with real information in front of you.</p><p></p><h4>6: Build the evaluation before you build the agent.</h4><p>I know this feels backwards. Build the thing first, then measure it - that&#8217;s the instinct. Don&#8217;t do it. Please.</p><p>Before you write a line of agent code, collect 200+ real examples from the process. Actual inputs, paired with what a good human would have decided on each one. Then define how you&#8217;ll score whether the agent&#8217;s call matches that standard.</p><p>This forces a useful confrontation: can you actually define &#8220;correct&#8221; before the AI has to? Sometimes you can&#8217;t. That&#8217;s valuable to discover now rather than six months into production.</p><blockquote><p>Evaluation isn&#8217;t a final checkbox. It&#8217;s the architecture that keeps the whole thing alive.</p></blockquote><p>If you can&#8217;t build a golden dataset for a decision node, that node isn&#8217;t ready. That&#8217;s not a failure. That&#8217;s the process working.</p><p></p><h4>Step 7: Shadow mode first. Then cut over.</h4><p>Run the agent on live inputs in parallel with the humans. Humans keep making the real decisions. You compare outputs - systematically, against your golden dataset and against the human calls on the same inputs.</p><p>The edge cases that didn&#8217;t show up in your test set will show up here. They always do. Shadow mode is where you find them safely, without consequences, while building the evidence base that earns trust.</p><p>Cut over when the error rate threshold is met. Not when the launch date arrives.</p><div><hr></div><h3>The question that tells you if you&#8217;re ready</h3><p>Before any of this starts, ask yourself one thing:</p><blockquote><p>If you ran the human process and the agentic process side by side on the same inputs for 90 days, would you have the data to prove the agent is performing at least as well?</p></blockquote><p>If yes - you&#8217;ve defined success, you have evaluation infrastructure, you&#8217;re ready.</p><p>If no - something foundational is missing. You haven&#8217;t defined success clearly enough, the data can&#8217;t support measurement, or you don&#8217;t have a golden dataset to compare against. That&#8217;s Evaluation Debt. </p><p>The process you want to turn agentic probably has a version that can work. Whether you get there depends on whether you&#8217;re willing to answer the hard questions before the tools conversation starts.</p><p>Most meetings I&#8217;m in, we never get there.</p><p>If you&#8217;re in the middle of this - mapping a process, arguing about scope, trying to figure out where the human stays in the loop - hit reply. Tell me where it&#8217;s stuck. I find these problems genuinely interesting could share my insights.</p><p><br>If you enjoyed reading this, please share with your friends. Leave a feedback, and tell me what you would like to read more of.</p><p>Thanks,<br>Sandi.</p><div><hr></div><p><em><strong>Ask your friends to join.</strong><br>More valuable content coming your way.</em></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://newsletter.agentbuild.ai/?utm_source=substack&amp;utm_medium=email&amp;utm_content=share&amp;action=share&quot;,&quot;text&quot;:&quot;Share agentbuild.ai&quot;,&quot;action&quot;:null,&quot;class&quot;:&quot;button-wrapper&quot;}" data-component-name="ButtonCreateButton"><a class="button primary button-wrapper" href="https://newsletter.agentbuild.ai/?utm_source=substack&amp;utm_medium=email&amp;utm_content=share&amp;action=share"><span>Share agentbuild.ai</span></a></p><div><hr></div><p>Thanks for reading agentbuild.ai! Subscribe for free to receive new posts and support my work.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://newsletter.agentbuild.ai/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading agentbuild.ai! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[Why the best developers are writing less code than ever]]></title><description><![CDATA[Lena Hall, Sr. Director of Developer Relations at Akamai on why the most valuable developers in the AI era are the ones who've stopped typing.]]></description><link>https://newsletter.agentbuild.ai/p/why-the-best-developers-are-writing</link><guid isPermaLink="false">https://newsletter.agentbuild.ai/p/why-the-best-developers-are-writing</guid><pubDate>Sun, 22 Feb 2026 21:52:10 GMT</pubDate><enclosure url="https://substackcdn.com/image/youtube/w_728,c_limit/lDn0-_Ed53Q" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p><em>A conversation with Lena Hall, Sr. Director of Developer Relations at Akamai, ex-AWS, ex-Microsoft Research.</em></p><p>Something quietly shifted in the last 12 months.</p><p>The senior engineers moving fastest right now - the principal engineers, the architects - are spending <strong>80% of their time writing specs, not code.</strong> They&#8217;re defining inputs and outputs, mapping component contracts, eliminating ambiguity before an agent ever touches the implementation.</p><p>If that makes you uncomfortable, it should. It means the game has changed - and what made you good yesterday may not be what makes you valuable tomorrow.</p><p>I sat down with Lena Hall to talk about this. Lena has built distributed systems at Microsoft Research, led developer relations across AWS, and now drives AI infrastructure strategy at Akamai. She&#8217;s watched this shift happen in real time.</p><div><hr></div><blockquote><p><em>&#8220;Code is becoming like binaries. We don&#8217;t manage binaries - they&#8217;re generated. Code is heading the same way. The question is: what does that make you?&#8221;</em> </p><p>- Lena Hall</p></blockquote><div><hr></div><p><strong>The Role Is Changing. Here&#8217;s What It&#8217;s Changing Into.</strong></p><p>Developers aren&#8217;t becoming obsolete. They&#8217;re becoming architects - and the best ones are operating more like CTOs. They own the logic, the system design, the edge cases. They define what needs to be built with enough clarity that an AI agent can execute it reliably.</p><p>But here&#8217;s where Lena&#8217;s technical background adds a layer most people miss: <strong>AI agents are non-deterministic by nature.</strong> If you don&#8217;t control them structurally - with structured outputs, phased execution, and human checkpoints at high-stakes decision points - they will break in ways you can&#8217;t predict or explain.</p><div class="native-video-embed" data-component-name="VideoPlaceholder" data-attrs="{&quot;mediaUploadId&quot;:&quot;2f13831a-5b56-43ca-bbf6-f49a3756817b&quot;,&quot;duration&quot;:null}"></div><p>She calls this pragmatic AI: architecture matched to the stakes of the business problem. Low-stakes tasks can tolerate some ambiguity. High-stakes tasks - financial decisions, healthcare workflows, anything with legal exposure cannot. The expert must be in the loop <em>before</em> the system ships, not after.</p><div><hr></div><p><strong>What You&#8217;ll Take Away From This Episode</strong></p><ul><li><p>Why the Two Generals Problem from distributed systems applies directly to every LLM call you make</p></li><li><p>The 3-tier framework for deciding how much AI control your use case actually needs</p></li><li><p>Why excluding domain experts from your AI workflow is the #1 mistake teams make</p></li><li><p>The one habit that separates developers who ship reliable AI from those who don&#8217;t: fix the spec, not the output</p></li></ul><div><hr></div><h3><strong>WATCH THE FULL EPISODE - AgentBuild Expert Exchange</strong></h3><p>Whether you&#8217;re writing code every day or managing teams that do - this one reframes how you think about where your value actually lives in the AI era.</p><div id="youtube2-lDn0-_Ed53Q" class="youtube-wrap" data-attrs="{&quot;videoId&quot;:&quot;lDn0-_Ed53Q&quot;,&quot;startTime&quot;:null,&quot;endTime&quot;:null}" data-component-name="Youtube2ToDOM"><div class="youtube-inner"><iframe src="https://www.youtube-nocookie.com/embed/lDn0-_Ed53Q?rel=0&amp;autoplay=0&amp;showinfo=0&amp;enablejsapi=0" frameborder="0" loading="lazy" gesture="media" allow="autoplay; fullscreen" allowautoplay="true" allowfullscreen="true" width="728" height="409"></iframe></div></div><p><br>Thanks, and please leave some comments on the video.<br>-Sandi.</p><div><hr></div><p><em><strong>Ask your friends to join.</strong><br>More valuable content coming your way.</em></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://newsletter.agentbuild.ai/?utm_source=substack&amp;utm_medium=email&amp;utm_content=share&amp;action=share&quot;,&quot;text&quot;:&quot;Share agentbuild.ai&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://newsletter.agentbuild.ai/?utm_source=substack&amp;utm_medium=email&amp;utm_content=share&amp;action=share"><span>Share agentbuild.ai</span></a></p><div><hr></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://newsletter.agentbuild.ai/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading agentbuild.ai! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[Storytelling: The SKILL that’s quietly becoming more valuable than your technical chops]]></title><description><![CDATA[The engineers getting promoted aren't the most technical ones. I've spent years watching this pattern. Inside: the three moves that actually work - and the data that proves this isn't optional anymore]]></description><link>https://newsletter.agentbuild.ai/p/storytelling-the-skill-thats-quietly</link><guid isPermaLink="false">https://newsletter.agentbuild.ai/p/storytelling-the-skill-thats-quietly</guid><pubDate>Sat, 14 Feb 2026 14:30:38 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/80214c9e-7168-40d1-b25b-7ef629eaa2ed_1280x720.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>I watched one of the sharpest ML engineers I&#8217;ve ever worked with get passed over for a lead role last year.</p><p>His technical work was genuinely brilliant. The kind of thing that makes other engineers quietly jealous.</p><p>Then he presented it to leadership.</p><p>Forty-five minutes of architecture diagrams. Precision recall curves. Token-level breakdowns of embedding strategies. Every slide was technically correct... and completely forgettable.</p><p>The exec sponsor checked her phone twice. The VP asked one question: &#8220;So what does this actually mean for our customers?&#8221; </p><p>He stumbled. Not because he didn&#8217;t know - he absolutely did - but because he had never practised framing it as anything other than a technical achievement.</p><p>Someone else got the role. Someone less technically impressive, but who could walk into a room and make a CTO feel something about the work.</p><p>I&#8217;ve seen this pattern dozens of times now. Brilliant people, invisible impact. And it&#8217;s not because they lack skill. It&#8217;s because nobody ever told them that the story of the work matters as much as the work itself.</p><p>That gap is about about to get a lot more expensive.<br><br><strong>And you need to pay attention.</strong></p><div><hr></div><h2>This shift is already here</h2><p>So here&#8217;s where it gets interesting.</p><p>The Wall Street Journal reported in December 2025 that LinkedIn job posts mentioning &#8220;storyteller&#8221; <a href="https://www.wsj.com/articles/companies-are-desperately-seeking-storytellers-7b79f54e">doubled in a single year</a>. Not grew a bit. Doubled. </p><p>Let that sit for a second.</p><p>And it&#8217;s not just hiring. <a href="https://www.marketplace.org/story/2025/12/22/corporate-firms-are-looking-for-storytellers">Executive mentions of &#8220;storytelling&#8221; on earnings calls hit 469 in 2025</a>, up from 147 in 2015. That&#8217;s not a marketing trend. That&#8217;s a boardroom concept now.</p><p>So, you may ask why the spike? </p><p>You can probably guess. </p><p>AI made content cheap. Abundant, even. </p><p>Which made trust and human narrative scarce... and therefore valuable. </p><p>One communications CEO nailed it: the flood of AI-generated content created so much distrust that the <strong>brands winning right now are the ones that sound most human</strong>.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Bnha!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5a7b7295-94f5-4282-950d-b3acf49d5b54_971x794.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Bnha!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5a7b7295-94f5-4282-950d-b3acf49d5b54_971x794.png 424w, https://substackcdn.com/image/fetch/$s_!Bnha!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5a7b7295-94f5-4282-950d-b3acf49d5b54_971x794.png 848w, https://substackcdn.com/image/fetch/$s_!Bnha!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5a7b7295-94f5-4282-950d-b3acf49d5b54_971x794.png 1272w, https://substackcdn.com/image/fetch/$s_!Bnha!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5a7b7295-94f5-4282-950d-b3acf49d5b54_971x794.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Bnha!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5a7b7295-94f5-4282-950d-b3acf49d5b54_971x794.png" width="971" height="794" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/5a7b7295-94f5-4282-950d-b3acf49d5b54_971x794.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:794,&quot;width&quot;:971,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:440100,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.agentbuild.ai/i/187939044?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F567c4789-acc9-46d3-ab69-c85adaa0ade7_1200x1200.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Bnha!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5a7b7295-94f5-4282-950d-b3acf49d5b54_971x794.png 424w, https://substackcdn.com/image/fetch/$s_!Bnha!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5a7b7295-94f5-4282-950d-b3acf49d5b54_971x794.png 848w, https://substackcdn.com/image/fetch/$s_!Bnha!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5a7b7295-94f5-4282-950d-b3acf49d5b54_971x794.png 1272w, https://substackcdn.com/image/fetch/$s_!Bnha!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5a7b7295-94f5-4282-950d-b3acf49d5b54_971x794.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p><blockquote><p>Here&#8217;s the weird side-effect if you&#8217;re in data or AI: your dashboards, models, and agents aren&#8217;t the final product anymore. </p><p><em><strong>The story about them is.</strong></em></p></blockquote><div><hr></div><h2>How you actually get better at this</h2><p>You don&#8217;t need to become a novelist. You just need to change how you frame what you&#8217;re already doing.</p><ul><li><p><strong>Frame everything as a before and after.</strong> Next time you present work, try this: <em>Before</em> - here&#8217;s how decisions were made, or what was broken. <br><em>Conflict</em> - here&#8217;s the cost of staying like this. <br><em>After</em> - here&#8217;s what changes if this works. One slide. Three beats. That&#8217;s it. That&#8217;s your story. Most people skip the conflict part, by the way - and that&#8217;s exactly the bit that makes execs lean forward.</p></li><li><p><strong>Translate complexity into choices.</strong> Instead of &#8220;we used model X with technique Y,&#8221; try: &#8220;We chose this approach because it sacrifices a bit of accuracy for much better latency, which means customers don&#8217;t wait.&#8221; See the difference? You&#8217;re telling a story of trade-offs now. A VP can repeat that in a corridor. They can&#8217;t repeat your architecture diagram.</p></li><li><p><strong>Anchor every number to a human.</strong> &#8220;3% uplift&#8221; is forgettable. &#8220;That 3% means 8,000 fewer customers hitting this error screen every month&#8221; - that sticks. <em><strong>Whenever you&#8217;ve got a metric, ask yourself: what does this number feel like for a real person?</strong></em></p></li></ul><div><hr></div><h2>Why bother when you could just get much better at technical stuff?</h2><p>Fair question. Here&#8217;s my honest answer.</p><blockquote><p>AI is eating the production side of our work. Fast. </p><p>Code, analysis, first drafts - all getting automated. </p><p>What it can&#8217;t replace is picking the right problem, reading the room, and crafting a narrative that makes someone with budget authority say &#8220;we&#8217;re doing this.&#8221;</p></blockquote><p><strong>That&#8217;s the bit you don&#8217;t want to outsource.</strong></p><p>Here's what I do. <strong><br></strong>Every project I work on, I write a five-sentence story about it before I present anything: who it helps, what hurts today, what we're changing, how we'll know it worked, what happens next. <br><br>No fancy framework. Just five sentences. <br>It forces me to find the narrative before I open a slide deck.</p><p>Then next time you send a Slack update or a deck, check: is there a clear before and after? <strong>Is there one sentence someone could repeat to their boss?</strong></p><p>If you can answer that... you&#8217;re already ahead of most technical people in the room. Because you got clearer.</p><p><strong>And clarity, it turns out, is what actually moves organisations.</strong><br><br>Thanks for reading. Tell me in comments - what you think about this new skill employers are looking for? How are you preparing for this shift?<br><br>Thanks,<br>Sandi.</p><div><hr></div><p><em><strong>Ask your friends to join.</strong><br>More valuable content coming your way.</em></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://newsletter.agentbuild.ai/?utm_source=substack&amp;utm_medium=email&amp;utm_content=share&amp;action=share&quot;,&quot;text&quot;:&quot;Share agentbuild.ai&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://newsletter.agentbuild.ai/?utm_source=substack&amp;utm_medium=email&amp;utm_content=share&amp;action=share"><span>Share agentbuild.ai</span></a></p><div><hr></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://newsletter.agentbuild.ai/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading agentbuild.ai! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[Anthropics Timeline vs. Your 2006 Database]]></title><description><![CDATA[Inside: AI Companies are built ground up to excel with AI. Most companies are not, and that's the gap. And in this gap, I see a massive opportunity for you.]]></description><link>https://newsletter.agentbuild.ai/p/anthropics-timeline-vs-your-2006</link><guid isPermaLink="false">https://newsletter.agentbuild.ai/p/anthropics-timeline-vs-your-2006</guid><pubDate>Sat, 07 Feb 2026 10:51:54 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!cE9r!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F493fa750-f17e-4c87-b227-4848a8ac4383_886x1142.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Dario Amodei, CEO, Anthropic says software engineers have 6-12 months left.</p><p>Meanwhile, I just left a meeting last week where a $1B+ company spent 90 minutes arguing about whether &#8220;active customer&#8221; means someone who bought in the last 30 days or 90 days.</p><p>Different teams. </p><p>Different definitions. </p><p>Different databases.</p><p>These are not the same timeline.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!cE9r!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F493fa750-f17e-4c87-b227-4848a8ac4383_886x1142.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!cE9r!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F493fa750-f17e-4c87-b227-4848a8ac4383_886x1142.png 424w, https://substackcdn.com/image/fetch/$s_!cE9r!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F493fa750-f17e-4c87-b227-4848a8ac4383_886x1142.png 848w, https://substackcdn.com/image/fetch/$s_!cE9r!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F493fa750-f17e-4c87-b227-4848a8ac4383_886x1142.png 1272w, https://substackcdn.com/image/fetch/$s_!cE9r!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F493fa750-f17e-4c87-b227-4848a8ac4383_886x1142.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!cE9r!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F493fa750-f17e-4c87-b227-4848a8ac4383_886x1142.png" width="886" height="1142" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/493fa750-f17e-4c87-b227-4848a8ac4383_886x1142.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1142,&quot;width&quot;:886,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1042119,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://newsletter.agentbuild.ai/i/187184394?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F493fa750-f17e-4c87-b227-4848a8ac4383_886x1142.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!cE9r!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F493fa750-f17e-4c87-b227-4848a8ac4383_886x1142.png 424w, https://substackcdn.com/image/fetch/$s_!cE9r!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F493fa750-f17e-4c87-b227-4848a8ac4383_886x1142.png 848w, https://substackcdn.com/image/fetch/$s_!cE9r!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F493fa750-f17e-4c87-b227-4848a8ac4383_886x1142.png 1272w, https://substackcdn.com/image/fetch/$s_!cE9r!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F493fa750-f17e-4c87-b227-4848a8ac4383_886x1142.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Image Credit: artificialintelligence.co</figcaption></figure></div><div><hr></div><h3><strong>The Anthropic Reality</strong></h3><p>At Anthropic, engineers stopped writing code because their stack was built for AI from day one. Clean data. Modern architecture. No legacy anything.</p><div><hr></div><h3><strong>The Enterprise Reality</strong></h3><p>At most companies, the Head of Data is still explaining why you can&#8217;t just &#8220;put everything in a vector database&#8221; when nobody agrees on what an active customer is.</p><p>This isn&#8217;t about AI capability. It&#8217;s about infrastructure debt that&#8217;s 15 - 30 years deep.</p><div><hr></div><h3><strong>The Real Gap</strong></h3><p>Most of my time with customers these days are spent on talking about AI Readiness. </p><p>You know what blocks them? Not models. Not talent. Not budget.</p><p>It&#8217;s that nobody can answer basic questions:</p><ul><li><p>Where is the source of truth for customer data?</p></li><li><p>Can we access it in real-time or only batch?</p></li><li><p>Do we have lineage? Do we have versioning?</p></li><li><p>Can we trace decisions back to their data sources?</p></li></ul><p>AI-native companies designed from scratch for these questions. </p><p>Everyone else is retrofitting.</p><div><hr></div><h3>The Opportunity in the Gap</h3><p>Here&#8217;s what&#8217;s interesting about this moment.</p><p>AI companies built for a world that doesn&#8217;t exist yet. </p><p>Enterprises are still operating in the world that does.</p><p>Someone has to bridge that gap.</p><p><strong>And right now, AI companies are realizing they can&#8217;t do it alone.</strong> Anthropic can&#8217;t retrofit your 2009 database. OpenAI can&#8217;t untangle your customer data across 12 systems.</p><p><strong>This is where the real opportunity is.</strong></p><p>Not in building better models. </p><p>In building the infrastructure that lets models actually work.</p><p>If you know:</p><ul><li><p>How to design cloud infrastructure </p></li><li><p>How to build data platforms</p></li><li><p>How industry-specific workflows actually operate</p></li><li><p>How to translate technical requirements to business outcomes</p></li></ul><p>You&#8217;re not behind. You&#8217;re exactly where the market needs you.</p><p><strong>But here&#8217;s what you need to learn - and learn fast:</strong></p><p>The gap between what models can do and what enterprises can actually deploy.</p><p>Because that gap is the entire business for the next 5 years.</p><p>Most people think they need to learn prompt engineering or RAG architectures.</p><p>What they actually need to learn is why a large enterprise can&#8217;t answer &#8220;what&#8217;s this customer&#8217;s balance?&#8221; without a nightly batch job.</p><p>And how to fix that before the AI even shows up.</p><p><strong>The window&#8217;s still open. But it&#8217;s closing fast.</strong></p><p>Not because AI is getting harder. Because the people who understand both worlds - AI capability AND enterprise reality - are getting snatched up.</p><p>The question isn&#8217;t whether you can catch up to Anthropic.</p><p>The question is whether you can help enterprises catch up to what AI requires.</p><p>That&#8217;s the real opportunity.</p><div><hr></div><h3><strong>Next Week</strong></h3><p>I am releasing a plan for you to prepare for this opportunity. </p><p>Make sure you don&#8217;t miss that.</p><div><hr></div><p><em><strong>Ask your friends to join.</strong><br>More valuable content coming your way.</em></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://newsletter.agentbuild.ai/?utm_source=substack&amp;utm_medium=email&amp;utm_content=share&amp;action=share&quot;,&quot;text&quot;:&quot;Share agentbuild.ai&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://newsletter.agentbuild.ai/?utm_source=substack&amp;utm_medium=email&amp;utm_content=share&amp;action=share"><span>Share agentbuild.ai</span></a></p><div><hr></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://newsletter.agentbuild.ai/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading agentbuild.ai! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[When Your AI is Quietly Failing]]></title><description><![CDATA[Inside: The checklist to rescue your AI projects that are silently failing. Find out the systematic process to make these failures visible and measurable - so you can improve these systems continously]]></description><link>https://newsletter.agentbuild.ai/p/when-your-ai-is-quietly-failing</link><guid isPermaLink="false">https://newsletter.agentbuild.ai/p/when-your-ai-is-quietly-failing</guid><pubDate>Sat, 31 Jan 2026 14:30:24 GMT</pubDate><enclosure url="https://substackcdn.com/image/youtube/w_728,c_limit/qzlXRY2BgMY" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Few months ago I meet a VP of Engineering of a SaaS Startup building document processign solutions. He said, &#8220;We shipped our AI six months ago, my team spends most of the time fire-fighting issues - I am not sure we know what&#8217;s wrong with it.&#8221;</p><p>They had no measurement infrastructure. No test cases. No way to trace decisions. They&#8217;d celebrated the launch, moved the team to the next project, and now they are spending most time resolving issues - patching, and fixing. </p><p>He&#8217;s not alone. I have similar discussions with many companies who shipped thier AI features under pressur - from boards, from investors, from competition. </p><div><hr></div><h2>I keep seeing this pattern</h2><p>Three times in the last quarter, I&#8217;ve worked with organizations dealing with the same crisis:</p><p><strong>Scenario 1</strong>: This company deployed document processing AI to 10 enterprise customers. Six months in, their largest customer threatened to cancel. The AI was extracting wrong data from contracts. The team had no way to see why.</p><p><strong>Scenario 2</strong>: A fintech launched an insurance documentation assistant. Their customers complained it was &#8220;getting slower&#8221; and &#8220;less accurate.&#8221; Thier team couldn&#8217;t verify either claim - they&#8217;d never established baselines.</p><p><strong>Scenario 3</strong>: A retail bank deployed a chatbot handling internal claim disputes. Support tickets about &#8220;wrong AI answers&#8221; were climbing. Nobody could trace which policy the AI used or why it made specific decisions.</p><p>Here&#8217;s the common thread I noticed? <br><br>All three had deployed without building evaluation infrastructure. <br>They&#8217;d built the AI, but not the system to know if the AI was working.</p><div><hr></div><h2>What Actually Happens in These Meetings</h2><p>I sit in a conference room with engineering, product, and business leaders. I ask: </p><p>&#8220;What&#8217;s your accuracy?&#8221;</p><p><em>&#8220;X%&#8221;</em></p><p>&#8220;What&#8217;s it costing per query?&#8221;</p><p><em>&#8220;Infrastructure costs are $X, but we don&#8217;t track per-query.&#8221;</em></p><p>&#8220;Do you have test cases?&#8221;</p><p><em>&#8220;We tested it before launch...&#8221;</em></p><p>&#8220;Can you show me traces of what went wrong?&#8221;</p><p><em>Silence.</em></p><p>This is what I call <strong>Evaluation Debt</strong>. </p><p>You deployed a system without building the measurement infrastructure to operate it. And now you&#8217;re paying interest - in firefighting, guessing, and eroding stakeholder trust.</p><div><hr></div><h2>The Recovery Framework I Use</h2><p>Here&#8217;s what most advice gets wrong: it assumes you&#8217;re starting from scratch. But you&#8217;re not. Many AI POCs mde it to production - just that they were not production-ready. They need a recovery framework, not a startup guide.</p><p>I&#8217;ve developed a four-phase Recovery Pathway that rescues these systems without rebuilding from scratch:</p><ol><li><p><strong>Define what success should have been</strong> </p></li><li><p><strong>Build measurement infrastructure retroactively</strong> </p></li><li><p><strong>Diagnose with data, not guesses</strong></p></li><li><p><strong>Fix and validate systematically</strong> </p></li></ol><p>That SaaS company? Itook them from 73% accuracy and customer threats to 96% accuracy and $1.4 million in annual savings. </p><p>Not by switching models. </p><p>By implementing evaluation infrastructure and working systematically.</p><div><hr></div><h2>Why This Matters Now</h2><p>According to recent IDC research, enterprises are collectively spending $154 billion on AI initiatives in 2024. But McKinsey data shows that only 11% of organizations have achieved significant financial returns from their AI investments.</p><p>The gap between investment and return isn&#8217;t a capability problem. </p><p>It&#8217;s a measurement problem.</p><p>You can&#8217;t fix what you can&#8217;t measure. </p><p>And you can&#8217;t defend an investment you can&#8217;t prove is working.</p><div><hr></div><h2>Watch the Recovery Pathway Framework Video</h2><p>I just recorded a complete walkthrough of the Recovery Pathway framework. -including a real case study where we rescued a failing AI system.<br><br>&#128073; <em>I will be bringing more practical content like this - please subscribe to my channel if you want to stay updated. I share clips and short formats so you can learn something new everyday.</em></p><div id="youtube2-qzlXRY2BgMY" class="youtube-wrap" data-attrs="{&quot;videoId&quot;:&quot;qzlXRY2BgMY&quot;,&quot;startTime&quot;:null,&quot;endTime&quot;:null}" data-component-name="Youtube2ToDOM"><div class="youtube-inner"><iframe src="https://www.youtube-nocookie.com/embed/qzlXRY2BgMY?rel=0&amp;autoplay=0&amp;showinfo=0&amp;enablejsapi=0" frameborder="0" loading="lazy" gesture="media" allow="autoplay; fullscreen" allowautoplay="true" allowfullscreen="true" width="728" height="409"></iframe></div></div><p>You&#8217;ll see:</p><ul><li><p>The exact diagnostic process that reveals where failures are happening</p></li><li><p>How to implement tracing retroactively without rebuilding</p></li><li><p>The week-by-week action plan to go from crisis to recovery</p></li><li><p>Real numbers: $4.20 per document down to $1.80, 47 complaints per month down to 3</p></li></ul><p>This isn&#8217;t theory. This is the actual process I use when organizations call me to rescue production AI systems.</p><p>If you&#8217;re dealing with an AI system that shipped but isn&#8217;t delivering the value you promised - this framework will show you the way out.</p><p>And if you know someone firefighting a struggling AI deployment, share this with them. Recovery is possible. But it requires working backward with discipline.</p><p>&#128073; BONUS:  I&#8217;ve created a <a href="https://drive.google.com/file/d/1hXSIKdT3wZhHXGpXv9HauRJyPHk1YrHo/view?usp=sharing">Recovery Pathway checklist</a> with the workshop agenda, tracing implementation guide, and diagnostic framework. </p><p>Get in touch if you have questions.</p><div><hr></div><p><em>Found this useful? <strong>Ask your friends to join.</strong><br>We have so much planned for the community - can&#8217;t wait to share more soon.</em></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://newsletter.agentbuild.ai/?utm_source=substack&amp;utm_medium=email&amp;utm_content=share&amp;action=share&quot;,&quot;text&quot;:&quot;Share agentbuild.ai&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://newsletter.agentbuild.ai/?utm_source=substack&amp;utm_medium=email&amp;utm_content=share&amp;action=share"><span>Share agentbuild.ai</span></a></p><div><hr></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://newsletter.agentbuild.ai/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading agentbuild.ai! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item></channel></rss>