This week: Agent communication is a major problem in multi-agent syatems. What ar ethe common failure modes, how to design for them, and key lessons I have learned.
Designing for failure between agents is the part teams skip. Most setups assume the receiving agent reads the message the way the sender meant it. Half the bugs I’ve seen come from one agent confidently passing context the next one silently misinterprets. Building the expected-failure path first changes the whole architecture.
Designing for failure between agents is the part teams skip. Most setups assume the receiving agent reads the message the way the sender meant it. Half the bugs I’ve seen come from one agent confidently passing context the next one silently misinterprets. Building the expected-failure path first changes the whole architecture.
Designing for failure is the right framing, and it's exactly what's missing from most A2A implementations. A2A handles the "happy path" of agent communication well, but when messages get lost, agents misinterpret intent, or trust decays mid-session, there's no built-in recovery. This is the gap a collaboration layer like AACP fills — shared session state, rollback points, and human escalation triggers when agent-to-agent communication degrades. The protocol stack needs to treat failure not as an edge case but as a design constraint.
Brilliant piece. Thank you, Sandipan. Instead of communication failures - could the Telephone Game, death spiral and destructive-action HITL gate be judgement failures? Schemas are helpful but they don't address whether the action should happen at all. Seems this judgement layer is needed. Curious if you are seeing anyone building in this layer yet?
Thanks. Yes, judgement can be architected. Either, 1. you can use deterministic condition gates (easier to do if you force agent outputs in strcutured JSON); 2. Use evals for judging responses and agent actions - improve offline and monitor online. There are number of companies building judgement layers - the trick is in stitching them in the production architecture, and also keeping the "judges" aligned.
Quite obvious for professionals, very well known standard repeatable problems and solution patterns used by decades, nothing new.
The real main problem is the extremely low professional culture level of self-confident humans who enthusiastically took on a seemingly easy task, which they understand very poorly.
Designing for failure between agents is the part teams skip. Most setups assume the receiving agent reads the message the way the sender meant it. Half the bugs I’ve seen come from one agent confidently passing context the next one silently misinterprets. Building the expected-failure path first changes the whole architecture.
Yeah, design fall back strategies. That a whole different blog. Let me add it to the idea log.
Designing for failure between agents is the part teams skip. Most setups assume the receiving agent reads the message the way the sender meant it. Half the bugs I’ve seen come from one agent confidently passing context the next one silently misinterprets. Building the expected-failure path first changes the whole architecture.
Designing for failure is the right framing, and it's exactly what's missing from most A2A implementations. A2A handles the "happy path" of agent communication well, but when messages get lost, agents misinterpret intent, or trust decays mid-session, there's no built-in recovery. This is the gap a collaboration layer like AACP fills — shared session state, rollback points, and human escalation triggers when agent-to-agent communication degrades. The protocol stack needs to treat failure not as an edge case but as a design constraint.
Communication failures in AI agents are a real challenge, great breakdown.
Brilliant piece. Thank you, Sandipan. Instead of communication failures - could the Telephone Game, death spiral and destructive-action HITL gate be judgement failures? Schemas are helpful but they don't address whether the action should happen at all. Seems this judgement layer is needed. Curious if you are seeing anyone building in this layer yet?
Thanks. Yes, judgement can be architected. Either, 1. you can use deterministic condition gates (easier to do if you force agent outputs in strcutured JSON); 2. Use evals for judging responses and agent actions - improve offline and monitor online. There are number of companies building judgement layers - the trick is in stitching them in the production architecture, and also keeping the "judges" aligned.
A great paper to read - "Who judges the judges" by Shreya Shankar.
Quite obvious for professionals, very well known standard repeatable problems and solution patterns used by decades, nothing new.
The real main problem is the extremely low professional culture level of self-confident humans who enthusiastically took on a seemingly easy task, which they understand very poorly.
They are known patterns, easily misunderstood in agentic context.