Skip to content
agent2agent
Multi-Agent Systems

How AI Agents Communicate: Protocols, Message Passing, and Shared State

AI agent communication design determines system cost, reliability, and debuggability as much as any individual agent's capability. The two fundamental approaches — message passing and shared state — each have distinct trade-offs, and most production systems use a hybrid.

By Nora LinJune 1, 20256 min read

When multiple AI agents collaborate, the communication layer is usually what breaks first. Agents that work perfectly in isolation start producing garbage outputs when they receive poorly structured messages from each other. **AI agent communication** is not an afterthought — it's a core architectural decision that affects cost, latency, debuggability, and failure recovery.

Quick answer

AI agents communicate via two mechanisms: message passing (agents send structured messages to each other, keeping private internal state) and shared state (all agents read/write a common data store). Message passing scales better and isolates failures; shared state is simpler to implement and gives every agent full context. Most production multi-agent systems use a hybrid: shared state for task-level data, message passing for agent-to-agent coordination signals.

Why does communication design matter so much in multi-agent systems?

A 2024 benchmark study on multi-agent coding systems found that communication overhead — the tokens spent passing context between agents — accounted for 30-50% of total API costs in pipeline-style architectures. Poor communication design doesn't just waste money; it introduces ambiguity that causes agents to misinterpret each other's outputs. The result is cascading errors that are difficult to diagnose because the root cause is buried in an inter-agent message, not in any single agent's reasoning.

Well-designed AI agent communication solves three problems simultaneously: it gives each agent exactly the context it needs (no more, no less), it makes failures attributable to a specific message exchange, and it keeps the token budget predictable.

What is message passing and how does it work between agents?

In message passing, each agent maintains its own private state and communicates exclusively by sending and receiving structured messages. An agent never directly reads another agent's internal state — it only sees what that agent chose to send. This is the communication model used by AutoGen's group chat, LangGraph's inter-node messages, and most actor-based distributed systems.

A well-structured agent message typically includes: the sender identity, the recipient identity (or broadcast indicator), the message type (task assignment, result, error, status update), the payload (the actual content), and a correlation ID that links this message to its originating request for debugging.

  • Advantages: each agent is independently testable by mocking its message inputs; failures are attributable to specific messages; agents can be replaced without changing others' code.
  • Disadvantages: requires explicit message contracts between every communicating pair; if Agent B needs context from Agent A's earlier message to Agent C, you must explicitly route that context.
  • Best for: large systems with many agents where isolation and testability matter more than implementation speed.

What is shared state and when should agents use it?

In shared state, all agents in a system read from and write to a common data structure — a Python dict, a LangGraph `State` object, a Redis store, or a database. Any agent can inspect the full task context at any time without requesting it from another agent. LangGraph's `TypedDict` state is the most common implementation in Python-based AI agent communication systems.

Shared state works well when agents genuinely need to react to the same evolving context. In a research task, a shared `findings` list lets every researcher agent add results and let the synthesizer agent read them all without requiring any explicit message-passing choreography.

  • Advantages: simpler to implement; every agent has full context; no message routing logic to maintain.
  • Disadvantages: concurrent writes can corrupt state if not carefully managed; the shared state object grows unbounded without explicit cleanup; tight coupling between agents makes individual testing harder.
  • Best for: small systems (2-4 agents) with a clear central data model and limited parallelism.
Message passing (left): agents exchange structured messages via explicit channels. Shared state (right): all agents read and write a common state object. Most production systems use elements of both.

What is the Agent-to-Agent (A2A) protocol?

Google introduced the Agent-to-Agent (A2A) protocol in April 2025 as an open standard for AI agent communication across different platforms and vendors. A2A defines how agents advertise their capabilities (via an Agent Card), how tasks are assigned and tracked, and how results are returned — even when the agents were built by different teams using different frameworks.

A2A uses HTTP with JSON payloads and supports both synchronous (request/response) and asynchronous (task queuing with callbacks) communication patterns. An agent exposes an Agent Card at a well-known URL describing what it can do. A client (another agent or orchestrator) reads this card, sends a task, and polls or receives a webhook when the task completes.

As of mid-2025, A2A support has been added to LangGraph, CrewAI, and AutoGen, making it the closest thing to a cross-framework interoperability standard in the agent ecosystem.

How do agents signal task completion, failure, and uncertainty?

Signaling is one of the most underspecified parts of multi-agent design. Many systems fail not because an agent produced a wrong answer, but because it produced a wrong answer that *looked* like a correct one — no error signal was raised, and the downstream agent accepted it without question.

  • Completion: agents should return a structured result object with an explicit `status: 'complete'` field, not just a text string. This allows the orchestrator to programmatically confirm completion rather than parsing free text.
  • Failure: use typed error responses — `{status: 'error', error_type: 'tool_timeout', retryable: true, message: '...'}` — so the orchestrator can decide whether to retry, route elsewhere, or escalate.
  • Uncertainty / low confidence: implement a `confidence` field or explicit hedging signal. When an agent is uncertain, it should flag this rather than guessing, so a supervisor can route to human review or request a second opinion from another agent.
  • Partial completion: for long tasks, agents should return intermediate checkpoints rather than only a final result, enabling the orchestrator to decide whether to continue or abort.

How do you avoid chatty agents and control context costs?

"Chatty agents" refers to agents that exchange many small messages when a few large, well-structured ones would suffice. Each additional round-trip adds latency and token cost. The fix is designing coarser-grained message contracts: instead of Agent A asking Agent B five sequential questions, Agent A sends all five questions in one message and Agent B returns all five answers at once.

Context compression is equally important in long agent conversations. When two agents have exchanged 20 messages, the 21st message doesn't need the full transcript — it needs a summary. LangGraph supports checkpoint-based context summarization; custom implementations can use a summarization agent that compresses conversation history before it's passed to the next stage.

  • Batch related requests into a single agent invocation rather than chaining them.
  • Use a shared state summary field that each agent updates with its key findings, rather than passing full transcripts.
  • Set a context window budget per agent and enforce it: if an incoming message exceeds the budget, route it through a summarization step first.
  • For long-running tasks, use a Redis or database-backed state store rather than passing state through LLM context.

For the architectural patterns that these communication choices feed into, see Multi-Agent Systems Guide and Agent Orchestration Patterns.

Frequently asked questions

Should agents communicate using natural language or structured formats?
Structured formats (JSON with defined schemas) for inter-agent communication; natural language only at the human-facing boundaries. Natural language between agents introduces parsing ambiguity, makes output validation difficult, and inflates token counts. Define a Pydantic or JSON Schema model for every agent output that another agent will consume, and validate against it before passing the message downstream.
How do agents communicate in LangGraph specifically?
LangGraph uses a shared `State` object (a TypedDict) that flows through the graph. Each node (agent) receives the current state, performs its work, and returns an update to the state. Nodes communicate by writing to named fields in the state — there's no direct agent-to-agent messaging. For conditional routing, a node can return a routing decision that LangGraph uses to determine the next node to execute.
What's the difference between the A2A protocol and MCP?
They solve different problems. The Model Context Protocol (MCP, from Anthropic) standardizes how a single agent accesses tools and external data sources — it's about agent-to-tool communication. Google's A2A protocol standardizes how agents communicate with each other across different platforms — it's about agent-to-agent communication. The two protocols are complementary; an agent can use MCP to access tools and A2A to communicate its results to another agent.
How do you prevent agents from contradicting each other in shared state?
Use field-level ownership: each state field is owned by exactly one agent, which is the only agent authorized to write to it. Other agents read but don't write. When multiple agents must contribute to the same field (e.g., a shared `findings` list), use append-only writes with unique agent identifiers so updates can be attributed and conflicts resolved by a dedicated merge agent.
Nora Lin

Written by

Nora Lin

Senior AI Research Analyst & Technical Reviewer

Nora researches AI agent capabilities, safety, and practical deployment patterns. She reviews every guide on agent2agent to ensure technical accuracy and current best practices.

This article is for educational purposes only. It does not constitute professional software, legal, or financial advice. Read our full disclaimer.

Related articles

Multi-Agent Systems

Multi-Agent Systems: How AI Teams Collaborate to Solve Complex Problems

Multi-agent systems assign specialized roles to separate AI agents that coordinate to complete tasks no single agent could handle reliably. The key architectures — supervisor, pipeline, and peer-to-peer — each trade control for flexibility in different ways.

Marcus Reid·9 min read
Multi-Agent Systems

Agent Orchestration Patterns: Supervisor, Pipeline, and Swarm Architectures

The three dominant agent orchestration patterns are supervisor (central coordinator delegates to workers), sequential pipeline (agents pass output forward in a chain), and swarm (agents communicate peer-to-peer). Each trades control, debuggability, and flexibility in different proportions.

Marcus Reid·7 min read
Understanding AI Agents

What Is an AI Agent? The Complete Guide

AI agents are programs that perceive their environment, plan a sequence of steps, use tools to act, and loop back until a goal is achieved — unlike a one-shot LLM call that just predicts the next token.

Marcus Reid·9 min read