Most teams reach for multiple agents one task too early. Multi-agent systems can outperform a single agent on hard, parallelizable problems, but they also multiply the ways your pipeline can fail silently.
What multi-agent systems actually are
A multi-agent system is two or more LLM-driven agents, each with its own context window, tools, and instructions, working toward a shared goal. The point is not "more models." The point is division of labor: separate contexts so each agent reasons over a smaller, cleaner slice of the problem instead of drowning in one giant prompt.
That separation is the real benefit. A single agent handed a 40-tool toolbox and a 30,000-token instruction blob gets confused, picks the wrong tool, and loses the thread halfway through. Split that into a researcher, a writer, and a critic, and each one stays sharp because its context only holds what it needs.
The orchestrator/worker pattern
The dominant design for multi-agent systems is orchestrator/worker, sometimes called lead/subagent. One agent plans and delegates; several workers execute in parallel; the orchestrator synthesizes their results.
It runs in a clear loop:
- Decompose. The orchestrator breaks the goal into independent subtasks and writes an explicit objective, output format, and scope boundary for each.
- Dispatch. Each subtask spawns a worker with a fresh context window and only the tools it needs.
- Execute. Workers run concurrently. A worker searching ten sources does not block one querying a database.
- Synthesize. The orchestrator collects worker outputs and merges them into a final answer, often after a verification pass.
Anthropic's research feature and OpenAI's Swarm both lean on this shape. Frameworks like LangGraph model it as a graph with a supervisor node, while CrewAI and AutoGen expose roles and conversations directly. The vocabulary differs; the skeleton is the same.
The non-obvious part is that delegation prompts have to be detailed. "Research the competitor" produces three workers that all read the same homepage. "Find pricing tiers," "find headcount from LinkedIn," and "find funding history from Crunchbase" produce three workers with no overlap. The orchestrator's job is mostly writing good task descriptions.
Other coordination shapes
Orchestrator/worker is not the only option:
- Pipeline. Agents in a fixed sequence, each transforming the previous output. Clean for stages like extract, then transform, then validate.
- Debate or critic. One agent proposes, another challenges. Useful for catching reasoning errors in math, code, and analysis.
- Blackboard. Agents read and write to shared state and act when relevant. Flexible but harder to reason about and debug.
When to use multiple agents vs one
Default to a single agent. It is cheaper, easier to debug, and has no coordination overhead. Reach for multiple agents only when the problem pushes back.
Good fits for multi-agent systems:
- Parallelizable breadth. Tasks that fan out into independent subtasks, like surveying many documents or sources at once.
- Distinct skill sets. When subtasks need genuinely different tools or instructions, such as a SQL agent feeding a chart-building agent.
- Context pressure. When one agent's context would overflow, splitting the work keeps each window focused.
- Built-in review. When a separate critic catching mistakes is worth the extra cost, as in code generation with a reviewer.
Stay single-agent when the task is mostly sequential, latency matters, the budget is tight, or the steps share so much context that splitting just forces you to copy state between agents. Multi-agent systems can burn several times the tokens of a single agent because every worker re-reads context and the orchestrator pays again to synthesize. If the work does not parallelize, you are paying that tax for nothing.
Coordination pitfalls that quietly break things
The failure modes of multi-agent systems are rarely loud. They show up as plausible-looking wrong answers.
- Context loss at boundaries. A worker only knows what the orchestrator told it. Vague handoffs make workers guess, and guesses compound across the chain.
- Duplicated and conflicting work. Without clear scoping, two workers solve the same subtask, or worse, return contradictory facts the orchestrator has to reconcile blind.
- Error propagation. A hallucination in an early worker becomes a trusted input downstream. Without a verification step, nothing catches it.
- Runaway loops and cost. Agents that call each other can ping-pong indefinitely. Always cap iterations, set token budgets, and add timeouts.
- Lost observability. When five agents act in parallel, a single trace no longer tells the story. You need per-agent logging or you cannot debug what went wrong.
- Synthesis bottleneck. The orchestrator can become the weak link, stuffing every worker's full output into one context and losing detail. Have workers return structured summaries, not raw dumps.
Most of these trace back to one root cause: implicit assumptions about who knows what. Agents do not share memory unless you make them. Treat every handoff as an API contract with a defined input, output, and scope, and the system gets far more predictable.
How to start
Build the single-agent version first and find where it breaks. If it fails on breadth or context limits, introduce orchestrator/worker for that bottleneck only. Add a critic agent if correctness matters more than speed. Instrument everything, cap every loop, and keep your delegation prompts specific. Multi-agent systems reward teams that earn the complexity, and punish the ones that adopt it by default.