Multi-Agent Systems & Orchestration
When one LLM isn't enough — patterns for coordinating many.
A multi-agent system has multiple LLMs, each with a specialised role, coordinated by orchestration logic. They're the most powerful — and most brittle — pattern in production AI.
Why multi-agent at all
Some tasks decompose naturally:
- Researcher fetches information → Writer drafts → Editor refines
- Planner breaks down a goal → Executor handles each step → Critic verifies
- One agent per domain (legal, finance, ops) routed by a dispatcher
The argument for: specialised system prompts and tool sets per role, parallel execution where possible, easier to debug individual agents.
The argument against: cost multiplies, latency stacks, debugging the interactions is harder than debugging any one agent.
Common patterns
Pipeline — agents run in sequence, each consuming the previous one's output. Simple, debuggable, the most common pattern.
Hierarchy — a manager agent breaks a task into subtasks, dispatches to worker agents, aggregates results. Used by Anthropic's research agents, AutoGPT, BabyAGI.
Debate / Critic — generate an answer, have a critic agent score it, revise. Often improves quality 10-30%.
Parallel ensemble — run N agents on the same input, vote or aggregate. Expensive but reduces variance.
Routing — a classifier sends each request to the appropriate specialist agent.
The communication problem
How do agents talk to each other? Three options:
- Plain text — Agent A's output becomes Agent B's input as a message. Lossy, free-form, debuggable.
- Structured handoffs — JSON schemas define the interface. Reliable, harder to extend.
- Shared memory / blackboard — all agents read/write to a shared state. Powerful, hard to coordinate.
Production systems usually use option 2 for critical handoffs and option 1 for human-readable summaries.
Frameworks
- CrewAI — role-based multi-agent setup, beginner-friendly
- LangGraph — graph-based agent orchestration, production-grade
- AutoGen (Microsoft) — research-flavoured, generates conversation between agents
- OpenAI Swarm / Agents SDK — official OpenAI multi-agent framework
- Anthropic Claude SDK Agents — recently added multi-agent primitives
Most teams that try frameworks end up writing their own thin orchestrator instead — frameworks add complexity and constrain debugging.
Cost reality
Multi-agent costs scale multiplicatively. A 3-agent pipeline costs 3× a single-agent call. A 3-agent debate with 3 rounds costs 9×. Without aggressive caching and small models for cheap roles, costs explode.
When NOT to use multi-agent
If you can:
- Do it with one well-prompted model: do.
- Do it with one model + many tools: do.
- Do it with chain-of-thought: do.
Multi-agent is for genuinely decomposable tasks where each subtask needs different prompting / tools / models. Otherwise, you're paying for orchestration complexity you don't need.
The future
Major labs are starting to bake multi-agent capabilities directly into models — Claude 4's "extended computer use," GPT's "task" mode, Gemini's planning APIs. The boundary between "agent framework" and "model API" is dissolving. Build defensively: your orchestration code shouldn't lock you into one framework.