LLMAtlas — The Open Ecosystem Workspace for LLMs

A multi-agent system has multiple LLMs, each with a specialised role, coordinated by orchestration logic. They're the most powerful — and most brittle — pattern in production AI.

Why multi-agent at all

Some tasks decompose naturally:

Researcher fetches information → Writer drafts → Editor refines
Planner breaks down a goal → Executor handles each step → Critic verifies
One agent per domain (legal, finance, ops) routed by a dispatcher

The argument for: specialised system prompts and tool sets per role, parallel execution where possible, easier to debug individual agents.

The argument against: cost multiplies, latency stacks, debugging the interactions is harder than debugging any one agent.

Common patterns

Pipeline — agents run in sequence, each consuming the previous one's output. Simple, debuggable, the most common pattern.

Hierarchy — a manager agent breaks a task into subtasks, dispatches to worker agents, aggregates results. Used by Anthropic's research agents, AutoGPT, BabyAGI.

Debate / Critic — generate an answer, have a critic agent score it, revise. Often improves quality 10-30%.

Parallel ensemble — run N agents on the same input, vote or aggregate. Expensive but reduces variance.

Routing — a classifier sends each request to the appropriate specialist agent.

The communication problem

How do agents talk to each other? Three options:

Plain text — Agent A's output becomes Agent B's input as a message. Lossy, free-form, debuggable.
Structured handoffs — JSON schemas define the interface. Reliable, harder to extend.
Shared memory / blackboard — all agents read/write to a shared state. Powerful, hard to coordinate.

Production systems usually use option 2 for critical handoffs and option 1 for human-readable summaries.

Frameworks

CrewAI — role-based multi-agent setup, beginner-friendly
LangGraph — graph-based agent orchestration, production-grade
AutoGen (Microsoft) — research-flavoured, generates conversation between agents
OpenAI Swarm / Agents SDK — official OpenAI multi-agent framework
Anthropic Claude SDK Agents — recently added multi-agent primitives

Most teams that try frameworks end up writing their own thin orchestrator instead — frameworks add complexity and constrain debugging.

Cost reality

Multi-agent costs scale multiplicatively. A 3-agent pipeline costs 3× a single-agent call. A 3-agent debate with 3 rounds costs 9×. Without aggressive caching and small models for cheap roles, costs explode.

When NOT to use multi-agent

If you can:

Do it with one well-prompted model: do.
Do it with one model + many tools: do.
Do it with chain-of-thought: do.

Multi-agent is for genuinely decomposable tasks where each subtask needs different prompting / tools / models. Otherwise, you're paying for orchestration complexity you don't need.

The future

Major labs are starting to bake multi-agent capabilities directly into models — Claude 4's "extended computer use," GPT's "task" mode, Gemini's planning APIs. The boundary between "agent framework" and "model API" is dissolving. Build defensively: your orchestration code shouldn't lock you into one framework.

Multi-Agent Systems & Orchestration