A single LLM agent with 10+ tools handles 80% of simple workflows. But at production scale, certain workloads — complex sales research, multi-step incident response, large-scale document processing — break this monolithic pattern. The agent's context window thrashes, the prompt becomes a 3,000-token kitchen sink, and quality drops as the agent has to context-switch between unrelated tools. Multi-agent orchestration is the answer, but it introduces its own failure modes. After deploying multi-agent systems for 12+ NKKTech clients, here are the three orchestration patterns that cover 95% of production needs — and the specific signals that tell you which one fits.
Why Single-Agent Solutions Hit a Ceiling
A single agent works well when: the task has fewer than ~8 tools, each tool's purpose is closely related, and the reasoning chain rarely exceeds 4-5 steps. Past these thresholds, three failure modes appear. First: prompt bloat. With 15+ tool definitions in the system prompt, the LLM spends ~2,000 input tokens just describing tools — every single request. Caching helps but doesn't eliminate the latency cost or the model's tendency to confuse similar tools. Second: context contamination. A general-purpose agent that handles both customer-support tickets AND internal analytics will mix patterns from one domain into the other, especially on edge cases. Third: eval framework explosion. A monolithic agent needs eval cases for every combination of (input domain × tool path × edge case), which scales quadratically. Multi-agent orchestration fixes all three by splitting the work into specialized agents, each with a focused tool set, focused prompt, and focused eval suite.
Pattern A: Hub-and-Spoke (Orchestrator + Workers)
A central orchestrator agent receives the user request, classifies the task type, and routes to one of N specialized worker agents. The orchestrator has zero domain tools — its only job is routing. Workers have deep, narrow tool sets for their specialty. Example from a NKKTech client (B2B SaaS support): orchestrator (50 prompt tokens, 1 tool: classify_intent) routes to one of four workers — billing-agent, account-config-agent, integration-agent, refund-agent. Each worker has 8-15 domain tools. Benefits: each worker's eval suite is bounded; adding a new domain means adding a worker without touching the others; the orchestrator's classification accuracy is independently testable. Watch out for: orchestrator misrouting (which is the single biggest production failure mode — invest in the intent-classification eval set), worker overlap (when two workers can both handle a request, the routing rule needs to be deterministic), and the temptation to make the orchestrator 'smarter' by adding domain tools to it (which defeats the purpose). Use hub-and-spoke when you have clear domain boundaries and 3-8 specialized agents.
📥 Free Download: Vietnam Offshore Dev Cost Guide 2026
Real developer rates, project cost breakdowns, and a budget planning template. Used by 200+ startup founders.
Ready to build?
NKKTech delivers AI Development projects from $30K.
Fixed scope. Senior Vietnam engineers. 14-day kickoff.
Pattern B: Swarm (Peer-to-Peer Coordination)
All agents have equal authority — any agent can hand off to any other agent during a conversation. Agent A starts handling the request, recognizes it needs Agent B's specialty mid-flow, transfers context, and Agent B picks up. OpenAI's Swarm framework popularized this pattern; it's well-suited to workflows where the task scope evolves as the conversation progresses. Example from a NKKTech client (legal research): a contract-review agent reading through a clause realizes it needs the case-law-research agent's expertise, hands off the relevant context, gets case law back, then continues review. Three sub-patterns: trigger-handoff (agent explicitly transfers when a condition is met), suggest-handoff (agent recommends another agent and lets the user accept), continuous-handoff (LLM-driven decision each turn). Benefits: natural for evolving workflows where the relevant specialty isn't knowable upfront. Watch out for: handoff loops (agent A → B → A → B → ...; require a 'handoff budget' per session to prevent), context loss across handoffs (the receiving agent doesn't have the full history; pass a structured summary), and harder eval framework because the agent path varies per session.
Pattern C: Hierarchical (Multi-Tier Delegation)
A tree structure: top-level supervisor agents delegate to mid-level domain agents, which delegate to leaf workers. Best for workflows with multiple levels of abstraction. Example from a NKKTech client (enterprise procurement automation): top-level 'purchase-request-coordinator' receives the request, delegates to 'vendor-research-coordinator' (which delegates to leaf agents for vendor-lookup, vendor-comparison, risk-assessment) AND 'budget-coordinator' (which delegates to leaf agents for budget-check, approval-routing, finance-notification). The top-level coordinator stitches together the results from both mid-level coordinators. Three tiers is the sweet spot — four tiers becomes unwieldy. Benefits: each tier has a clear responsibility, evaluation is tier-by-tier, parallelism is natural (top-level can fan out to multiple mid-level agents in parallel). Watch out for: latency stacking (each tier adds ~1-2s of LLM call overhead; 3 tiers = 6 seconds of pure orchestration latency), token costs (every tier serializes the previous tier's output back through an LLM, multiplying token costs), and over-engineering (most workflows don't actually need three tiers — start with hub-and-spoke and only escalate when it's clearly necessary).
Picking the Right Pattern for Your Workload
Decision tree we use in scoping. (1) Does the workload have clear domain boundaries that a classifier can route on? → Hub-and-spoke. This is 60-70% of cases. (2) Does the workload need to dynamically evolve based on intermediate findings, with the relevant specialty not knowable upfront? → Swarm. ~15-20% of cases. (3) Does the workload have multiple abstraction layers (a strategy decision that triggers tactical decisions that trigger leaf operations)? → Hierarchical. ~10-15% of cases. Default to hub-and-spoke unless you have a clear reason not to. The biggest mistake we see: teams pick hierarchical because it sounds 'enterprise-grade,' then drown in token costs and orchestration complexity for a workload that hub-and-spoke handles in a quarter of the code. Start simple, escalate only when the eval framework shows the simpler pattern is hitting a ceiling. For the broader production architecture decisions — memory, retrieval, tool design, evaluation — see our AI Agents in Production Architecture Guide.
Production Failure Modes and How to Diagnose Them
Three multi-agent failures we see most often in production audits. Routing accuracy degradation: orchestrator's intent-classification accuracy drops from 92% in eval to 78% in production after a few weeks. Cause: real user traffic distribution differs from the eval set distribution. Fix: weekly review of misrouted requests, retraining the classifier on the corrected distribution. Eval framework asymmetry: the worker agents have eval coverage but the orchestrator doesn't, so a routing regression goes undetected for weeks. Fix: build the orchestrator eval suite FIRST, before any worker. Use it as the canary for the whole system. Context-handoff degradation: in swarm patterns, the receiving agent has less context than it needs because the handing-off agent's summary missed something. Cause: the handoff summary is generated by the LLM and the LLM doesn't always know what the receiving agent will need. Fix: structured handoff schemas (the handing-off agent must populate explicit fields), plus rejection-back-handoff (the receiving agent can request more context if the handoff is insufficient).
📥 Free Download: Vietnam Offshore Dev Cost Guide 2026
Real developer rates, project cost breakdowns, and a budget planning template. Used by 200+ startup founders.
Ready to build?
NKKTech delivers AI Development projects from $30K.
Fixed scope. Senior Vietnam engineers. 14-day kickoff.

10+ years building AI systems for Toyota, Sony, and Rakuten in Japan. Founded NKKTech in 2018 with a senior-only engineering model.
Want to build this with NKKTech?
Building or running a multi-agent system and want an architecture review? Book a free 30-minute call. We'll look at your orchestration pattern and routing accuracy, identify likely production failure modes, and suggest the highest-ROI restructuring.
Book a Free Call