Your AI Agents Are a Panel, Not a Band

Mihir Wagle 5 min read
jugalbandiCQRSagentic-aimulti-agent-systemsarchitecture

Most multi-agent AI systems work like this: run several agents independently on the same problem, then have a separate agent merge their outputs. The literature calls this a "council" or "mixture of agents." A recording engineer would call it something else: five musicians in separate studios, mixed by a producer who never heard them play together.

This is an ensemble. Ensembles are good at coverage. They explore more of the solution space than a single agent. But they are not reasoning together. They are reasoning separately and being averaged.

The thing missing is interaction. Not interaction as in "Agent B sees Agent A's output." Interaction as in: Agent B's response is shaped by the specific thing Agent A just said, and Agent A's next move is shaped by that response. Stateful, bidirectional conditioning with iterative updates. The output should be something neither agent would have produced alone.

Jugalbandi as Counter-Image

Hindustani classical music has a performance format called jugalbandi: two soloists sharing a stage, improvising in real time. It is not a Western duet where parts are composed in advance. It is structured improvisation between peers.

Several properties make it structurally interesting as a contrast to the council pattern.

Shared constraints. Both musicians operate within the same raga, a melodic framework that specifies permitted notes, characteristic phrases, rules of progression, and which notes are dominant. The raga is agreed before performance begins. Neither musician invents their own rules. This shared constraint structure is what makes improvisation legible to both players. Without it, two people improvising simultaneously is noise.

Progressive commitment. A performance typically begins with slow, open-ended exploration before moving into full rhythmic engagement. You do not start with virtuosic sparring. You feel out the space first. Some agent systems do implement planning passes or scratchpads, but exploration is almost never jointly constructed. There is no phase where agents build a shared understanding of the problem space before committing to a direction.

Call-and-response. During the performance, musicians trade phrases: sawal (question) and jawab (answer). Each phrase responds to the specific thing the other musician just played. The goal is not to win. It is to play something that forces the other musician to stretch, and in stretching, to elevate the performance beyond what either would produce solo.

Convergence at sam. The tala cycle has a beat called sam where musicians converge, however far they have diverged within the cycle. This is not "merge at the end." It is a recurring alignment constraint. Diverge, explore, push each other, then prove you are still in the same universe. Then diverge again. This is an architectural pattern that multi-agent systems lack. Agents either run fully independently or are chained sequentially. The jugalbandi pattern is neither: it is parallel improvisation with periodic convergence.

Closure from within. A tihai is a thrice-repeated rhythmic or melodic phrase with calculated spacing such that the final articulation resolves exactly on sam. The spacing is not accidental. The musician works backward from where they need to land, engineering the gaps between repetitions so convergence is precise. Contrast this with the typical agent loop: iterate N times, or until a judge says stop. In the jugalbandi pattern, local convergence is engineered within the interaction and designed to resolve at a known point of shared alignment. The convergence is structured, not hoped for.

The failure mode is worth naming: sycophantic convergence. Agents that agree too early, before genuine exploration, will trigger false closure. This is the equivalent of a musician playing a tihai in the opening minutes because they ran out of ideas, not because the performance reached resolution. In music, this failure is immediately audible because it violates aesthetic timing expectations. In agent systems, detection is harder. It requires explicit diversity or disagreement metrics, not just checking whether agents agree.

A friend of mine does something like this manually: sending the same problem to Claude and ChatGPT, feeding each model's output to the other, iterating until the quality ratchets up. It is a handmade jugalbandi. It works because each model responds to the other's specific output, not to the original prompt in isolation.

What This Means for Agent Design

The analogy is an analogy. Hindustani classical music is a performing art refined over centuries with a depth and internal logic that dwarfs anything in agentic AI. I am borrowing its structure to illustrate a design gap, not claiming the correspondence is exact.

The gap: most multi-agent architectures have no equivalent of the shared raga, sawal-jawab, convergence at sam, or closure from within. They produce independent artifacts that a judge merges. Specifically, this suggests:

Shared constraint envelopes, not shared prompts. Agents working together need a structure defined before execution that makes their outputs mutually legible. What terms mean. What moves are out of bounds. What counts as progress. The raga is not the prompt. It is the rules of engagement.

Interaction over headcount. Two agents in genuine call-and-response will outperform five agents in parallel on tasks that require depth over breadth. Coverage is the ensemble's strength. Depth requires agents that build on each other's specific outputs.

Periodic convergence, not terminal merging. The sam pattern, where agents re-align at checkpoints mid-execution rather than only at the end, is an architecture choice that barely exists in current systems. It lets agents diverge far enough to be useful without diverging so far that merging becomes lossy.

Engineered convergence, not iteration counts. Rather than "run for N iterations" or "until the judge is satisfied," design the interaction to resolve at a known alignment point. The tihai is not "repeat until you happen to agree." It is local convergence engineered backward from where you need to land. Agent systems could define resolution criteria upfront and structure the interaction so it narrows toward them, rather than looping until an external judge calls time.

Humans choose constraints, not merge outputs. The human's role is upstream: selecting the right constraint structure for this problem, then evaluating the result. Not standing between agents synthesizing their phrases. The interaction itself produces the synthesis.

None of this is proven. The multi-agent field is too young for strong empirical claims about interaction architectures versus ensembles. But if your agents never hear each other, you should at least know what you are leaving on the table.


The musical descriptions in this post were reviewed for accuracy and stretching of analogies by Hindustani classical music afficionados. Any remaining imprecision in the analogy is mine. The argument about agent architectures stands on its own terms.

← Back to blog

Enjoyed this post? Get new ones in your inbox.