Demystifying Prompt Series 38. Multi-Agent Routing Policies

There are several types of common multi-intelligence frameworks, such as ChatDev, CAMEL and other collaborative modes in which intelligences communicate with each other to accomplish tasks together, routing modes in which one intelligence is responsible for one type of task and selects the most suitable intelligence to accomplish the task, and of course, complex interaction modes in which multiple intelligences share a memory layer. In this chapter, we focus on intelligent body routing, that is, selecting the most suitable intelligent body to accomplish the task, and see what options are available.

In the previous chapter, we discussed the decision problem of when to use a RAG, and zoomed out to consider the RAG as one intelligent body and the base LLM as another, in fact, the RAG decision problem is also a microcosm of the multi-intelligent body routing problem. So what other types of intelligent body routing are there in practical application scenarios?

Intelligentsia in different roles, for example the funniest seen were different genres of fortune-telling robots
Intelligentsia with different tool mounts, e.g. access to different knowledge bases with different domain tools
Intelligentsia with different ways of thinking, e.g. COT thinking, with step-back thinking, with outline thinking
Intelligentsia for different workflows, e.g. e.g. no RAG, single-step RAG, multi-step RAG routing of intelligentsia
The fusion of the above is the integrated intelligence routing of different roles, tools, ways of thinking, and workflows

And here we look at two outlier strategies, i.e., schemes that can be routed directly on the outer layer of the currently existing multi-intelligentsia.

Capability- and domain-based routing of intelligences

One Agent To Rule Them All: Towards Multi-agent Conversational AI

/ChrisIsKing/black-box-multi-agent-integation

MARS is actually an article that predates the advent of big models, but serves as one of the foundational articles for multi-agent routing, which focuses on what happens when theSelection of intelligences for different domains (abilities). The ideas are very clear. The paper begins by defining the multi-intelligent body selection problem, which consists of the following components

Query: User Questions
agent skill: a description of the intelligence's capabilities, which can also be sample queries
agent response: the answer of an intelligent body to a question asked by a user

Then, naturally, there are two options for intelligences to choose from.One is directly based on query for selection (Query-Pairing) and one is based on intelligent body response for selection (Response-pairing)The current multi-intelligence body decision-making is also these two general directions, the former faster but limited accuracy, the latter slower but more effective. Here are the details of the program, because in practice you will find that both programs have difficulties.

Question pairing

The problem with query-based judgment is thatHow to describe what agent can do, the paper points out that the boundaries of the capabilities of intelligences are poorly defined and even more difficult to describe.

One option given in the paper is to usequery sampleAlthough we don't know the global capability of the model, we can know which query the model can answer based on the history of the user's usage, for example, "locate me some good places in Kentucky that serve sushi" is a question that can be answered by "Alexa", "Google". Then we can train a query based on the history of collected query samples.Multi-tag categorization model that predicts which intelligences can answer for each query. Actually, this scenario also uses a RESPONSE, except that it uses a historical AGENT answer.

In addition to query categorization, the paper also uses similarity. The paper collects descriptions of agent's capabilities on publicly available websites, such as "Our productivity bot helps you stay productive and organized. From sleep timers and alarms to reminders, calendar management, and email ...." . The textual similarity between the agent description and the query is then used to determine whether the agent can answer the question. Here the paper tried bm25, USE, and fine-tuning Roberta for vector coding. Previously, we have also considered a KNN-like scheme, but a problem with this scheme is that text similarity can measure domain differences, such as math Agent, finance Agent, but cannot distinguish the task complexity, so it is not applicable to other agent routing scenarios outside the domain.

Response Pairing

The core difficulty of using online model responses for routing is actually how to judge the quality of the response, the paper points out that the previous paper mostly judged by the similarity between response and query, which is not enough, but also to judge the accuracy, so the paper uses a cross-encoder to train the query-response ranking model The paper used cross-encoder to train the query-response ranking model. However, in the two years since the big model came out, there are more comprehensive evaluation criteria for response quality, such as OpenAI's 3H (Helful, Harmless, Honesty), DeepMind's more concerned about the 2H (helpful, harmless), and more Reward and Judement. There are also more training programs for Reward and Judement models, interested students can go to see theGood Alignment RLHF-OpenAI-DeepMind-Anthropic Comparative Analysis。

Here we will not go into detail about the program of the paper, directly look at the results. The paper was evaluated on the four major Agents (Aleax,Google,houndify,Adasa) at the time in 22 years, and the scheme based on Response sorting was the best, although the scheme using Query Sample classification was not too bad.

Intelligent body routing based on problem complexity

Adaptive-RAG: Learning to Adapt Retrieval-Augmented Large Language Models through Question Complexity

The previous MARS is more on the domain level to divide the intelligences, such as bank agent, weather agent, transport agent, but on the RAG problem, the domain difference more only affects the database routing, that is, which recalls are used and what data are checked. There is also a more important difference that comes from the complexity of the problem. A similar scheme is SELF-RAG, but it fuses routing into the model inference process, and the overall complexity is too high, and the usability is a bit low. So let's look at Adaptive-RAG's scheme for external routing.

Adaptive-RAG proposes to classify the complexity of the query through a classifier, and based on the classification result choose LLM direct answer, simple single-step RAG, or complex multi-step RAG (the paper chose Interleaving-COT), respectively, as follows

Then how to judge the complexity of a query, here is actually similar to the idea of query multi-label classification model in query pairing proposed by MARS earlier. It also uses the same query, the advantages and disadvantages of the answer results of the 3 patterns as labels to train the classification model, of course, it can also be a listwise sorting model. The paper uses a QA dataset with standard answers, so it will be easier to judge the results of the multi-model answers, and here the 3 ways of answering are also prioritized, that is, the default label is the simplest scheme if the simpler link can be answered correctly. Here the query classifier, the paper trained T5-Large, the sample is only 400 query, as well as each question corresponds to the answer results on the 3 kinds of links.

The collection of feedback from RAG samples in a real-world scenario is much more complex, requiring the training of a Reward model based on labeled samples to get a score on the quality of the responses, and then the use of the Reward model to score the responses from multiple links to get the classification labels.

If you have more RAG link choices and more complex prioritization, you may want to use a multi-label model to get multiple candidate agents, and then choose the Agent with the lowest complexity, or the highest priority on that task, to answer based on the prioritization between the multiple Agents.

Effectiveness papers were validated on single-step and multi-hopQA datasets, respectively, and Adaptive both used less time and steps to complete the task while guaranteeing better results (Oracle is more ceiling when the classifiers are perfectly correct)

Intelligent body routing based on user preferences

Zooter：Routing to the Expert: Efficient Reward-guided Ensemble of Large
Language Models

The third paper is to choose the most suitable agent from the user answer preference, in fact, it is also the optimal base model. Base model Ensemble and Routing is also considered an independent direction in intelligent body routing, including large model and small model routing in order to use less cost and faster speed to balance the effect, there are also multiple models with the same ability to complement each other's strengths and weaknesses of the model routing. Personally, I think the routing of base models is more complicated than agents, or rags, in different domains, because the differences between base models are more dispersed in textual representations, and the abstraction is difficult to categorize and classify. This difference may come from the pre-training data distribution difference, the style difference of the instruction dataset, or rlhf annotation rule difference, etc~.

It is because of the difficulty of differentiation that base-model routing requires more and richer training data if it is to use query-pairing to achieve similar results and generalization as response-pairing. zooter gives a distillation scheme, which is to train the reward model to score multiple model responses, and then use the model scores as the labels to train the query routing model. Here is how it works

For the distillation part, the paper draws on the distillation loss function, and in order to retain more information from the REWARD model, instead of transforming the REWARD scoring of multiple models into a top-answer multiclassification problem at the end, here the REWARD scores are normalized, and the KL-divergence is used directly to allow the model to fit the relative strengths and weaknesses between the responses of multiple models The model is then normalized to the normalized scores. At the same time, considering the noise problem of the reward-model itself, the paper also uses the label-smoothing scheme in distillation to reduce the noise and improve the confidence of model responses. In fact, it is also possible to use the entropy value of multi-model REWARD scoring for sample screening.

For the reward function, the paper uses QwenRM as the reward model, and 47,986 query samples were constructed from mixed multiple datasets, and mdeberta-v3-base was distilled for training.

Effectively, the paper compares six single-base models, using distilled models for query routing (ours), and using different Reward models for response routing, as well as SOTA GPT3.5 and GPT4

The effects of the different Reward models vary considerably, with Qwen and Ultra being significantly better on the four task sets currently evaluated
Thesis distillation of the Zooter model trained in the query routing effect can be basically comparable to the use of RM for response routing, the use of 1/6 of the inference cost can be done with similar effects have similar inference effect

More Smart Body Routing Related Solutions

More papers on RAG Routing, Intelligent Body Routing, and Base Model Routing Ensemble for those who are interested.

Intelligent Body Routing
- One Agent To Rule Them All: Towards Multi-agent Conversational AI
- A Multi-Agent Conversational Recommender System
Base Model Routing & Ensemble
- Large Language Model Routing with Benchmark Datasets
- LLM-BL E N D E R: Ensembling Large Language Models with Pairwise Ranking and Generative Fusion
- RouteLLM: Learning to Route LLMs with Preference Data
- More Agents Is All You Need
- Routing to the Expert: Efficient Reward-guided Ensemble of Large Language Models
dynamic (science)RAG（When to Search & Search Plan）
- SELF-RAG: LEARNING TO RETRIEVE, GENERATE, AND CRITIQUE THROUGH SELF-REFLECTION ⭐
- Self-Knowledge Guided Retrieval Augmentation for Large Language Models
- Self-DC: When to retrieve and When to generate Self Divide-and-Conquer for Compositional Unknown Questions
- Small Models, Big Insights: Leveraging Slim Proxy Models To Decide When and What to Retrieve for LLMs
- Adaptive-RAG: Learning to Adapt Retrieval-Augmented Large Language Models through Question Complexity
- REAPER: Reasoning based Retrieval Planning for Complex RAG Systems
- When to Retrieve: Teaching LLMs to Utilize Information Retrieval Effectively
- PlanRAG: A Plan-then-Retrieval Augmented Generation for Generative Large Language Models as Decision Makers

To see a fuller compendium of papers related to large models - fine-tuning and pre-training data and frameworks - AIGC applications, move to Github >> DecryPrompt