We establish a cross-modeling for the LLMUnified Tool Call API. With it, you can use the same code on different models in theMistral、Cohere、NousResearch maybeLlama etc., without or with little need to change the code associated with tool calls based on the model. Additionally, we have also added a new model to thetransformers
We've added a few new utility interfaces to make tool calls smoother, and we've also equipped theFull Documentation and the end-to-end tools used by thetypical example. We will continue to add more model support.
introductory
This is an interesting feature for LLM tools to use -- everyone thinks it's great, but most people have never tested it. The concept is simple: you give the LLM some tools (i.e., callable functions), and the LLM can call them on its own in response to user queries. Let's say you give it a calculator so it doesn't have to rely on its own unreliable arithmetic skills; you can also let it search the web or check your calendar, or grant it access to your company's database (read-only!) so it can extract the relevant information or You can also give it access to the company's database (read-only!) so that it can retrieve information or search for technical documents.
Tool calls allow LLM to break through many of their core limitations. Many LLMs are articulate and talkative, but tend to be imprecise when it comes to calculations and facts, and don't know much about the specifics of niche topics. They also don't know anything that happened after the training data deadline. They are generalists, but they know nothing about you or the context of the chat when they start it, other than the information you provide in system messages. Tools give them access to structured, specialized, relevant, up-to-date information that can help them become truly helpful partners, not just fascinating novelties.
However, the problem arises when you start to really experiment with the tool! Documentation is sparse and inconsistent, even contradictory - this is true for both closed source APIs and open models! While tooling is simple in theory, it's often a nightmare in practice: how do you get the tooling to the model? How do you make sure that tooltips match the format you used for training? How do you merge the tool into the chat tip when the model calls it? If you've ever tried your hand at implementing tool usage, you've probably found that these problems are surprisingly tricky, and in many cases poorly documented and sometimes even unhelpful.
To make matters worse, the implementations used by tools for different models can be wildly different. Even in the most basic matter of defining a usable toolset, some model vendors use JSON schema, while others want to use Python function headers. Even for those who wish to use JSON schema, the details often differ, thus creating huge API incompatibilities. WATCH! The user is being held down on the floor and frantically rubbed against it while being internally confused.
What can we do about it?
Chat Templates
Loyal fans of the Hugging Face Cinematic Universe will recall that the open-source community used to have a lot of fun at theChat Model Similar challenges have been faced in terms of The chat model uses<|start_of_user_turn|>
maybe<|end_of_message|>
and other control lexical elements to let the model know what's going on in the chat, but different models were trained to use completely different control lexical elements, which meant that users needed to write specific formatting code for each of the models they used. This was a huge headache at the time.
The final solution isChat Templates - That is, the model will come with a smallJinja templates, which standardize the chat format and control lexical elements for each model in the correct format. Chat templates mean that users can write chats in a generic, model-independent way and trust Jinja templates to handle model formatting-related matters.
Based on this, an obvious way to support the use of tools would be to extend the functionality of the chat template to support tools. That's exactly what we've done, but tools bring many new challenges to the template program. Let's take a look at those challenges and how we solved them. Hopefully, in the process, you'll gain a deeper understanding of how the solution works and how to best utilize it.
Passing Tools to Chat Templates
When designing a tool usage API, the primary requirement is that the way you define tools and pass them to chat templates should be intuitive. We found that the process for most users was to first write the tool function and then figure out how to generate the tool definition from it and pass it to the model. A natural thought was that it would be nice if users could simply pass the function directly to the chat template and have it generate the tool definition for them.
But here's the thing: the way you "pass a function" is highly dependent on the programming language you use, and many people do it through theJavaScript maybeRust Instead, Python interacts with the chat model. So we've found a compromise that we think gives us the best of both worlds: theChat templates define tools in JSON format, but if you pass Python functions to the template, we'll automatically convert them to JSON! This results in a nice, clean API: the
def get_current_temperature(location: str):
"""
Gets the temperature at a given location.
Args:
location: The location to get the temperature for
"""
return 22.0 # bug: Sometimes the temperature is not 22. low priority
tools = [get_current_temperature]
chat = [
{"role": "user", "content": "Hey, what's the weather like in Paris right now?"}
]
tool_prompt = tokenizer.apply_chat_template(
chat,
tools=tools,
add_generation_prompt=True,
return_tensors="pt"
)
existapply_chat_template
Internal.get_current_temperature
The function is converted to the full JSON format. To see the resulting format, you can callget_json_schema
Interface.
>>> from import get_json_schema
>>> get_json_schema(get_current_weather)
{
"type": "function",
"function": {
"name": "get_current_temperature",
"description": "Gets the temperature at a given location.",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "The location to get the temperature for"
}
},
"required": [
"location"
]
}
}
}
If you prefer manual control or coding in a language other than Python, you can organize your tools into JSON format and pass them directly to the template. However, when you use Python, you don't need to deal with the JSON format directly. You can just use the clearFunction Name,right Type Hints as well as the complete list of products containingParameter Documentation String (used form a nominal expression)documentation string to define your tool function, all of which will be used to generate the JSON formatting required by the template. In fact, these requirements are already Python best practices that you should have followed, and if you've already followed them, then there's no need for any more extra work; your function is already ready to be used as a tool!
Remember: Whether generated from document strings and type hints or manually, the accuracy of the JSON format is critical for the model to understand how to use the tool. The model will never see the code that implements the function, only the JSON format, so the clearer and more accurate they are, the better!
Calling Tools in Chat
A detail often overlooked by users (and model documentation 😬) is that when a model invokes a tool, it actually needs to set thetwo-step messages are added to the chat history. The first message is the modelcall (programming) tool's message, and the second message isResponse of the tool, i.e., the output of the called function.
Both the tool call and the tool response are necessary - remember that the model only knows what's in the chat history, and if it can't see the calls it makes and the parameters it passes, it won't be able to understand the tool's response.22
itself doesn't provide much information, but if the model knows that the message preceding it isget_current_temperature("Paris, France")
The other way would be very helpful.
This is handled very differently by different model vendors, and we have standardized the tool calls to beA field in a chat messageAs shown below.
message = {
"role": "assistant",
"tool_calls": [
{
"type": "function",
"function": {
"name": "get_current_temperature",
"arguments": {
"location": "Paris, France"
}
}
}
]
}
(message)
Adding Tool Responses to Chat
Tool responses are much simpler, especially if the tool only returns a single string or number.
message = {
"role": "tool",
"name": "get_current_temperature",
"content": "22.0"
}
(message)
hands-on
We've strung together the above code to build a complete example of using the tool. If you want to use tools in your own projects, we recommend you try our code - try running it yourself, adding or removing tools, switching models and tweaking details to get a feel for the whole system. This familiarity makes things easier when it comes to implementing tool usage in software! To make it even easier, we've also provided this example of thenotebook。
The first step is to set up the model, which we do using theHermes-2-Pro-Llama-3-8B
, because of its small size, power, freedom of use, and support for tool calls. But let's not forget that bigger models, may give better results on complex tasks!
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
checkpoint = "NousResearch/Hermes-2-Pro-Llama-3-8B"
tokenizer = AutoTokenizer.from_pretrained(checkpoint)
model = AutoModelForCausalLM.from_pretrained(checkpoint, torch_dtype=torch.bfloat16, device_map="auto")
Next, we set up the tools and chat messages to be used. We continue to use the aboveget_current_Temperature
:
def get_current_temperature(location: str):
"""
Gets the temperature at a given location.
Args:
location: The location to get the temperature for, in the format "city, country"
"""
return 22.0 # bug: Sometimes the temperature is not 22. low priority to fix tho
tools = [get_current_temperature]
chat = [
{"role": "user", "content": "Hey, what's the weather like in Paris right now?"}
]
tool_prompt = tokenizer.apply_chat_template(
chat,
tools=tools,
return_tensors="pt",
return_dict=True,
add_generation_prompt=True,
)
tool_prompt = tool_prompt.to()
Once the tools available to the model have been set up, it is necessary for the model to generate responses to user queries.
out = (**tool_prompt, max_new_tokens=128)
generated_text = out[0, tool_prompt['input_ids'].shape[1]:]
print((generated_text))
We got.
<tool_call>
{"arguments": {"location": "Paris, France"}, "name": "get_current_temperature"}
</tool_call><|im_end|>
The model requests the use of a tool! Note that it correctly infers that the parameter "Paris, France" should be passed instead of just "Paris", because it follows the format recommended by the function's documentation string.
But the model doesn't really call these tools programmatically; like all language models, it just generates text. As a programmer, you need to accept the model's request and call the function. First, we add the model's tool requests to the chat.
Note that this step may require some manual processing - while you should always add requests to the chat in the following format, the request text for the model caller tool (e.g.<tool_call>
tags) may vary between models. Usually, it's very intuitive, but keep in mind that when trying this in your own code, you may need some model-specific()
maybe()
!
message = {
"role": "assistant",
"tool_calls": [
{
"type": "function",
"function": {
"name": "get_current_temperature",
"arguments": {"location": "Paris, France"}
}
}
]
}
(message)
Now, let's actually call the tool in Python code and add its response to the chat:.
message = {
"role": "tool",
"name": "get_current_temperature",
"content": "22.0"
}
(message)
Then, as we did before, we update the chat message by format and pass it to the model so that it can use the tool to respond in the conversation:.
tool_prompt = tokenizer.apply_chat_template(
chat,
tools=tools,
return_tensors="pt",
return_dict=True,
add_generation_prompt=True,
)
tool_prompt = tool_prompt.to()
out = (**tool_prompt, max_new_tokens=128)
generated_text = out[0, tool_prompt['input_ids'].shape[1]:]
print((generated_text))
Finally, we get the final response to the user, which is constructed based on the information obtained in the intermediate tool invocation step:.
The current temperature in Paris is 22.0 degrees Celsius. Enjoy your day!<|im_end|>
Regrettable lack of harmonization of response formats
As you may have noticed in the example above, while the chat template can help hide differences between models in the chat format as well as the tool definition format, it still leaves something unfinished. When a model makes a tool call request, it's still using its own format, so it requires you to manually parse it before you can add it to the chat in a common format. Thankfully, most of the formats are very intuitive, so it should only take a few lines of()
, which at worst is estimated to be a simple()
Then you can create the tool call dictionary you need.
Nevertheless, this is the last remaining tail of "disunity". We have some ideas on how to solve this problem, but they are not yet mature, so let us "roll up our sleeves and get to work"!
summarize
While there is still a bit of a loose end, we think the situation is much improved over the previous one, where tool calls were scattered, confusing, and under-documented. We hope that our efforts at harmonization will make it easier for open source developers to use tools in their projects to enhance the powerful LLM with an amazing array of new tools. fromHermes-2-Pro-8B and other smaller models toMistral-Large、Command-R-Plus maybeLlama-3.1-405B and other state-of-the-art giant behemoths, more and more cutting-edge LLMs are already supporting the use of tools. We think tools will be an integral part of the next wave of LLM products, and we hope that the improvements we've made will make it easier for you to use them in your own projects. Good luck!
Original in English./blog/unified-tool-use
Original author: Matthew Carrigan
Translator: Matrix Yao (Yao Weifeng), Deep Learning Engineer at Intel, working on the application of transformer-family models to modal data and training inference for large-scale models.