Deep analysis of Spring AI: the core logic of the request and response mechanism

We basically did a comprehensive review of the new changes in Spring Boot version 3 in the previous two chapters to make sure that we avoided any potential problems when we looked at Spring AI next. Today, we can finally get right to the point: how Spring AI initiates requests and returns information to the user.

In what follows, we'll focus on this process, and we can explore streaming answers and function callbacks in more detail in our next presentation.

Start parsing

First of all, for those who don't have a project yet, please make sure to install the required POM dependencies. Please note that the JDK version requirement is 17. Therefore, you can easily download and configure this version in IDEA.

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="/POM/4.0.0" xmlns:xsi="http:///2001/XMLSchema-instance"
    xsi:schemaLocation="/POM/4.0.0 /xsd/maven-4.0.">
    <modelVersion>4.0.0</modelVersion>
    <parent>
        <groupId></groupId>
        <artifactId>spring-boot-starter-parent</artifactId>
        <version>3.3.1</version>
        <relativePath/> <!-- lookup parent from repository -->
    </parent>
    <groupId></groupId>
    <artifactId>demo</artifactId>
    <version>0.0.1-SNAPSHOT</version>
    <name>demo</name>
    <description>Demo project for Spring Boot</description>
    <url/>
    <licenses>
        <license/>
    </licenses>
    <developers>
        <developer/>
    </developers>
    <scm>
        <connection/>
        <developerConnection/>
        <tag/>
        <url/>
    </scm>
    <properties>
        <>17</>
<!-- <>1.1.0</>-->
        <>1.0.0-M2</>
    </properties>
    <dependencies>

        <dependency>
            <groupId></groupId>
            <artifactId>spring-boot-starter-actuator</artifactId>
        </dependency>

        <dependency>
            <groupId></groupId>
            <artifactId>spring-boot-starter-web</artifactId>
        </dependency>
        <dependency>
            <groupId></groupId>
            <artifactId>spring-ai-openai-spring-boot-starter</artifactId>
        </dependency>
        <dependency>
            <groupId></groupId>
            <artifactId>knife4j-openapi3-jakarta-spring-boot-starter</artifactId>
            <version>4.1.0</version>
        </dependency>
        <dependency>
            <groupId></groupId>
            <artifactId>-api</artifactId>
            <version>4.0.1</version>
        </dependency>
        <dependency>
            <groupId></groupId>
            <artifactId>lombok</artifactId>
            <optional>true</optional>
        </dependency>
        <dependency>
            <groupId></groupId>
            <artifactId>spring-boot-starter-test</artifactId>
            <scope>test</scope>
        </dependency>
    </dependencies>
    <dependencyManagement>
        <dependencies>
            <dependency>
<!-- <groupId></groupId>-->
                <groupId></groupId>
                <artifactId>spring-ai-bom</artifactId>
                <version>${}</version>
                <type>pom</type>
                <scope>import</scope>
            </dependency>
        </dependencies>
    </dependencyManagement>

    <build>
        <plugins>
            <plugin>
                <groupId></groupId>
                <artifactId>native-maven-plugin</artifactId>
                <configuration>
                    <!-- imageNameUsed to set the name of the generated binary file -->
                    <imageName>${}</imageName>
                    <!-- mainClassfor specifyingmainMethod Class Path -->
                    <mainClass></mainClass>
                    <buildArgs>
                        --no-fallback
                    </buildArgs>
                </configuration>
                <executions>
                    <execution>
                        <id>build-native</id>
                        <goals>
                            <goal>compile-no-fork</goal>
                        </goals>
                        <phase>package</phase>
                    </execution>
                </executions>
            </plugin>
            <plugin>
                <groupId></groupId>
                <artifactId>spring-boot-maven-plugin</artifactId>
                <configuration>
                    <excludes>
                        <exclude>
                            <groupId></groupId>
                            <artifactId>lombok</artifactId>
                        </exclude>
                    </excludes>
                </configuration>
            </plugin>
        </plugins>
    </build>
    <repositories>
        <repository>
            <id>spring-milestones</id>
            <name>Spring Milestones</name>
            <url>/milestone</url>
            <snapshots>
                <enabled>false</enabled>
            </snapshots>
        </repository>
    </repositories>

</project>

The basic usage has been covered in previous lectures, so we won't go into detail here. In order to better understand the concept, we will demonstrate it with two concrete examples.

The first example will show the implementation of blocking answers, while the second will deal with answers with contextual information memorization. Both will help us gain a deeper understanding of how these techniques can be flexibly utilized in real-world applications.

basic usage

An example usage of a blocking answer will be provided here to better understand its application scenario and specific implementation.

@PostMapping("/ai")
ChatDataPO generationByText(@RequestParam("userInput")  String userInput) {
    String content = ()
                .user(userInput)
                .call()
                .content();
    ("content: {}", content);
    ChatDataPO chatDataPO = ().code("text").data(().text(content).build()).build();;
    return chatDataPO;
}

In this example, we'll show how to implement a mechanism that waits for the AI to complete its answer and returns the result directly to the interface caller. The process is actually quite simple; you just pass the question to theuser parameter will suffice. Next, we'll do some source code parsing.

In order to save time, we will not analyze the code of the intermediate processes in detail, line by line, as this may seem lengthy and complex. Instead, we will focus directly on the key source code to understand its core logic and implementation details more efficiently.

Source Code Analysis - Build Requests

We'll now go straight tocontent Methods are analyzed in depth. In the previous steps, the parameter calls of all the methods are mainly to build an object in preparation for subsequent operations. Instead, the real core calling logic is centered on thecontent method internally.

private ChatResponse doGetChatResponse(DefaultChatClientRequestSpec inputRequest, String formatParam) {

            Map<String, Object> context = new ConcurrentHashMap<>();
            (());
            DefaultChatClientRequestSpec advisedRequest = (inputRequest,
                    context);

            var processedUserText = (formatParam)
                    ? () + () + "{spring_ai_soc_format}"
                    : ();

            Map<String, Object> userParams = new HashMap<>(());
            if ((formatParam)) {
                ("spring_ai_soc_format", formatParam);
            }

            var messages = new ArrayList<Message>(());
            var textsAreValid = ((processedUserText)
                    || (()));
            if (textsAreValid) {
                if ((())
                        || !().isEmpty()) {
                    var systemMessage = new SystemMessage(
                            new PromptTemplate((), ())
                                .render());
                    (systemMessage);
                }
                UserMessage userMessage = null;
                if (!(userParams)) {
                    userMessage = new UserMessage(new PromptTemplate(processedUserText, userParams).render(),
                            ());
                }
                else {
                    userMessage = new UserMessage(processedUserText, ());
                }
                (userMessage);
            }

            if (() instanceof FunctionCallingOptions functionCallingOptions) {
                if (!().isEmpty()) {
                    (new HashSet<>(()));
                }
                if (!().isEmpty()) {
                    (());
                }
            }
            var prompt = new Prompt(messages, ());
            var chatResponse = (prompt);

            ChatResponse advisedResponse = chatResponse;
            // apply the advisors on response
            if (!(())) {
                var currentAdvisors = new ArrayList<>(());
                for (RequestResponseAdvisor advisor : currentAdvisors) {
                    advisedResponse = (advisedResponse, context);
                }
            }

            return advisedResponse;
        }

The lack of any comments in this code is really surprising and speaks volumes about what Spring code was designed for - more for developers to use than for humans to read. The core idea is that it is enough to be able to use it effectively. Although this code appears to be clean and simple, its importance cannot be overlooked. All of the implementations are very concise, with no redundant code, so instead of trimming it down, I decided to present it in its entirety.

To help you better understand the logic and structure, I will use pseudo-code to explain it.

Initialization Context: Create an empty context.

Requests for adjustments: The logic of request adjustment is based on the dynamic processing of input requests based on context. First, we need to determine whether the request object has beenadvisor Packaging. If needed then we will return a file that has beenadvisor Packaged request object.

Below is the relevant source code implementation that shows the specifics of this logic:

public static DefaultChatClientRequestSpec adviseOnRequest(DefaultChatClientRequestSpec inputRequest,
                Map<String, Object> context) {

//....Omit a bunch of code here
        var currentAdvisors = new ArrayList<>();
                for (RequestResponseAdvisor advisor : currentAdvisors) {
                    adviseRequest = (adviseRequest, context);
                }
                advisedRequest = new DefaultChatClientRequestSpec((), (),
                        (), (), (),
                        (), (), (),
                        (), (), (),
                        (), (),
                        ());
            }

            return advisedRequest;
        }

Here, I would like to explain in detail(adviseRequest, context) Functionality and importance of this method. The role of this method is especially critical since we have configured the enhancement class to, for example, introduce a chat memory feature. Specifically, it is responsible for augmenting incoming requests to meet specific business requirements.

It's worth noting that this enhancement request method is the counterpart to the enhancement response method, and they usually appear in pairs. Next, take a deeper look at theadviseRequest The specific implementation of the method:

String content = ()
                .advisors(new MessageChatMemoryAdvisor(chatMemory))
                .user(userInput)
                .call()
                .content();

We configured theMessageChatMemoryAdvisor class, whose core method is implemented by storing the corresponding message into a chat memory after receiving it. This way, the next time a request is processed, the relevant content can be extracted directly from the chat memory.

public AdvisedRequest adviseRequest(AdvisedRequest request, Map<String, Object> context) {

    //Omit a bunch of code here
    // 4. Add the new user input to the conversation memory.
    UserMessage userMessage = new UserMessage((), ());
    ().add((context), userMessage);

    return advisedRequest;
}

Processing user text, constructing user parameters: need to be based onformatParam method to process the user input. Specifically, this step involves not only formatting the user text, but also updating the appropriate user parameters.

Next, we will show concrete examples of implementations for a clearer understanding of the operational details of this process:

.user(u -> ("""
                Generate the filmography for a random actor.
                {format}
              """)
            .param("format", ()))

The above snippet will set the{format} replaced with the actual formatted message. In addition to user-supplied parameters, system messages likewise contain parameters that need to be parsed, which must also be passed in correctly during processing.

Building Message Lists: Depending on the validity of the system text and the user text, the process of constructing a message integrates the two. We can add all valid messages to a List collection for subsequent processing. In addition, an information object is created to hold information about these messages to ensure that they can be easily accessed and managed when needed.

Whether or not there is a function callback: If so, set up the specific function. (More on this in the next section)

Generate Chat Alerts: Create a prompt new Prompt() object and call the chat model api to get the return information.

Return to Enhancements: If the current request object is configured with an advisor, then the corresponding enhancement method will be called. In addition, the system will automatically store the corresponding question and answer in the information list, so the corresponding information needs to be recorded as well.

public ChatResponse adviseResponse(ChatResponse chatResponse, Map<String, Object> context) {

    List<Message> assistantMessages = ().stream().map(g -> (Message) ()).toList();

    ().add((context), assistantMessages);

    return chatResponse;
}

Return results: Returns the final chat response.

Source Code Analysis - Requesting OpenAI

Next, we will explore in detail the specifics of how to invoke the OpenAI interface via a request object. To do this, we'll use the OpenAI source code as the basis for our analysis. If you are using another AI product, the process will be different and the system will jump around depending on the product. As shown in the figure:

We will provide a comprehensive analysis of OpenAI's request invocation process to gain a deeper understanding of the mechanisms and implementation details behind it:

public ChatResponse call(Prompt prompt) {

    ChatCompletionRequest request = createRequest(prompt, false);

    ChatModelObservationContext observationContext = ()
        .prompt(prompt)
        .provider(OpenAiApiConstants.PROVIDER_NAME)
        .requestOptions(buildRequestOptions(request))
        .build();

    ChatResponse response = ChatModelObservationDocumentation.CHAT_MODEL_OPERATION
        .observation(, DEFAULT_OBSERVATION_CONVENTION, () -> observationContext,
                )
        .observe(() -> {

            ResponseEntity<ChatCompletion> completionEntity = 
                .execute(ctx -> (request, getAdditionalHttpHeaders(prompt)));

            var chatCompletion = ();

            if (chatCompletion == null) {
                ("No chat completion returned for prompt: {}", prompt);
                return new ChatResponse(());
            }

            List<Choice> choices = ();
            if (choices == null) {
                ("No choices returned for prompt: {}", prompt);
                return new ChatResponse(());
            }

            List<Generation> generations = ().map(choice -> {
        // @formatter:off
                Map<String, Object> metadata = (
                        "id", () != null ? () : "",
                        "role", ().role() != null ? ().role().name() : "",
                        "index", (),
                        "finishReason", () != null ? ().name() : "",
                        "refusal", (().refusal()) ? ().refusal() : "");
                // @formatter:on
                return buildGeneration(choice, metadata);
            }).toList();

            // Non function calling.
            RateLimit rateLimit = (completionEntity);

            ChatResponse chatResponse = new ChatResponse(generations, from((), rateLimit));

            (chatResponse);

            return chatResponse;

        });

    if (response != null && isToolCall(response, (.TOOL_CALLS.name(),
            ()))) {
        var toolCallConversation = handleToolCalls(prompt, response);
        // Recursively call the call method with the tool call message
        // conversation that contains the call responses.
        return (new Prompt(toolCallConversation, ()));
    }

    return response;
}

While they are all valuable and deletion is not a good option, we may need to analyze them carefully due to the lack of annotations. Let's take a look at this information and gradually sort out the logic and main points.

createRequest The main purpose of the function is to build the request object that will be needed when the API is actually called. Since different service providers have different interface designs, we need to implement this process ourselves according to the specific API specification. For example, when calling OpenAI's interface, we need to build a specific parameter structure, a process that should already be familiar to you. As you can see in the figure below, we can see the parameters and their formats required to construct the request.

ChatModelObservationContext Primarily used to configure other restrictions and requirements associated with the request. This includes several key parameters, such as a limit on the maximum number of tokens for this request, the specific type of OpenAI quiz model used, and a limit on the frequency of requests. As shown in the code:

private ChatOptions buildRequestOptions( request) {
    return ()
        .withModel(())
        .withFrequencyPenalty(())
        .withMaxTokens(())
        .withPresencePenalty(())
        .withStopSequences(())
        .withTemperature(())
        .withTopP(())
        .build();
}

The remaining big ChatResponse method is responsible for actually executing the API request and processing the response. There are a few key details worth noting in this process.

The request object uses theretryTemplateThis is a request API tool with a retry mechanism. It is designed to enhance the reliability of requests, especially in the face of temporary failures or network problems, and can automatically retry to improve the success rate. More flexibly, theretryTemplate Allows users to configure it to meet the needs of different application scenarios.

Users can adjust the number of retries, the retry interval, and other related parameters according to actual needs, all of which can be configured through the This prefix is customized. You can see this class in detail:

@AutoConfiguration
@ConditionalOnClass()
@EnableConfigurationProperties({ })
public class SpringAiRetryAutoConfiguration {
  //Omit a bunch of code here
}

Then, if OpenAI's interface returns a response properly, the system will begin formatting the answer. There are several key fields involved in this process, all of which should be quite familiar to programmers, especially those with previous interface interfacing experience.

Map<String, Object> metadata = (
                            "id", () != null ? () : "",
                            "role", ().role() != null ? ().role().name() : "",
                            "index", (),
                            "finishReason", () != null ? ().name() : "",
                            "refusal", (().refusal()) ? ().refusal() : "");

Then, after receiving all the return parameters, the system consolidates them and returns them to theresponse Object. However, at this stage, we make another important judgment, checking whether or not theisToolCallThis judgment actually involves the mechanism of function callbacks. This judgment actually involves the mechanism of function callbacks, this part of the implementation of the logic is very critical, but today we do not go into this detail, leave it for the next time to explain.

At this point, the entire calling process has been successfully completed. Our interface smoothly and happily returns the processed information to the caller, ensuring an efficient response to the user request.

summarize

In this exploration, we focus on how Spring AI effectively initiates requests and delivers response information to the user. This process not only bridges the gap between developers and AI interactions, but is also key to optimizing the user experience. With a well-defined request structure and response mechanism, Spring AI is able to flexibly handle a variety of user inputs and adjust the answering strategy according to the context.

We then analyze the core of this mechanism in depth, focusing on the specific implementation and business logic. In the process, we demonstrate how blocked answers and answers with contextual memory work in real-world applications through examples. This hands-on approach not only helps us better understand how Spring AI works, but also lays the groundwork for future in-depth discussions of streaming answers and function callbacks.

Understanding the logic behind this process will provide strong support for applying Spring AI in our daily development. As technology continues to advance, so do the challenges developers face, but with this clear request and response architecture, we can deal with complexity more comfortably and achieve smarter solutions.

I'm Rain, a Java server-side coder, studying the mysteries of AI technology. I love technical communication and sharing, and I am passionate about open source community. I am also a Tencent Cloud Creative Star, Ali Cloud Expert Blogger, Huawei Cloud Enjoyment Expert, and Nuggets Excellent Author.

💡 I won't be shy about sharing my personal explorations and experiences on the path of technology, in the hope that I can bring some inspiration and help to your learning and growth.

🌟 Welcome to the effortless drizzle! 🌟