With the widespread application of large language models, how to efficiently interface calls with them has become a key issue. The traditional request-response mode has problems such as high response delay and poor user experience when generating large amounts of text in the face of large models. Streaming is an important means to solve this problem.
This article will introduce it based onServer-Sent Events(SSE)The protocol's big model streaming call specification and combined withSpring BootA complete server-side and client call example is given.
1. Why choose SSE?
When talking to a big model, the model usually generates content word by word. If a traditional HTTP request is used, you need to wait until the model has generated all the content before responding to the client, resulting in a high latency. Using the SSE protocol can achieve generation and push, greatly improving interactivity and user experience.
Advantages of SSE:
-
One-way connection: the server actively pushes, and the client automatically receives;
-
Using HTTP protocol, browser native support;
-
Simple implementation and suitable for streaming text output scenarios.
2. Streaming call interface specification (based on SSE)
Request method
-
method:POST
-
Content-Type:application/json
-
Accept:text/event-stream
Request Example
Response Format (SSE Stream)
-
Each line with
data:
Start with JSON string; -
The last line with
data: [DONE]
Indicates that the stream ends; -
The client needs to parse the received
content
Fields and display.
3. Spring Boot server example
Below is an example of an implementation of SSE streaming interface based on Spring Boot.
1. Controller layer
2. Request class definition
4. Client call example (Java)
Client streaming with Spring WebFlux:
WebClient client = (); () .uri("http://localhost:8080/chat/stream") .header(, MediaType.TEXT_EVENT_STREAM_VALUE) .bodyValue(("prompt", "Introduce the Romance of the Three Kingdoms", "stream", true)) .retrieve() .bodyToFlux() .doOnNext(::println) .blockLast();
V. Summary and Suggestions
Large-model streaming calls based on SSE can significantly improve response speed and user experience. Pay attention to when using:
-
SSE is suitable for text output. If it involves audio/pictures and other content, it is recommended to use WebSocket;
-
Exceptions and resource release should be considered when processing on the server;
-
Clients need to have real-time processing and splicing capabilities.