Location>code7788 >text

Output Deepseek response streaming content to the front end via API

Popularity:932 ℃/2025-03-17 10:02:12

To implement streaming content output to the front-end through the API, you can adopt the following technical solution (taking Python back-end + front-end JavaScript as an example):

Solution 1: Use Server-Sent Events (SSE)

This is a streaming solution natively supported by the browser, the first choice is recommended

# Flask example
 from flask import Response, stream_with_context

 @('/stream')
 def stream_data():
     def generate():
         response = (
             model="deepseek-chat",
             messages=messages,
             stream=True
         )
        
         for chunk in response:
             if :
                 content = [0]. or ""
                 # SSE format requirements data: prefix and double newline characters
                 yield f"data: {({'content': content})}\n\n"
    
     return Response(stream_with_context(generate()), mimetype='text/event-stream')
// Front-end JavaScript
 const eventSource = new EventSource('/stream');

  = (event) => {
     const data = ();
     ('output').innerHTML += ;
 };

  = (err) => {
     ('EventSource failed:', err);
     ();
 };

Solution 2: Use Streaming HTTP Response (NDJSON)

More general streaming solution for non-browser clients

# FastAPI Example
 from fastapi import APIRouter
 from import StreamingResponse
 import json

 @("/stream")
 async def stream_data():
     async def generate():
         response = (
             model="deepseek-chat",
             messages=messages,
             stream=True
         )
        
         async for chunk in response:
             if :
                 content = [0]. or ""
                 yield ({"content": content}) + "\n" # NDJSON format
    
     return StreamingResponse(generate(), media_type='application/x-ndjson')
// Front-end JavaScript uses Fetch API
 async function streamData() {
     const response = await fetch('/stream');
     const reader = ();
     const decoder = new TextDecoder();
    
     while(true) {
         const { done, value } = await ();
         if(done) break;
        
         const chunk = (value);
         const data = (chunk);
         ('output').innerHTML += ;
     }
 }

Key configuration instructions

  1. Response header settings

    # Flask
     headers = {
         'Cache-Control': 'no-cache',
         'Connection': 'keep-alive'
     }
    
     # FastAPI automatic processing
  2. Data format selection

    • SSE (text/event-stream): Browser native support, automatic reconnection
    • NDJSON (application/x-ndjson): More general streaming JSON format
    • Plain text stream: simple but weak structured
  3. Front-end processing suggestions

    // More robust reading method (handling incomplete chunking)
     let buffer = '';
    
     async function processChunk(chunk) {
         buffer += chunk;
         while(('\n')) {
             const lineEnd = ('\n');
             const line = (0, lineEnd);
             buffer = (lineEnd + 1);
            
             try {
                 const data = (line);
                 // Processing data...
             } catch(e) {
                 ('Parse error:', e);
             }
         }
     }

Complete Workflow Example (FastAPI + React)

rear end

# 
from fastapi import FastAPI
from  import CORSMiddleware

app = FastAPI()

app.add_middleware(
    CORSMiddleware,
    allow_origins=["*"],
    allow_methods=["*"],
    allow_headers=["*"],
)

@("/chat")
async def chat_stream(prompt: str):
    async def generate():
        response = (
            model="deepseek-chat",
            messages=[{"role": "user", "content": prompt}],
            stream=True
        )
        
        async for chunk in response:
            if content := [0].:
                yield ({"content": content})
    
    return StreamingResponse(generate(), media_type="application/x-ndjson")

Front-end React Components

//
 import { useState } from 'react';

 export default function ChatComponent() {
     const [output, setOutput] = useState('');
    
     const startStream = async () => {
         const response = await fetch('http://api/chat?prompt=Hello');
         const reader = ();
         const decoder = new TextDecoder();
         let buffer = '';
        
         while(true) {
             const { done, value } = await ();
             if(done) break;
            
             buffer += (value);
             while(('}')) {
                 const endIndex = ('}') + 1;
                 const chunk = (0, endIndex);
                 buffer = (endIndex);
                
                 try {
                     const data = (chunk);
                     setOutput(prev => prev + );
                 } catch(e) {
                     ('Parse error:', e);
                 }
             }
         }
     };
    
     Return (
         <div>
             <button onClick={startStream}>Start the conversation</button>
             <div >{output}</div>
         </div>
     );
 }

Things to note

  1. Connection Management

    • Set a reasonable timeout (usually 30-60 seconds)
    • Handle the situation where the client disconnects in advance
    # FastAPI Example
     try:
         async for chunk in response:
             # ...process data
             if await request.is_disconnected():
                 break
     Finally:
         await () # Clean up resources
  2. Performance optimization

    • Using an asynchronous framework (FastAPI performs better than Flask)
    • Enable response compression
    app = FastAPI()
    @("http")
    async def add_compression(request, call_next):
        response = await call_next(request)
        ["Content-Encoding"] = "gzip"
        return response
    
  3. Safety considerations

    • Limit the maximum number of concurrent connections
    • Implement rate limiting
    from fastapi import Request
    from  import Middleware
    from slowapi import Limiter
    from  import get_remote_address
    
    limiter = Limiter(key_func=get_remote_address)
     = limiter
    
    @("/chat")
    @("10/minute")
    async def chat_stream(request: Request):
        # ...
    
  4. Error handling enhancement

    async def generate():
         try:
             response = (...)
             async for chunk in response:
                 # Processing data...
         except Exception as e:
             yield ({"error": str(e)})
         Finally:
             await () # Make sure resources are released

These solutions can be combined according to specific needs. It is recommended to choose SSE solutions (good browser compatibility). WebSocket can be considered when supporting more complex scenarios, but the latter is cost-effective to implement.