Streaming

Learn how to use Server-Sent Events (SSE) to stream chat completions in real-time, providing a better user experience for interactive applications.

What is Streaming?

Streaming allows you to receive the model's response incrementally as it's generated, rather than waiting for the entire response to complete. This creates a more responsive user experience, especially for longer responses.

Benefits of Streaming

  • Better UX: Users see responses appear in real-time, similar to typing
  • Faster Perceived Response: Users get feedback immediately, not after completion
  • Interruptible: Users can stop generation early if they have enough information
  • Lower Memory: Process chunks as they arrive instead of buffering the entire response

Information

Streaming uses Server-Sent Events (SSE), a standard protocol for server-to-client streaming over HTTP. The response is sent as a series of data: events.

Basic Streaming

Enable streaming by setting stream=true in your request. The response will be delivered as a series of chunks.

Basic Streaming Example

1from openai import OpenAI
2
3client = OpenAI(
4    api_key="your-api-key-here",
5    base_url="https://api.selamgpt.com/v1"
6)
7
8# Create a streaming request
9stream = client.chat.completions.create(
10    model="selam-turbo",
11    messages=[
12        {"role": "user", "content": "Explain machine learning in simple terms."}
13    ],
14    stream=True
15)
16
17# Process each chunk as it arrives
18for chunk in stream:
19    if chunk.choices[0].delta.content:
20        print(chunk.choices[0].delta.content, end="", flush=True)
21
22print()  # New line at the end

Stream Response Format

Each chunk in the stream is a JSON object prefixed with data:. The stream ends with a data: [DONE] message.

data: {"id":"chatcmpl-123","object":"chat.completion.chunk","created":1694268190,"model":"selam-turbo","choices":[{"index":0,"delta":{"role":"assistant","content":""},"finish_reason":null}]}

data: {"id":"chatcmpl-123","object":"chat.completion.chunk","created":1694268190,"model":"selam-turbo","choices":[{"index":0,"delta":{"content":"Machine"},"finish_reason":null}]}

data: {"id":"chatcmpl-123","object":"chat.completion.chunk","created":1694268190,"model":"selam-turbo","choices":[{"index":0,"delta":{"content":" learning"},"finish_reason":null}]}

data: {"id":"chatcmpl-123","object":"chat.completion.chunk","created":1694268190,"model":"selam-turbo","choices":[{"index":0,"delta":{"content":" is"},"finish_reason":null}]}

...

data: {"id":"chatcmpl-123","object":"chat.completion.chunk","created":1694268190,"model":"selam-turbo","choices":[{"index":0,"delta":{},"finish_reason":"stop"}]}

data: [DONE]

Information

The first chunk typically contains the role field. Subsequent chunks contain content deltas. The final chunk has a finish_reason.

Handling Stream Chunks

Process each chunk to extract the content delta and handle completion signals.

Advanced Stream Handling

1from openai import OpenAI
2
3client = OpenAI(
4    api_key="your-api-key-here",
5    base_url="https://api.selamgpt.com/v1"
6)
7
8stream = client.chat.completions.create(
9    model="selam-turbo",
10    messages=[
11        {"role": "user", "content": "Write a short story."}
12    ],
13    stream=True
14)
15
16full_response = ""
17finish_reason = None
18
19for chunk in stream:
20    # Extract delta content
21    delta = chunk.choices[0].delta
22    
23    # Check for content
24    if delta.content:
25        content = delta.content
26        full_response += content
27        print(content, end="", flush=True)
28    
29    # Check for finish reason
30    if chunk.choices[0].finish_reason:
31        finish_reason = chunk.choices[0].finish_reason
32
33print(f"\n\nFinish reason: {finish_reason}")
34print(f"Total tokens: {len(full_response.split())}")

Error Handling in Streams

Implement robust error handling to manage network issues, rate limits, and other errors during streaming.

Error Handling Example

1from openai import OpenAI, APIError, RateLimitError, APIConnectionError
2import time
3
4client = OpenAI(
5    api_key="your-api-key-here",
6    base_url="https://api.selamgpt.com/v1"
7)
8
9def stream_with_retry(messages, max_retries=3):
10    """Stream with automatic retry on transient errors."""
11    for attempt in range(max_retries):
12        try:
13            stream = client.chat.completions.create(
14                model="selam-turbo",
15                messages=messages,
16                stream=True
17            )
18            
19            for chunk in stream:
20                if chunk.choices[0].delta.content:
21                    yield chunk.choices[0].delta.content
22            
23            return  # Success, exit function
24            
25        except RateLimitError as e:
26            if attempt < max_retries - 1:
27                wait_time = 2 ** attempt  # Exponential backoff
28                print(f"\nRate limit hit. Retrying in {wait_time}s...")
29                time.sleep(wait_time)
30            else:
31                raise
32                
33        except APIConnectionError as e:
34            if attempt < max_retries - 1:
35                print(f"\nConnection error. Retrying...")
36                time.sleep(1)
37            else:
38                raise
39                
40        except APIError as e:
41            print(f"\nAPI error: {e}")
42            raise
43
44# Usage
45try:
46    for content in stream_with_retry([
47        {"role": "user", "content": "Tell me a joke."}
48    ]):
49        print(content, end="", flush=True)
50except Exception as e:
51    print(f"\nFailed after retries: {e}")

Warning

Important: Always implement timeout handling for streams. A stuck connection can hang indefinitely. Set reasonable timeouts and implement retry logic with exponential backoff.

Frontend Integration

Here's how to integrate streaming into a React application for a chat interface.

React Streaming Example

1import { useState } from 'react';
2import OpenAI from 'openai';
3import PageFeedback from '@/components/docs/PageFeedback';
4
5function ChatComponent() {
6    const [messages, setMessages] = useState([]);
7    const [input, setInput] = useState('');
8    const [isStreaming, setIsStreaming] = useState(false);
9
10    const client = new OpenAI({
11        apiKey: process.env.NEXT_PUBLIC_SELAM_API_KEY,
12        baseURL: "https://api.selamgpt.com/v1",
13        dangerouslyAllowBrowser: true  // Only for demo
14    });
15
16    const sendMessage = async () => {
17        if (!input.trim() || isStreaming) return;
18
19        const userMessage = { role: 'user', content: input };
20        setMessages(prev => [...prev, userMessage]);
21        setInput('');
22        setIsStreaming(true);
23
24        // Create assistant message placeholder
25        const assistantMessage = { role: 'assistant', content: '' };
26        setMessages(prev => [...prev, assistantMessage]);
27
28        try {
29            const stream = await client.chat.completions.create({
30                model: 'selam-turbo',
31                messages: [...messages, userMessage],
32                stream: true
33            });
34
35            for await (const chunk of stream) {
36                const content = chunk.choices[0]?.delta?.content;
37                if (content) {
38                    setMessages(prev => {
39                        const updated = [...prev];
40                        updated[updated.length - 1].content += content;
41                        return updated;
42                    });
43                }
44            }
45        } catch (error) {
46            console.error('Streaming error:', error);
47            setMessages(prev => {
48                const updated = [...prev];
49                updated[updated.length - 1].content = 'Error: Failed to get response';
50                return updated;
51            });
52        } finally {
53            setIsStreaming(false);
54        }
55    };
56
57    return (
58        <div className="chat-container">
59            <div className="messages">
60                {messages.map((msg, idx) => (
61                    <div key={idx} className={`message ${msg.role}`}>
62                        {msg.content}
63                    </div>
64                ))}
65            </div>
66            <div className="input-area">
67                <input
68                    value={input}
69                    onChange={(e) => setInput(e.target.value)}
70                    onKeyPress={(e) => e.key === 'Enter' && sendMessage()}
71                    disabled={isStreaming}
72                    placeholder="Type a message..."
73                />
74                <button onClick={sendMessage} disabled={isStreaming}>
75                    {isStreaming ? 'Sending...' : 'Send'}
76                </button>
77            </div>
78        <PageFeedback />
79
80
81        </div>
82    );
83}

Warning

Security Note: Never expose your API key in client-side code in production. Use a backend proxy to make API calls and keep your key secure on the server.

Best Practices

Use Streaming for Interactive UIs

Streaming is ideal for chatbots, writing assistants, and any application where users expect real-time feedback. It makes your application feel more responsive and engaging.

Implement Proper Error Handling

Network issues can interrupt streams. Always implement retry logic with exponential backoff and provide clear error messages to users.

Handle Timeouts

Set reasonable timeouts for streaming requests. A stuck connection can hang indefinitely without proper timeout handling.

Buffer Partial Content

For UI updates, consider buffering chunks and updating the display at regular intervals (e.g., every 50ms) to avoid excessive re-renders.

Allow Interruption

Give users the ability to stop generation early. This is especially important for long responses where users may get the information they need before completion.

Monitor Performance

Track metrics like time-to-first-token and tokens-per-second to ensure good performance. Slow streaming can negatively impact user experience.

Related Resources

Was this page helpful?