Chat Completions
Learn how to use the chat completions API to build conversational AI applications with streaming, multi-turn conversations, and advanced parameters.
Basic Chat Completion
The chat completions endpoint allows you to send a conversation history and receive an AI-generated response. Each message has a role (system, user, or assistant) and content.
Basic Example
1from openai import OpenAI
2
3client = OpenAI(
4 api_key="your-api-key-here",
5 base_url="https://api.selamgpt.com/v1"
6)
7
8response = client.chat.completions.create(
9 model="selam-turbo",
10 messages=[
11 {"role": "system", "content": "You are a helpful assistant."},
12 {"role": "user", "content": "Explain quantum computing in simple terms."}
13 ]
14)
15
16print(response.choices[0].message.content)Information
The system message sets the behavior of the assistant, while user messages represent input from the end user. The model responds with an assistant message.
Streaming Responses
Enable streaming to receive the response incrementally as it's generated, providing a better user experience for real-time applications.
Streaming Example
1from openai import OpenAI
2
3client = OpenAI(
4 api_key="your-api-key-here",
5 base_url="https://api.selamgpt.com/v1"
6)
7
8stream = client.chat.completions.create(
9 model="selam-turbo",
10 messages=[
11 {"role": "user", "content": "Write a short poem about Ethiopia."}
12 ],
13 stream=True
14)
15
16for chunk in stream:
17 if chunk.choices[0].delta.content:
18 print(chunk.choices[0].delta.content, end="", flush=True)Tip
Streaming is ideal for chatbots and interactive applications where you want to display responses as they're generated. See the Streaming Guide for more details.
Multi-turn Conversations
Build conversational experiences by including the full conversation history in your requests. The model uses this context to generate relevant responses.
Conversation Example
1from openai import OpenAI
2
3client = OpenAI(
4 api_key="your-api-key-here",
5 base_url="https://api.selamgpt.com/v1"
6)
7
8# Start with conversation history
9messages = [
10 {"role": "system", "content": "You are a helpful travel assistant."},
11 {"role": "user", "content": "What are the best places to visit in Ethiopia?"},
12 {"role": "assistant", "content": "Ethiopia has many amazing places! Some highlights include Lalibela's rock-hewn churches, the Simien Mountains, Lake Tana, and the historic city of Axum."},
13 {"role": "user", "content": "Tell me more about Lalibela."}
14]
15
16response = client.chat.completions.create(
17 model="selam-turbo",
18 messages=messages
19)
20
21print(response.choices[0].message.content)
22
23# Add the response to conversation history
24messages.append({
25 "role": "assistant",
26 "content": response.choices[0].message.content
27})Warning
Token Limits: Each model has a maximum context window. Long conversations may exceed this limit. Consider truncating older messages or summarizing the conversation to stay within limits.
System Prompts
System prompts define the assistant's behavior, personality, and capabilities. They're processed before user messages and help guide the model's responses.
1# Example 1: Technical Expert
2response = client.chat.completions.create(
3 model="selam-coder",
4 messages=[
5 {
6 "role": "system",
7 "content": "You are an expert Python developer. Provide clear, well-documented code examples with explanations."
8 },
9 {"role": "user", "content": "How do I read a CSV file in Python?"}
10 ]
11)
12
13# Example 2: Creative Writer
14response = client.chat.completions.create(
15 model="selam-plus",
16 messages=[
17 {
18 "role": "system",
19 "content": "You are a creative storyteller. Write engaging narratives with vivid descriptions and compelling characters."
20 },
21 {"role": "user", "content": "Write a short story about a coffee ceremony in Ethiopia."}
22 ]
23)
24
25# Example 3: Concise Assistant
26response = client.chat.completions.create(
27 model="selam-turbo",
28 messages=[
29 {
30 "role": "system",
31 "content": "You are a concise assistant. Provide brief, direct answers without unnecessary elaboration."
32 },
33 {"role": "user", "content": "What is the capital of Ethiopia?"}
34 ]
35)Tip
Best Practices: Be specific about the assistant's role, tone, and constraints. Include examples of desired behavior when possible. Test different system prompts to find what works best.
Temperature and Parameters
Control the randomness and creativity of responses using various parameters.
Key Parameters
temperature (0-2)
Controls randomness. Lower values (0.2) make output more focused and deterministic. Higher values (1.5) make output more creative and varied.
Default: 1.0
max_tokens
Maximum number of tokens to generate in the response. Use this to control response length.
Default: Model's maximum
top_p (0-1)
Nucleus sampling. The model considers tokens with top_p probability mass. Lower values make output more focused.
Default: 1.0
presence_penalty (-2 to 2)
Penalizes tokens that have appeared in the text so far, encouraging the model to talk about new topics.
Default: 0
frequency_penalty (-2 to 2)
Penalizes tokens based on their frequency in the text so far, reducing repetition.
Default: 0
Using Parameters
1from openai import OpenAI
2
3client = OpenAI(
4 api_key="your-api-key-here",
5 base_url="https://api.selamgpt.com/v1"
6)
7
8# Creative writing with high temperature
9response = client.chat.completions.create(
10 model="selam-plus",
11 messages=[
12 {"role": "user", "content": "Write a creative story about a robot."}
13 ],
14 temperature=1.5,
15 max_tokens=500,
16 presence_penalty=0.6
17)
18
19# Factual response with low temperature
20response = client.chat.completions.create(
21 model="selam-turbo",
22 messages=[
23 {"role": "user", "content": "What is photosynthesis?"}
24 ],
25 temperature=0.3,
26 max_tokens=200
27)Best Practices
Choose the Right Model
Use selam-turbo for fast, general-purpose tasks. Use selam-plus for complex reasoning. Use selam-coder for programming tasks. Use selam-thinking for deep analysis.
Manage Context Length
Monitor token usage to avoid exceeding context limits. For long conversations, consider summarizing older messages or removing less relevant context.
Handle Errors Gracefully
Implement retry logic for rate limits and transient errors. Validate user input before sending requests. See the Error Reference for details.
Use Streaming for Better UX
Enable streaming for interactive applications to show responses as they're generated. This provides a better user experience and makes your application feel more responsive.
Optimize System Prompts
Test different system prompts to find what works best for your use case. Be specific about desired behavior and include examples when possible.
Monitor Usage and Costs
Track your API usage and stay within rate limits. Consider caching responses for repeated queries to reduce costs and improve performance.
Related Resources
Was this page helpful?