Chat Completions

Learn how to use the chat completions API to build conversational AI applications with streaming, multi-turn conversations, and advanced parameters.

Basic Chat Completion

The chat completions endpoint allows you to send a conversation history and receive an AI-generated response. Each message has a role (system, user, or assistant) and content.

Basic Example

1from openai import OpenAI
2
3client = OpenAI(
4    api_key="your-api-key-here",
5    base_url="https://api.selamgpt.com/v1"
6)
7
8response = client.chat.completions.create(
9    model="selam-turbo",
10    messages=[
11        {"role": "system", "content": "You are a helpful assistant."},
12        {"role": "user", "content": "Explain quantum computing in simple terms."}
13    ]
14)
15
16print(response.choices[0].message.content)

Information

The system message sets the behavior of the assistant, while user messages represent input from the end user. The model responds with an assistant message.

Streaming Responses

Enable streaming to receive the response incrementally as it's generated, providing a better user experience for real-time applications.

Streaming Example

1from openai import OpenAI
2
3client = OpenAI(
4    api_key="your-api-key-here",
5    base_url="https://api.selamgpt.com/v1"
6)
7
8stream = client.chat.completions.create(
9    model="selam-turbo",
10    messages=[
11        {"role": "user", "content": "Write a short poem about Ethiopia."}
12    ],
13    stream=True
14)
15
16for chunk in stream:
17    if chunk.choices[0].delta.content:
18        print(chunk.choices[0].delta.content, end="", flush=True)

Tip

Streaming is ideal for chatbots and interactive applications where you want to display responses as they're generated. See the Streaming Guide for more details.

Multi-turn Conversations

Build conversational experiences by including the full conversation history in your requests. The model uses this context to generate relevant responses.

Conversation Example

1from openai import OpenAI
2
3client = OpenAI(
4    api_key="your-api-key-here",
5    base_url="https://api.selamgpt.com/v1"
6)
7
8# Start with conversation history
9messages = [
10    {"role": "system", "content": "You are a helpful travel assistant."},
11    {"role": "user", "content": "What are the best places to visit in Ethiopia?"},
12    {"role": "assistant", "content": "Ethiopia has many amazing places! Some highlights include Lalibela's rock-hewn churches, the Simien Mountains, Lake Tana, and the historic city of Axum."},
13    {"role": "user", "content": "Tell me more about Lalibela."}
14]
15
16response = client.chat.completions.create(
17    model="selam-turbo",
18    messages=messages
19)
20
21print(response.choices[0].message.content)
22
23# Add the response to conversation history
24messages.append({
25    "role": "assistant",
26    "content": response.choices[0].message.content
27})

Warning

Token Limits: Each model has a maximum context window. Long conversations may exceed this limit. Consider truncating older messages or summarizing the conversation to stay within limits.

System Prompts

System prompts define the assistant's behavior, personality, and capabilities. They're processed before user messages and help guide the model's responses.

1# Example 1: Technical Expert
2response = client.chat.completions.create(
3    model="selam-coder",
4    messages=[
5        {
6            "role": "system",
7            "content": "You are an expert Python developer. Provide clear, well-documented code examples with explanations."
8        },
9        {"role": "user", "content": "How do I read a CSV file in Python?"}
10    ]
11)
12
13# Example 2: Creative Writer
14response = client.chat.completions.create(
15    model="selam-plus",
16    messages=[
17        {
18            "role": "system",
19            "content": "You are a creative storyteller. Write engaging narratives with vivid descriptions and compelling characters."
20        },
21        {"role": "user", "content": "Write a short story about a coffee ceremony in Ethiopia."}
22    ]
23)
24
25# Example 3: Concise Assistant
26response = client.chat.completions.create(
27    model="selam-turbo",
28    messages=[
29        {
30            "role": "system",
31            "content": "You are a concise assistant. Provide brief, direct answers without unnecessary elaboration."
32        },
33        {"role": "user", "content": "What is the capital of Ethiopia?"}
34    ]
35)

Tip

Best Practices: Be specific about the assistant's role, tone, and constraints. Include examples of desired behavior when possible. Test different system prompts to find what works best.

Temperature and Parameters

Control the randomness and creativity of responses using various parameters.

Key Parameters

temperature (0-2)

Controls randomness. Lower values (0.2) make output more focused and deterministic. Higher values (1.5) make output more creative and varied.

Default: 1.0

max_tokens

Maximum number of tokens to generate in the response. Use this to control response length.

Default: Model's maximum

top_p (0-1)

Nucleus sampling. The model considers tokens with top_p probability mass. Lower values make output more focused.

Default: 1.0

presence_penalty (-2 to 2)

Penalizes tokens that have appeared in the text so far, encouraging the model to talk about new topics.

Default: 0

frequency_penalty (-2 to 2)

Penalizes tokens based on their frequency in the text so far, reducing repetition.

Default: 0

Using Parameters

1from openai import OpenAI
2
3client = OpenAI(
4    api_key="your-api-key-here",
5    base_url="https://api.selamgpt.com/v1"
6)
7
8# Creative writing with high temperature
9response = client.chat.completions.create(
10    model="selam-plus",
11    messages=[
12        {"role": "user", "content": "Write a creative story about a robot."}
13    ],
14    temperature=1.5,
15    max_tokens=500,
16    presence_penalty=0.6
17)
18
19# Factual response with low temperature
20response = client.chat.completions.create(
21    model="selam-turbo",
22    messages=[
23        {"role": "user", "content": "What is photosynthesis?"}
24    ],
25    temperature=0.3,
26    max_tokens=200
27)

Best Practices

Choose the Right Model

Use selam-turbo for fast, general-purpose tasks. Use selam-plus for complex reasoning. Use selam-coder for programming tasks. Use selam-thinking for deep analysis.

Manage Context Length

Monitor token usage to avoid exceeding context limits. For long conversations, consider summarizing older messages or removing less relevant context.

Handle Errors Gracefully

Implement retry logic for rate limits and transient errors. Validate user input before sending requests. See the Error Reference for details.

Use Streaming for Better UX

Enable streaming for interactive applications to show responses as they're generated. This provides a better user experience and makes your application feel more responsive.

Optimize System Prompts

Test different system prompts to find what works best for your use case. Be specific about desired behavior and include examples when possible.

Monitor Usage and Costs

Track your API usage and stay within rate limits. Consider caching responses for repeated queries to reduce costs and improve performance.

Related Resources

Was this page helpful?