Vision

Learn how to use vision-capable models to understand and analyze images, enabling multimodal AI applications.

What is Vision?

Vision capabilities allow AI models to understand and analyze images alongside text. You can ask questions about images, extract information, describe scenes, and more.

Vision-Capable Models

  • selam-plus: Advanced vision understanding with detailed analysis
  • selam-turbo: Fast vision processing for quick tasks

Information

Vision models can analyze images provided as URLs or base64-encoded data. They support common formats like JPEG, PNG, GIF, and WebP.

Using Image URLs

The simplest way to provide images is via publicly accessible URLs.

Image URL Example

1from openai import OpenAI
2
3client = OpenAI(
4    api_key="your-api-key-here",
5    base_url="https://api.selamgpt.com/v1"
6)
7
8response = client.chat.completions.create(
9    model="selam-plus",
10    messages=[
11        {
12            "role": "user",
13            "content": [
14                {"type": "text", "text": "What's in this image?"},
15                {
16                    "type": "image_url",
17                    "image_url": {
18                        "url": "https://example.com/image.jpg"
19                    }
20                }
21            ]
22        }
23    ]
24)
25
26print(response.choices[0].message.content)

Using Base64-Encoded Images

For local images or when URLs aren't available, encode images as base64 strings.

Base64 Image Example

1from openai import OpenAI
2import base64
3
4client = OpenAI(
5    api_key="your-api-key-here",
6    base_url="https://api.selamgpt.com/v1"
7)
8
9# Read and encode image
10with open("image.jpg", "rb") as image_file:
11    base64_image = base64.b64encode(image_file.read()).decode('utf-8')
12
13response = client.chat.completions.create(
14    model="selam-plus",
15    messages=[
16        {
17            "role": "user",
18            "content": [
19                {"type": "text", "text": "Describe this image in detail."},
20                {
21                    "type": "image_url",
22                    "image_url": {
23                        "url": f"data:image/jpeg;base64,{base64_image}"
24                    }
25                }
26            ]
27        }
28    ]
29)
30
31print(response.choices[0].message.content)

Warning

Size Limits: Base64-encoded images increase payload size by ~33%. Keep images under 5MB to avoid issues. Consider resizing large images before encoding.

Multiple Images

You can include multiple images in a single request for comparison or combined analysis.

Multiple Images Example

1from openai import OpenAI
2
3client = OpenAI(
4    api_key="your-api-key-here",
5    base_url="https://api.selamgpt.com/v1"
6)
7
8response = client.chat.completions.create(
9    model="selam-plus",
10    messages=[
11        {
12            "role": "user",
13            "content": [
14                {"type": "text", "text": "Compare these two images. What are the differences?"},
15                {
16                    "type": "image_url",
17                    "image_url": {"url": "https://example.com/image1.jpg"}
18                },
19                {
20                    "type": "image_url",
21                    "image_url": {"url": "https://example.com/image2.jpg"}
22                }
23            ]
24        }
25    ]
26)
27
28print(response.choices[0].message.content)

Vision in Conversations

Combine vision with multi-turn conversations to have interactive discussions about images.

Conversational Vision Example

1from openai import OpenAI
2
3client = OpenAI(
4    api_key="your-api-key-here",
5    base_url="https://api.selamgpt.com/v1"
6)
7
8# First message with image
9messages = [
10    {
11        "role": "user",
12        "content": [
13            {"type": "text", "text": "What's in this image?"},
14            {
15                "type": "image_url",
16                "image_url": {"url": "https://example.com/coffee.jpg"}
17            }
18        ]
19    }
20]
21
22response = client.chat.completions.create(
23    model="selam-plus",
24    messages=messages
25)
26
27print(response.choices[0].message.content)
28
29# Add response to conversation
30messages.append({
31    "role": "assistant",
32    "content": response.choices[0].message.content
33})
34
35# Follow-up question (no need to resend image)
36messages.append({
37    "role": "user",
38    "content": "What type of coffee beans might be used?"
39})
40
41response = client.chat.completions.create(
42    model="selam-plus",
43    messages=messages
44)
45
46print(response.choices[0].message.content)

Tip

Pro tip: You don't need to resend the image in follow-up messages. The model retains the image context throughout the conversation.

Common Use Cases

Image Description

Generate detailed descriptions of images for accessibility, cataloging, or content creation.

Document Analysis

Extract text and information from documents, receipts, forms, and screenshots.

Product Recognition

Identify products, brands, and items in images for e-commerce and inventory management.

Code Understanding

Analyze code screenshots, diagrams, and technical documentation.

Educational Content

Help students understand diagrams, solve math problems, and analyze visual content.

Image Comparison

Compare multiple images to identify differences, similarities, or changes over time.

Best Practices

Use High-Quality Images

Clear, well-lit images produce better results. Avoid blurry, low-resolution, or heavily compressed images.

Be Specific in Prompts

Instead of "What's in this image?", try "List all the objects visible in this image" or "Describe the architectural style of this building."

Optimize Image Size

Resize large images before sending to reduce latency and costs. Most use cases work well with images under 2MB.

Handle Errors Gracefully

Implement error handling for invalid images, unsupported formats, or network issues. Provide clear feedback to users.

Consider Privacy

Be mindful of sensitive information in images. Don't send images containing personal data, passwords, or confidential information.

Use Appropriate Models

Use selam-plus for detailed analysis and complex tasks. Use selam-turbo for quick, simple vision tasks.

Supported Image Formats

📷
JPEG
.jpg, .jpeg
🖼️
PNG
.png
🎞️
GIF
.gif
🌐
WebP
.webp

Related Resources

Was this page helpful?