Vision
Learn how to use vision-capable models to understand and analyze images, enabling multimodal AI applications.
What is Vision?
Vision capabilities allow AI models to understand and analyze images alongside text. You can ask questions about images, extract information, describe scenes, and more.
Vision-Capable Models
- selam-plus: Advanced vision understanding with detailed analysis
- selam-turbo: Fast vision processing for quick tasks
Information
Vision models can analyze images provided as URLs or base64-encoded data. They support common formats like JPEG, PNG, GIF, and WebP.
Using Image URLs
The simplest way to provide images is via publicly accessible URLs.
Image URL Example
1from openai import OpenAI
2
3client = OpenAI(
4 api_key="your-api-key-here",
5 base_url="https://api.selamgpt.com/v1"
6)
7
8response = client.chat.completions.create(
9 model="selam-plus",
10 messages=[
11 {
12 "role": "user",
13 "content": [
14 {"type": "text", "text": "What's in this image?"},
15 {
16 "type": "image_url",
17 "image_url": {
18 "url": "https://example.com/image.jpg"
19 }
20 }
21 ]
22 }
23 ]
24)
25
26print(response.choices[0].message.content)Using Base64-Encoded Images
For local images or when URLs aren't available, encode images as base64 strings.
Base64 Image Example
1from openai import OpenAI
2import base64
3
4client = OpenAI(
5 api_key="your-api-key-here",
6 base_url="https://api.selamgpt.com/v1"
7)
8
9# Read and encode image
10with open("image.jpg", "rb") as image_file:
11 base64_image = base64.b64encode(image_file.read()).decode('utf-8')
12
13response = client.chat.completions.create(
14 model="selam-plus",
15 messages=[
16 {
17 "role": "user",
18 "content": [
19 {"type": "text", "text": "Describe this image in detail."},
20 {
21 "type": "image_url",
22 "image_url": {
23 "url": f"data:image/jpeg;base64,{base64_image}"
24 }
25 }
26 ]
27 }
28 ]
29)
30
31print(response.choices[0].message.content)Warning
Size Limits: Base64-encoded images increase payload size by ~33%. Keep images under 5MB to avoid issues. Consider resizing large images before encoding.
Multiple Images
You can include multiple images in a single request for comparison or combined analysis.
Multiple Images Example
1from openai import OpenAI
2
3client = OpenAI(
4 api_key="your-api-key-here",
5 base_url="https://api.selamgpt.com/v1"
6)
7
8response = client.chat.completions.create(
9 model="selam-plus",
10 messages=[
11 {
12 "role": "user",
13 "content": [
14 {"type": "text", "text": "Compare these two images. What are the differences?"},
15 {
16 "type": "image_url",
17 "image_url": {"url": "https://example.com/image1.jpg"}
18 },
19 {
20 "type": "image_url",
21 "image_url": {"url": "https://example.com/image2.jpg"}
22 }
23 ]
24 }
25 ]
26)
27
28print(response.choices[0].message.content)Vision in Conversations
Combine vision with multi-turn conversations to have interactive discussions about images.
Conversational Vision Example
1from openai import OpenAI
2
3client = OpenAI(
4 api_key="your-api-key-here",
5 base_url="https://api.selamgpt.com/v1"
6)
7
8# First message with image
9messages = [
10 {
11 "role": "user",
12 "content": [
13 {"type": "text", "text": "What's in this image?"},
14 {
15 "type": "image_url",
16 "image_url": {"url": "https://example.com/coffee.jpg"}
17 }
18 ]
19 }
20]
21
22response = client.chat.completions.create(
23 model="selam-plus",
24 messages=messages
25)
26
27print(response.choices[0].message.content)
28
29# Add response to conversation
30messages.append({
31 "role": "assistant",
32 "content": response.choices[0].message.content
33})
34
35# Follow-up question (no need to resend image)
36messages.append({
37 "role": "user",
38 "content": "What type of coffee beans might be used?"
39})
40
41response = client.chat.completions.create(
42 model="selam-plus",
43 messages=messages
44)
45
46print(response.choices[0].message.content)Tip
Pro tip: You don't need to resend the image in follow-up messages. The model retains the image context throughout the conversation.
Common Use Cases
Image Description
Generate detailed descriptions of images for accessibility, cataloging, or content creation.
Document Analysis
Extract text and information from documents, receipts, forms, and screenshots.
Product Recognition
Identify products, brands, and items in images for e-commerce and inventory management.
Code Understanding
Analyze code screenshots, diagrams, and technical documentation.
Educational Content
Help students understand diagrams, solve math problems, and analyze visual content.
Image Comparison
Compare multiple images to identify differences, similarities, or changes over time.
Best Practices
Use High-Quality Images
Clear, well-lit images produce better results. Avoid blurry, low-resolution, or heavily compressed images.
Be Specific in Prompts
Instead of "What's in this image?", try "List all the objects visible in this image" or "Describe the architectural style of this building."
Optimize Image Size
Resize large images before sending to reduce latency and costs. Most use cases work well with images under 2MB.
Handle Errors Gracefully
Implement error handling for invalid images, unsupported formats, or network issues. Provide clear feedback to users.
Consider Privacy
Be mindful of sensitive information in images. Don't send images containing personal data, passwords, or confidential information.
Use Appropriate Models
Use selam-plus for detailed analysis and complex tasks. Use selam-turbo for quick, simple vision tasks.
Supported Image Formats
Related Resources
Was this page helpful?