docs v1.0.0

Batch Processing

Learn how to use the batch endpoint to process multiple chat completion requests efficiently, ideal for bulk operations and cost optimization.

What is Batch Processing?

The batch endpoint (/v1/chat/completions/batch) allows you to send multiple chat completion requests in a single API call. This is more efficient than making individual requests and can help optimize costs and throughput.

Benefits

Efficiency: Process multiple requests with a single API call
Cost Optimization: Reduce overhead and network costs
Simplified Code: Handle bulk operations with less code
Better Throughput: Process large datasets faster

Information

Batch processing is ideal for scenarios like data analysis, content generation at scale, automated testing, and bulk classification tasks.

JSONL Request Format

Batch requests use JSONL (JSON Lines) format, where each line is a separate JSON object representing one chat completion request.

{"custom_id": "request-1", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "selam-turbo", "messages": [{"role": "user", "content": "What is AI?"}]}}
{"custom_id": "request-2", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "selam-turbo", "messages": [{"role": "user", "content": "Explain quantum computing."}]}}
{"custom_id": "request-3", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "selam-turbo", "messages": [{"role": "user", "content": "What is blockchain?"}]}}

JSONL Fields

custom_id: Unique identifier for each request (helps match responses)
method: HTTP method (always "POST" for chat completions)
url: Endpoint path (always "/v1/chat/completions")
body: The chat completion request parameters (model, messages, etc.)

Basic Batch Request

Send a batch of requests by providing JSONL content to the batch endpoint.

Basic Batch Example

1import requests
2import json
3
4# Prepare batch requests
5batch_requests = [
6    {
7        "custom_id": "request-1",
8        "method": "POST",
9        "url": "/v1/chat/completions",
10        "body": {
11            "model": "selam-turbo",
12            "messages": [{"role": "user", "content": "What is AI?"}]
13        }
14    },
15    {
16        "custom_id": "request-2",
17        "method": "POST",
18        "url": "/v1/chat/completions",
19        "body": {
20            "model": "selam-turbo",
21            "messages": [{"role": "user", "content": "Explain quantum computing."}]
22        }
23    },
24    {
25        "custom_id": "request-3",
26        "method": "POST",
27        "url": "/v1/chat/completions",
28        "body": {
29            "model": "selam-turbo",
30            "messages": [{"role": "user", "content": "What is blockchain?"}]
31        }
32    }
33]
34
35# Convert to JSONL format
36jsonl_content = "\n".join([json.dumps(req) for req in batch_requests])
37
38# Send batch request
39response = requests.post(
40    "https://api.selamgpt.com/v1/chat/completions/batch",
41    headers={
42        "Authorization": "Bearer your-api-key-here",
43        "Content-Type": "application/jsonl"
44    },
45    data=jsonl_content
46)
47
48# Parse responses
49results = response.text.strip().split("\n")
50for result_line in results:
51    result = json.loads(result_line)
52    custom_id = result["custom_id"]
53    content = result["response"]["body"]["choices"][0]["message"]["content"]
54    print(f"{custom_id}: {content}\n")

Batch Response Format

The response is also in JSONL format, with one line per request. Each line contains the custom_id and the corresponding response.

{"custom_id": "request-1", "response": {"status_code": 200, "body": {"id": "chatcmpl-123", "object": "chat.completion", "created": 1694268190, "model": "selam-turbo", "choices": [{"index": 0, "message": {"role": "assistant", "content": "AI stands for Artificial Intelligence..."}, "finish_reason": "stop"}]}}}
{"custom_id": "request-2", "response": {"status_code": 200, "body": {"id": "chatcmpl-124", "object": "chat.completion", "created": 1694268191, "model": "selam-turbo", "choices": [{"index": 0, "message": {"role": "assistant", "content": "Quantum computing is..."}, "finish_reason": "stop"}]}}}
{"custom_id": "request-3", "response": {"status_code": 200, "body": {"id": "chatcmpl-125", "object": "chat.completion", "created": 1694268192, "model": "selam-turbo", "choices": [{"index": 0, "message": {"role": "assistant", "content": "Blockchain is..."}, "finish_reason": "stop"}]}}}

Tip

Pro tip: Use the custom_id field to match responses with your original requests. This is especially useful when processing results asynchronously.

Common Use Cases

Data Analysis

Analyze large datasets by processing multiple data points in parallel. Extract insights, classify content, or generate summaries at scale.

Content Generation

Generate multiple pieces of content simultaneously. Create product descriptions, social media posts, or marketing copy in bulk.

Automated Testing

Test your AI application with multiple test cases in a single batch. Validate responses across different scenarios efficiently.

Classification

Classify large volumes of text, images, or documents. Categorize customer feedback, moderate content, or organize data.

Translation

Translate multiple texts or documents in one batch. Process multilingual content efficiently for localization projects.

Summarization

Generate summaries for multiple documents, articles, or reports simultaneously. Process large volumes of text quickly.

Advanced Batch Processing

Here's a more advanced example that processes a CSV file and generates responses for each row.

CSV Processing Example

1import csv
2import json
3import requests
4
5# Read data from CSV
6products = []
7with open('products.csv', 'r') as f:
8    reader = csv.DictReader(f)
9    for row in reader:
10        products.append(row)
11
12# Create batch requests
13batch_requests = []
14for i, product in enumerate(products):
15    batch_requests.append({
16        "custom_id": f"product-{i}",
17        "method": "POST",
18        "url": "/v1/chat/completions",
19        "body": {
20            "model": "selam-turbo",
21            "messages": [
22                {
23                    "role": "system",
24                    "content": "You are a marketing copywriter. Write compelling product descriptions."
25                },
26                {
27                    "role": "user",
28                    "content": f"Write a product description for: {product['name']}. Features: {product['features']}"
29                }
30            ],
31            "max_tokens": 150
32        }
33    })
34
35# Convert to JSONL
36jsonl_content = "\n".join([json.dumps(req) for req in batch_requests])
37
38# Send batch request
39response = requests.post(
40    "https://api.selamgpt.com/v1/chat/completions/batch",
41    headers={
42        "Authorization": "Bearer your-api-key-here",
43        "Content-Type": "application/jsonl"
44    },
45    data=jsonl_content
46)
47
48# Process results and save to new CSV
49results = {}
50for result_line in response.text.strip().split("\n"):
51    result = json.loads(result_line)
52    custom_id = result["custom_id"]
53    description = result["response"]["body"]["choices"][0]["message"]["content"]
54    results[custom_id] = description
55
56# Write results to CSV
57with open('products_with_descriptions.csv', 'w', newline='') as f:
58    fieldnames = ['name', 'features', 'description']
59    writer = csv.DictWriter(f, fieldnames=fieldnames)
60    writer.writeheader()
61    
62    for i, product in enumerate(products):
63        writer.writerow({
64            'name': product['name'],
65            'features': product['features'],
66            'description': results[f"product-{i}"]
67        })
68
69print(f"Processed {len(products)} products successfully!")

Rate Limits and Quotas

Batch requests are subject to the same rate limits as regular requests, but they're counted based on the number of individual requests in the batch.

Important Notes

Each request in the batch counts toward your rate limit
Maximum batch size may vary by tier (check your account limits)
Failed requests in a batch don't affect successful ones
Batch processing may take longer than individual requests

Warning

Rate Limit Tip: If you hit rate limits, consider splitting your batch into smaller chunks and processing them with delays between batches.

Best Practices

Use Meaningful Custom IDs

Use descriptive custom_id values that help you identify and match responses. Include relevant identifiers from your data (e.g., "user-123", "product-456").

Optimize Batch Size

Find the right balance between batch size and processing time. Very large batches may take longer to process. Test different sizes to find what works best for your use case.

Handle Errors Gracefully

Check the status_code in each response. Some requests may fail while others succeed. Implement retry logic for failed requests.

Validate JSONL Format

Ensure each line is valid JSON and properly formatted. Invalid JSONL will cause the entire batch to fail. Test with small batches first.

Monitor Progress

For large batches, implement progress tracking. Log successful and failed requests to monitor the batch processing status.

Consider Timeouts

Set appropriate timeouts for batch requests. Large batches may take several minutes to process. Adjust your HTTP client timeout accordingly.

Use Consistent Parameters

When possible, use the same model and parameters across all requests in a batch. This can improve processing efficiency and consistency.

Error Handling

Implement robust error handling to manage partial failures and retry failed requests.

Error Handling Example

1import json
2import requests
3import time
4
5def process_batch_with_retry(batch_requests, max_retries=3):
6    """Process batch with automatic retry for failed requests."""
7    
8    # Convert to JSONL
9    jsonl_content = "\n".join([json.dumps(req) for req in batch_requests])
10    
11    # Send batch request
12    response = requests.post(
13        "https://api.selamgpt.com/v1/chat/completions/batch",
14        headers={
15            "Authorization": "Bearer your-api-key-here",
16            "Content-Type": "application/jsonl"
17        },
18        data=jsonl_content
19    )
20    
21    # Parse results
22    successful = []
23    failed = []
24    
25    for result_line in response.text.strip().split("\n"):
26        result = json.loads(result_line)
27        
28        if result["response"]["status_code"] == 200:
29            successful.append(result)
30        else:
31            # Extract original request for retry
32            custom_id = result["custom_id"]
33            original_request = next(
34                req for req in batch_requests 
35                if req["custom_id"] == custom_id
36            )
37            failed.append(original_request)
38    
39    # Retry failed requests
40    if failed and max_retries > 0:
41        print(f"Retrying {len(failed)} failed requests...")
42        time.sleep(2)  # Wait before retry
43        retry_results = process_batch_with_retry(failed, max_retries - 1)
44        successful.extend(retry_results)
45    
46    return successful
47
48# Usage
49batch_requests = [...]  # Your batch requests
50results = process_batch_with_retry(batch_requests)
51print(f"Successfully processed {len(results)} requests")

Related Resources

Chat Completions

Learn about the chat completions API

Rate Limits

Understand rate limits and quotas

Best Practices

Optimization tips and production strategies

Was this page helpful?