Batch Processing
Learn how to use the batch endpoint to process multiple chat completion requests efficiently, ideal for bulk operations and cost optimization.
What is Batch Processing?
The batch endpoint (/v1/chat/completions/batch) allows you to send multiple chat completion requests in a single API call. This is more efficient than making individual requests and can help optimize costs and throughput.
Benefits
- Efficiency: Process multiple requests with a single API call
- Cost Optimization: Reduce overhead and network costs
- Simplified Code: Handle bulk operations with less code
- Better Throughput: Process large datasets faster
Information
Batch processing is ideal for scenarios like data analysis, content generation at scale, automated testing, and bulk classification tasks.
JSONL Request Format
Batch requests use JSONL (JSON Lines) format, where each line is a separate JSON object representing one chat completion request.
{"custom_id": "request-1", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "selam-turbo", "messages": [{"role": "user", "content": "What is AI?"}]}}
{"custom_id": "request-2", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "selam-turbo", "messages": [{"role": "user", "content": "Explain quantum computing."}]}}
{"custom_id": "request-3", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "selam-turbo", "messages": [{"role": "user", "content": "What is blockchain?"}]}}JSONL Fields
- custom_id: Unique identifier for each request (helps match responses)
- method: HTTP method (always "POST" for chat completions)
- url: Endpoint path (always "/v1/chat/completions")
- body: The chat completion request parameters (model, messages, etc.)
Basic Batch Request
Send a batch of requests by providing JSONL content to the batch endpoint.
Basic Batch Example
1import requests
2import json
3
4# Prepare batch requests
5batch_requests = [
6 {
7 "custom_id": "request-1",
8 "method": "POST",
9 "url": "/v1/chat/completions",
10 "body": {
11 "model": "selam-turbo",
12 "messages": [{"role": "user", "content": "What is AI?"}]
13 }
14 },
15 {
16 "custom_id": "request-2",
17 "method": "POST",
18 "url": "/v1/chat/completions",
19 "body": {
20 "model": "selam-turbo",
21 "messages": [{"role": "user", "content": "Explain quantum computing."}]
22 }
23 },
24 {
25 "custom_id": "request-3",
26 "method": "POST",
27 "url": "/v1/chat/completions",
28 "body": {
29 "model": "selam-turbo",
30 "messages": [{"role": "user", "content": "What is blockchain?"}]
31 }
32 }
33]
34
35# Convert to JSONL format
36jsonl_content = "\n".join([json.dumps(req) for req in batch_requests])
37
38# Send batch request
39response = requests.post(
40 "https://api.selamgpt.com/v1/chat/completions/batch",
41 headers={
42 "Authorization": "Bearer your-api-key-here",
43 "Content-Type": "application/jsonl"
44 },
45 data=jsonl_content
46)
47
48# Parse responses
49results = response.text.strip().split("\n")
50for result_line in results:
51 result = json.loads(result_line)
52 custom_id = result["custom_id"]
53 content = result["response"]["body"]["choices"][0]["message"]["content"]
54 print(f"{custom_id}: {content}\n")Batch Response Format
The response is also in JSONL format, with one line per request. Each line contains the custom_id and the corresponding response.
{"custom_id": "request-1", "response": {"status_code": 200, "body": {"id": "chatcmpl-123", "object": "chat.completion", "created": 1694268190, "model": "selam-turbo", "choices": [{"index": 0, "message": {"role": "assistant", "content": "AI stands for Artificial Intelligence..."}, "finish_reason": "stop"}]}}}
{"custom_id": "request-2", "response": {"status_code": 200, "body": {"id": "chatcmpl-124", "object": "chat.completion", "created": 1694268191, "model": "selam-turbo", "choices": [{"index": 0, "message": {"role": "assistant", "content": "Quantum computing is..."}, "finish_reason": "stop"}]}}}
{"custom_id": "request-3", "response": {"status_code": 200, "body": {"id": "chatcmpl-125", "object": "chat.completion", "created": 1694268192, "model": "selam-turbo", "choices": [{"index": 0, "message": {"role": "assistant", "content": "Blockchain is..."}, "finish_reason": "stop"}]}}}Tip
Pro tip: Use the custom_id field to match responses with your original requests. This is especially useful when processing results asynchronously.
Common Use Cases
Data Analysis
Analyze large datasets by processing multiple data points in parallel. Extract insights, classify content, or generate summaries at scale.
Content Generation
Generate multiple pieces of content simultaneously. Create product descriptions, social media posts, or marketing copy in bulk.
Automated Testing
Test your AI application with multiple test cases in a single batch. Validate responses across different scenarios efficiently.
Classification
Classify large volumes of text, images, or documents. Categorize customer feedback, moderate content, or organize data.
Translation
Translate multiple texts or documents in one batch. Process multilingual content efficiently for localization projects.
Summarization
Generate summaries for multiple documents, articles, or reports simultaneously. Process large volumes of text quickly.
Advanced Batch Processing
Here's a more advanced example that processes a CSV file and generates responses for each row.
CSV Processing Example
1import csv
2import json
3import requests
4
5# Read data from CSV
6products = []
7with open('products.csv', 'r') as f:
8 reader = csv.DictReader(f)
9 for row in reader:
10 products.append(row)
11
12# Create batch requests
13batch_requests = []
14for i, product in enumerate(products):
15 batch_requests.append({
16 "custom_id": f"product-{i}",
17 "method": "POST",
18 "url": "/v1/chat/completions",
19 "body": {
20 "model": "selam-turbo",
21 "messages": [
22 {
23 "role": "system",
24 "content": "You are a marketing copywriter. Write compelling product descriptions."
25 },
26 {
27 "role": "user",
28 "content": f"Write a product description for: {product['name']}. Features: {product['features']}"
29 }
30 ],
31 "max_tokens": 150
32 }
33 })
34
35# Convert to JSONL
36jsonl_content = "\n".join([json.dumps(req) for req in batch_requests])
37
38# Send batch request
39response = requests.post(
40 "https://api.selamgpt.com/v1/chat/completions/batch",
41 headers={
42 "Authorization": "Bearer your-api-key-here",
43 "Content-Type": "application/jsonl"
44 },
45 data=jsonl_content
46)
47
48# Process results and save to new CSV
49results = {}
50for result_line in response.text.strip().split("\n"):
51 result = json.loads(result_line)
52 custom_id = result["custom_id"]
53 description = result["response"]["body"]["choices"][0]["message"]["content"]
54 results[custom_id] = description
55
56# Write results to CSV
57with open('products_with_descriptions.csv', 'w', newline='') as f:
58 fieldnames = ['name', 'features', 'description']
59 writer = csv.DictWriter(f, fieldnames=fieldnames)
60 writer.writeheader()
61
62 for i, product in enumerate(products):
63 writer.writerow({
64 'name': product['name'],
65 'features': product['features'],
66 'description': results[f"product-{i}"]
67 })
68
69print(f"Processed {len(products)} products successfully!")Rate Limits and Quotas
Batch requests are subject to the same rate limits as regular requests, but they're counted based on the number of individual requests in the batch.
Important Notes
- Each request in the batch counts toward your rate limit
- Maximum batch size may vary by tier (check your account limits)
- Failed requests in a batch don't affect successful ones
- Batch processing may take longer than individual requests
Warning
Rate Limit Tip: If you hit rate limits, consider splitting your batch into smaller chunks and processing them with delays between batches.
Best Practices
Use Meaningful Custom IDs
Use descriptive custom_id values that help you identify and match responses. Include relevant identifiers from your data (e.g., "user-123", "product-456").
Optimize Batch Size
Find the right balance between batch size and processing time. Very large batches may take longer to process. Test different sizes to find what works best for your use case.
Handle Errors Gracefully
Check the status_code in each response. Some requests may fail while others succeed. Implement retry logic for failed requests.
Validate JSONL Format
Ensure each line is valid JSON and properly formatted. Invalid JSONL will cause the entire batch to fail. Test with small batches first.
Monitor Progress
For large batches, implement progress tracking. Log successful and failed requests to monitor the batch processing status.
Consider Timeouts
Set appropriate timeouts for batch requests. Large batches may take several minutes to process. Adjust your HTTP client timeout accordingly.
Use Consistent Parameters
When possible, use the same model and parameters across all requests in a batch. This can improve processing efficiency and consistency.
Error Handling
Implement robust error handling to manage partial failures and retry failed requests.
Error Handling Example
1import json
2import requests
3import time
4
5def process_batch_with_retry(batch_requests, max_retries=3):
6 """Process batch with automatic retry for failed requests."""
7
8 # Convert to JSONL
9 jsonl_content = "\n".join([json.dumps(req) for req in batch_requests])
10
11 # Send batch request
12 response = requests.post(
13 "https://api.selamgpt.com/v1/chat/completions/batch",
14 headers={
15 "Authorization": "Bearer your-api-key-here",
16 "Content-Type": "application/jsonl"
17 },
18 data=jsonl_content
19 )
20
21 # Parse results
22 successful = []
23 failed = []
24
25 for result_line in response.text.strip().split("\n"):
26 result = json.loads(result_line)
27
28 if result["response"]["status_code"] == 200:
29 successful.append(result)
30 else:
31 # Extract original request for retry
32 custom_id = result["custom_id"]
33 original_request = next(
34 req for req in batch_requests
35 if req["custom_id"] == custom_id
36 )
37 failed.append(original_request)
38
39 # Retry failed requests
40 if failed and max_retries > 0:
41 print(f"Retrying {len(failed)} failed requests...")
42 time.sleep(2) # Wait before retry
43 retry_results = process_batch_with_retry(failed, max_retries - 1)
44 successful.extend(retry_results)
45
46 return successful
47
48# Usage
49batch_requests = [...] # Your batch requests
50results = process_batch_with_retry(batch_requests)
51print(f"Successfully processed {len(results)} requests")Related Resources
Was this page helpful?