docs v1.0.0

Best Practices

Production-ready patterns, security guidelines, and optimization strategies for building robust applications with the Selam API.

API Key Security

Protecting your API keys is critical to prevent unauthorized access and unexpected charges.

Never Expose Keys in Client-Side Code

API keys should never be embedded in frontend JavaScript, mobile apps, or public repositories.

Use environment variables on the server side
Create a backend proxy to handle API calls
Never commit API keys to version control
Add .env files to .gitignore

Use Environment Variables

Store API keys in environment variables and never commit them to version control.

1# .env file (add to .gitignore)
2SELAM_API_KEY=your-api-key-here
3SELAM_BASE_URL=https://api.selamgpt.com/v1

Rotate Keys Regularly

Generate new API keys periodically and revoke old ones. Monitor usage patterns for anomalies that might indicate compromised keys.

Warning

If you suspect your API key has been compromised, revoke it immediately from your dashboard and generate a new one.

Rate Limit Optimization

Efficiently manage rate limits to maximize throughput while staying within your tier's quotas.

Implement Exponential Backoff

When you hit rate limits, wait progressively longer between retries.

Start with a short delay (e.g., 1 second)
Double the delay after each retry
Add random jitter to prevent thundering herd
Set a maximum number of retries

Check Rate Limit Headers

Monitor response headers to track your usage and avoid hitting limits. Look for X-RateLimit-Remaining and X-RateLimit-Reset headers.

Use Batch Processing

For bulk operations, use the batch endpoint to process multiple requests efficiently and reduce overhead.

Tip

Upgrade to a higher tier (BETA or PRO) for increased rate limits. See the rate limits documentation for details.

Error Handling Strategies

Robust error handling ensures your application remains stable and provides good user experience.

Handle Different Error Types

Different errors require different handling strategies:

401 Unauthorized: Invalid API key - don't retry
429 Rate Limit: Retry with exponential backoff
500 Server Error: Transient issue - retry
400 Bad Request: Invalid input - don't retry

Provide User-Friendly Messages

Don't expose raw error messages to end users. Provide helpful, actionable feedback that guides them on what to do next.

Log Errors for Debugging

Log detailed error information for debugging, but sanitize logs to avoid exposing sensitive data like API keys or user information.

Retry Logic

Implement smart retry logic to handle transient failures without overwhelming the API.

Retry Only Transient Errors

Not all errors should trigger retries. Only retry for transient issues:

DO retry: 429 (rate limit), 500, 502, 503, 504 (server errors), network timeouts
DON'T retry: 400 (bad request), 401 (unauthorized), 403 (forbidden), 404 (not found)

Set Maximum Retry Attempts

Limit retries to prevent infinite loops and excessive delays. A good default is 3-5 retries with exponential backoff.

Add Jitter to Backoff

Add random jitter to retry delays to prevent thundering herd problems when many clients retry simultaneously.

Information

The OpenAI Python SDK includes built-in retry logic with exponential backoff. You can configure it with the max_retries parameter.

Caching Strategies

Cache responses to reduce API calls, improve response times, and lower costs.

Cache Deterministic Responses

For identical requests with temperature=0, cache responses to avoid redundant API calls. Use a hash of the request parameters as the cache key.

Use Redis for Distributed Caching

For production applications, use Redis or similar for persistent, distributed caching across multiple servers.

Set Appropriate TTL

Set cache expiration times based on your use case. Short TTL (minutes) for dynamic content, longer TTL (hours/days) for static content.

Warning

Don't cache responses with high temperature values or when using features like web search, as these are designed to provide varied or current information.

Cost Optimization

Optimize your API usage to reduce costs while maintaining quality.

Choose the Right Model

Use the most cost-effective model for your use case:

selam-turbo: Fast, cost-effective for simple tasks
selam-plus: Balanced performance for complex tasks
selam-thinking: Advanced reasoning for complex problems
selam-coder: Specialized for code generation

Manage Context Length

Trim conversation history to reduce token usage while maintaining context. Keep system prompts and recent messages, summarize or remove middle messages.

Use Batch Processing

Process multiple requests in batches to reduce overhead and potentially lower costs.

Monitor Usage

Track your API usage patterns to identify optimization opportunities and prevent unexpected costs.

Performance Tips

Optimize your application's performance for better user experience.

Use Streaming for Real-Time UX

Enable streaming to display responses as they're generated, reducing perceived latency and improving user experience.

Parallel Processing

Process independent requests in parallel to reduce total execution time. Use async/await patterns in Python or JavaScript.

Connection Pooling

Reuse HTTP connections to reduce latency. The OpenAI SDK handles this automatically, but ensure you're reusing the client instance.

Set Appropriate Timeouts

Configure timeouts to prevent hanging requests. Balance between allowing enough time for complex requests and failing fast for better UX.

Production Deployment

Best practices for deploying Selam API integrations to production.

Use Environment-Specific Keys

Use separate API keys for development, staging, and production environments. This allows you to track usage per environment and revoke keys without affecting other environments.

Implement Health Checks

Monitor API availability and your application's ability to connect. Create health check endpoints that verify API connectivity.

Implement Logging and Monitoring

Log API requests, responses, errors, and performance metrics. Use monitoring tools to track usage patterns, error rates, and response times.

Implement Rate Limiting on Your Side

Add rate limiting to your API endpoints to prevent abuse and control costs, especially for user-facing applications.

Use Load Balancing

For high-traffic applications, distribute requests across multiple instances and implement circuit breakers to handle failures gracefully.

Implement Graceful Degradation

Have fallback strategies when the API is unavailable. This could include cached responses, simplified functionality, or user-friendly error messages.

Test Thoroughly

Test your integration with various inputs, error scenarios, and edge cases. Include load testing to ensure your application can handle expected traffic.

Tip

Consider implementing a staging environment that mirrors production to test changes before deploying to production.

Related Resources

Code Examples

Complete working examples for all features

Rate Limits

Understand rate limits and quotas

Error Reference

Complete error codes and troubleshooting

Was this page helpful?