Best Practices
Production-ready patterns, security guidelines, and optimization strategies for building robust applications with the Selam API.
API Key Security
Protecting your API keys is critical to prevent unauthorized access and unexpected charges.
Never Expose Keys in Client-Side Code
API keys should never be embedded in frontend JavaScript, mobile apps, or public repositories.
- Use environment variables on the server side
- Create a backend proxy to handle API calls
- Never commit API keys to version control
- Add .env files to .gitignore
Use Environment Variables
Store API keys in environment variables and never commit them to version control.
1# .env file (add to .gitignore)
2SELAM_API_KEY=your-api-key-here
3SELAM_BASE_URL=https://api.selamgpt.com/v1Rotate Keys Regularly
Generate new API keys periodically and revoke old ones. Monitor usage patterns for anomalies that might indicate compromised keys.
Warning
If you suspect your API key has been compromised, revoke it immediately from your dashboard and generate a new one.
Rate Limit Optimization
Efficiently manage rate limits to maximize throughput while staying within your tier's quotas.
Implement Exponential Backoff
When you hit rate limits, wait progressively longer between retries.
- Start with a short delay (e.g., 1 second)
- Double the delay after each retry
- Add random jitter to prevent thundering herd
- Set a maximum number of retries
Check Rate Limit Headers
Monitor response headers to track your usage and avoid hitting limits. Look for X-RateLimit-Remaining and X-RateLimit-Reset headers.
Use Batch Processing
For bulk operations, use the batch endpoint to process multiple requests efficiently and reduce overhead.
Tip
Upgrade to a higher tier (BETA or PRO) for increased rate limits. See the rate limits documentation for details.
Error Handling Strategies
Robust error handling ensures your application remains stable and provides good user experience.
Handle Different Error Types
Different errors require different handling strategies:
- 401 Unauthorized: Invalid API key - don't retry
- 429 Rate Limit: Retry with exponential backoff
- 500 Server Error: Transient issue - retry
- 400 Bad Request: Invalid input - don't retry
Provide User-Friendly Messages
Don't expose raw error messages to end users. Provide helpful, actionable feedback that guides them on what to do next.
Log Errors for Debugging
Log detailed error information for debugging, but sanitize logs to avoid exposing sensitive data like API keys or user information.
Retry Logic
Implement smart retry logic to handle transient failures without overwhelming the API.
Retry Only Transient Errors
Not all errors should trigger retries. Only retry for transient issues:
- DO retry: 429 (rate limit), 500, 502, 503, 504 (server errors), network timeouts
- DON'T retry: 400 (bad request), 401 (unauthorized), 403 (forbidden), 404 (not found)
Set Maximum Retry Attempts
Limit retries to prevent infinite loops and excessive delays. A good default is 3-5 retries with exponential backoff.
Add Jitter to Backoff
Add random jitter to retry delays to prevent thundering herd problems when many clients retry simultaneously.
Information
The OpenAI Python SDK includes built-in retry logic with exponential backoff. You can configure it with the max_retries parameter.
Caching Strategies
Cache responses to reduce API calls, improve response times, and lower costs.
Cache Deterministic Responses
For identical requests with temperature=0, cache responses to avoid redundant API calls. Use a hash of the request parameters as the cache key.
Use Redis for Distributed Caching
For production applications, use Redis or similar for persistent, distributed caching across multiple servers.
Set Appropriate TTL
Set cache expiration times based on your use case. Short TTL (minutes) for dynamic content, longer TTL (hours/days) for static content.
Warning
Don't cache responses with high temperature values or when using features like web search, as these are designed to provide varied or current information.
Cost Optimization
Optimize your API usage to reduce costs while maintaining quality.
Choose the Right Model
Use the most cost-effective model for your use case:
- selam-turbo: Fast, cost-effective for simple tasks
- selam-plus: Balanced performance for complex tasks
- selam-thinking: Advanced reasoning for complex problems
- selam-coder: Specialized for code generation
Manage Context Length
Trim conversation history to reduce token usage while maintaining context. Keep system prompts and recent messages, summarize or remove middle messages.
Use Batch Processing
Process multiple requests in batches to reduce overhead and potentially lower costs.
Monitor Usage
Track your API usage patterns to identify optimization opportunities and prevent unexpected costs.
Performance Tips
Optimize your application's performance for better user experience.
Use Streaming for Real-Time UX
Enable streaming to display responses as they're generated, reducing perceived latency and improving user experience.
Parallel Processing
Process independent requests in parallel to reduce total execution time. Use async/await patterns in Python or JavaScript.
Connection Pooling
Reuse HTTP connections to reduce latency. The OpenAI SDK handles this automatically, but ensure you're reusing the client instance.
Set Appropriate Timeouts
Configure timeouts to prevent hanging requests. Balance between allowing enough time for complex requests and failing fast for better UX.
Production Deployment
Best practices for deploying Selam API integrations to production.
Use Environment-Specific Keys
Use separate API keys for development, staging, and production environments. This allows you to track usage per environment and revoke keys without affecting other environments.
Implement Health Checks
Monitor API availability and your application's ability to connect. Create health check endpoints that verify API connectivity.
Implement Logging and Monitoring
Log API requests, responses, errors, and performance metrics. Use monitoring tools to track usage patterns, error rates, and response times.
Implement Rate Limiting on Your Side
Add rate limiting to your API endpoints to prevent abuse and control costs, especially for user-facing applications.
Use Load Balancing
For high-traffic applications, distribute requests across multiple instances and implement circuit breakers to handle failures gracefully.
Implement Graceful Degradation
Have fallback strategies when the API is unavailable. This could include cached responses, simplified functionality, or user-friendly error messages.
Test Thoroughly
Test your integration with various inputs, error scenarios, and edge cases. Include load testing to ensure your application can handle expected traffic.
Tip
Consider implementing a staging environment that mirrors production to test changes before deploying to production.
Related Resources
Was this page helpful?