I've been building more and more tools that integrate with Large Language Models lately. From automating git commits using AI to creating a voice assistant using ChatGPT, I found myself writing the same integration code over and over. Each time I needed robust error handling, retries, and proper connection management. After the third or fourth implementation, I decided to build a proper package that would handle all of this out of the box.
Core Architecture and Design Philosophy
The package is built around a few key principles that I've found essential when working with LLMs in production:
- Make integration dead simple
- Support multiple LLM providers out of the box
- Include production-ready features by default
- Provide clear cost visibility
- Handle failures gracefully
Here's what a basic implementation looks like:
client, err := llm.NewClient(
os.Getenv("OPENAI_API_KEY"),
llm.WithProvider("openai"),
llm.WithModel("gpt-4"),
llm.WithTimeout(30 * time.Second),
)
resp, err := client.Chat(context.Background(), &types.ChatRequest{
Messages: []types.Message{
{
Role: types.RoleUser,
Content: "What is the capital of France?",
},
},
})
Simple on the surface, but there's a lot happening underneath. Let's dive into the key components that make this production-ready.
Connection Management: Beyond Basic HTTP Clients
When building services that interact with LLMs, connection management becomes crucial. Every request doesn't need a new connection - that's wasteful and can lead to resource exhaustion. The connection pooling system is built to handle this efficiently:
type PoolConfig struct {
MaxSize int // Maximum number of connections
IdleTimeout time.Duration // How long to keep idle connections
CleanupPeriod time.Duration // How often to clean up idle connections
}
The pool manages connections through several key mechanisms:
Connection Lifecycle Management
The pool tracks both active and idle connections, implementing a cleanup routine that runs periodically:
func (p *ConnectionPool) cleanup() {
ticker := time.NewTicker(p.config.CleanupPeriod)
defer ticker.Stop()
for range ticker.C {
p.mu.Lock()
now := time.Now()
// Remove idle connections that have timed out
// Keep track of active connections
p.mu.Unlock()
}
}
Smart Connection Distribution
When a client requests a connection, the pool follows a specific hierarchy:
- Try to reuse an existing idle connection
- Create a new connection if under the max limit
- Wait for a connection to become available if at capacity
This prevents both resource wastage and connection starvation.
Robust Error Handling and Retries
LLM APIs can be unreliable. They might rate limit you, have temporary outages, or just be slow to respond. The retry system is designed to handle these cases gracefully:
type RetryConfig struct {
MaxRetries int
InitialInterval time.Duration
MaxInterval time.Duration
Multiplier float64
}
The retry system implements exponential backoff with jitter to prevent thundering herd problems. Here's how it works:
- Initial attempt fails
- Wait for InitialInterval
- For each subsequent retry:
- Add random jitter to prevent synchronization
- Increase wait time by Multiplier
- Cap at MaxInterval to prevent excessive waits
This means your application can handle various types of failures:
- Rate limiting (429 responses)
- Temporary service outages (5xx responses)
- Network timeouts
- Connection reset errors
Cost Tracking and Budget Management
One of the most requested features was cost tracking. If you're building services on top of LLMs, you need to know exactly how much each request costs. The cost tracking system provides:
Per-Request Cost Tracking
type Usage struct {
PromptTokens int
CompletionTokens int
TotalTokens int
Cost float64
}
func (ct *CostTracker) TrackUsage(provider, model string, usage Usage) error {
cost := calculateCost(provider, model, usage)
if cost > ct.config.MaxCostPerRequest {
return ErrCostLimitExceeded
}
// Track costs and usage
}
Budget Management
The system allows you to set various budget controls:
- Per-request cost limits
- Daily/monthly budget caps
- Usage alerts at configurable thresholds
- Cost breakdown by model and provider
This becomes critical when you're running at scale. I've seen services rack up surprising bills because they didn't have proper cost monitoring in place. With this system, you can:
- Monitor costs in real-time
- Set hard limits to prevent runaway spending
- Get alerts before hitting budget thresholds
- Track costs per customer or feature
Streaming Support: Real-time Responses
Modern LLM applications often need streaming support for better user experience. The package includes robust streaming support:
streamChan, err := client.StreamChat(ctx, req)
if err != nil {
return err
}
for resp := range streamChan {
if resp.Error != nil {
return resp.Error
}
fmt.Print(resp.Message.Content)
}
The streaming implementation handles several complex cases:
- Graceful connection termination
- Partial message handling
- Error propagation
- Context cancellation
Performance Metrics and Monitoring
Understanding how your LLM integration performs is crucial. The package includes comprehensive metrics:
Request Metrics
- Request latency
- Token usage
- Error rates
- Retry counts
Connection Pool Metrics
- Active connections
- Idle connections
- Wait time for connections
- Connection errors
Cost Metrics
- Cost per request
- Running totals
- Budget utilization
- Cost per model/provider
Provider Management
The package currently supports multiple LLM providers:
OpenAI
- GPT-3.5
- GPT-4
- Text completion models
Anthropic
- Claude
- Claude Instant
Each provider implementation handles its specific quirks while presenting a unified interface to your application.
Real-World Applications
I've used this package in several production applications:
Automated Content Generation
A system generating thousands of product descriptions daily. Key features used:
- Connection pooling for high throughput
- Cost tracking for billing
- Retries for reliability
Interactive Chat Applications
Real-time chat applications requiring:
- Streaming responses
- Low latency
- Error resilience
Batch Processing Systems
Large-scale document processing using:
- Multiple providers
- Budget management
- Detailed usage tracking
What's Next
While the package is already being used in production, there's more to come:
Short Term
- Enhanced cost tracking across different pricing tiers
- Better model handling and automatic selection
- Support for more LLM providers
- Improved metrics and monitoring
Long Term
- Automatic provider failover
- Smart request routing
- Advanced budget controls
- Performance optimization tools
Best Practices and Tips
From my experience using this package in production, here are some recommendations:
- Start with conservative retry settings
- Monitor your token usage closely
- Set up budget alerts well below your actual limits
- Use streaming for interactive applications
- Implement proper error handling in your application
Conclusion
Building this package has significantly simplified my LLM integrations. Instead of rewriting the same boilerplate code for each project, I can focus on building the actual features I need. If you're working with LLMs in Go, feel free to check out the package and contribute.
Like my approach to CI/CD deployment, this is open source and available for anyone to use and improve. The more we can standardize these patterns, the better our LLM integrations will become.
The future of LLM integration is about making these powerful tools more accessible and reliable. With proper abstractions and production-ready features, we can focus on building innovative applications instead of worrying about the underlying infrastructure.