Rate Limiting
Protect your backend APIs from abuse, excessive load, and DDoS attacks with KnoxCall’s intelligent rate limiting system.
What is Rate Limiting?
Rate limiting controls how many requests a client can make within a specific time window. It prevents:
- ❌ API abuse - Malicious users making excessive requests
- ❌ Accidental overload - Buggy code creating infinite loops
- ❌ DDoS attacks - Distributed denial of service attempts
- ❌ Cost overruns - Preventing excessive API usage costs
How Rate Limiting Works
Client makes request
↓
Check request count for this client
↓
Within limit? → ✅ Forward request → Increment counter
↓
Exceeded limit? → ❌ Return 429 Too Many Requests
Rate limit counters reset based on your configured window:
- Per minute: Resets every 60 seconds
- Per hour: Resets every hour
- Per day: Resets at midnight UTC
Configuration Levels
KnoxCall supports rate limiting at multiple levels (applied in order):
1. Tenant-Level Limits (Account-Wide)
First checkpoint: Total requests across entire tenant
Tenant: Acme Corp (Pro Plan)
Limit: 5,000 requests/minute (all routes + clients combined)
ALL traffic through KnoxCall cannot exceed 5k req/min
How it works:
- Token bucket algorithm (allows bursts)
- Refills continuously based on your plan tier
- Applied before route or client limits
- Automatic based on subscription plan
Plan Tiers:
Free: 100 requests/minute
Standard: 1,000 requests/minute
Pro: 5,000 requests/minute
Enterprise: 10,000 requests/minute (customizable)
Use case: Platform-wide protection, billing enforcement, abuse prevention
Token bucket behavior:
Bucket capacity: 5,000 tokens (requests)
Refill rate: 5,000 tokens/minute
Example:
- 0:00:00 → 5,000 tokens available
- 0:00:05 → Receive 2,000 requests (3,000 remaining)
- 0:00:10 → 416 tokens refilled (3,416 remaining)
- 0:00:15 → Receive burst of 4,000 requests
❌ Only 3,416 available → 584 requests rejected (429)
Benefits:
- ✅ Allows traffic bursts (better UX than strict limits)
- ✅ Still protects against sustained abuse
- ✅ Industry standard for API gateways
2. Route-Level Limits
Second checkpoint: Limits per route
Route: stripe-webhooks
Limit: 10,000 requests/hour
ALL clients using this route combined cannot exceed 10k req/hour
Use case: Protect specific backend APIs from overload
3. Client-Level Limits
Third checkpoint: Limits per client
Client: mobile-app-ios
Limit: 1,000 requests/hour
This specific client cannot exceed 1k req/hour across all routes
Use case: Fair usage per client/application, prevent single client from monopolizing resources
4. Method-Specific Limits
Fourth checkpoint: Different limits per HTTP method
GET requests: 5,000/hour
POST requests: 1,000/hour
DELETE requests: 100/hour
Use case: Restrict write operations more than reads
How Limits Stack
Request passes through ALL levels in order:
1. Tenant limit: 5,000/min → ✅ Pass (3,000 used)
↓
2. Route limit: 1,000/hour → ✅ Pass (500 used)
↓
3. Client limit: 500/hour → ✅ Pass (200 used)
↓
4. Method limit (POST): 100/hour → ✅ Pass (50 used)
↓
5. Request forwarded to backend API
If ANY limit exceeded → 429 Too Many Requests
Setting Up Rate Limits
Route-Level Rate Limiting
- Navigate to Routes → Select your route
- Scroll to Rate Limiting section
- Toggle Enable Rate Limiting to ON
- Configure limits:
Request Limit:
Time Window:
Options: minute, hour, day
Burst Allowance (Optional):
Allows temporary spikes above the base limit.
- Click Save
Client-Level Rate Limiting
- Navigate to Clients → Select your client
- Scroll to Rate Limiting section
- Configure limits:
Per-Client Limit:
Applies to: All routes this client can access
Method-Specific Rate Limiting
- Edit your route
- Go to Method Configurations tab
- For each HTTP method, set individual limits:
GET:
Rate Limit: 5000/hour
POST:
Rate Limit: 1000/hour
DELETE:
Rate Limit: 100/hour
Rate Limit Response
When a client exceeds the limit, they receive:
HTTP Status:
Response Headers:
X-RateLimit-Limit: 1000
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1640000000
Retry-After: 3600
Response Body:
{
"error": "Rate limit exceeded",
"message": "You have exceeded your rate limit of 1000 requests per hour",
"limit": 1000,
"remaining": 0,
"reset_at": "2025-01-20T15:00:00Z",
"retry_after_seconds": 3600
}
Checking Rate Limit Status
Clients can check their current status via response headers on every request:
HTTP/1.1 200 OK
X-RateLimit-Limit: 1000
X-RateLimit-Remaining: 847
X-RateLimit-Reset: 1640000000
Headers:
X-RateLimit-Limit: Total requests allowed in window
X-RateLimit-Remaining: Requests remaining in current window
X-RateLimit-Reset: Unix timestamp when limit resets
Monitoring Tenant-Level Limits
Dashboard view:
Navigate to Account → Usage to see your tenant-wide rate limit status:
Your Plan: Pro (5,000 requests/minute)
Current Usage:
┌────────────────────────────────┐
│████████░░░░░░░░░░░░░░░░░░░░░░│ 32% (1,600/5,000)
└────────────────────────────────┘
Recent Activity (last 60 seconds):
0:00 - 0:10: 400 requests
0:10 - 0:20: 350 requests
0:20 - 0:30: 420 requests
0:30 - 0:40: 230 requests
0:40 - 0:50: 150 requests
0:50 - 1:00: 50 requests
Token Bucket Status:
Available tokens: 3,400 / 5,000
Refill rate: 83.3 tokens/second
Time to full: 19.2 seconds
API endpoint:
GET /admin/tenant/rate-limit-status
Response:
{
"plan": "pro",
"limit": 5000,
"refill_rate": 5000,
"current_tokens": 3400,
"last_request_at": "2025-01-20T15:30:45Z",
"reset_at": "2025-01-20T15:31:04Z"
}
When to upgrade:
- ⚠️ Consistently using >80% of tokens
- ⚠️ Seeing 429 errors in logs
- ⚠️ Traffic growing month-over-month
- ⚠️ Planning marketing campaign or launch
Upgrade options:
Free → Standard: 10x increase (100 → 1,000 req/min)
Standard → Pro: 5x increase (1,000 → 5,000 req/min)
Pro → Enterprise: 2x increase + custom limits (5,000 → 10,000+)
Burst Protection
Handle temporary traffic spikes without blocking legitimate users:
Configuration:
Base Limit: 1,000 requests/hour
Burst Allowance: 200 requests
How it works:
- Client can make up to 1,200 requests in a short burst
- After burst, limited to 1,000 requests/hour average
- Prevents legitimate spikes from being blocked
Example:
Minute 1: 200 requests ✅ (burst)
Minute 2: 200 requests ✅ (burst)
Minute 3: 200 requests ✅ (burst)
Minute 4: 200 requests ✅ (burst)
Minute 5: 200 requests ✅ (burst)
Minute 6: 200 requests ❌ (burst depleted)
Remaining hour: ~17 requests/minute average
Advanced Strategies
Per-User Rate Limiting
Use different limits based on user tiers:
Free Tier Client:
Limit: 100 requests/hour
Pro Tier Client:
Limit: 1,000 requests/hour
Enterprise Client:
Limit: 10,000 requests/hour
Create separate clients for each tier.
Geographic Rate Limiting
Combine with IP whitelisting:
US Region: 5,000 requests/hour
EU Region: 3,000 requests/hour
APAC Region: 2,000 requests/hour
Create region-specific clients with different limits.
Time-Based Rate Limiting
Different limits for peak vs off-peak:
Peak Hours (9 AM - 5 PM):
Off-Peak:
- Limit: 2,000 requests/hour
This requires creating separate routes or using API-based dynamic configuration.
Rate Limit Monitoring
View Rate Limit Events
- Navigate to Logs → API Logs
- Filter by status code:
429
- See which clients are hitting limits
Set Up Alerts
Get notified when clients hit rate limits:
- Navigate to Alerts → Add Alert
- Select Rate Limit Exceeded
- Configure:
Alert Type: Rate Limit Exceeded
Threshold: 10 violations/hour
Channels: Email, Slack
Analytics Dashboard
Monitor rate limit metrics:
- Hit rate: % of requests that are rate-limited
- Top offenders: Clients hitting limits most often
- Trend analysis: Rate limit violations over time
Best Practices
1. Start Conservative
Begin with strict limits and relax based on usage:
Initial: 100 requests/hour
After monitoring: 500 requests/hour
Production stable: 1,000 requests/hour
2. Use Tiered Limits
Different limits for different client types:
Public API: 100/hour
Partner API: 1,000/hour
Internal Services: 10,000/hour
3. Enable Burst Protection
Allow temporary spikes:
Base: 1,000/hour
Burst: +20% (1,200 total)
4. Monitor and Adjust
- Check rate limit logs weekly
- Adjust limits based on legitimate usage
- Set alerts for unusual patterns
5. Communicate Limits
Document your rate limits for API consumers:
## Rate Limits
- **Free Tier**: 100 requests/hour
- **Pro Tier**: 1,000 requests/hour
- **Enterprise**: Custom limits
Headers included in every response.
Common Configurations
Webhook Endpoint
Limit: 10,000 requests/hour
Burst: 500 requests
Reason: Webhooks can spike during events
Public API
Limit: 100 requests/hour per API key
Burst: 20 requests
Reason: Prevent abuse of public endpoints
Internal Microservices
Limit: 50,000 requests/hour
Burst: 5,000 requests
Reason: High-traffic internal communication
Payment Processing
POST /payments: 10 requests/minute
GET /payments: 100 requests/minute
Reason: Prevent duplicate payment charges
Handling Rate Limits (Client-Side)
Exponential Backoff
When receiving 429, implement retry logic:
async function makeRequestWithRetry(url, maxRetries = 3) {
for (let i = 0; i < maxRetries; i++) {
const response = await fetch(url);
if (response.status === 429) {
const retryAfter = response.headers.get('Retry-After');
const waitTime = retryAfter ? parseInt(retryAfter) * 1000 : Math.pow(2, i) * 1000;
console.log(`Rate limited. Waiting ${waitTime}ms before retry...`);
await new Promise(resolve => setTimeout(resolve, waitTime));
continue;
}
return response;
}
throw new Error('Max retries exceeded');
}
const response = await fetch(url);
const limit = response.headers.get('X-RateLimit-Limit');
const remaining = response.headers.get('X-RateLimit-Remaining');
const reset = response.headers.get('X-RateLimit-Reset');
if (remaining < 10) {
console.warn(`Low on rate limit: ${remaining}/${limit} remaining`);
}
Request Queuing
Prevent hitting limits by queuing requests:
class RateLimitedQueue {
constructor(maxRequestsPerHour) {
this.maxRequests = maxRequestsPerHour;
this.queue = [];
this.requestTimestamps = [];
}
async enqueue(requestFn) {
// Remove timestamps older than 1 hour
const oneHourAgo = Date.now() - 3600000;
this.requestTimestamps = this.requestTimestamps.filter(t => t > oneHourAgo);
// Wait if at limit
while (this.requestTimestamps.length >= this.maxRequests) {
const oldestRequest = this.requestTimestamps[0];
const waitTime = oldestRequest + 3600000 - Date.now();
await new Promise(resolve => setTimeout(resolve, waitTime));
this.requestTimestamps.shift();
}
this.requestTimestamps.push(Date.now());
return await requestFn();
}
}
Troubleshooting
High False Positive Rate
Problem: Legitimate users hitting limits
Solutions:
- Increase burst allowance
- Raise base limits
- Use per-user instead of per-IP limits
DDoS Still Getting Through
Problem: Rate limits not preventing attacks
Solutions:
- Lower limits for unknown clients
- Enable request signing
- Use IP-based blocking
- Contact support for enterprise DDoS protection
Inconsistent Limit Enforcement
Problem: Some requests bypass rate limits
Check:
- Rate limits enabled on all routes
- No conflicting client configurations
- Limits applied at correct level (route vs client)
Next Steps
Request Signing
Add cryptographic signatures for extra security
Alerts
Get notified of rate limit violations
Analytics
Monitor rate limit metrics
Client Management
Set up per-client limits
📊 Statistics
- Level: intermediate
- Time: 15 minutes
🏷️ Tags
rate-limiting, security, ddos, api-protection