Documentation Index
Fetch the complete documentation index at: https://docs.knoxcall.com/llms.txt
Use this file to discover all available pages before exploring further.
Rate Limiting overview
KnoxCall enforces rate limits at the proxy edge — before the request ever reaches your upstream API. A misbehaving client hits a 429 from KnoxCall, your upstream sees nothing, your daily Stripe / OpenAI / Anthropic quota stays intact.
Three independent limit tiers run on every request. Any one of them tripping returns 429:
- Per-route — a single route’s burst + sustained + daily ceiling.
- Per-client — fairness within a route, so one customer can’t starve the others.
- Per-tenant — a global ceiling for your tenant, useful for trial limits or as a hard kill-switch.
A request that’s allowed by all three tiers proceeds; the 429 response — when one is returned — comes back with the precise window that failed and a Retry-After header.
Why use it
| Problem | Without rate limiting | With KnoxCall |
|---|
| Runaway script blows a daily Stripe quota at 2am | Your team finds out from Stripe at 9am | Hits 429 at burst threshold; never reaches Stripe |
| One noisy customer monopolises the API | Legitimate traffic queues behind their bursts | Per-client limits enforce fairness without per-customer code |
| Free-tier abuse | Manual ban + chase | Tenant-level cap; abuse self-throttles |
| Upstream provider’s own rate limits trigger | Cascading 429s, retries multiply load | Upstream quota headers (Anthropic, OpenAI) parsed and respected automatically |
The three tiers
Each tier shares the same shape: a configurable requests count over a sliding window, plus an optional burst allowance for short spikes.
Per-route
Configured per route. Typical settings:
| Window | Use for |
|---|
| Burst (RPS) | Spike protection. Sliding 1-second window. |
| Sustained (RPM / RPS) | Steady-state ceiling. 1-minute or 5-minute window. |
| Daily quota | Cost control. Resets at UTC midnight or your configured tz. |
A route’s burst is usually 2-3× sustained. KnoxCall uses a sliding window, not a fixed bucket, so traffic doesn’t pile up at window boundaries.
Per-client
When a route is fronted by Clients (one client = one of your customers / SDKs / integrations), you can set per-client RPS and per-day caps. KnoxCall identifies the client from its API key on the inbound request.
This is the lever for fairness: 100 RPS total on a route, with each client capped at 10 RPS, means no single noisy client can starve the others.
Per-tenant
A tenant-wide ceiling across all routes. Useful for:
- Trial limits — cap free-tier tenants at N requests/day across all routes.
- Hard kill-switch — set a low ceiling during incident response to limit blast radius.
- Predictable cost — your monthly egress / upstream-API cost is bounded by the tenant ceiling.
Sliding window semantics
KnoxCall uses Redis sorted-set sliding windows, not fixed buckets. Every request stamps the current timestamp; counting “how many requests in the last 60 seconds” walks the sorted set and counts members in the window.
Practical effect: a client sending 60 requests in the first second of a minute, then waiting, sees the limit relax linearly over the next 60 seconds. With a fixed-bucket implementation they’d be blocked for the rest of that minute then free to send another 60 immediately at the boundary — the classic burst-at-window-boundary attack.
If Redis is unavailable, KnoxCall falls back to an in-memory counter per process. Less accurate for a multi-replica deployment but fail-open: the proxy keeps serving traffic.
429 response shape
When any tier trips:
HTTP/1.1 429 Too Many Requests
Retry-After: 23
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 2026-05-04T12:00:23Z
X-RateLimit-Layer: route
Content-Type: application/json
{
"error": "Too Many Requests",
"message": "Route rate limit exceeded. Try again in 23 seconds.",
"limit": 100,
"remaining": 0,
"reset": "2026-05-04T12:00:23Z",
"layer": "route"
}
Your client SDK reads Retry-After and waits. The X-RateLimit-Layer header tells you which limit tripped ("route" or "tenant") — useful for debugging “is this me being noisy or a tenant-wide ceiling?”.
KnoxCall parses upstream rate-limit signals and surfaces them in your API Logs automatically:
| Provider | Headers parsed |
|---|
| Anthropic | anthropic-ratelimit-{requests,tokens,input-tokens,output-tokens}-{limit,remaining,reset} |
| OpenAI | x-ratelimit-{limit,remaining,reset}-{requests,tokens} |
| Stripe | Stripe-Should-Retry, Retry-After |
| GitHub | X-RateLimit-{Limit,Remaining,Reset} |
These power smart_ai alerts — KnoxCall fires you a Slack ping when an upstream quota crosses 80% so you can react before it hits 100%.
Quick start (UI)
- Go to Routes → [your route] → Rate Limiting tab.
- Set burst (e.g.
200 RPS), sustained (100 RPS), and daily quota (1M / day).
- If the route uses Clients, set per-client caps in the Clients tab.
- For tenant-wide ceilings, Settings → Billing → Quotas.
Quick start (API)
# Set rate limits on an existing route (environment config PATCH)
curl -X PATCH https://api.knoxcall.com/v1/routes/stripe-charges \
-H "Authorization: Bearer $KC_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"rate_limit_enabled": true,
"rate_limit_requests": 100,
"rate_limit_window_sec": 1,
"rate_limit_burst": 200
}'
Plan limits
| Tier | Per-route limits | Per-client limits | Per-tenant ceiling |
|---|
| Free | basic burst | not available | 100 calls/day |
| Starter | burst + sustained | not available | 10K calls/month |
| Pro | full (burst + sustained + daily) | available | 1M calls/month |
| Enterprise | full | available | unlimited |
Rate limiting is enabled on Pro and above. Free / Starter get the per-tenant ceiling automatically as part of the plan; per-route customisation unlocks at Pro.
Next steps