Rate Limiting overview

KnoxCall enforces rate limits at the proxy edge — before the request ever reaches your upstream API. A misbehaving client hits a 429 from KnoxCall, your upstream sees nothing, your daily Stripe / OpenAI / Anthropic quota stays intact. Three independent limit tiers run on every request. Any one of them tripping returns 429:

Per-route — a single route’s burst + sustained + daily ceiling.
Per-client — fairness within a route, so one customer can’t starve the others.
Per-tenant — a global ceiling for your tenant, useful for trial limits or as a hard kill-switch.

A request that’s allowed by all three tiers proceeds; the 429 response — when one is returned — comes back with the precise window that failed and a Retry-After header.

Why use it

Problem	Without rate limiting	With KnoxCall
Runaway script blows a daily Stripe quota at 2am	Your team finds out from Stripe at 9am	Hits 429 at burst threshold; never reaches Stripe
One noisy customer monopolises the API	Legitimate traffic queues behind their bursts	Per-client limits enforce fairness without per-customer code
Free-tier abuse	Manual ban + chase	Tenant-level cap; abuse self-throttles
Upstream provider’s own rate limits trigger	Cascading 429s, retries multiply load	Upstream quota headers (Anthropic, OpenAI) parsed and respected automatically

The three tiers

Each tier shares the same shape: a configurable requests count over a sliding window, plus an optional burst allowance for short spikes.

Per-route

Configured per route. Typical settings:

Window	Use for
Burst (RPS)	Spike protection. Sliding 1-second window.
Sustained (RPM / RPS)	Steady-state ceiling. 1-minute or 5-minute window.
Daily quota	Cost control. Resets at UTC midnight or your configured tz.

A route’s burst is usually 2-3× sustained. KnoxCall uses a sliding window, not a fixed bucket, so traffic doesn’t pile up at window boundaries.

Per-client

When a route is fronted by Clients (one client = one of your customers / SDKs / integrations), you can set per-client RPS and per-day caps. KnoxCall identifies the client from its API key on the inbound request. This is the lever for fairness: 100 RPS total on a route, with each client capped at 10 RPS, means no single noisy client can starve the others.

Per-tenant

A tenant-wide ceiling across all routes. Useful for:

Trial limits — cap free-tier tenants at N requests/day across all routes.
Hard kill-switch — set a low ceiling during incident response to limit blast radius.
Predictable cost — your monthly egress / upstream-API cost is bounded by the tenant ceiling.

Sliding window semantics

KnoxCall uses Redis sorted-set sliding windows, not fixed buckets. Every request stamps the current timestamp; counting “how many requests in the last 60 seconds” walks the sorted set and counts members in the window. Practical effect: a client sending 60 requests in the first second of a minute, then waiting, sees the limit relax linearly over the next 60 seconds. With a fixed-bucket implementation they’d be blocked for the rest of that minute then free to send another 60 immediately at the boundary — the classic burst-at-window-boundary attack. If Redis is disabled or not configured (e.g. local dev), KnoxCall falls back to an in-memory counter per process — less accurate for a multi-replica deployment, but limits are still enforced. If a configured Redis becomes unreachable, proxy rate limits fail open instead: the request is allowed with full remaining quota so a control-plane Redis outage doesn’t block tenant traffic. (Auth-sensitive limits fail closed — denying the request — rather than falling back.)

429 response shape

When any tier trips:

HTTP/1.1 429 Too Many Requests
Retry-After: 23
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 2026-05-04T12:00:23Z
X-RateLimit-Layer: route
Content-Type: application/json

{
  "error": "Too Many Requests",
  "message": "Route rate limit exceeded. Try again in 23 seconds.",
  "limit": 100,
  "remaining": 0,
  "reset": "2026-05-04T12:00:23Z",
  "layer": "route"
}

Your client SDK reads Retry-After and waits. The X-RateLimit-Layer header tells you which limit tripped ("route" or "tenant") — useful for debugging “is this me being noisy or a tenant-wide ceiling?”.

Upstream quota header passthrough

KnoxCall parses upstream rate-limit signals and surfaces them in your API Logs automatically:

Provider	Headers parsed
Anthropic	`anthropic-ratelimit-{requests,tokens,input-tokens,output-tokens}-{limit,remaining,reset}`
OpenAI	`x-ratelimit-{limit,remaining,reset}-{requests,tokens}`
Stripe	`Stripe-Should-Retry`, `Retry-After`
GitHub	`X-RateLimit-{Limit,Remaining,Reset}`

These power smart_ai alerts — KnoxCall fires you a Slack ping when an upstream quota crosses 80% so you can react before it hits 100%.

Quick start (UI)

Go to Routes → [your route] → Rate Limiting tab.
Set burst (e.g. 200 RPS), sustained (100 RPS), and daily quota (1M / day).
If the route uses Clients, set per-client caps in the Clients tab.
For tenant-wide ceilings, Settings → Billing → Quotas.

Quick start (API)

# Set rate limits on an existing route (environment config PATCH)
curl -X PATCH https://api.knoxcall.com/v1/routes/stripe-charges \
  -H "Authorization: Bearer $KC_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "rate_limit_enabled": true,
    "rate_limit_requests": 100,
    "rate_limit_window_sec": 1,
    "rate_limit_burst": 200
  }'

Plan limits

Tier	Per-route limits	Per-client limits	Per-tenant ceiling
Free	basic burst	not available	100 calls/day
Starter	burst + sustained	not available	10K calls/month
Pro	full (burst + sustained + daily)	available	1M calls/month
Enterprise	full	available	unlimited

Rate limiting is enabled on Pro and above. Free / Starter get the per-tenant ceiling automatically as part of the plan; per-route customisation unlocks at Pro.

Next steps

Routes overview →
Clients overview → — for per-client fairness
Audit Logs → — every 429 lands here

Inbound Webhook format guide

Crypto Keys overview

​Rate Limiting overview

​Why use it

​The three tiers

​Per-route

​Per-client

​Per-tenant

​Sliding window semantics

​429 response shape

​Upstream quota header passthrough

​Quick start (UI)

​Quick start (API)

​Plan limits

​Next steps