Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.knoxcall.com/llms.txt

Use this file to discover all available pages before exploring further.

Rate Limiting overview

KnoxCall enforces rate limits at the proxy edge — before the request ever reaches your upstream API. A misbehaving client hits a 429 from KnoxCall, your upstream sees nothing, your daily Stripe / OpenAI / Anthropic quota stays intact. Three independent limit tiers run on every request. Any one of them tripping returns 429:
  • Per-route — a single route’s burst + sustained + daily ceiling.
  • Per-client — fairness within a route, so one customer can’t starve the others.
  • Per-tenant — a global ceiling for your tenant, useful for trial limits or as a hard kill-switch.
A request that’s allowed by all three tiers proceeds; the 429 response — when one is returned — comes back with the precise window that failed and a Retry-After header.

Why use it

ProblemWithout rate limitingWith KnoxCall
Runaway script blows a daily Stripe quota at 2amYour team finds out from Stripe at 9amHits 429 at burst threshold; never reaches Stripe
One noisy customer monopolises the APILegitimate traffic queues behind their burstsPer-client limits enforce fairness without per-customer code
Free-tier abuseManual ban + chaseTenant-level cap; abuse self-throttles
Upstream provider’s own rate limits triggerCascading 429s, retries multiply loadUpstream quota headers (Anthropic, OpenAI) parsed and respected automatically

The three tiers

Each tier shares the same shape: a configurable requests count over a sliding window, plus an optional burst allowance for short spikes.

Per-route

Configured per route. Typical settings:
WindowUse for
Burst (RPS)Spike protection. Sliding 1-second window.
Sustained (RPM / RPS)Steady-state ceiling. 1-minute or 5-minute window.
Daily quotaCost control. Resets at UTC midnight or your configured tz.
A route’s burst is usually 2-3× sustained. KnoxCall uses a sliding window, not a fixed bucket, so traffic doesn’t pile up at window boundaries.

Per-client

When a route is fronted by Clients (one client = one of your customers / SDKs / integrations), you can set per-client RPS and per-day caps. KnoxCall identifies the client from its API key on the inbound request. This is the lever for fairness: 100 RPS total on a route, with each client capped at 10 RPS, means no single noisy client can starve the others.

Per-tenant

A tenant-wide ceiling across all routes. Useful for:
  • Trial limits — cap free-tier tenants at N requests/day across all routes.
  • Hard kill-switch — set a low ceiling during incident response to limit blast radius.
  • Predictable cost — your monthly egress / upstream-API cost is bounded by the tenant ceiling.

Sliding window semantics

KnoxCall uses Redis sorted-set sliding windows, not fixed buckets. Every request stamps the current timestamp; counting “how many requests in the last 60 seconds” walks the sorted set and counts members in the window. Practical effect: a client sending 60 requests in the first second of a minute, then waiting, sees the limit relax linearly over the next 60 seconds. With a fixed-bucket implementation they’d be blocked for the rest of that minute then free to send another 60 immediately at the boundary — the classic burst-at-window-boundary attack. If Redis is unavailable, KnoxCall falls back to an in-memory counter per process. Less accurate for a multi-replica deployment but fail-open: the proxy keeps serving traffic.

429 response shape

When any tier trips:
HTTP/1.1 429 Too Many Requests
Retry-After: 23
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 2026-05-04T12:00:23Z
X-RateLimit-Layer: route
Content-Type: application/json

{
  "error": "Too Many Requests",
  "message": "Route rate limit exceeded. Try again in 23 seconds.",
  "limit": 100,
  "remaining": 0,
  "reset": "2026-05-04T12:00:23Z",
  "layer": "route"
}
Your client SDK reads Retry-After and waits. The X-RateLimit-Layer header tells you which limit tripped ("route" or "tenant") — useful for debugging “is this me being noisy or a tenant-wide ceiling?”.

Upstream quota header passthrough

KnoxCall parses upstream rate-limit signals and surfaces them in your API Logs automatically:
ProviderHeaders parsed
Anthropicanthropic-ratelimit-{requests,tokens,input-tokens,output-tokens}-{limit,remaining,reset}
OpenAIx-ratelimit-{limit,remaining,reset}-{requests,tokens}
StripeStripe-Should-Retry, Retry-After
GitHubX-RateLimit-{Limit,Remaining,Reset}
These power smart_ai alerts — KnoxCall fires you a Slack ping when an upstream quota crosses 80% so you can react before it hits 100%.

Quick start (UI)

  1. Go to Routes → [your route] → Rate Limiting tab.
  2. Set burst (e.g. 200 RPS), sustained (100 RPS), and daily quota (1M / day).
  3. If the route uses Clients, set per-client caps in the Clients tab.
  4. For tenant-wide ceilings, Settings → Billing → Quotas.

Quick start (API)

# Set rate limits on an existing route (environment config PATCH)
curl -X PATCH https://api.knoxcall.com/v1/routes/stripe-charges \
  -H "Authorization: Bearer $KC_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "rate_limit_enabled": true,
    "rate_limit_requests": 100,
    "rate_limit_window_sec": 1,
    "rate_limit_burst": 200
  }'

Plan limits

TierPer-route limitsPer-client limitsPer-tenant ceiling
Freebasic burstnot available100 calls/day
Starterburst + sustainednot available10K calls/month
Profull (burst + sustained + daily)available1M calls/month
Enterprisefullavailableunlimited
Rate limiting is enabled on Pro and above. Free / Starter get the per-tenant ceiling automatically as part of the plan; per-route customisation unlocks at Pro.

Next steps