Skip to main content

Agent Security Model

The KnoxCall agent (both the Go Client Agent and the self-hosted knoxcall/proxy container) handles your credentials. Our job is to make every theft scenario harder than it looks — not just the obvious ones. This page enumerates the threats we defend against and the layers that mitigate them.

Threat Model

ThreatMitigation
Attacker copies the agent binary to a new machineMachine-fingerprint binding refuses sessions from foreign hosts
Attacker modifies the agent binaryTamper seal fails signature verification; control plane refuses sessions
Attacker captures the TLS-decrypted session requestNonce + timestamp makes it single-use
Attacker steals the disk cacheEncrypted persistence without the session key inside
Attacker dumps agent memorySession rotation limits usable secret lifetime to the current hour
Attacker compromises agent credentialsAdmin revocation with ≤5-minute propagation
Rogue administrator on a stolen hostSingle-tenant guard prevents cross-tenant access

Credential Hierarchy

Three tiers of key material, each with a different blast radius:
MASTER_KEY_B64          ← stays in KnoxCall cloud (never leaves)

   └─ wraps CEKs for each secret (envelope encryption)

         └─ session_key  ← derived per agent + per hour (HKDF)
               │         ← re-encrypts CEKs when shipped in a bundle
               │         ← never written to disk
               └─ lives in agent memory for ≤ 1 hour
Key properties:
  • MASTER_KEY_B64 never leaves the control plane. Stealing an agent gives you zero progress toward decrypting it.
  • Session keys are HKDF-derived per (agent_id, window_hour). The derivation is stateless — the control plane re-computes on each fetch without storing the key anywhere.
  • Window rotation means a stolen session key from 14:00 UTC is worthless at 15:00 UTC, even without revocation.

Machine-Fingerprint Binding

Every /agent/v1/session request includes a machine_fingerprint — a SHA-256 of stable host signals:
  • /etc/machine-id (systemd) or /var/lib/dbus/machine-id (fallback)
  • Primary non-loopback MAC address
  • OS hostname
  • A random first-run salt persisted alongside the cache
On first successful fetch, the control plane stores the fingerprint and sets machine_fingerprint_locked=true. Every subsequent fetch must send the same fingerprint — otherwise the control plane:
  1. Refuses the session with 403 fingerprint_mismatch
  2. Logs a tamper event visible on the agent detail page
  3. Keeps the agent’s existing binding intact (the thief’s fingerprint is not recorded)
An admin who legitimately moves an agent to a new host can reset the binding via POST /admin/agents/:id/reset-fingerprint. The next fetch rebinds. Override the fingerprint with KNOXCALL_MACHINE_FINGERPRINT=<custom> when the agent legitimately runs on ephemeral instances that share an identity (e.g. auto-scaling groups behind a warm-pool image).

Tamper Seal

Every agent binary carries a build_sig = HMAC-SHA256(AGENT_BUILD_SECRET, hash:version). Two-pass build flow:
  1. Compile with placeholder sig → hash the compiled artefact
  2. HMAC with the CI-only secret → re-compile with real sig baked in
On every session fetch the control plane looks up the sig in agent_versions:
  • Known sig → session issued normally.
  • Unknown sig → tamper event logged. Default mode continues with a warning; strict mode (KNOXCALL_STRICT_TAMPER_CHECK=true) denies the session.
The signing key lives only in CI/CD — never shipped in any binary. A compromised agent can’t generate a valid sig for a modified version.

Replay Protection

Session fetches carry:
  • nonce — 16 random bytes, base64-hex
  • timestamp — seconds since epoch
The control plane:
  1. Rejects requests with timestamp drift > 5 min (clock-sync check)
  2. Hashes the nonce and inserts into agent_session_nonces with a unique index on (agent_id, nonce_hash) → a replay hits the unique constraint and is refused
  3. Cleans up nonces older than 1 hour via the log-cleanup cron
Effect: a captured session request is single-use. Even if TLS is broken (via a rogue CA, for example) the attacker gets one fetch at most.

Encrypted Disk Cache

The proxy persists a disk cache at /var/lib/knoxcall/session.json so a container restart can serve traffic during the first sync. Security properties:
  • Never contains the session key. session_key_b64 is scrubbed before write.
  • Encrypted at rest with AES-256-GCM, key derived from MASTER_KEY_B64 via HKDF-SHA256 with a per-write random salt.
  • File mode 0600 (owner read/write only).
  • Magic-prefix marks the cache format version; old plaintext caches are discarded on upgrade.
  • On load without a live session key, the proxy serves routes + environments + api_keys from the cache but returns 503 for requests that need secret injection until the next live sync repopulates decryptedSecrets.
An attacker who steals just the cache file gets the re-encrypted secret blobs. Without the session key (which rotates hourly and is never on disk), the blobs are indistinguishable from random. Disable persistence entirely with KNOXCALL_PROXY_BUNDLE_CACHE_PATH=off.

Session Rotation

  • Session duration: 1 hour
  • Renewal trigger: 55 minutes (5-minute grace before expiry)
  • Stale-session deadman: at expires_at, the proxy flips /healthz to 503 and refuses new requests until a successful refresh
Even without revocation, a stolen session key stops working in ≤ 1 hour.

Memory Hygiene

On graceful shutdown (SIGTERM, SIGINT):
  • Every entry in the decrypted-secrets map is overwritten with empty string before the map is dropped
  • The in-memory session key is zeroed
  • The disk cache is updated one last time (without the session key)
This reduces the window in which a post-mortem heap dump could recover plaintext. JavaScript strings are immutable, so we can’t guarantee full secure-wipe — that requires an OS-level memory-scrub primitive. What we do guarantee: no references held, no plaintext in on-disk cache.

Single-Tenant Enforcement

When DEPLOYMENT_MODE=self_hosted, every request goes through a tenant guard in the proxy router. If the resolved tenant ID differs from KNOXCALL_TENANT_ID:
  • Request returns 404 (not 403 — we don’t reveal whether the other tenant exists)
  • Control plane refuses to issue session bundles containing any other tenant’s data
Even if an attacker stole credentials from one customer’s self-hosted install and repurposed them to proxy a different tenant’s domain, the binding check denies the session upfront.

Revocation

Admin UI → Automation → Agents → select agent → Revoke.
  • Sets agent_registrations.status = 'revoked' immediately
  • Existing in-memory session remains valid until expires_at (≤ 55 minutes)
  • Next renewal attempt fails with 403 agent_revoked
For faster kill: also rotate MASTER_KEY_B64 — the session key derivation depends on it, so a key rotation invalidates every agent session system-wide within minutes.

What We Don’t Protect Against

Being honest about the limits:
  • Full root on the agent host — if the attacker owns the kernel, they can read the live session key out of process memory. This is true of every secrets-management tool. Mitigate with host hardening (SELinux, AppArmor, minimal-surface containers).
  • Compromised control plane — if knoxcall.com itself is compromised, the attacker has the AGENT_BUILD_SECRET and MASTER_KEY_B64 and can forge anything. Our defence is that the control plane lives behind our security perimeter with a small attack surface and audited operator access.
  • Insider with database access — a KnoxCall operator with prod-DB access can read any tenant’s re-encrypted secrets. Mitigated by break-glass audit logging on all prod access; true air-gap requires the Phase-1B fully self-hosted architecture.
If your threat model requires defending against the last two, reach out about the fully self-hosted roadmap.