Skip to main content

Alerts & Notifications

Get notified immediately when your routes experience failures, high latency, unauthorized access attempts, or rate limit issues.

What Are Alerts?

Alerts monitor your routes in real-time and send notifications when specific conditions are met. Example:
Route: stripe-payments
Condition: 5+ failed requests (5xx errors) within 5 minutes
Action: Send email to [email protected] + Slack #incidents channel
Use cases:
  • 🚨 Detect API integration failures instantly
  • ⏱️ Monitor response time degradation
  • 🔒 Track unauthorized access attempts
  • 📊 Catch rate limit violations
  • 🛠️ Proactive issue resolution before customers complain

Alert Types

KnoxCall supports four types of alerts:

1. Request Failures

What it monitors: Failed requests (5xx status codes) Common scenarios:
  • Backend API is down (502, 503, 504)
  • Internal server errors (500)
  • Backend timeout issues
Example condition:
{
  "threshold": 5,
  "window_minutes": 5,
  "include_status_codes": [500, 502, 503, 504]
}
Triggers when: 5 or more 5xx errors occur within 5 minutes Use case:
Your Stripe integration suddenly starts returning 503 errors.
Alert triggers after 5 failures → Email sent → Team investigates immediately
→ Discover Stripe API outage → Switch to backup payment processor

2. High Latency

What it monitors: Request response times (P95 percentile) Common scenarios:
  • Backend database slow queries
  • Network issues
  • Service degradation
Example condition:
{
  "threshold_ms": 2000,
  "percentile": 95,
  "window_minutes": 5,
  "min_requests": 10
}
Triggers when: P95 latency exceeds 2 seconds (2000ms) over 5 minutes, with at least 10 requests Use case:
User complaints about slow checkout page.
Alert triggers when P95 latency hits 2.5s → Team investigates → Find database index missing
→ Add index → Latency drops to 300ms
Percentiles explained:
  • P50 (median): 50% of requests faster than this
  • P95: 95% of requests faster than this (catches slowest 5%)
  • P99: 99% of requests faster than this (extreme outliers)

3. Rate Limit Exceeded

What it monitors: Requests hitting your configured rate limits Common scenarios:
  • Client sending too many requests
  • Runaway script or bot
  • DDoS attempt
Example condition:
{
  "threshold": 3,
  "window_minutes": 10
}
Triggers when: Rate limit exceeded 3 times within 10 minutes Use case:
Client accidentally deploys infinite loop calling your API.
Alert triggers → Email sent → You contact client → They fix the bug
→ Prevents unnecessary API costs

4. Unauthorized Client

What it monitors: Requests from non-whitelisted IP addresses Common scenarios:
  • Security breach attempt
  • Client using wrong IP after server migration
  • Misconfigured firewall
Example condition:
{
  "immediate": true,
  "threshold": 1,
  "window_minutes": 5
}
Triggers when: Unauthorized IP tries to access route (immediate alert) Use case:
Random IP address attempts to access your internal API.
Alert triggers immediately → Security team reviews → Block malicious IP
→ Prevent potential breach

Notification Channels

Alerts can be sent through multiple channels:

Email Notifications

Configuration: Format:
Subject: [CRITICAL] stripe-payments: Integration Failure
Body:
Alert: Integration Failure
Route: stripe-payments
Severity: Critical
Time: 2025-01-15 10:30:45 UTC

Condition Met: 5+ failures within 5 minutes

Recent Failures:
- 10:30:42 UTC - 503 Service Unavailable
- 10:30:40 UTC - 502 Bad Gateway
- 10:30:38 UTC - 502 Bad Gateway
- 10:30:35 UTC - 503 Service Unavailable
- 10:30:33 UTC - 500 Internal Server Error

View Route: https://admin.knoxcall.com/routes/route_abc123
View Alert: https://admin.knoxcall.com/alerts/alert_def456
Best for: Primary notification channel, detailed information

SMS Notifications

Configuration:
Phone Numbers: +15551234567, +15559876543
Format:
[KnoxCall CRITICAL] stripe-payments: 5 failures in 5min. Check email for details.
Best for: Critical alerts, on-call engineers, immediate attention

Slack Notifications

Configuration:
Webhook URL: https://hooks.slack.com/services/T00000000/B00000000/XXXXXXXXXXXXXXXXXXXX
Format:
🚨 CRITICAL ALERT: Integration Failure

Route: stripe-payments
Severity: Critical
Time: 2025-01-15 10:30:45 UTC

5 failures detected within 5 minutes

Recent errors:
• 503 Service Unavailable (3)
• 502 Bad Gateway (2)

[View Route] [View Alert] [Acknowledge]
Best for: Team collaboration, incident response, threaded discussions

Creating Your First Alert

Step 1: Navigate to Alerts

  1. Click Monitoring in sidebar
  2. Select Alerts
  3. Click + Create Alert

Step 2: Choose a Template

KnoxCall provides pre-configured templates: Integration Failure (Recommended)
  • Alert on any 5xx error
  • Severity: High
  • Cooldown: 15 minutes
  • Great for catching backend issues immediately
Critical Failures
  • Alert on 5+ failures within 5 minutes
  • Severity: Critical
  • Cooldown: 30 minutes
  • Best for production environments
High Latency
  • Alert when P95 latency exceeds 2 seconds
  • Severity: Medium
  • Cooldown: 20 minutes
  • Good for performance monitoring
Unauthorized Access Attempt
  • Immediate alert on unauthorized IP
  • Severity: High
  • Cooldown: 60 minutes
  • Essential for security
Or click Custom Alert to configure from scratch.

Step 3: Configure Alert

Route: Select the route to monitor
stripe-payments
Alert Name:
Stripe Integration Failure
Description (optional):
Alert when Stripe API returns 5xx errors
Alert Type:
Request Failures
Severity:
  • Low: Informational, review later
  • Medium: Important, check within hours
  • High: Urgent, check within 30 minutes
  • Critical: Emergency, check immediately
Alert Conditions (JSON):
{
  "threshold": 3,
  "window_minutes": 5,
  "include_status_codes": [500, 502, 503, 504]
}
Cooldown (minutes):
15
Time before alert can trigger again (prevents spam) Aggregation (minutes):
1
Window for batching multiple events into one alert

Step 4: Configure Notifications

Enable Email

☑ Email Notifications
Email Addresses: [email protected], [email protected]

Enable SMS (optional)

☐ SMS Notifications
Phone Numbers: +15551234567

Enable Slack (optional)

☐ Slack Notifications
Webhook URL: https://hooks.slack.com/services/...
At least one channel required. You can enable multiple channels to ensure alerts are seen.

Step 5: Custom Message Templates (Optional)

Customize notification content: Email Subject Template:
[{{SEVERITY}}] {{ROUTE_NAME}}: {{ALERT_NAME}}
Email Body Template:
Alert triggered: {{ALERT_NAME}}
Route: {{ROUTE_NAME}}
Severity: {{SEVERITY}}
Time: {{TIMESTAMP}}

Condition: {{CONDITION_DESCRIPTION}}

{{TRIGGER_DETAILS}}

View: https://admin.knoxcall.com/alerts/{{ALERT_ID}}
Available variables:
  • {{ALERT_NAME}} - Alert name
  • {{ROUTE_NAME}} - Route name
  • {{SEVERITY}} - low/medium/high/critical
  • {{TIMESTAMP}} - When alert triggered
  • {{CONDITION_DESCRIPTION}} - Human-readable condition
  • {{TRIGGER_DETAILS}} - Specific trigger info (errors, latency, etc.)
  • {{ALERT_ID}} - Alert ID
  • {{ROUTE_ID}} - Route ID

Step 6: Create Alert

Click Create Alert Alert is now active and monitoring your route! 🎉

Alert States

Alerts have three states:

1. OK (Green)

Meaning: Condition is not met, everything normal Example:
Route: stripe-payments
Failures: 0 in last 5 minutes
State: OK ✓

2. Triggered (Red)

Meaning: Condition met, notification sent Example:
Route: stripe-payments
Failures: 6 in last 5 minutes (threshold: 5)
State: TRIGGERED ⚠️
Notification sent: 2025-01-15 10:30:45 UTC

3. Cooldown (Yellow)

Meaning: Recently triggered, waiting for cooldown period Example:
Route: stripe-payments
Last triggered: 5 minutes ago
Cooldown: 15 minutes
State: COOLDOWN (10 minutes remaining)
Purpose of cooldown:
  • Prevents notification spam
  • Gives time to fix issue
  • Won’t trigger again until cooldown expires

Advanced Configuration

Condition Schemas

Different alert types have different configuration options:

Request Failures Schema

{
  "threshold": 5,                        // Number of failures to trigger
  "window_minutes": 5,                   // Time window to count failures
  "include_status_codes": [500, 502, 503, 504],  // Which status codes count as failures
  "exclude_status_codes": []             // Optional: exclude specific codes
}

High Latency Schema

{
  "threshold_ms": 2000,                  // Latency threshold in milliseconds
  "percentile": 95,                      // Percentile to monitor (50, 95, 99)
  "window_minutes": 5,                   // Time window to calculate percentile
  "min_requests": 10                     // Minimum requests needed (prevents false positives)
}

Rate Limit Exceeded Schema

{
  "threshold": 3,                        // Number of rate limit hits to trigger
  "window_minutes": 10                   // Time window to count hits
}

Unauthorized Client Schema

{
  "immediate": true,                     // Trigger on first occurrence
  "threshold": 1,                        // Usually 1 for immediate alerts
  "window_minutes": 5                    // Window for counting attempts
}

Multi-Status Code Filtering

Include only specific errors:
{
  "threshold": 3,
  "window_minutes": 5,
  "include_status_codes": [503, 504]     // Only 503 and 504
}
Exclude specific errors:
{
  "threshold": 3,
  "window_minutes": 5,
  "include_status_codes": [500, 502, 503, 504],
  "exclude_status_codes": [503]          // Ignore 503, alert on others
}
Use case: Ignore expected 503 errors during maintenance.

Latency Percentiles

P50 (Median):
{
  "threshold_ms": 500,
  "percentile": 50,
  "window_minutes": 5,
  "min_requests": 50
}
Alert if typical request exceeds 500ms P95 (Recommended):
{
  "threshold_ms": 2000,
  "percentile": 95,
  "window_minutes": 5,
  "min_requests": 10
}
Alert if slowest 5% exceed 2 seconds P99 (Outliers):
{
  "threshold_ms": 5000,
  "percentile": 99,
  "window_minutes": 5,
  "min_requests": 100
}
Alert on extreme outliers (slowest 1%)

Alert Management

Viewing Alert Status

Navigate to: Monitoring → Alerts List view shows:
  • Alert name and description
  • Route name
  • Current state (OK / Triggered / Cooldown)
  • Severity
  • Trigger count (24h)
  • Last triggered time
  • Enabled/Disabled status
Filters:
  • Severity: Low, Medium, High, Critical
  • State: OK, Triggered, Cooldown
  • Enabled: All, Enabled, Disabled
  • Alert Type: Failures, Latency, Rate Limit, Unauthorized
  • Activity: 0 triggers, 1-5, 6-20, 20+

Viewing Alert Details

Click alert name to see: Overview:
  • Current state
  • Trigger history graph
  • Recent triggers list
Configuration:
  • Alert type and conditions
  • Notification channels
  • Cooldown and aggregation settings
Logs:
  • When alert triggered
  • Notification delivery status
  • Error details that triggered alert

Editing Alerts

  1. Navigate to alert details
  2. Click Edit Alert
  3. Modify settings:
    • Change threshold
    • Update notification emails/phones
    • Adjust cooldown
    • Change severity
  4. Click Save Changes
Changes take effect immediately.

Disabling Alerts

Temporarily disable:
  1. Navigate to alert details
  2. Toggle Enabled switch to OFF
  3. Alert stops monitoring (won’t trigger)
Use cases:
  • Scheduled maintenance
  • Known issue being fixed
  • Testing changes without spam
Re-enable:
  1. Toggle Enabled switch to ON
  2. Alert resumes monitoring

Deleting Alerts

  1. Navigate to alert details
  2. Click Delete Alert
  3. Confirm deletion
Deleting an alert is permanent! All trigger history and configuration is lost. Consider disabling instead if you might need it later.

Alert Logs

View alert trigger history: Navigate to: Monitoring → Alert Logs Shows:
  • When alert triggered
  • Which route
  • Severity
  • Trigger details (error messages, latency values, etc.)
  • Notification channels used
  • Delivery status (sent, failed)
Filter by:
  • Date range
  • Route
  • Severity
  • Alert name
Use cases:
  • Audit notification history
  • Troubleshoot missed alerts
  • Analyze incident patterns

Best Practices

1. Start with Templates

Use built-in templates when creating your first alerts Why:
  • Pre-configured with sensible defaults
  • Battle-tested thresholds
  • Clear descriptions
Don’t create custom alerts immediately

2. Set Appropriate Thresholds

Too sensitive (spam):
{
  "threshold": 1,
  "window_minutes": 1
}
Triggers on single error = too noisy Too lenient (miss issues):
{
  "threshold": 50,
  "window_minutes": 60
}
50 errors in an hour might mean users already frustrated Just right:
{
  "threshold": 5,
  "window_minutes": 5
}
Catches issues quickly without false positives

3. Use Severity Correctly

Critical:
  • Production payment processing down
  • Complete API outage
  • Security breach attempt
High:
  • Partial service degradation
  • Single integration failing
  • Elevated error rate
Medium:
  • Performance degradation
  • Non-critical API slow
  • Occasional errors
Low:
  • Informational
  • Minor issues
  • For tracking/trending

4. Configure Cooldowns

Problem: Alert triggers every minute → 60 emails in 1 hour Solution: Use cooldown
Cooldown: 30 minutes
Result: 1 alert, wait 30 min, then can alert again if still failing
Recommended cooldowns:
  • Critical alerts: 30-60 minutes
  • High alerts: 15-30 minutes
  • Medium alerts: 60 minutes
  • Low alerts: 2-4 hours

5. Use Multiple Notification Channels

Redundancy strategy:
Critical Alert:
- Email: [email protected]
- SMS: +15551234567 (on-call engineer)
- Slack: #incidents channel
Why:
  • Email might be missed
  • SMS ensures immediate attention
  • Slack allows team collaboration

6. Test Your Alerts

Before going live:
  1. Create test alert with low threshold:
    {
      "threshold": 1,
      "window_minutes": 1
    }
    
  2. Trigger condition (e.g., send request that returns 500)
  3. Verify notifications received:
    • Check email inbox
    • Check SMS received
    • Check Slack message
  4. Adjust configuration if needed
  5. Set production thresholds

7. Monitor Alert Logs

Weekly review:
  • Which alerts triggered most?
  • Any false positives?
  • Any missed incidents?
Adjust thresholds based on patterns.

8. Document Your Alerts

In alert description, include:
Description: Alerts when Stripe API returns 5xx errors.
Indicates Stripe service outage or network issues between KnoxCall and Stripe.
Runbook: https://wiki.company.com/runbooks/stripe-outage
Benefits:
  • Team knows what alert means
  • Clear action steps
  • Faster incident resolution

Common Alert Scenarios

Scenario 1: Backend API Outage

Problem: Stripe API completely down Alert Configuration:
Alert Type: Request Failures
Condition: {
  "threshold": 3,
  "window_minutes": 5,
  "include_status_codes": [502, 503, 504]
}
Severity: Critical
Cooldown: 30 minutes
Channels: Email + SMS + Slack
Expected behavior:
  • 3 failures → Alert triggers
  • Notifications sent to all channels
  • Team investigates
  • Issue resolved or escalated to Stripe
  • 30 minutes later, if still failing, alert again

Scenario 2: Gradual Performance Degradation

Problem: Database queries getting slower over time Alert Configuration:
Alert Type: High Latency
Condition: {
  "threshold_ms": 1000,
  "percentile": 95,
  "window_minutes": 10,
  "min_requests": 20
}
Severity: Medium
Cooldown: 60 minutes
Channels: Email + Slack
Expected behavior:
  • P95 latency hits 1.5s → Alert triggers
  • Team reviews logs
  • Identifies slow query
  • Optimizes or adds caching
  • Latency returns to normal

Scenario 3: Security Incident

Problem: Unknown IP trying to access internal API Alert Configuration:
Alert Type: Unauthorized Client
Condition: {
  "immediate": true,
  "threshold": 1,
  "window_minutes": 5
}
Severity: High
Cooldown: 60 minutes
Channels: Email + SMS + Slack
Expected behavior:
  • Unauthorized IP makes request → Immediate alert
  • Security team reviews
  • IP blocked if malicious
  • Client contacted if legitimate (e.g., moved servers)

Scenario 4: Rate Limit Abuse

Problem: Client’s script gone rogue, hitting rate limits Alert Configuration:
Alert Type: Rate Limit Exceeded
Condition: {
  "threshold": 5,
  "window_minutes": 10
}
Severity: Medium
Cooldown: 30 minutes
Channels: Email
Expected behavior:
  • Client hits rate limit 5 times in 10 min → Alert
  • Review which client
  • Contact client
  • They fix infinite loop
  • Rate limit stops triggering

Troubleshooting

Issue: “Alert not triggering”

Check:
  1. Alert is enabled (not disabled)
  2. Route is active (not disabled)
  3. Condition threshold is correct (not too high)
  4. Check alert state (might be in cooldown)
Debug:
Route: stripe-payments
Alert: Stripe Integration Failure
State: Cooldown (15 minutes remaining)
Last triggered: 10 minutes ago
Alert won’t trigger again until cooldown expires

Issue: “Too many notifications”

Cause: Threshold too low or cooldown too short Fix:
  1. Increase threshold: 1 → 5
  2. Increase cooldown: 5 minutes → 30 minutes
  3. Increase window: 1 minute → 5 minutes

Issue: “Not receiving email notifications”

Check:
  1. Email addresses correct (no typos)
  2. Check spam folder
  3. Email channel enabled
  4. Alert logs show “sent” status
Debug: Navigate to Alert Logs → Find trigger event → Check delivery status

Issue: “Slack notifications not working”

Check:
  1. Webhook URL correct (starts with https://hooks.slack.com)
  2. Slack channel enabled in alert config
  3. Webhook not revoked in Slack settings
Test webhook:
curl -X POST https://hooks.slack.com/services/YOUR/WEBHOOK/URL \
  -H "Content-Type: application/json" \
  -d '{"text": "Test from KnoxCall"}'
Should see message in Slack channel.
  • API Logs: View detailed request history that triggered alerts
  • Analytics: Visualize trends and patterns in alert triggers
  • Audit Logs: Track who created/modified alerts

Next Steps


📊 Statistics

  • Level: beginner to intermediate
  • Time: 15 minutes

🏷️ Tags

alerts, monitoring, notifications, incidents, email, sms, slack