Spyd Documentation - AI-Powered Server Monitoring

Getting Started

Installation

One-Line Install BETA

curl -fsSL spyd.sh | sh

Downloads and installs the latest Spyd binary for your platform (Linux/macOS). Supports both x86_64 and ARM64 architectures.

Beta Notice: Spyd is currently in beta. We recommend using it on test/staging servers first before deploying to production.

Quick Start

# 1. Install Spyd $ curl -fsSL spyd.sh | sh Downloading spyd v0.1.0 for linux-amd64... Installing to /usr/local/bin/spyd... Spyd installed successfully! # 2. Initialize configuration $ spyd init Welcome to Spyd Setup! ? AI Provider: OpenAI ? API Key: sk-... ? CPU Threshold (%): 80 ? Memory Threshold (%): 85 ? Disk Threshold (%): 90 Configuration saved to ~/.config/spyd/config.yaml # 3. Start the daemon $ spyd start Daemon started (PID: 12345) # 4. Check status $ spyd status Daemon: Running (PID: 12345) CPU: 45.2% | Memory: 62.1% | Disk: 55.3% Load: 1.2, 0.8, 0.6 AI: Last checked 2m ago - System healthy

Tip: Run spyd init interactively to configure AI provider, monitoring thresholds, and notification channels all at once.

How It Works

Architecture

Spyd runs as a background daemon that continuously monitors your server, detects anomalies, and provides AI-powered diagnosis when things go wrong.

Metrics Collector

Logs Watcher

Baselines Learner

▼

Anomaly Detector

▼

AI Investigator (Claude / GPT-4)

▼

Alert Manager

▼

Email Slack Telegram Discord Desktop

What Spyd Monitors

System Metrics

Collected every 60 seconds (configurable):

CPU

Overall %, per-core breakdown

Memory

RAM usage, swap usage

Disk

Per-partition usage

Load

1m, 5m, 15m averages

Processes

Top 20 by CPU with full command

Log Monitoring

Real-time pattern matching on configured log files:

Error Patterns: ERROR, FATAL, CRITICAL, Exception, panic

Security Patterns: authentication failed, permission denied, unauthorized

Performance Patterns: timeout, slow query, out of memory, OOM

Anomaly Detection

Spyd uses three complementary detection methods to catch issues:

1. Threshold Detection

Simple, fast checks against configured limits:

CPU > 80% → Warning Memory > 85% → Warning Disk > 90% → Warning Any > 95% → Critical

2. Statistical Detection (Baseline Learning)

Learns what's "normal" for YOUR server over ~7 days, then flags deviations:

# Example: Server normally uses 20-40% CPU Mean: 30% | StdDev: 5% # Suddenly CPU hits 60% Z-score = (60 - 30) / 5 = 6.0 Z-score > 3.0 → Anomaly! # Even though 60% is below threshold, # it's unusual for THIS server

3. Log Pattern Detection

Real-time matching triggers immediate anomalies:

# Log line appears: "sshd: Failed password for root from 192.168.1.100" ↓ Matches: "Failed password" ↓ Security anomaly created → Alert sent

4. Anomaly Prediction

Spyd analyzes trends to warn you before thresholds are breached:

# Sliding window tracks last 10 samples CPU: 60% → 64% → 68% → 72% → 76% # Linear regression detects upward trend Slope: +4% per interval | Confidence: 95% # Threshold: 80% — breach predicted in ~4 mins → Proactive alert sent BEFORE threshold hit

Proactive Alerting: Get warned about emerging issues before they become critical. Prediction only alerts when trend confidence is ≥70%.

AI Diagnosis

When an anomaly is detected, Spyd gathers context and consults AI for root cause analysis:

Context Provided to AI

Anomaly Details

Type, metric, actual vs threshold, severity

System State

Current CPU, memory, disk, load snapshot

Top Processes

PID, name, CPU%, memory%, user, full command

Correlation Hints

High CPU + High Load = CPU-bound, etc.

Process Patterns

Fork bombs, single dominators, suspicious users

Recent Logs

Matched error/security events

AI Capabilities

The AI is configured as an L3+ Senior SysAdmin who:

1. Recognizes attack signatures (crypto miners, DDoS, brute force, fork bombs)

2. Performs triage (urgency, blast radius, trend analysis)

3. Identifies root cause with confidence level

4. Provides actionable fix commands

Smart Alerting: Alerts are deduplicated (same alert suppressed for 5 min) and throttled (max 10/hour) to prevent notification fatigue. Only actionable alerts reach you.

Investigation Modes

Spyd offers three investigation modes that balance speed, cost, and thoroughness:

Basic Mode

Fast analysis using metrics and process list only. Best for routine warnings.

Anomaly → AI analyzes metrics + processes → Diagnosis # Single API call, fastest response

Hybrid Mode

Thorough investigation where AI can request diagnostic commands:

Anomaly → AI analyzes → Requests commands → Spyd executes: lsof -p 1234, ps aux... → AI receives output → Final diagnosis # Multi-round, deeper root cause analysis

Safe Execution: Only whitelisted read-only commands are allowed. Rate limited to 30 commands/minute.

Auto Mode (Default)

Automatically selects mode based on severity:

Warning severity → Basic mode (fast, cheap) Critical severity → Hybrid mode (thorough)

AI Response Caching

Similar anomalies get instant cached responses, reducing API costs:

How Caching Works

Cache key = metric + severity + top 3 processes

# First occurrence: CPU critical, nginx/mysql/php-fpm Cache MISS → Call AI API → Store response (TTL: 30 min) # Same anomaly within 30 minutes: Cache HIT → Return cached diagnosis instantly No API call → Faster response, zero cost # Different processes = different cache key: CPU critical, node/python/go → New API call

Command Reference

Core Commands

init

spyd init

Interactive setup wizard that guides you through initial configuration including AI provider, monitoring thresholds, and notification channels.

$ spyd init Welcome to Spyd Setup! AI Configuration ? Enable AI analysis? Yes ? AI Provider: OpenAI ? API Key: sk-... ? Model: gpt-4o-mini Monitoring Thresholds ? CPU Threshold (%): 80 ? Memory Threshold (%): 85 ? Disk Threshold (%): 90 Configuration saved to ~/.config/spyd/config.yaml

start

spyd start

Starts the Spyd daemon in the background. The daemon collects system metrics every 60 seconds (configurable) and triggers AI analysis when thresholds are exceeded.

$ spyd start Daemon started (PID: 12345)

stop

spyd stop

Gracefully stops the running daemon. Completes the current metric collection cycle before shutting down.

$ spyd stop Daemon stopped

restart

spyd restart

Stops and restarts the daemon. Useful after configuration changes.

$ spyd restart Daemon stopped Daemon started (PID: 12346)

status

spyd status

Displays current system health snapshot including CPU, memory, disk usage, load average, and the most recent AI analysis.

$ spyd status +---------------------------------------------------------------+ | Spyd Status | +---------------------------------------------------------------+ Daemon: Running (PID: 12345) Uptime: 2h 15m 30s System Metrics: CPU: 45.2% [========----------] Memory: 62.1% [============------] Disk: 55.3% [===========-------] Load: 1.2, 0.8, 0.6 AI Analysis: Last checked: 2m ago Status: System healthy Provider: OpenAI (gpt-4o-mini) Notifications: Telegram: Enabled Email: Enabled

start --foreground

spyd start -f

Runs daemon in foreground with verbose logging. Useful for debugging or running in Docker/systemd.

$ spyd start --foreground Spyd daemon started (PID: 12345) 2024-01-15T10:30:00Z INF Collecting metrics... 2024-01-15T10:30:01Z INF CPU: 45.2%, Memory: 62.1%, Disk: 55.3% 2024-01-15T10:30:01Z INF All metrics within thresholds

Monitoring & Logs

logs

spyd logs [-n COUNT] [--since DURATION]

View detected log events that matched configured patterns (ERROR, FATAL, etc.).

# Show recent log events (default: 20) $ spyd logs # Show last 50 events $ spyd logs -n 50 # Events from last hour $ spyd logs --since 1h

alerts

spyd alerts [-n COUNT] [--since DURATION] [--ack ID]

View and manage alerts generated by Spyd.

# Show recent alerts (default: last 20) $ spyd alerts Alerts (last 5): ================= 2024-01-15 10:30:00 [critical] ID: abc12345 CPU usage 95.3% exceeds threshold 80.0% Metric: cpu | Value: 95.30 | Threshold: 80.00 AI: Fork bomb detected... # Alerts from last 24 hours $ spyd alerts --since 24h # Acknowledge an alert $ spyd alerts --ack abc12345

explain

spyd explain "question" [--context "additional context"]

Ask AI about a specific error, log, or topic. Get expert-level explanations for server issues.

$ spyd explain "what does OOM killer mean?" Asking AI (openai)... =============================================================== AI EXPLANATION =============================================================== The OOM (Out of Memory) Killer is a Linux kernel mechanism that activates when the system runs critically low on memory. It selects and terminates processes to free up RAM and prevent a complete system crash. When triggered, you'll see messages like: "Out of memory: Killed process 1234 (nginx)" Common causes: 1. Memory leak in an application 2. Insufficient RAM for workload 3. Too many processes running simultaneously To investigate: $ dmesg | grep -i "out of memory" $ grep -i "killed process" /var/log/syslog # With additional context $ spyd explain "nginx error: upstream timed out" --context "high traffic spike"

test-alert

spyd test-alert [--channel CHANNEL] [--severity LEVEL]

Send a test alert through configured notification channels to verify setup.

# Test all enabled channels $ spyd test-alert +---------------------------------------------------------------+ | Spyd Test Alert | +---------------------------------------------------------------+ Sending test alert... Severity: warning Message: This is a test notification to verify your Spyd alert configuration. Channel: all Channels tested: telegram: @your_chat_id email: [admin@example.com] # Test specific channel $ spyd test-alert --channel telegram # Test with critical severity $ spyd test-alert --severity critical

analyze

spyd analyze [--mode MODE]

Trigger manual AI analysis of current system state. Collects metrics, runs anomaly detection, and provides AI insights.

# Analyze current state (auto mode) $ spyd analyze # Quick analysis without running commands $ spyd analyze --mode basic # Thorough analysis with diagnostic commands $ spyd analyze --mode hybrid

Configuration Commands

configure ai

spyd configure ai [flags]

Update AI provider settings. Shows full configuration summary when run without flags.

# Show current AI configuration $ spyd configure ai # Update provider and model $ spyd configure ai --provider openai --model gpt-4o-mini # Enable/disable AI command execution $ spyd configure ai --command-enable $ spyd configure ai --command-disable

configure monitoring

spyd configure monitoring

Interactive configuration for monitoring thresholds (CPU, Memory, Disk, Load Average).

configure alerts

spyd configure alerts [--channel CHANNEL]

Set up notification channels. Use --channel flag to configure a specific channel directly.

# Interactive mode (all channels) $ spyd configure alerts # Configure specific channel $ spyd configure alerts --channel telegram $ spyd configure alerts --channel email $ spyd configure alerts --channel slack $ spyd configure alerts --channel discord $ spyd configure alerts --channel desktop $ spyd configure alerts --channel webhook

configure show

spyd configure show

Display complete configuration summary including AI settings, monitoring thresholds, alert channels, and storage settings.

$ spyd configure show Configuration: ~/.config/spyd/config.yaml AI: Provider: openai Model: gpt-4o-mini Command Execution: enabled Monitoring: Interval: 60s CPU Threshold: 80% Memory Threshold: 85% Disk Threshold: 90% Notifications: Telegram: Enabled (chat: -123456789) Email: Enabled (to: admin@example.com) Slack: Disabled Discord: Disabled Desktop: Disabled Storage: Database: ~/.local/share/spyd/spyd.db Retention: 30 days

Utility Commands

upgrade

spyd upgrade [--check] [--force]

Check for and install the latest version. Automatically stops the daemon, installs the new binary, and restarts the daemon.

$ spyd upgrade Current version: 0.1.0 Checking for updates... Latest version: 0.2.0 Downloading spyd_linux_amd64... Stopping daemon... Installing new version... Restarting daemon... Successfully upgraded to v0.2.0! # Check only (don't upgrade) $ spyd upgrade --check # Force upgrade even if same version $ spyd upgrade --force

uninstall

spyd uninstall [--yes]

Complete uninstallation with zero traces. Removes binary, configuration, data, logs, and service files. Use --yes to skip confirmation prompts.

$ spyd uninstall Spyd Complete Uninstaller This will remove Spyd and all its data with zero traces. Continue with complete uninstall? [y/N] y [1/7] Stopping Spyd daemon... Stopping daemon (PID: 12345)... [2/7] Removing systemd services... [3/7] Skipping launchd (not macOS) [4/7] Removing binary... Removed: /usr/local/bin/spyd [5/7] Removing v1 artifacts... [6/7] Removing configuration, data, and logs... Remove configuration? [y/N] y Removed: ~/.config/spyd Remove data and database? [y/N] y Removed: ~/.local/share/spyd [7/7] Cleaning up PID files... Verifying cleanup... Spyd has been completely uninstalled. # Skip all confirmations $ spyd uninstall --yes

version

spyd version [-f|--full]

Show version information. Use --full for banner and build details.

$ spyd version Spyd Version 0.1.0 $ spyd version --full [ASCII banner] Version: 0.1.0 Commit: abc1234 Built: 2024-01-15T10:00:00Z Go: go1.21.5

doctor

spyd doctor

Diagnose configuration and connectivity issues. Checks AI provider, notification channels, and system requirements.

$ spyd doctor Spyd Doctor - Diagnosing your setup... [Configuration] Config file: ~/.config/spyd/config.yaml Status: Valid [AI Provider] Provider: OpenAI API Key: sk-...xxxx (set) Connection: OK [Notifications] Telegram: OK (message sent) Email: OK (SMTP connection successful) [System] OS: linux Architecture: amd64 Memory: 8GB Disk: 100GB free All checks passed!

Configuration

Configuration file location: ~/.config/spyd/config.yaml

Full Configuration Example

# Server identification server: name: my-server # AI Configuration ai: enabled: true provider: openai # openai, anthropic, ollama, litellm, azure api_key: sk-... model: gpt-4o-mini timeout: 30 # seconds investigation_mode: auto # basic, hybrid, auto hybrid_max_commands: 5 cache: enabled: true ttl_minutes: 30 # Monitoring Thresholds monitoring: interval: 60 # seconds between checks thresholds: cpu_percent: 80 memory_percent: 85 disk_percent: 90 load_average: 0 # 0 = auto (based on CPU cores) baseline: enabled: true learning_days: 7 process_count: 20 # top processes to track prediction: enabled: true # proactive anomaly prediction window_size: 10 # samples for trend analysis horizon_mins: 30 # predict this far ahead # Alert Settings alerts: dedup_window: 300 # seconds (5 minutes) max_per_hour: 10 min_severity: warning # info, warning, critical # Storage storage: db_path: ~/.local/share/spyd/spyd.db retention_days: 30 max_size_mb: 500 # Daemon daemon: pid_file: /tmp/spyd.pid log_level: info # debug, info, warn, error health: enabled: true port: 19999

Note: AI command execution is enabled by default. The AI can only run safe, read-only commands (ps, df, tail, etc.) to investigate issues. All commands are whitelisted and audited.

Alert Channels

Spyd supports 6 notification channels. Configure them using spyd configure alerts or by editing the config file.

📧

Email

💬

Slack

📱

🎮

Discord

🔔

Desktop (ntfy)

🔗

Webhooks

Email

SMTP-based email notifications. Supports Gmail, Outlook, and custom SMTP servers.

notifications: email: enabled: true smtp_host: smtp.gmail.com smtp_port: 587 smtp_user: alerts@example.com smtp_pass: your-app-password from: spyd@yourserver.com to: - admin@example.com - ops@example.com use_tls: true

📧 Sample Email Alert

Subject: [Spyd] CRITICAL - cpu on my-server CRITICAL - cpu CPU usage 95.3% exceeds threshold 80.0% Server: my-server Time: 2024-01-15 10:30:00 WHAT HAPPENED: Your server is being hammered by automated scripts. Multiple bash processes are consuming all CPU - this looks like a stress test or denial-of-service attempt. Confidence: 90% FIX BY RUNNING: $ kill -15 3685587 3685588 3685589 3685590 CAUTION: Review command before executing. Ensure you understand what it does. -- Spyd Monitoring

Slack

Send alerts to Slack channels using incoming webhooks.

notifications: slack: enabled: true webhook_url: https://hooks.slack.com/services/T00/B00/XXX channel: #alerts # optional, overrides webhook default

Setup: Create an incoming webhook in your Slack workspace at api.slack.com/messaging/webhooks

💬 Sample Slack Alert

┌─────────────────────────────────────────────────────┐ │ 🚨 CRITICAL - cpu │ ├─────────────────────────────────────────────────────┤ │ CPU usage 95.3% exceeds threshold 80.0% │ │ │ │ *WHAT HAPPENED:* │ │ Your server is being hammered by automated scripts. │ │ Multiple bash processes are consuming all CPU - │ │ this looks like a stress test or denial-of-service │ │ attempt. _Confidence: 90%_ │ │ │ │ *FIX BY RUNNING:* │ │ `kill -15 3685587 3685588 3685589 3685590` │ │ │ │ :warning: _CAUTION: Review command before │ │ executing._ │ ├─────────────────────────────────────────────────────┤ │ Spyd • my-server Jan 15, 10:30 │ └─────────────────────────────────────────────────────┘

Instant notifications via Telegram bot. Create a bot with @BotFather.

notifications: telegram: enabled: true bot_token: 123456789:ABC-DEF1234ghIkl-zyx57W2v1u123ew11 chat_id: 123456789 # or -100123456789 for groups

Setup:

Message @BotFather on Telegram
Send /newbot and follow the prompts
Copy the bot token
Start a chat with your bot or add it to a group
Get your chat ID by messaging @userinfobot

📱 Sample Telegram Alert

🚨 CRITICAL - cpu CPU usage 95.3% exceeds threshold 80.0% WHAT HAPPENED: Your server is being hammered by automated scripts. Multiple bash processes are consuming all CPU - this looks like a stress test or denial-of-service attempt. Confidence: 90% FIX BY RUNNING: kill -15 3685587 3685588 3685589 3685590 ⚠️ CAUTION: Review command before executing.

Discord

Send rich embed notifications to Discord channels via webhooks.

notifications: discord: enabled: true webhook_url: https://discord.com/api/webhooks/123/abc...

Setup: In Discord, go to Server Settings → Integrations → Webhooks → New Webhook. Copy the webhook URL.

🎮 Sample Discord Alert

┌──────────────────────────────────────────────────────┐ │ 🚨 CRITICAL - cpu [RED] │ ├──────────────────────────────────────────────────────┤ │ CPU usage 95.3% exceeds threshold 80.0% │ │ │ │ **WHAT HAPPENED:** │ │ Your server is being hammered by automated scripts. │ │ Multiple bash processes are consuming all CPU - │ │ this looks like a stress test or denial-of-service │ │ attempt. *Confidence: 90%* │ │ │ │ **FIX BY RUNNING:** │ │ `kill -15 3685587 3685588 3685589 3685590` │ │ │ │ ⚠️ *CAUTION: Review command before executing.* │ ├──────────────────────────────────────────────────────┤ │ Spyd • my-server 2024-01-15T10:30 │ └──────────────────────────────────────────────────────┘

Desktop (ntfy)

Cross-platform push notifications via ntfy.sh. Works on phones, desktops, and browsers.

notifications: desktop: enabled: true server: https://ntfy.sh # or self-hosted topic: my-server-alerts # your unique topic token: # optional, for private topics priority: # optional override (urgent, high, default)

Setup:

Install the ntfy app on your phone or desktop
Subscribe to your topic (e.g., my-server-alerts)
Configure Spyd with the same topic name
That's it! Alerts will push to all subscribed devices

🔔 Sample Desktop (ntfy) Alert

Title: 🚨 CRITICAL - cpu Priority: urgent Tags: rotating_light, skull CPU usage 95.3% exceeds threshold 80.0% WHAT HAPPENED: Your server is being hammered by automated scripts. Multiple bash processes are consuming all CPU - this looks like a stress test or denial-of-service attempt. Confidence: 90% FIX BY RUNNING: $ kill -15 3685587 3685588 3685589 3685590 ⚠️ CAUTION: Review command before executing.

Webhooks

Generic HTTP webhooks for custom integrations, PagerDuty, Opsgenie, or any service that accepts webhooks.

notifications: webhook: enabled: true url: https://your-service.com/webhook/spyd method: POST # POST or GET

🔗 Sample Webhook Payload (JSON)

{ "id": "alert-1705312200", "title": "CRITICAL - cpu", "message": "CPU usage 95.3% exceeds threshold 80.0%", "severity": "critical", "metric": "cpu", "value": 95.3, "threshold": 80.0, "server": "my-server", "timestamp": "2024-01-15T10:30:00Z", "diagnosis": { "summary": "Your server is being hammered by automated scripts...", "root_cause": "Fork bomb pattern detected. PIDs 3685587, 3685588...", "suggestions": [ "kill -15 3685587 3685588 3685589 3685590" ], "confidence": 0.90 } }

Testing Alerts: Use spyd test-alert to verify your notification setup is working correctly before relying on it for real alerts.

AI Providers

Spyd supports multiple AI providers. Choose based on your needs: cloud providers for convenience, or local models for privacy.

OpenAI

Fast, accurate analysis with GPT-4o-mini or other OpenAI models. Requires API key.

ai: enabled: true provider: openai api_key: sk-... model: gpt-4o-mini # or gpt-4o, gpt-4-turbo

Anthropic (Claude)

Claude models for detailed, nuanced analysis. Requires API key.

ai: enabled: true provider: anthropic api_key: sk-ant-... model: claude-3-haiku-20240307

Ollama (Local)

Run AI locally with Ollama. 100% private, zero cost, no API key required.

ai: enabled: true provider: ollama base_url: http://localhost:11434 # default model: llama3 # or mistral, codellama, etc.

# Install Ollama and pull a model first $ curl -fsSL https://ollama.ai/install.sh | sh $ ollama pull llama3

LiteLLM

Unified API for 100+ LLM providers. Use any model through a single interface.

ai: enabled: true provider: litellm base_url: http://localhost:4000 model: gpt-4o-mini

Azure OpenAI

For enterprise deployments with Azure compliance requirements.

ai: enabled: true provider: azure api_key: your-azure-key base_url: https://your-resource.openai.azure.com model: your-deployment-name

Privacy Note: When using cloud providers (OpenAI, Anthropic, Azure), only redacted system summaries are sent for analysis. Sensitive information (usernames, paths, IPs) is automatically removed using configurable privacy patterns.

Help & Support

Built-in Help

spyd help

Show all available commands.

# General help $ spyd help # Command-specific help $ spyd configure --help $ spyd alerts --help

Need Help?

Run spyd doctor to diagnose configuration issues
Run spyd configure show to verify your settings
Check logs with spyd logs --tail 100
Report issues at github.com/rz0re/spyd/issues