K3s Monitor
The K3s Monitor tool is a comprehensive Python utility designed to collect, analyze and report on resource utilization and performance metrics from a K3s cluster. This tool is particularly useful for diagnosing performance issues, capacity planning and understanding resource consumption patterns in production environments.
Tool Features
- Cluster Resource Monitoring: Collects various resource metrics from nodes and pods
- Component-Specific Monitoring: Tracks resource usage for all K3s Cluster components
- Log Collection: Gathers logs from system services and Kubernetes components
- Automated Analysis: Identifies high resource consumption and potential issues
- Comparative Reporting: Compares current metrics with previous monitoring runs
- Comprehensive Summary: Generates detailed reports with recommendations, ready for AI-assisted analysis with tools like Claude
Prerequisites
The following dependencies are required to run the K3s Monitor tool, automatically deployed with Provisioning playbook:
- Python 3.8+
python3-kubernetes
librarypython3-yaml
librarykubectl
configured to access the K3s clusterjournalctl
for log collectionjq
for JSON processing
Generated Reports
The following reports are generated:
- cilium-metrics.log: Detailed Cilium networking status, endpoints and services information
- cluster-info.log: Basic information about the cluster
- comparison.log: Comparison with previous monitoring runs
- component-metrics.csv: Time-series data for component resource usage
- summary.log: Overall resource usage summary and recommendations
- etcd-metrics.log: Status of HA clusters,
etcd
cluster health and metrics - k3s-monitor.log: Operational log of the monitoring tool itself, including all actions taken during execution
- log-summary.txt: Summary of important log events (errors, warnings)
- pod-metrics.csv: Detailed pod-level resource metrics
- sysctl.txt: System kernel parameter settings
- summary.log: Overall resource usage summary and recommendations
See below the directories and files structure, containing the generated reports.
Note
Submit the generated tarball to Claude, for AI-assisted analysis. Upload the tarball to a chat with Claude and ask for an analysis of your K3s cluster metrics and performance.
- cilium-metrics.log
- cluster-info.log
- comparison.log
- component-metrics.csv
- etcd-metrics.log
- k3s-monitor.log
- log-summary.txt
- pod-metrics.csv
- argo-cd_YYYYMMDD-HHMMSS.log
- cert-manager_YYYYMMDD-HHMMSS.log
- cilium_YYYYMMDD-HHMMSS.log
- coredns_YYYYMMDD-HHMMSS.log
- external-dns_YYYYMMDD-HHMMSS.log
- kured_YYYYMMDD-HHMMSS.log
- longhorn_YYYYMMDD-HHMMSS.log
- metrics-server_YYYYMMDD-HHMMSS.log
- victorialogs_YYYYMMDD-HHMMSS.log
- victoriametrics_YYYYMMDD-HHMMSS.log
- containerd.log
- k3s.log
- kubelet.log
- summary.log
- sysctl.txt
- k3s-monitor-YYYYMMDD-HHMMSS.tar.gz
Tool Usage
Login into one of the server nodes and run the tool:
ssh apollo
sudo k3s-monitor --help
usage: k3s-monitor [-h] [-d DURATION] [-i INTERVAL] [-l LOG_DIR] [-m LOG_MAX_SIZE] [-n NAMESPACE] [-v]
K3s Cluster Monitor
options:
-h, --help show this help message and exit
-d DURATION, --duration DURATION
Total monitoring duration in seconds (default: 3600)
-i INTERVAL, --interval INTERVAL
Time between metric collections in seconds (default: 300)
-l LOG_DIR, --log-dir LOG_DIR
Directory to store logs and reports (default: /var/log/k3s)
-m LOG_MAX_SIZE, --log-max-size LOG_MAX_SIZE
Maximum log file size in MB (default: 50)
-n NAMESPACE, --namespace NAMESPACE
Default namespace (default: kube-system)
-v, --verbose Enable verbose logging (default: False)
See below various K3s Monitor tool usage examples.
Examples
Monitor components for 24 hours with 15-minute intervals:
sudo k3s-monitor --duration 86400 --interval 900
Store logs into a custom directory with verbose output:
sudo k3s-monitor --log-dir /home/user/k3s-monitoring --verbose
Monitor components deployed into a different namespace:
sudo k3s-monitor --namespace monitoring
Run a quick 10-minute check with 1-minute intervals:
sudo k3s-monitor --duration 600 --interval 60
Best Practices
- Regular Monitoring: Run the tool periodically (e.g., weekly) to establish baseline metrics
- After Changes: Run after cluster upgrades or significant workload changes
- Retention: Keep monitoring results for trend analysis
- Size Appropriately: Adjust duration and interval based on cluster size:
- Small clusters: 1-hour duration, 5-minute intervals
- Large clusters: 6-hour duration, 15-minute intervals