Troubleshooting
This page covers common issues with the Crucible agent and Forge dashboard, along with step-by-step solutions.
Crucible service fails to start
Symptom: systemctl status crucible shows failed or inactive (dead).
Steps:
- Check the service logs:
journalctl -u crucible --no-pager -n 50
- If you see
config: parse error, validate the YAML:crucible config check
Common YAML mistakes include using tabs instead of spaces, missing quotes around strings with special characters, and incorrect indentation. - If you see
permission denied, ensure the configuration file is readable:ls -la /etc/glassmkr/collector.yaml
The file should be owned by root with mode 0600 or 0640. - If you see
bind: address already in use, another instance of Crucible may be running:pgrep -a crucible
Kill the stale process and try again.
Server shows "offline" in the dashboard
Symptom: The server card in Forge shows a gray status indicator and "last seen" is more than 5 minutes ago.
Steps:
- Check that Crucible is running:
systemctl status crucible
- Check network connectivity to the API:
curl -s -o /dev/null -w "%{http_code}" https://forge.glassmkr.com/api/v1/healthYou should get200. If not, check DNS resolution, firewall rules, and proxy settings. - Check if the token is valid:
sudo journalctl -u glassmkr-crucible --since "5 min ago" --no-pager
If you seeauth error: 401, generate a new token in the Forge dashboard and update/etc/glassmkr/collector.yaml. - Check for network-level blocks. Some firewalls or security groups block outbound HTTPS. Verify that port 443 to
forge.glassmkr.comis open:nc -zv forge.glassmkr.com 443
- If you are behind a proxy, configure it in
collector.yaml:proxy: https: http://proxy.internal:3128
Metrics are delayed or missing
Symptom: The dashboard shows gaps in charts or data arrives minutes late.
Steps:
- Check the agent's push timing:
sudo journalctl -u glassmkr-crucible --since "5 min ago" --no-pager
The "Last push" value should be close to the configured interval (default: 300 seconds). - If pushes are slow, check the agent log for timeout errors:
grep -i "timeout\|retry" /var/log/glassmkr/crucible.log | tail -20
- If the server's clock is significantly off, metrics may be dropped. Verify NTP is working:
timedatectl status
The system clock should be synchronized. If not, enable NTP:sudo timedatectl set-ntp true
- If specific collectors are slow (e.g., SMART queries on many disks), they can delay the entire push. Check collector timing:
sudo journalctl -u glassmkr-crucible -f
Consider increasing the collection interval or disabling slow collectors.
SMART data is not appearing
Symptom: The Disk tab in the dashboard shows no SMART information.
Steps:
- Ensure
smartmontoolsis installed:# Debian/Ubuntu sudo apt install smartmontools # RHEL/Rocky/Alma sudo dnf install smartmontools
- Verify that
smartctlcan read your drives:sudo smartctl -a /dev/sda
If this fails with a permission error, Crucible needs to run as root (which is the default for the systemd service). - For hardware RAID controllers, drives behind the controller are not visible to
smartctlwithout the-dflag. Check if your controller is supported:sudo smartctl -a /dev/sda -d megaraid,0
- Verify the SMART collector is enabled in
collector.yaml:collectors: smart: enabled: true
Telegram notifications are not arriving
Symptom: Alerts fire in the dashboard but no Telegram messages are received.
Steps:
- Test the channel from the dashboard or API:
curl -X POST https://forge.glassmkr.com/api/v1/channels/CHANNEL_ID/test \ -H "Authorization: Bearer YOUR_TOKEN"
- If the test fails with
401 Unauthorized, the bot token is invalid. Create a new bot with BotFather or regenerate the token. - If the test fails with
400 Bad Request: chat not found, the chat ID is wrong. Common mistakes:- Missing the
-100prefix for supergroups. - The bot was removed from the group after setup.
- The bot has not received any messages in the chat yet (send a message to the bot first).
- Missing the
- If the test succeeds but real alerts do not arrive, check the channel routing. Go to Settings > Alert Defaults and verify that your Telegram channel is listed.
- Check the alert cooldown. By default, Forge only sends one notification per alert per hour. If you acknowledged the alert or it was recently notified, additional notifications are suppressed.
Email notifications go to spam
Symptom: Test emails arrive in the spam folder.
Steps:
- Check the spam folder and mark messages as "not spam" to train your mail provider.
- Add
[email protected]to your contacts or safe senders list. - If you control the recipient domain, add an SPF record allowing Glassmkr's mail servers. Contact support for the current IP ranges.
- For better deliverability, use a custom SMTP server with your own domain. See the Channels page for setup instructions.
Temperature or IPMI data is missing
Symptom: The Hardware tab shows no temperature, fan, or PSU data.
Steps:
- Install
lm-sensorsfor hwmon data:# Debian/Ubuntu sudo apt install lm-sensors sudo sensors-detect --auto
- For IPMI data, install
ipmitool:sudo apt install ipmitool
Verify it works:sudo ipmitool sdr list
- If IPMI is not available (common on consumer hardware and many cloud VMs), Crucible falls back to hwmon. Virtual machines typically have no thermal sensors at all.
- Check that the thermal collector is not disabled:
collectors: thermal: enabled: true source: auto
High CPU usage by Crucible
Symptom: The Crucible process uses more than 1-2% CPU consistently.
Steps:
- Check which collectors are running:
sudo journalctl -u glassmkr-crucible -f
- SMART queries on many disks can be expensive. If you have more than 20 disks, increase the interval or limit which disks are scanned:
collectors: smart: devices: - /dev/sda - /dev/sdb - Per-core CPU metrics on machines with 64+ cores generate a lot of data. Disable per-core reporting if you do not need it:
collectors: cpu: per_core: false - If the collection interval is set very low (e.g., 10 seconds), increase it to reduce overhead:
collectors: interval: 300
Registration fails with "server limit reached"
Symptom: the Forge dashboard ("+ Add Server") returns an error about the server limit.
Steps:
- Check your current plan limits in the Forge dashboard under Settings > Account.
- The Free plan allows up to 3 servers. The Pro plan allows unlimited servers. The Enterprise plan has no limit.
- If you have decommissioned servers that are still registered, delete them from the dashboard to free up slots.
- To upgrade your plan, go to Settings > Account > Billing.
Configuration changes are not taking effect
Symptom: You edited collector.yaml but Crucible still uses the old settings.
Steps:
- Restart the service after any configuration change:
sudo systemctl restart crucible
- Verify the configuration was parsed correctly:
crucible config check
- Check that you edited the correct file. If the
CRUCIBLE_CONFIGenvironment variable is set, it may point to a different location:systemctl show crucible -p Environment
- Environment variables override the config file. Check if any
CRUCIBLE_*variables are set in the systemd unit or the shell environment.
Per-core CPU data is not showing
Symptom: The per-core CPU chart does not appear in the expanded CPU view, or per-core data is missing from AI analysis.
Steps:
- Per-core monitoring requires Crucible 0.3.0 or later. Check your version:
crucible --version
- Ensure per-core monitoring is enabled in the configuration:
collectors: cpu: per_core: true - Restart Crucible after changing the configuration:
sudo systemctl restart crucible
- Wait for the next collection interval (default: 5 minutes) for data to appear.
Muted rules are still firing
Symptom: You muted a rule but it continues to fire alerts or send notifications.
Steps:
- Muting takes effect on the next ingest cycle. Wait for at least one full collection interval (default: 5 minutes) after muting.
- If you muted via the configuration file, restart Crucible for the change to take effect:
sudo systemctl restart crucible
- If you muted via the dashboard, no restart is needed, but the change applies on the next push from that server.
- Verify the rule is muted in the dashboard under the server's Alerts tab. Muted rules show a mute icon.
Getting help
If your issue is not covered here:
- Run
crucible debugto generate a diagnostic bundle. This collects logs, configuration (with tokens redacted), system info, and recent metrics. Attach it when contacting support. - Email [email protected] with your server ID and a description of the issue.