Release Date: 30 May 2023
OverviewThis release introduces user configurable metric alerting, along with a set of standard system alerts. Alerts can also be sent as notifications to email or webhooks. It also includes support for extended metric retention (30 days) along with a number of additional small enhancements and bug fixes.
New Intelligent Monitoring framework
- Ability to view all alert conditions for a DBMS.
- Ability to enable, disable and delete conditions as required.
- Ability to create custom alert conditions based on thresholds for metrics. This is done via the charts on the Metrics dashboard.
- Provision of pre-configured event based alert conditions for Agents
- Unreachable/Offline agents detected
- Incompatible agent connected
- Agent update failed
- Failed to parse config
- Metric collection failed
- Failed read configuration
- Provision of pre-configured event based alert conditions for Instances
- Offline instances detected
- Metric collection failed
- Bloom license error
- Could not connect to query log streaming
- Query log buffer overflow
- Failed to parse a query log message
- Query execution error
- Authorization failure
- Provision of pre-configured event based alert conditions for DBMSs
- Multiple versions detected
- Missing members in topology
- Provision of pre-configured metric threshold based alert conditions
- High CPU usage (idle cpu usage is less than 2%)
- High memory usage (memory free is less than 1GiB)
- Disk space low (disk free is less than 5GiB)
- High swap usage (swap usage is greater than 5GiB)
- Low page cache hit ratio (page cache hit ratio is less than 0.98/98%)
Notifications for NOM alerts
- Ability to configure an email channel.
- Ability to configure webhooks for specific apps
- Ability to configure a webhook that will accept the generic payload NOM provides.
- Ability to set up notifications per DBMS
- Alert levels can be specified to limit notifications as required.
- Notifications can be sent to the configured webhooks or email.
Aggregated metrics data
- Can now view metrics up to one month ago
- Metrics that exceed 3 days since their capture, are aggregated into one aggregated metric node with a median value, min, and max
- Fixed an issue where NOM was incorrectly storing details for instances with GDS installed. This was causing NOM to be unable to correctly display the topology of the managed estate.
- Threshold alert condition configuration for page cache metrics is incorrectly displayed. The metrics are displayed as % values but are actually stored as a ratio between 0 and 1. As a workaround – users can set the threshold value to a value between 0 and 1 and ignore the % sign (e.g. to set the threshold to 98%, the user would actually need to set the value as 0.98%).
- Threshold alert condition configuration for disk usage metrics incorrectly applies the threshold condition to all ‘Disk Used’ metrics instead of only the one specified in the current chart.
- The search in the ‘Conditions’ table does not search in the ‘Location’ column.
- When upgrading an agent which is running as a service, sometimes the agent will need to be manually restarted for query logs to be streamed to NOM after the upgrade
- Agent screen auto-update can stop working. You may need to refresh the screen to get the latest agent status.