Release Date: 30 May 2023

Highlights (1.6.2)

Overview

This release introduces user configurable metric alerting, along with a set of standard system alerts. Alerts can also be sent as notifications to email or webhooks. It also includes support for extended metric retention (30 days) along with a number of additional small enhancements and bug fixes.

New Intelligent Monitoring framework

  • Ability to view all alert conditions for a DBMS.
  • Ability to enable, disable and delete conditions as required.
  • Ability to create custom alert conditions based on thresholds for metrics. This is done via the charts on the Metrics dashboard.
  • Provision of pre-configured event based alert conditions for Agents
    • Unreachable/Offline agents detected
    • Incompatible agent connected
    • Agent update failed
    • Failed to parse config
    • Metric collection failed
    • Failed read configuration
  • Provision of pre-configured event based alert conditions for Instances
    • Offline instances detected
    • Metric collection failed
    • Bloom license error
    • Could not connect to query log streaming
    • Query log buffer overflow
    • Failed to parse a query log message
    • Query execution error
    • Authorization failure
  • Provision of pre-configured event based alert conditions for DBMSs
    • Multiple versions detected
    • Missing members in topology
    • Provision of pre-configured metric threshold based alert conditions
      • High CPU usage (idle cpu usage is less than 2%)
      • High memory usage (memory free is less than 1GiB)
      • Disk space low (disk free is less than 5GiB)
      • High swap usage (swap usage is greater than 5GiB)
      • Low page cache hit ratio (page cache hit ratio is less than 0.98/98%)

Notifications for NOM alerts

  • Ability to configure an email channel.
  • Ability to configure webhooks for specific apps
    • Slack
    • Discord
    • Teams
  • Ability to configure a webhook that will accept the generic payload NOM provides.
  • Ability to set up notifications per DBMS
    • Alert levels can be specified to limit notifications as required.
    • Notifications can be sent to the configured webhooks or email.
 

 Aggregated metrics data 

  • Can now view metrics up to one month ago
  • Metrics that exceed 3 days since their capture, are aggregated into one aggregated metric node with a median value, min, and max

Fixes

  • Fixed an issue where NOM was incorrectly storing details for instances with GDS installed. This was causing NOM to be unable to correctly display the topology of the managed estate. 

Known Issues

  • Threshold alert condition configuration for page cache metrics is incorrectly displayed. The metrics are displayed as % values but are actually stored as a ratio between 0 and 1. As a workaround – users can set the threshold value to a value between 0 and 1 and ignore the % sign (e.g. to set the threshold to 98%, the user would actually need to set the value as 0.98%). 
  • Threshold alert condition configuration for disk usage metrics incorrectly applies the threshold condition to all ‘Disk Used’ metrics instead of only the one specified in the current chart. 
  • The search in the ‘Conditions’ table does not search in the ‘Location’ column. 
  • When upgrading an agent which is running as a service, sometimes the agent will need to be manually restarted for query logs to be streamed to NOM after the upgrade
  • Agent screen auto-update can stop working. You may need to refresh the screen to get the latest agent status.