Eliminating Repeating Alerts
Learn how to eliminate repeating alerts in the given data.
In the previous lesson, we eliminated potential anomalies with less than 10 entries. Using thresholds, we were able to remove some non-interesting anomalies.
Let’s have a look at the data for status code 400
after applying the threshold:
The first alert happened at 17:59, and a minute later, the z-score was still high with a large number of entries, and so we classified the next rows at 18:00 as an anomaly as well.
If you think of an alerting system, we want to send an alert only when an anomaly first happens. We do not want to send an alert every minute until the z-score comes back below the threshold. In this case, we only want to send one alert at 17:59. We do not want to send another alert a minute later at 18:00.
Let’s remove alerts where the previous period was also classified as an alert:
WITH calculations_over_window AS (SELECTstatus_code,period,entries,AVG(entries) OVER status_window as mean_entries,STDDEV(entries) OVER status_window as stddev_entriesFROMserver_log_summaryWINDOW status_window AS (PARTITION BY status_codeORDER BY periodROWS BETWEEN 60 PRECEDING AND CURRENT ROW)),with_zscore AS (SELECT*,(entries - mean_entries) / NULLIF(stddev_entries::float, 0) as zscoreFROMcalculations_over_window),with_alert AS (SELECT*,entries > 10 AND zscore > 3 AS alertFROMwith_zscore),with_previous_alert AS (SELECT*,LAG(alert) OVER (PARTITION BY status_code ORDER BY period) AS previous_alertFROMwith_alert)SELECTstatus_code,period,entries,zscore,alertFROMwith_previous_alertWHEREalert AND NOT previous_alertORDER BYperiod DESC;
By eliminating alerts that were already triggered we get a very small list of anomalies that may have happened during the day. Looking at the results we can see what anomalies we would have discovered:
- Anomaly in status code
400
at 17:59: We also found that one earlier.
- Anomaly in status code
500
: We spotted this one on the chart when we started.
- Anomaly in status code
404
: This is a hidden anomaly that we did not know about until now.
The query can now be used to fire alerts when it encounters an anomaly.