Z-score is a numeric measurement that identifies how far a data point is from the mean. It is measured in terms of the standard deviation.
The z-score is calculated as follows:
Z-scores can be a vital tool for statisticians and developers alike.
Through z-scores, developers can easily detect anomalies within our dataset.
The following points allow us to extract more information from a z-score:
By using the z-score of a particular data point, we can measure how close or far the point is from our mean. By setting a range of acceptable z-scores, we can identify the anomalies as the points that lie outside of our acceptable range( e.g., ).
A range of means that we will be considering points that are one standard deviation from our mean (as acceptable). All other points will be anomalies or outliers.
Let’s consider the following dataset:
[2, 3, 5, 4, 7, 19, 6, 4, 3, 6]
First, we will calculate the mean and standard deviation of our dataset. These come out as:
5.9
4.6
Now, we will proceed to calculate the z-scores using the formula above.
Data point | z-score |
2 | -0.8 |
3 | -0.6 |
5 | -0.1 |
4 | -0.4 |
7 | 0.2 |
19 | 2.8 |
6 | 0.02 |
4 | -0.4 |
3 | -0.6 |
6 | 0.02 |
From the table, we can easily identify that data point 19
has the highest z-score. Hence, the point can be considered an anomaly with a z-score of 2.8
. The point lies 2.8 standard deviations beyond the mean.
Note:
The z-score may also be referred to as Standard Score.
Free Resources