What is proximity measure for ordinal attributes?

Proximity measures are essential tools in data analysis, specifically for ordinal data (ranked or rated data). They allow us to quantify relationships between data points by tools like Spearman’s rank correlation or Goodman and Kruskal’s gamma coefficient. These measures, crucial for tasks like clustering and classification, reveal valuable patterns and structures by quantifying the proximity or dissimilarity between data points.

Let's understand how to calculate the proximity measure for ordinal attributes using the example below.

Example: proximity measure for ordinal attributes

Suppose we have a table with five ranks, i.e., Excellent, Very Good, Good, Fair, and Poor. For these ranks, we have an ordinal attribute named Test, as given below:

Object Identifier

Test

1

Excellent

2

Good

3

Poor

4

Good

5

Very Good

6

Fair

7

Poor

8

Good

9

Fair

10

Very Good

Step 1: Replace each value for the Test attribute by its rank

For each data point in the dataset, let’s determine its numeric rank based on each value of the Test attribute. We are doing this because it helps maintain the order of the attributes, making it easier to accurately measure the distance or similarity between them. These ranks are assigned in ascending order, starting from 1 as the lowest and incrementing to 5 as the highest. Let’s start doing it:

  • Object 1, has a rank Excellent and obtains a numeric rank value 5 since it is the highest value among the data points.

  • Object 2, has a rank Good, and obtains a numeric rank value 3 since it represents the third highest value in the dataset.

The updated table after assigning ranks looks like this:

Object Identifier

Test

1

5

2

3

3

1

4

3

5

4

6

2

7

1

8

3

9

2

10

4

Step 2: Normalize the ranking

Now that we have assigned ranks to each data point. Next, normalize these ranks so that they fall in the range of 0.0 to 1.0.

We can map ranks with the help of the following formula:

MfM_f represents the total number of rank types or categories in the Test attribute, which is 5 for our case and X is the ordinal numeric value.

Now, using these normalized values for each rank, let’s replace the value of the Test attribute with the normalized ones.

Object Identifier

Test

1

1

2

0.5

3

0

4

0.5

5

0.75

6

0.25

7

0

8

0.5

9

0.25

10

0.75

Step 3: Use Euclidean distance to find the dissimilarity matrix

With the normalized ranks, let’s calculate the dissimilarity between pairs of data points using the Euclidean distance formula. The Euclidean distance between two points (x1) and (x2) in a 1D space is given by:

In our case:

  • Distance between Object 1 and 2:|1 - 0.5| = 0.5

  • Distance between Object 1 and 3:|1 - 0| = 1

  • Distance between Object 1 and 4:|1 - 0.5| = 0.5

  • Distance between Object 1 and 5:|1 - 0.75| = 0.25

  • Distance between Object 1 and 6:|1 - 0.25| = 0.75

  • Distance between Object 1 and 7:|1 - 0| = 1

  • Distance between Object 1 and 8:|1 - 0.5| = 0.5

  • Distance between Object 1 and 9:|1 - 0.25| = 0.75

  • Distance between Object 1 and 10:|1 - 0.75| = 0.25

Note: There’s no need to separately calculate the upper right triangle when calculating the left lower triangle of the dissimilarity matrix, as they are symmetrical.

Similarly, calculate this for the rest of the pairs. The dissimilarity matrix would look like:

For Object 1
For Object 1
1 of 10

As a result, we can observe that:

  • Objects 1 and 3 are the most dissimilar, with a dissimilarity score of 1.00.

  • Objects 3 and 7 are highly similar, with a dissimilarity score of 0.00.

  • Objects 5 and 10 are also highly similar, with a dissimilarity score of 0.00.

  • Objects 1 and 6 are highly dissimilar, with a dissimilarity score of 0.75.

  • Objects 3 and 9 are also moderately similar, with a dissimilarity score of 0.25.

  • Objects 1 and 2 are moderately dissimilar, with a dissimilarity score of 0.50.

  • Objects 2 and 4 are also highly similar, with a dissimilarity score of 0.00.

In conclusion, calculating dissimilarity matrices using appropriate proximity measures for ordinal attributes is instrumental in revealing patterns within ranked data.

Free Resources

Copyright ©2024 Educative, Inc. All rights reserved