For many of us, our first introduction to the union-find data structure (also known as the disjoint-set data structure) comes from seeing a well-known implementation of Kruskal’s algorithm. In many situations, the union-find data structure is used for storing objects that are connected. These objects can be pixels of an image, elements of a grid, or nodes of a network. However, there are many other situations that warrant the use of the union-find data structure. To be able to identify the union-find data structure as an appropriate data structure for a particular problem, we need to keep in mind that the primary use case of the union-find data structure is representing what are known as equivalence classes.
Objects in a set can be related to each other in many different ways. Some of these relationships are so pervasive and useful that they are given special names. In this blog, we talk about one such relation called an equivalence relation. We also discuss how an equivalence relation on a set causes some of its elements to be considered equivalent in some sense. These subsets of equivalent elements are called equivalence classes. We conclude by looking at some of the ways in which these abstract ideas are useful in the real world.
Let’s consider a set of objects. We can express a relationship between some elements of as another set . The set contains ordered pairs of the form , representing that the element of is related to the element of in some sense.
Example 1: Suppose is the set of all individuals. We define a relation on as:
It is the set of all ordered pairs , where and are any individuals that are born in the same month. For instance, both Charles Babbage and Ada Lovelace were born in December, so we have:
But beware! In general, if is in a relation, we may not necessarily have in that relation.
Example 2: Suppose is a set of all people. We define a relation on as:
contains all ordered pairs , where and are individuals such that is a sister of .
Example 3: Let be the set of all people living in the same US state. We define a relation on as:
contains all ordered pairs , where and live in the same US state, and are friends with each other.
All of the above relations are defined on sets containing individuals, but they may very well be defined on other kinds of collections, such as a set of numbers, a collection of virtual machines, a set of variables, and so on. We’ll show four more examples of relations defined on sets of numbers.
Example 4: Let be the set of real numbers. We can define a relation that contains all pairs containing a real number and its additive inverse:
This is the set of all ordered pairs , where and are reals, and is the additive inverse of .
Example 5: We can define an “is greater than” relation as the following relation on the set of reals:
This is the set of all ordered pairs , where and are reals, and is greater than .
Example 6: Suppose is a set of positive integers. We define a relation “is divisible by” on as follows:
This is the set of all ordered pairs , where is divisible by for positive integers and .
Example 7: Suppose is a set of
This relates all natural numbers and that have the same remainder when divided by .
To understand equivalence relations, we first need to understand three other types of relations:
A relation on a set is called reflexive if for each element , the pair .
Example: The relation is a reflexive relation on the set of positive integers because each positive integer divides itself. So, contains , , and so on.
Nonexample: The following relation is not a reflexive relation on the set of reals because it does not contain all pairs of the kind , where is a real:
For instance, it does not contain the pair or .
Notice that we gave a general argument to show that a relation is reflexive. But to show that a relation is not reflexive, a single counterexample is enough.
A relation is called symmetric if for each pair , it’s also the case that .
Example: The following is a symmetric relation on the set of all individuals:
To see this for any two individuals and , if and are born in the same month, then it’s perfectly legitimate to say that and are born in the same month. In other words, if , then
Nonexample: The following relation is not a symmetric relation on the set of positive integers.
To see this, observe that as , but as .
A relation is called transitive if for each and , it’s also the case that .
Example: The following is a transitive relation on the set of all positive integers:
Suppose and . Since , it follows that , and so is a factor of . Similarly, implies that , which implies that is a factor of . As is a factor of , and is a factor of , it follows that is a factor of . In other words, , implying .
Nonexample: The following relation, , is not a transitive relation on the set of all real numbers.
The pairs and belong to . If was transitive, then and would be present in , but they aren’t. So, is not transitive.
An equivalence relation on a set is a relation that is reflexive, symmetric, and transitive. Of all the relations shown at the top of this page, only two are equivalence relations.
Example: The following is an equivalence relation on the set of all individuals.
Test your understanding of equivalence relations:
Is the following an equivalence relation on the set of all natural numbers?
Two elements and in a set are said to be equivalent under an equivalence relation if . For example, under the following equivalence relation on the set of all individuals, all individuals born in December are considered equivalent.
A subset of a set that contains all elements that are equivalent under an equivalence relation is called an equivalence class.
Under the relation defined above, all individuals born in the same month fall into one equivalence class. So, there are 12 equivalence classes corresponding to the 12 months of the year, and each individual falls into exactly one of those 12 classes.
Notice how these equivalence classes are actually disjoint subsets since no person falls into two different equivalence classes. Notice also that since each person belongs to some equivalence class, if we take a union of these disjoint equivalence classes, we get the entire set of individuals.
This observation is of utmost importance because equivalence classes are primarily useful for thinking of a set as being partitioned into these disjoint subsets, with each subset consisting of objects that are deemed equivalent. We express this observation as the following fact:
Fact: Any equivalence relation on a set S imposes a grouping of the elements of into disjoint subsets of , called its equivalence classes, such that a union of these equivalence classes is .
A proof of this fact follows from the definitions.
Notation: In mathematics, equivalence classes are typically represented using a notation , where is an arbitrary element in the equivalence class.
Example: Consider the following equivalence relation:
Since the remainder of any natural number by is an integer in the range to , it follows that there are seven equivalence classes:
Note that all seven equivalence classes here are infinite sets. It would not be incorrect to write as or as because all multiples of belong to , and any number in this equivalence class can equally well serve as its representative.
The idea of equivalence classes gives us a useful abstraction for looking at the world around us. We’ll briefly touch upon a few of the ways in which the notion of equivalence classes has been put to use in different applications of computer science.
An important use case is the applications that use the union-find data structure to either help identify equivalence classes or enable working with equivalence classes. In fact, this is the reason the data structure was invented in the first place: to identify variables that are equivalent in the sense that they represent the same memory locations.
The most significant ways in which the union-find data structure has been effectively used in real-world applications centers around definitions of equivalence relations that can be abstracted as “an entity is connected to entity in a specific way.”
For instance, the connected components of a graph form equivalence classes under the following equivalence relation defined on the set of vertices of a graph as:
Well-known implementations of Kruskal’s algorithm use the union-find data structure to keep track of connected subgraphs that coalesce into a minimum spanning tree.
There are similar applications in image processing. The Matlab function bwlabel
takes a 2-dimensional boolean matrix as an input and returns a matrix containing different labels for each connected structure in the input matrix. The precise definition of “connected” can vary depending on a function parameter. For example, the illustration below shows this for an image that has three 4-connected regions, where two pixels that are adjacent along the horizontal or vertical axis are said to be part of the same 4-connected region.
Each colored region, labeled 1, 2, and 3 in the illustration on the right below, is an equivalence class showing pixels that are 4-connected to each other. The function bwlabel
uses the union-find data structure to identify these. The function can also be used for identifying equivalence classes under other equivalence relations, such as the equivalence relation of belonging to the same
- The coding interview courses in Educative, such as this course, contain lots of interesting problems involving the union-find data structure and equivalence classes.
Many other similar applications require either identifying equivalence classes as their final goal or as a means to an end.
One instance where the concept of equivalence classes is useful is in black-box testing. To test a software the test cases are designed so that they correspond to equivalence classes. Each equivalence class contains elements that are equivalent in the sense that they are expected to elicit a similar response from the system. The idea is to test software more effectively by testing representative data elements from equivalence classes instead of having to test the entire input space. Having said this, figuring out which data falls in the same equivalence class requires skills. This use case is a little more unusual than the other abovementioned use cases because the focus is on identifying equivalence classes based on the requirement specifications and the engineer’s judgment call of what’s considered equivalent rather than beginning with a mathematically crisp definition of what constitutes an equivalence relation.
There are similar instances in experimental works where looking at the search space or a dataset through the lens of equivalence classes has provided some kind of theoretical justification.
One of the takeaways here is what was left unsaid: it’s important to have the basics covered and be well-prepared because one never knows where one simple abstract idea can be applied in so many diverse ways.
That’s all for this blog. We hope this was useful for you!
Free Resources