Similarity and Equivalence
Learn about similarity in complex networks and the idea of equivalence.
We'll cover the following
Similarity is an important concept in data science in general. Being able to compare instances and determine how similar or dissimilar they are opens up a lot of possibilities to improve our analysis and prescriptions.
Defining similarity in graphs
Defining similarity in graphs can come in several shapes. We can define similarity in at least three levels:
Node similarity
Edge similarity
Graph/subgraph similarity
Let’s explore how each one of them can be defined.
Node similarity
Node similarity tries to answer if two nodes are similar in some sort of way. Notice that this is a tricky thing to define. Saying a node is similar to another can have a lot of definitions.
One node can be similar to another one if they have the same centrality measures. Another definition can be that nodes are similar if they’re linked to the same set of other nodes. Yet another way to say that nodes are similar is if the amount of information that passes through them is similar. There is no single definition, but each definition can be useful depending on our objectives.
By having some measure of similarity between nodes, we can try to make some inferences:
If node
is similar to and node likes action movies, maybe node will also like it. If node
is similar to node , maybe they’ll generate an edge between them in the future. If node
has a lot of common friends with node , maybe they know each other and we should recommend they send a friendship request to each other. If node
commits fraudulent activity and node is very similar to him, maybe we should also keep an eye on node for fraudulent activity.
Answering these types of questions can generate a lot of value in our analysis, justifying why we should try to look for similarities between nodes.
Edge similarity
Edge similarity is not so common in complex network analysis. Usually, one looks for similarities between nodes and then estimates if an edge should be created between those nodes.
However, given some specific types of modeling, we can assert some phenomena using edge similarities.
We can say that two edges are similar to each other if they represent, besides the same phenomena, a similar set of characteristics of that phenomena.
For example, if we have a graph of companies and workers, the similarity between the two edges can be related to how similar the work type of those workers at those companies is.
Graph similarity
Another type of similarity that can be defined is graph similarity.
How do we say a graph is similar to another graph? We can say that two graphs are similar if:
They have the same number of nodes and edges.
They have the same distribution of centrality measures.
They are isomorphic, meaning they’re the same graph, with their nodes named differently.
Some insights can be generated when we think in terms of graph similarity for some types of complex networks:
If a given protein can be used to make some medicine, similar proteins can also be tested for new medicines.
If a graph of molecules has some characteristics, such as electric response or magnetic response, similar molecules might also have those characteristics.
Defining similarity between graphs is very important but is less common than finding node similarities.
Let’s explore now the concept of equivalence and how it can be used to assess the similarity of nodes.
Structural equivalence
Structural equivalence is the idea that two nodes should be similar if they share a lot of the same neighbors. So, if nodes
Get hands-on with 1400+ tech skills courses.