Similarity and Equivalence

Learn about similarity in complex networks and the idea of equivalence.

Similarity is an important concept in data science in general. Being able to compare instances and determine how similar or dissimilar they are opens up a lot of possibilities to improve our analysis and prescriptions.

Defining similarity in graphs

Defining similarity in graphs can come in several shapes. We can define similarity in at least three levels:

  • Node similarity

  • Edge similarity

  • Graph/subgraph similarity

Let’s explore how each one of them can be defined.

Node similarity

Node similarity tries to answer if two nodes are similar in some sort of way. Notice that this is a tricky thing to define. Saying a node is similar to another can have a lot of definitions.

One node can be similar to another one if they have the same centrality measures. Another definition can be that nodes are similar if they’re linked to the same set of other nodes. Yet another way to say that nodes are similar is if the amount of information that passes through them is similar. There is no single definition, but each definition can be useful depending on our objectives.

By having some measure of similarity between nodes, we can try to make some inferences:

  • If node AA is similar to BB and node AA likes action movies, maybe node BB will also like it.

  • If node XX is similar to node YY, maybe they’ll generate an edge between them in the future.

  • If node ZZ has a lot of common friends with node LL, maybe they know each other and we should recommend they send a friendship request to each other.

  • If node AA ...