The PageRank Algorithm
Learn the PageRank algorithm, an extension to the Katz centrality.
The PageRank algorithm, famously created by Google, is an interesting extension of the Katz centrality. It’s considered a generalization of the Katz centrality to overcome one of its downsides.
The problem with Katz centrality
Let’s think about an example. We have a high-importance node in a network with a relatively high Katz centrality. This node has out-edges going to 10 other nodes. In that same graph, we have another important node with the same Katz centrality as the previous one, but this node only has two out-edges.
If we remember the equation of the Katz centrality, it’s defined as:
In this case, because we’re looking only at out-edges for these nodes, all ten neighbors for the first node and the two neighbors for the second node will have the same Katz centrality.
This raises the question: should they have the same centrality? Wouldn’t it make more sense to have a node that passes its importance to a lot of other nodes to split this importance, not just copy it? This is why the PageRank algorithm was created.
Here, notice that the out-degree of the neighbor is what’s going to influence the importance of our node of interest. In other metrics, such as the eigenvector centrality, this doesn’t happen.
The PageRank algorithm
The PageRank algorithm was usually used to define the importance of web pages that would then be returned by the Google search engine.
Its main intuition is that if a site with a high reputation has a link to it, then it should receive a lot of reputation, too. But if that site has a link to every other site on the internet, then it should not be considered special.
Therefore, the importance of a node that’s connected to a very important node should be smaller if this high-importance neighbor is also pointing to lots of other nodes. This way, it isn’t enough to be connected to an important node. For our node of interest to also be important, it needs to have some exclusivity.
We can verify it using the out-degree of each node, which we are going to define as:
With this value in hands, we can change the calculation of the Katz centrality to take it into account: ...