...

/

Graph Convolutional Networks

Graph Convolutional Networks

Discover how Graph Convolutional Networks are conceived: a deep dive into the mathematics behind them.

Types of graph tasks: graph and node classification

We discussed a bit about the input representation. But what about the target (output)? The most basic tasks in graph neural networks are:

  • Graph classification: We have a lot of graphs and we would like to find a single label for each individual graph (similar to image classification). This task is casted as a standard supervised problem. In graphs, you will see the term inductive learning for this task.

  • Node classification: Usually, in this type of task, we have a huge graph (>5000 nodes) and we try to find a label for the nodes (similar to image segmentation). Importantly, we have very few labeled nodes to train the model (for instance <5%). The aim is to predict the missing labels for all the other nodes in the graph. That is why this task is formulated as a semi-supervised learning task or transductive learning equivalently. It is called semi-supervised because even though the nodes do not have labels, we feed the graph (with all the nodes) in the neural network and formulate a supervised loss term for the labeled nodes only.

Next, you will be provided with some minimal theory on how graph data is processed

How are graph convolutions layer formed

Principle: Convolution in the vertex domain is equivalent to multiplication in the graph spectral domain.

The most straightforward implementation of a graph neural network would be something like this:

Y=(AX)WY = (A X) W

Where WW is a trainable parameter and YY is the output.

Why? Because now we have to account not just for the signal XX, but also for the structure/connectivity (AA).

The thing is that A is usually binary and has no interpretation.

By definition, multiplying a graph operator by a graph signal will compute a weighted sum of each node’s neighborhood.

It is amazing that this can be expressed with a simple matrix multiplication!

Instead of A, we would like to have a more expressive operator. In general, when you multiply a graph operator by a graph signal, you get a transformed graph signal.

That is why we use the normalized Laplacian LnormL_{norm} instead of AA. Furthermore, the graph Laplacian L has a direct interpretation. When raised to the power KK, it traverses nodes that are up to pp steps (hops) away, thus introducing the notion of locality.

Formally, by multiplying the signal with a Laplacian power K, LKL^K can be interpreted as expressing the graph constructed from the KK-hops, thus providing the desired localization property (similar to convolutions).

In the simplest case, we have the GCN layer:

Y=LnormXW{Y} = {L}_{norm} {X} {W}

Lnorm=D12(A+I)D12{L}_{norm} = {D}^{-\frac{1}{2}}({A}+{I}){D}^{-\frac{1}{2}}

Notice that a slightly modified version of the Laplacian is used, which helps encounter training instabilities by adding self-loops.

Self-loops are added by adding the identity matrix to the adjacency matrix while recomputing the degree matrix.

In this case, each layer will consider only its direct neighbors since we use the first power of laplacian L1L^1. This is similar to a 3x3 kernel in classical image convolution, wherein we aggregate information from the direct pixel’s neighborhood.

We may extend this idea. Actually, the originally proposed graph convolution used and defined higher powers of the graph Laplacian.

The background theory of spectral graph convolutional networks

Feel free to skip this section if you don’t really care about the underlying math.

The convolution of graph signal XX can be defined in the “spectral” domain.

Spectral basically means that we will utilize the Laplacian eigenvectors. Therefore, convolution in graphs can be ...