Implementation of a GCN

Develop and train a Graph Convolutional Network (GCN) in Pytorch.

Implementing a 1-hop GCN layer in Pytorch

For this tutorial, we will train a simple 1-hop GCN layer in a small graph dataset. We will use the open-source graph data (you can find the link in the appendix) from the University of Dortmund. In particular, we will play with the MUTAG dataset because it is small enough to train something to solidify your understanding.

MUTAG dataset characteristics

Each node contains a label from 0 to 6 which will be used as a one-hot-encoding feature vector. From the 188 graphs nodes, we will use 150 for training and the rest for validation. Finally, we have two classes.

The goal is to demonstrate that graph neural networks are a great fit for such data. You can find the data-loading part as well as the training loop code in the notebook. Here, they are omitted them for clarity. You are instead shown the result in terms of accuracy.

But first things first. As an exercise, you will need to build a simple Graph Convolutional Layer that we will incorporate into our network.

Our GCN layer will be defined by the following equations:

Y=LnormXW{Y} = {L}_{norm} {X} {W}

Lnorm=D12(A+I)D12{L}_{norm} = {D}^{-\frac{1}{2}}({A}+{I}){D}^{-\frac{1}{2}}

Once you build the GCN layer, you can test your implementation by training a 3-layer custom graph neural network on the MUTAG dataset. The code for loading the data and for the custom GNN is provided by us. All you have to do is build the GCN layer. And of course, play with the code, try different hyperparameters, and model architectures.

For comparison, here are the results of training as it runs on our local PC:

Epoch: 000, Train Acc: 61.3333, Val Acc: 68.4211
Epoch: 010, Train Acc: 66.0000, Val Acc: 68.4211
Epoch: 020, Train Acc: 66.0000, Val Acc: 68.4211
Epoch: 030, Train Acc: 66.6667, Val Acc: 71.0526
Epoch: 040, Train Acc: 71.3333, Val Acc: 71.0526
Epoch: 050, Train Acc: 72.6667, Val Acc: 65.7895
Epoch: 060, Train Acc: 74.6667, Val Acc: 68.4211
Epoch: 070, Train Acc: 73.3333, Val Acc: 71.0526
Epoch: 080, Train Acc: 72.6667, Val Acc: 68.4211
Epoch: 090, Train Acc: 74.0000, Val Acc: 65.7895
Epoch: 100, Train Acc: 75.3333, Val Acc: 73.6842
Epoch: 110, Train Acc: 74.0000, Val Acc: 73.6842
Epoch: 120, Train Acc: 77.3333, Val Acc: 71.0526
Epoch: 130, Train Acc: 76.6667, Val Acc: 71.0526
Epoch: 140, Train Acc: 77.3333, Val Acc: 73.6842
Epoch: 150, Train Acc: 78.0000, Val Acc: 73.6842
Epoch: 160, Train Acc: 75.3333, Val Acc: 73.6842
Epoch: 170, Train Acc: 72.6667, Val Acc: 71.0526
Epoch: 180, Train Acc: 75.3333, Val Acc: 71.0526
Epoch: 190, Train Acc: 76.0000, Val Acc: 76.3158
Epoch: 200, Train Acc: 75.3333, Val Acc: 63.1579
Epoch: 210, Train Acc: 78.0000, Val Acc: 65.7895
Epoch: 220, Train Acc: 80.0000, Val Acc: 73.6842
Epoch: 230, Train Acc: 76.6667, Val Acc: 65.7895
Epoch: 240, Train Acc: 75.3333, Val Acc: 71.0526

Finished training. Best obtained val score: 78.9474

Insight: It may sound counter-intuitive and obscure, but the adjacency matrix is used in all the graph convolutional layers of the architecture. This gives graph neural networks a strong inductive bias to respect the initial graph structure in all their layers.

A side note before you start coding:

Practical issues when dealing with graphs

One of the practical problems with graph data is the varying number of nodes that makes batching graph data difficult. The solution?

Block-diagonal matrices:

Block diagonal matrices for input batching
Block diagonal matrices for input batching n

This matrix operation actually gives us a bigger graph with batch size (2 here) non-connected components (subgraphs). You can guess what will happen to the eigenvalues.

Since this part was purposely avoided in this lesson, a batch size of 1 has been used.

Another issue, especially for graph classification, is how to produce a single label when you have the node embeddings. Node embeddings refer to the transformed learned feature representation for each node (X1’ and X2’ in the figure above). This is called the readout layer in the graph literature. Basically, we will aggregate the node representations. Of course, there exists a lot of “readout” layers, but the most common is to use a mean or max operation on the node embeddings.

Other misleading experimental results may be the instability in training compared with classical CNN in images that produce super smooth curves. Don’t worry, it is extremely rare that you will get a textbook’s training curve!

You are ready now:

Get hands-on with 1300+ tech skills courses.