Gradient-weighted CAM
A gradient-weighted CAM or Grad-CAM is another class activation map algorithm that uses the gradients of the predicted class propagating into the final convolutional layer to weigh the feature maps used for creating maps.
Let's assume that Fl∈Rh×w is the lth feature map obtained from the penultimate convolutional layer (h and w are the height and width of the feature map) and Fl(i,j) represents the activation of the (i,j)th neuron in the lth feature map (total L feature maps in F).
Now we compute the gradient Gl(i,j)=∇Fl(i,j)fk∗(X) of the prediction score fk∗(X) with respect to the feature Fl(i,j). The gradient Gl(i,j)∈Rh×w is then average pooled across height and width to be used as weights in the CAM equation. This looks as follows: