Grad-CAM with PyTorch
PyTorch implementation of Grad-CAM (Gradient-weighted Class Activation Mapping) [1] in image classification. This repository also contains implementations of vanilla backpropagation, guided backpropagation [2], deconvnet [2], and guided Grad-CAM [1], occlusion sensitivity maps [3].
Requirements
Python 2.7 / 3.+
$ pip install click opencv-python matplotlib tqdm numpy
$ pip install "torch>=0.4.1" torchvision
Basic usage
python main.py [DEMO_ID] [OPTIONS]
Demo ID:
Options:
-i
,--image-paths
: image path, which can be provided multiple times (required)-a
,--arch
: a model name fromtorchvision.models
, e.g. "resnet152" (required)-t
,--target-layer
: a module name to be visualized, e.g. "layer4.2" (required)-k
,--topk
: the number of classes to generate (default: 3)-o
,--output-dir
: a directory to store results (default: ./results)--cuda/--cpu
: GPU or CPU
The command above generates, for top k classes:
- Gradients by vanilla backpropagation
- Gradients by guided backpropagation [2]
- Gradients by deconvnet [2]
- Grad-CAM [1]
- Guided Grad-CAM [1]
The guided-* do not support F.relu
but only nn.ReLU
in this codes.
For instance, off-the-shelf inception_v3
cannot cut off negative gradients during backward operation (issue #2).
Demo 1
Generate all kinds of visualization maps given a torchvision model, a target layer, and images.
python main.py demo1 -a resnet152 -t layer4 \
-i samples/cat_dog.png -i samples/vegetables.jpg # You can add more images
Predicted class | #1 boxer | #2 bull mastiff | #3 tiger cat |
---|---|---|---|
Grad-CAM [1] | |||
Vanilla backpropagation | |||
"Deconvnet" [2] | |||
Guided backpropagation [2] | |||
Guided Grad-CAM [1] |
Grad-CAM with different models for "bull mastiff" class
Model | resnet152 |
vgg19 |
vgg19_bn |
densenet201 |
squeezenet1_1 |
---|---|---|---|---|---|
Layer | layer4 |
features |
features |
features |
features |
Grad-CAM [1] |
Demo 2
Generate Grad-CAM maps for "bull mastiff" class, at different layers of ResNet-152 (hardcoded).
python main.py demo2 -i samples/cat_dog.png
Layer | relu |
layer1 |
layer2 |
layer3 |
layer4 |
---|---|---|---|---|---|
Grad-CAM [1] |
Demo 3
Generate the occlusion sensitivity map [1, 3] based on logit scores. The red and blue regions indicate a relative increase and decrease from non-occluded scores respectively: the blue regions are critical!
python main.py demo3 -a resnet152 -i samples/cat_dog.png
Patch size | 10x10 | 15x15 | 25x25 | 35x35 | 45x45 | 90x90 |
---|---|---|---|---|---|---|
"boxer" sensitivity | ||||||
"bull mastiff" sensitivity | ||||||
"tiger cat" sensitivity |
This demo takes much time to compute per-pixel logits.
You can control the resolution by changing sampling stride (--stride
), or increasing batch size as to fit on your GPUs (--n-batches
). The model is wrapped with torch.nn.DataParallel
so that runs on multiple GPUs by default.
References
- R. R. Selvaraju, A. Das, R. Vedantam, M. Cogswell, D. Parikh, and D. Batra. Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization. In ICCV, 2017
- J. T. Springenberg, A. Dosovitskiy, T. Brox, and M. Riedmiller. Striving for Simplicity: The All Convolutional Net. arXiv, 2014
- M. D. Zeiler, R. Fergus. Visualizing and Understanding Convolutional Networks. In ECCV, 2013