Reading-List

Reading list on deep learning.

Basic Network and Techniques

AlexNet: MLA Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. "Imagenet classification with deep convolutional neural networks." Advances in neural information processing systems. 2012. ⭐⭐⭐⭐⭐
Dropout: Srivastava, Nitish, et al. "Dropout: a simple way to prevent neural networks from overfitting." Journal of Machine Learning Research 15.1 (2014): 1929-1958. ⭐⭐⭐⭐
VGG: Simonyan, Karen, and Andrew Zisserman. "Very deep convolutional networks for large-scale image recognition." arXiv preprint arXiv:1409.1556 (2014). ⭐⭐⭐⭐⭐
GoogLeNet: Szegedy, Christian, et al. "Going deeper with convolutions." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2015. ⭐⭐⭐⭐⭐
Batch Normalization: Ioffe, Sergey, and Christian Szegedy. "Batch normalization: Accelerating deep network training by reducing internal covariate shift." arXiv preprint arXiv:1502.03167 (2015). [Inception v2] ⭐⭐⭐⭐⭐
PReLU & msra Initilization: He, Kaiming, et al. "Delving deep into rectifiers: Surpassing human-level performance on imagenet classification." Proceedings of the IEEE international conference on computer vision. 2015. ⭐⭐⭐⭐⭐
InceptionV3: Szegedy, Christian, et al. "Rethinking the inception architecture for computer vision." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016. ⭐⭐⭐⭐
ResNet: He, Kaiming, et al. "Deep residual learning for image recognition." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016. ⭐⭐⭐⭐⭐
Identity ResNet: He, Kaiming, et al. "Identity mappings in deep residual networks." European Conference on Computer Vision. Springer International Publishing, 2016. ⭐⭐⭐⭐⭐
CReLU: Shang, Wenling, et al. "Understanding and improving convolutional neural networks via concatenated rectified linear units." Proceedings of the International Conference on Machine Learning (ICML). 2016. ⭐⭐⭐
InceptionV4 & Inception-ResNet: Szegedy, Christian, et al. "Inception-v4, inception-resnet and the impact of residual connections on learning." arXiv preprint arXiv:1602.07261 (2016). ⭐⭐⭐⭐
ResNeXt: Xie, Saining, et al. "Aggregated residual transformations for deep neural networks." arXiv preprint arXiv:1611.05431 (2016). ⭐⭐⭐⭐
Batch Renormalization: Ioffe, Sergey. "Batch Renormalization: Towards Reducing Minibatch Dependence in Batch-Normalized Models." arXiv preprint arXiv:1702.03275 (2017). ⭐⭐⭐⭐
Xception: Chollet, François. "Xception: Deep Learning with Depthwise Separable Convolutions." arXiv preprint arXiv:1610.02357 (2016). ⭐⭐⭐
MobileNets: Howard, Andrew G., et al. "Mobilenets: Efficient convolutional neural networks for mobile vision applications." arXiv preprint arXiv:1704.04861 (2017). ⭐⭐⭐
DenseNet: Huang, Gao, et al. "Densely connected convolutional networks." arXiv preprint arXiv:1608.06993 (2016). ⭐⭐⭐⭐⭐
PolyNet: Zhang, Xingcheng, et al. "Polynet: A pursuit of structural diversity in very deep networks." arXiv preprint arXiv:1611.05725 (2016). Slides ⭐⭐⭐⭐
IRNN: Le, Quoc V., Navdeep Jaitly, and Geoffrey E. Hinton. "A simple way to initialize recurrent networks of rectified linear units." arXiv preprint arXiv:1504.00941 (2015). ⭐⭐⭐
ReNet: Visin, Francesco, et al. "ReNet: A recurrent neural network based alternative to convolutional networks." arXiv preprint arXiv:1505.00393 (2015). ⭐⭐⭐⭐
Non-local Neural Network: Wang, Xiaolong, Ross Girshick, Abhinav Gupta, and Kaiming He. "Non-local Neural Networks." arXiv preprint arXiv:1711.07971 (2017). ⭐⭐⭐⭐
Group Normalization: Wu, Yuxin, and Kaiming He. "Group normalization." In ECCV (2018). ⭐⭐⭐⭐⭐
SENet: Hu, Jie, Li Shen, and Gang Sun. "Squeeze-and-excitation networks."In CVPR (2018). ⭐⭐⭐⭐⭐
Rethinking ImageNet Pre-training： He, Kaiming, Ross Girshick, and Piotr Dollár. "Rethinking ImageNet Pre-training." arXiv preprint arXiv:1811.08883 (2018). ⭐⭐⭐⭐
CBAM： Woo, Sanghyun, et al. "CBAM: Convolutional block attention module." Proceedings of the European Conference on Computer Vision (ECCV). 2018. ⭐⭐⭐⭐
Network generator: Saining Xie, Alexander Kirillov, Ross Girshick, Kaiming He. Exploring Randomly Wired Neural Networks for Image Recognition. arXiv:1904.01569 (2019). ⭐⭐⭐⭐⭐
GCNet: Cao, Yue, et al. "GCNet: Non-local Networks Meet Squeeze-Excitation Networks and Beyond." arXiv preprint arXiv:1904.11492 (2019). ⭐⭐⭐⭐
SqueezeNet: Forrest N. Iandola, etal. SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5MB model size. In ICLR, 2017. ⭐⭐⭐⭐
Dynamic Filter: Jia, Xu, et al. "Dynamic filter networks." Advances in neural information processing systems. 2016. ⭐⭐⭐⭐
CondConv: Yang, Brandon, et al. "Condconv: Conditionally parameterized convolutions for efficient inference." Advances in Neural Information Processing Systems. 2019. ⭐⭐⭐⭐
SimSiam: Chen, X., & He, K. (2020). Exploring Simple Siamese Representation Learning. In CVPR 2021. ⭐⭐⭐⭐
CycleMLP: Chen, S., Xie, E., Ge, C., Liang, D., & Luo, P. (2021). CycleMLP: A MLP-like Architecture for Dense Prediction. arXiv preprint arXiv:2107.10224. ⭐⭐⭐⭐
EfficientNet: Tan, Mingxing, and Quoc Le. "EfficientNet: Rethinking model scaling for convolutional neural networks." International Conference on Machine Learning. PMLR, 2019. ⭐⭐⭐⭐
ConvNeXt: Liu, Z., Mao, H., Wu, C. Y., Feichtenhofer, C., Darrell, T., & Xie, S. (2022). A ConvNet for the 2020s. arXiv preprint arXiv:2201.03545 ⭐⭐⭐⭐⭐
CoAtNet: Dai, Z., Liu, H., Le, Q., & Tan, M. (2021). CoAtNet: Marrying convolution and attention for all data sizes. Advances in Neural Information Processing Systems, 34. ⭐⭐⭐⭐⭐
Large_Kernel: Ding, X., Zhang, X., Zhou, Y., Han, J., Ding, G., & Sun, J. (2022). Scaling Up Your Kernels to 31x31: Revisiting Large Kernel Design in CNNs. arXiv preprint arXiv:2203.06717. ⭐⭐⭐⭐⭐
MPViT: Lee, Y., Kim, J., Willette, J., & Hwang, S. J. MPViT: Multi-Path Vision Transformer for Dense Prediction. In CVPR 2022. ⭐⭐⭐
Deformable Attention: Xia, Z., Pan, X., Song, S., Li, L. E., & Huang, G. (2022). Vision Transformer with Deformable Attention. arXiv preprint arXiv:2201.00520. ⭐⭐⭐⭐
EfficientNet: Tan, Mingxing, and Quoc Le. "EfficientNet: Rethinking model scaling for convolutional neural networks." International conference on machine learning. PMLR, 2019. ⭐⭐⭐⭐⭐
HaloNets: Vaswani, Ashish, et al. "Scaling local self-attention for parameter efficient visual backbones." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2021. ⭐⭐⭐⭐
SLaK: Liu S, Chen T, Chen X, et al. More ConvNets in the 2020s: Scaling up Kernels Beyond 51x51 using Sparsity[J]. arXiv preprint arXiv:2207.03620, 2022. ⭐⭐⭐⭐
MetaFormer: Yu, Weihao, et al. "Metaformer is actually what you need for vision." In CVPR. 2022. ⭐⭐⭐
Resnet strikes back: Wightman, R., Touvron, H., & Jégou, H. (2021). Resnet strikes back: An improved training procedure in timm. arXiv preprint arXiv:2110.00476. ⭐⭐⭐
VSA: Zhang, Qiming, et al. "VSA: Learning Varied-Size Window Attention in Vision Transformers." arXiv preprint arXiv:2204.08446 (2022). ⭐⭐⭐⭐

Object Detection

Overfeat: Sermanet, Pierre, et al. "Overfeat: Integrated recognition, localization and detection using convolutional networks." arXiv preprint arXiv:1312.6229 (2013). ⭐⭐⭐⭐
RCNN: Girshick, Ross, et al. "Rich feature hierarchies for accurate object detection and semantic segmentation." Proceedings of the IEEE conference on computer vision and pattern recognition. 2014. ⭐⭐⭐⭐⭐
SPP: He, Kaiming, et al. "Spatial pyramid pooling in deep convolutional networks for visual recognition." European Conference on Computer Vision. Springer International Publishing, 2014. ⭐⭐⭐⭐⭐
Fast RCNN: Girshick, Ross. "Fast r-cnn." Proceedings of the IEEE International Conference on Computer Vision. 2015. ⭐⭐⭐⭐⭐
Faster RCNN: Ren, Shaoqing, et al. "Faster r-cnn: Towards real-time object detection with region proposal networks." Advances in neural information processing systems. 2015. ⭐⭐⭐⭐⭐
R-CNN minus R: Lenc, Karel, and Andrea Vedaldi. "R-cnn minus r." arXiv preprint arXiv:1506.06981 (2015). ⭐
End-to-end people detection in crowded scenes: Stewart, Russell, Mykhaylo Andriluka, and Andrew Y. Ng. "End-to-end people detection in crowded scenes." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016. ⭐⭐
YOLO: Redmon, Joseph, et al. "You only look once: Unified, real-time object detection." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016. ⭐⭐⭐⭐⭐
ION: Bell, Sean, et al. "Inside-outside net: Detecting objects in context with skip pooling and recurrent neural networks." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016. ⭐⭐⭐⭐
MultiPath: Zagoruyko, Sergey, et al. "A multipath network for object detection." arXiv preprint arXiv:1604.02135 (2016). ⭐⭐⭐
SSD: Liu, Wei, et al. "SSD: Single shot multibox detector." European Conference on Computer Vision. Springer International Publishing, 2016. ⭐⭐⭐⭐⭐
OHEM: Shrivastava, Abhinav, Abhinav Gupta, and Ross Girshick. "Training region-based object detectors with online hard example mining." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016. ⭐⭐⭐⭐⭐
HyperNet: Kong, Tao, et al. "HyperNet: towards accurate region proposal generation and joint object detection." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016. ⭐⭐⭐⭐
SDP: Yang, Fan, Wongun Choi, and Yuanqing Lin. "Exploit all the layers: Fast and accurate cnn object detector with scale dependent pooling and cascaded rejection classifiers." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016. ⭐⭐⭐⭐
SubCNN: Xiang, Yu, et al. "Subcategory-aware convolutional neural networks for object proposals and detection." Applications of Computer Vision (WACV), 2017 IEEE Winter Conference on. IEEE, 2017. ⭐⭐⭐
MSCNN: Cai, Zhaowei, et al. "A unified multi-scale deep convolutional neural network for fast object detection." European Conference on Computer Vision. Springer International Publishing, 2016. ⭐⭐⭐⭐
RFCN: Li, Yi, Kaiming He, and Jian Sun. "R-fcn: Object detection via region-based fully convolutional networks." Advances in Neural Information Processing Systems. 2016. ⭐⭐⭐⭐⭐
Shallow Network: Ashraf, Khalid, et al. "Shallow networks for high-accuracy road object-detection." arXiv preprint arXiv:1606.01561 (2016). ⭐⭐
Is Faster R-CNN Doing Well for Pedestrian Detection: Zhang, Liliang, et al. "Is Faster R-CNN Doing Well for Pedestrian Detection?." European Conference on Computer Vision. Springer International Publishing, 2016. ⭐⭐
GCNN: Najibi, Mahyar, Mohammad Rastegari, and Larry S. Davis. "G-cnn: an iterative grid based object detector." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016. ⭐⭐⭐
LocNet: Gidaris, Spyros, and Nikos Komodakis. "Locnet: Improving localization accuracy for object detection." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016. ⭐⭐⭐
PVANet: Kim, Kye-Hyeon, et al. "PVANET: Deep but Lightweight Neural Networks for Real-time Object Detection." arXiv preprint arXiv:1608.08021 (2016). ⭐⭐⭐⭐
FPN: Lin, Tsung-Yi, et al. "Feature Pyramid Networks for Object Detection." arXiv preprint arXiv:1612.03144 (2016). ⭐⭐⭐⭐⭐
TDM: Shrivastava, Abhinav, et al. "Beyond Skip Connections: Top-Down Modulation for Object Detection." arXiv preprint arXiv:1612.06851 (2016). ⭐⭐⭐⭐
YOLO9000: Redmon, Joseph, and Ali Farhadi. "YOLO9000: Better, Faster, Stronger." arXiv preprint arXiv:1612.08242 (2016). ⭐⭐⭐⭐
Speed/accuracy trade-offs for modern convolutional object detectors: Huang, Jonathan, et al. "Speed/accuracy trade-offs for modern convolutional object detectors." arXiv preprint arXiv:1611.10012 (2016). ⭐⭐
GDB-Net: Zeng, Xingyu, et al. "Crafting GBD-Net for Object Detection." arXiv preprint arXiv:1610.02579 (2016). Slides ⭐⭐⭐⭐
WRInception: Lee, Youngwan, et al. "Wide-Residual-Inception Networks for Real-time Object Detection." arXiv preprint arXiv:1702.01243 (2017). ⭐
DSSD: Fu, Cheng-Yang, et al. "DSSD: Deconvolutional Single Shot Detector." arXiv preprint arXiv:1701.06659 (2017). ⭐⭐⭐⭐
A-Fast-RCNN (Hard positive generation): Wang, Xiaolong, Abhinav Shrivastava, and Abhinav Gupta. "A-fast-rcnn: Hard positive generation via adversary for object detection." arXiv preprint arXiv:1704.03414 (2017). ⭐⭐⭐ code
RRC: Ren, Jimmy, et al. "Accurate Single Stage Detector Using Recurrent Rolling Convolution." arXiv preprint arXiv:1704.05776 (2017). ⭐⭐⭐
Deformable ConvNets: Dai, Jifeng, et al. "Deformable Convolutional Networks." arXiv preprint arXiv:1703.06211 (2017). ⭐⭐⭐⭐
RSSD: Jeong, Jisoo, Hyojin Park, and Nojun Kwak. "Enhancement of SSD by concatenating feature maps for object detection." arXiv preprint arXiv:1705.09587 (2017). ⭐⭐
Perceptual GAN: Li, Jianan, et al. "Perceptual Generative Adversarial Networks for Small Object Detection." arXiv preprint arXiv:1706.05274 (2017). ⭐⭐⭐
RetinaNet (Focal Loss): Tsung-Yi Lin, Priya Goyal, Ross B. Girshick, Kaiming He, and Piotr Dollár. "Focal Loss for Dense Object Detection." In ICCV. 2017. ⭐⭐⭐⭐⭐
YOLOv3: Redmon, Joseph, and Ali Farhadi. "YOLOv3: An Incremental Improvement." arXiv preprint arXiv:1804.02767 (2018). ⭐⭐⭐
Domain Adaptive Faster R-CNN: Chen, Yuhua, et al. "Domain adaptive faster r-cnn for object detection in the wild." In CVPR, 2018. ⭐⭐⭐⭐
OMNIA Faster R-CNN： Rame, Alexandre, et al. "OMNIA Faster R-CNN: Detection in the wild through dataset merging and soft distillation." arXiv preprint arXiv:1812.02611 (2018). [Omni-Supervised across different datasets for object detection] ⭐⭐⭐⭐
Libra R-CNN: Pang, J., Chen, K., Shi, J., Feng, H., Ouyang, W., & Lin, D. (2019). Libra R-CNN: Towards Balanced Learning for Object Detection. arXiv preprint arXiv:1904.02701. ⭐⭐⭐⭐
FCOS: Tian, Zhi, et al. "FCOS: Fully Convolutional One-Stage Object Detection." arXiv preprint arXiv:1904.01355 (2019). ⭐⭐⭐⭐⭐
POTO: Prediction-aware OneTo-One (POTO) label assignment: Wang, J., Song, L., Li, Z., Sun, H., Sun, J., & Zheng, N. (2020). End-to-end object detection with fully convolutional network. arXiv preprint arXiv:2012.03544. ⭐⭐⭐⭐
FaPN: FaPN: Feature-aligned Pyramid Network for Dense Image Prediction. Shihua Huang etal. 2021. arXiv preprint arXiv:2021.07058. ⭐⭐⭐
DETR: Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., & Zagoruyko, S. End-to-end object detection with transformers. In ECCV, 2020. ⭐⭐⭐⭐⭐
Sparse R-CNN: Sun, P., Zhang, R., Jiang, Y., Kong, T., Xu, C., Zhan, W., ... & Luo, P. Sparse r-cnn: End-to-end object detection with learnable proposals. In CVPR (pp. 14454-14463), 2021. ⭐⭐⭐⭐⭐

Semantic Segmentation

FCN: Long, Jonathan, Evan Shelhamer, and Trevor Darrell. "Fully convolutional networks for semantic segmentation." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2015. ⭐⭐⭐⭐⭐
Deconvolution Network for Segmentation: Noh, Hyeonwoo, Seunghoon Hong, and Bohyung Han. "Learning deconvolution network for semantic segmentation." Proceedings of the IEEE International Conference on Computer Vision. 2015. ⭐⭐⭐
U-Net: Ronneberger, Olaf, Philipp Fischer, and Thomas Brox. "U-net: Convolutional networks for biomedical image segmentation." International Conference on Medical image computing and computer-assisted intervention. Springer, Cham, 2015. ⭐⭐⭐⭐⭐
CRF as RNN: Zheng, Shuai, et al. "Conditional random fields as recurrent neural networks." In ICCV. 2015. ⭐⭐⭐⭐
PSPNet: Zhao, Hengshuang, et al. "Pyramid scene parsing network." arXiv preprint arXiv:1612.01105 (2016). ⭐⭐⭐
Deeplab v1v2: Chen, Liang-Chieh, et al. "Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs." IEEE transactions on pattern analysis and machine intelligence 40.4 (2018): 834-848. ⭐⭐⭐⭐⭐
Deeplab v3: Chen, Liang-Chieh, et al. "Rethinking atrous convolution for semantic image segmentation." arXiv preprint arXiv:1706.05587 (2017). ⭐⭐⭐
Deeplab v3+: Chen, Liang-Chieh, et al. "Encoder-decoder with atrous separable convolution for semantic image segmentation." arXiv preprint arXiv:1802.02611 (2018). ⭐⭐⭐
PSANet: Zhao, Hengshuang, et al. "PSANet: Point-wise Spatial Attention Network for Scene Parsing." Proceedings of the European Conference on Computer Vision (ECCV). 2018. ⭐⭐⭐⭐ [good summary of context information]
OCNet: Yuan, Yuhui, and Jingdong Wang. "OCNet: Object Context Network for Scene Parsing." arXiv preprint arXiv:1809.00916 (2018). ⭐⭐⭐
ReSeg: Visin, Francesco, et al. "Reseg: A recurrent neural network-based model for semantic segmentation." In CVPR Workshops. 2016. ⭐⭐
CCNet: Huang, Zilong, et al. "CCNet: Criss-Cross Attention for Semantic Segmentation." arXiv preprint arXiv:1811.11721 (2018). ⭐⭐⭐
Depth-aware CNN: Wang, Weiyue, and Ulrich Neumann. "Depth-aware CNN for RGB-D Segmentation." In ECCV, 2018. ⭐⭐⭐⭐⭐
DFANet: Li, H., Xiong, P., Fan, H., & Sun, J. (2019). DFANet: Deep Feature Aggregation for Real-Time Semantic Segmentation. arXiv preprint arXiv:1904.02216. ⭐⭐
DADA: Vu, Tuan-Hung, et al. "DADA: Depth-aware Domain Adaptation in Semantic Segmentation." arXiv preprint arXiv:1904.01886 (2019). ⭐⭐⭐⭐
CFNet： Zhang, Hang, et al. "Co-Occurrent Features in Semantic Segmentation." In CVPR, 2019. ⭐⭐⭐
PointRend Kirillov, A., Wu, Y., He, K., & Girshick, R. (2019). PointRend: Image Segmentation as Rendering. arXiv preprint arXiv:1912.08193. ⭐⭐⭐⭐
Trans2Seg: Xie, E., Wang, W., Wang, W., Sun, P., Xu, H., Liang, D., & Luo, P. (2021). Segmenting transparent object in the wild with transformer. arXiv preprint arXiv:2101.08461. ⭐⭐⭐⭐
Swin-Unet: Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., & Wang, M. (2021). Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation. arXiv preprint arXiv:2105.05537. ⭐⭐⭐⭐
SegFormer: Xie, E., Wang, W., Yu, Z., Anandkumar, A., Alvarez, J. M., & Luo, P. (2021). SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers. arXiv preprint arXiv:2105.15203. ⭐⭐⭐⭐

Instance Segmentation

MNC: Dai, Jifeng, Kaiming He, and Jian Sun. "Instance-aware semantic segmentation via multi-task network cascades." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016. ⭐⭐⭐⭐⭐
InstanceFCN: Dai, Jifeng, et al. "Instance-sensitive fully convolutional networks." arXiv preprint arXiv:1603.08678 (2016). ⭐⭐⭐⭐
FCIS: Li, Yi, et al. "Fully convolutional instance-aware semantic segmentation." arXiv preprint arXiv:1611.07709 (2016). ⭐⭐⭐⭐⭐
Mask R-CNN: He, Kaiming, Georgia Gkioxari, Piotr Dollár, and Ross Girshick. "Mask R-CNN." In ICCV. 2017. ⭐⭐⭐⭐⭐
Learning to Segment Every Thing (Mask^X R-CNN): Hu, Ronghang, Piotr Dollár, Kaiming He, Trevor Darrell, and Ross Girshick. "Learning to Segment Every Thing." arXiv preprint arXiv:1711.10370 (2017). ⭐⭐⭐⭐⭐
PANet: Liu, Shu, et al. "Path aggregation network for instance segmentation." arXiv preprint arXiv:1803.01534 (2018). ⭐⭐⭐⭐
Panoptic Segmentation: Kirillov, A., He, K., Girshick, R., Rother, C., & Dollár, P. (2018). Panoptic Segmentation. arXiv preprint arXiv:1801.00868. ⭐⭐⭐⭐
Panoptic FPN: Kirillov, A., Girshick, R., He, K., & Dollár, P. (2019). Panoptic Feature Pyramid Networks. arXiv preprint arXiv:1901.02446. ⭐⭐⭐⭐⭐
Mask Scoring R-CNN: Huang, Z., Huang, L., Gong, Y., Huang, C., & Wang, X. (2019). Mask Scoring R-CNN. arXiv preprint arXiv:1903.00241. ⭐⭐⭐⭐
TensorMask： Chen, X., Girshick, R., He, K., & Dollár, P. (2019). TensorMask: A Foundation for Dense Object Segmentation. arXiv preprint arXiv:1903.12174. ⭐⭐⭐⭐
SSAP: Gao, Naiyu, et al. "SSAP: Single-shot instance segmentation with affinity pyramid." Proceedings of the IEEE International Conference on Computer Vision. 2019. ⭐⭐⭐
EmbedMask: Ying, H., Huang, Z., Liu, S., Shao, T., & Zhou, K. (2019). EmbedMask: Embedding Coupling for One-stage Instance Segmentation. arXiv preprint arXiv:1912.01954. ⭐⭐⭐⭐⭐
CondInst Tian, Z., Shen, C., & Chen, H. (2020). Conditional Convolutions for Instance Segmentation. In ECCV 2020. ⭐⭐⭐⭐⭐
MaskFormer: Cheng, B., Schwing, A. G., & Kirillov, A. (2021). Per-Pixel Classification is Not All You Need for Semantic Segmentation. arXiv preprint arXiv:2107.06278. ⭐⭐⭐⭐⭐ (Semantic+Instance)
SOLQ: Dong, B., Zeng, F., Wang, T., Zhang, X., & Wei, Y. (2021). SOLQ: Segmenting Objects by Learning Queries. arXiv preprint arXiv:2106.02351. ⭐⭐⭐
QueryInst： Yang, S., Fang, Y., Wang, X., Li, Y., Shan, Y., Feng, B., & Liu, W. (2021). Tracking Instances as Queries. arXiv preprint arXiv:2106.11963. ⭐⭐⭐⭐⭐
ISTR： Hu, J., Cao, L., Lu, Y., Zhang, S., Wang, Y., Li, K., ... & Ji, R. (2021). ISTR: End-to-End Instance Segmentation with Transformers. arXiv preprint arXiv:2105.00637. ⭐⭐⭐

Weakly Supervised

Weakly Supervised Object Localization with Multi-fold Multiple Instance Learning: Cinbis, Ramazan Gokberk, Jakob Verbeek, and Cordelia Schmid. "Weakly supervised object localization with multi-fold multiple instance learning." IEEE transactions on pattern analysis and machine intelligence 39.1 (2017): 189-203. ⭐⭐⭐
Weakly Supervised Deep Detection Networks: Bilen, Hakan, and Andrea Vedaldi. "Weakly supervised deep detection networks." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016. ⭐⭐⭐⭐
Weakly- and Semi-Supervised Learning: Papandreou, George, et al. "Weakly-and semi-supervised learning of a deep convolutional network for semantic image segmentation." Proceedings of the IEEE International Conference on Computer Vision. 2015. ⭐⭐⭐⭐
Image-level to pixel-level labeling: Pinheiro, Pedro O., and Ronan Collobert. "From image-level to pixel-level labeling with convolutional networks." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2015.
Weakly Supervised Localization using Deep Feature Maps: Bency, Archith J., et al. "Weakly supervised localization using deep feature maps." arXiv preprint arXiv:1603.00489 (2016).
WELDON: Durand, Thibaut, Nicolas Thome, and Matthieu Cord. "Weldon: Weakly supervised learning of deep convolutional neural networks." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016.
WILDCAT: Durand, Thibaut, et al. "WILDCAT: Weakly Supervised Learning of Deep ConvNets for Image Classification, Pointwise Localization and Segmentation." The IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2017.
SGDL: Lai, Baisheng, and Xiaojin Gong. "Saliency guided dictionary learning for weakly-supervised image parsing." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016.

Unsupervised/Self-supervised

Learning Features by Watching Objects Move: Pathak, Deepak, et al. "Learning Features by Watching Objects Move." arXiv preprint arXiv:1612.06370 (2016). ⭐⭐⭐⭐⭐
SimGAN: Shrivastava, Ashish, et al. "Learning from simulated and unsupervised images through adversarial training." arXiv preprint arXiv:1612.07828 (2016). ⭐⭐⭐
OPN: Lee, Hsin-Ying, et al. "Unsupervised Representation Learning by Sorting Sequences." arXiv preprint arXiv:1708.01246 (2017). ⭐⭐⭐
Transitive Invariance for Self-supervised Visual Representation Learning: Wang, Xiaolong, et al. "Transitive Invariance for Self-supervised Visual Representation Learning" Proceedings of the IEEE International Conference on Computer Vision. 2017. ⭐⭐⭐ code
Omni-Supervised Learning: Radosavovic, I., Dollár, P., Girshick, R., Gkioxari, G., & He, K. Data Distillation: Towards Omni-Supervised Learning. In CVPR, 2018. ⭐⭐⭐⭐⭐
MAE: He, Kaiming, et al. "Masked autoencoders are scalable vision learners." In CVPR 2022. ⭐⭐⭐⭐⭐
SimMIM Xie, Z., Zhang, Z., Cao, Y., Lin, Y., Bao, J., Yao, Z., ... & Hu, H. (2021). Simmim: A simple framework for masked image modeling. arXiv preprint arXiv:2111.09886 ⭐⭐⭐⭐⭐
ConvMAE: Gao, P., Ma, T., Li, H., Dai, J., & Qiao, Y. (2022). ConvMAE: Masked Convolution Meets Masked Autoencoders. arXiv preprint arXiv:2205.03892. ⭐⭐⭐

Semi-supervised

Adversarial Self-Supervised Learning: Si, C., Nie, X., Wang, W., Wang, L., Tan, T., & Feng, J. (2020). Adversarial Self-Supervised Learning for Semi-Supervised 3D Action Recognition. In ECCV 2020. ⭐⭐⭐
Directional Context-Aware Consistency: Lai, Xin, et al. "Semi-Supervised Semantic Segmentation With Directional Context-Aware Consistency." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2021. ⭐⭐⭐⭐
Cross Pseudo Supervision: Chen, X., Yuan, Y., Zeng, G., & Wang, J. (2021). Semi-Supervised Semantic Segmentation with Cross Pseudo Supervision. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 2613-2622). ⭐⭐⭐⭐
CutMix: French, G., Laine, S., Aila, T., & Mackiewicz, M. (2019). Semi-supervised semantic segmentation needs strong, varied perturbations. In BMCV. ⭐⭐⭐
CGT： Ke, Z., Qiu, D., Li, K., Yan, Q., & Lau, R. W. (2020). Guided collaborative training for pixel-wise semi-supervised learning. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XIII 16 (pp. 429-445). Springer International Publishing. ⭐⭐⭐⭐
Robust Mutual Learning: Zhang, P., Zhang, B., Zhang, T., Chen, D., & Wen, F. (2021). Robust Mutual Learning for Semi-supervised Semantic Segmentation. arXiv preprint arXiv:2106.00609. ⭐⭐⭐⭐

Domain Adaptation

Learning from Synthetic Animals: Mu, J., Qiu, W., Hager, G. D., & Yuille, A. L. (2020). Learning from Synthetic Animals. In CVPR 2020. ⭐⭐⭐⭐
CD3A: Kurmi, Vinod Kumar, et al. "Curriculum based dropout discriminator for domain adaptation." arXiv preprint arXiv:1907.10628 (2019). ⭐⭐⭐
Open compound domain adaptation: Liu, Z., Miao, Z., Pan, X., Zhan, X., Lin, D., Yu, S. X., & Gong, B. (2020). Open compound domain adaptation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 12406-12415). ⭐⭐⭐⭐⭐

Domain Generalization

Extrinsic and Intrinsic: Wang, Shujun, et al. "Learning from Extrinsic and Intrinsic Supervisions for Domain Generalization." In ECCV (2020). ⭐⭐⭐⭐
DoFE: Wang, Shujun, et al. "DoFE: Domain-oriented Feature Embedding for Generalizable Fundus Image Segmentation on Unseen Datasets." IEEE Transactions on Medical Imaging (2020). ⭐⭐⭐⭐
Self-Challenging: Huang, Zeyi, et al. "Self-Challenging Improves Cross-Domain Generalization." arXiv preprint arXiv:2007.02454 (2020). ⭐⭐⭐⭐
Generate Novel Domains: Zhou, Kaiyang, et al. "Learning to Generate Novel Domains for Domain Generalization." arXiv preprint arXiv:2007.03304 (2020). ⭐⭐⭐
Jigsaw puzzles: Carlucci, Fabio M., et al. "Domain generalization by solving jigsaw puzzles." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2019. ⭐⭐⭐⭐

Video

Semi-suepervised, memory network: Oh, S. W., Lee, J. Y., Xu, N., & Kim, S. J. (2019). Video object segmentation using space-time memory networks. In Proceedings of the IEEE/CVF International Conference on Computer Vision (pp. 9226-9235). ⭐⭐⭐⭐

Saliency

DHSNet: Liu, Nian, and Junwei Han. "Dhsnet: Deep hierarchical saliency network for salient object detection." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016. ⭐⭐⭐⭐
RFCN: Wang, Linzhao, et al. "Saliency detection with recurrent fully convolutional networks." European Conference on Computer Vision. Springer International Publishing, 2016. ⭐⭐⭐⭐
RACDNN: Kuen, Jason, Zhenhua Wang, and Gang Wang. "Recurrent attentional networks for saliency detection." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016. ⭐⭐⭐⭐
NLDF: Luo, Zhiming, et al. "Non-Local Deep Features for Salient Object Detection." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2017. ⭐⭐⭐
DSS: Hou, Qibin, et al. "Deeply supervised salient object detection with short connections." arXiv preprint arXiv:1611.04849 (2016). ⭐⭐⭐⭐
MSRNet: Li, Guanbin, et al. "Instance-Level Salient Object Segmentation." arXiv preprint arXiv:1704.03604 (2017). ⭐⭐⭐⭐
Amulet: Zhang, Pingping, et al. "Amulet: Aggregating Multi-level Convolutional Features for Salient Object Detection." arXiv preprint arXiv:1708.02001 (2017). ⭐⭐⭐⭐
UCF: Zhang, Pingping, et al. "Learning Uncertain Convolutional Features for Accurate Saliency Detection." arXiv preprint arXiv:1708.02031 (2017). ⭐⭐⭐⭐
SRM: Wang, Tiantian, et al. "A Stagewise Refinement Model for Detecting Salient Objects in Images." In ICCV. 2017. ⭐⭐⭐⭐
S4Net: Fan, Ruochen, et al. "$ S^ 4$ Net: Single Stage Salient-Instance Segmentation." arXiv preprint arXiv:1711.07618 (2017). ⭐⭐⭐⭐⭐
Deep Edge-Aware Saliency Detection： Zhang, Jing, Yuchao Dai, Fatih Porikli, and Mingyi He. "Deep Edge-Aware Saliency Detection." arXiv preprint arXiv:1708.04366 (2017). ⭐⭐⭐
Bi-Directional Message Passing Model: Zhang, Lu, et al. "A Bi-Directional Message Passing Model for Salient Object Detection." In CVPR. 2018. ⭐⭐⭐
PiCANet: Liu, Nian, Junwei Han, and Ming-Hsuan Yang. "PiCANet: Learning Pixel-wise Contextual Attention for Saliency Detection." In CVPR. 2018. ⭐⭐⭐⭐⭐
Detect Globally, Refine Locally: A Novel Approach to Saliency Detection: Wang, Tiantian, et al. "Detect Globally, Refine Locally: A Novel Approach to Saliency Detection." In CVPR. 2018. ⭐⭐⭐
PAGRN： Zhang, Xiaoning, et al. "Progressive Attention Guided Recurrent Network for Salient Object Detection." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018. ⭐⭐⭐
Reverse Attention for Salient Object Detection: Chen, Shuhan, et al. "Reverse Attention for Salient Object Detection." In ECCV, 2018. ⭐⭐
CA-Fuse: Chen, Hao, and Youfu Li. "Progressively Complementarity-Aware Fusion Network for RGB-D Salient Object Detection." In CVPR. 2018. ⭐⭐⭐
SOC dataset: Fan, Deng-Ping, et al. "Salient objects in clutter: Bringing salient object detection to the foreground." In ECCV. 2018. ⭐⭐⭐⭐⭐ [complex dataset + instance level]
DNA: Liu, Yun, et al. "DNA: Deeply-supervised Nonlinear Aggregation for Salient Object Detection." arXiv preprint arXiv:1903.12476 (2019). ⭐⭐⭐
SE2Net： Zhou, S., Wang, J., Wang, F., & Huang, D. SE2Net: Siamese Edge-Enhancement Network for Salient Object Detection. ⭐⭐⭐⭐⭐
PFAN: Zhao, T., & Wu, X. (2019). Pyramid Feature Selective Network for Saliency detection. In CVPR 2019. ⭐⭐
PoolNet: Liu, Jiang-Jiang, et al. "A Simple Pooling-Based Design for Real-Time Salient Object Detection." In CVPR 2019. ⭐⭐⭐⭐

Attention

SRN: Zhu, Feng, et al. "Learning Spatial Regularization with Image-level Supervisions for Multi-label Image Classification." arXiv preprint arXiv:1702.05891 (2017). ⭐⭐⭐⭐
Zoom-in-Net: Wang, Zhe, et al. "Zoom-in-Net: Deep Mining Lesions for Diabetic Retinopathy Detection." arXiv preprint arXiv:1706.04372 (2017). ⭐⭐⭐⭐
Multi-context attention: Chu, Xiao, et al. "Multi-context attention for human pose estimation." arXiv preprint arXiv:1702.07432 (2017). ⭐⭐⭐

Depth Information and Stereo Vision

HFM-Net: Zeng, J., Tong, Y., Huang, Y., Yan, Q., Sun, W., Chen, J., & Wang, Y. (2019). Deep Surface Normal Estimation with Hierarchical RGB-D Fusion. arXiv preprint arXiv:1904.03405. ⭐⭐⭐
MADNet: Tonioni, Alessio, et al. "Real-time self-adaptive deep stereo." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2019. ⭐⭐⭐⭐ (offline domain adaption)
Geometry-Aware Distillation: Jiao, Jianbo, et al. "Geometry-Aware Distillation for Indoor Semantic Segmentation." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2019. ⭐⭐⭐
DiverseDepth: Yin, W., Wang, X., Shen, C., Liu, Y., Tian, Z., Xu, S., ... & Renyin, D. (2020). DiverseDepth: Affine-invariant depth prediction using diverse data. arXiv preprint arXiv:2002.00569. ⭐⭐⭐⭐

Shadow Detection/Removal

DeshadowNet: Qu, Liangqiong, et al. "DeshadowNet: A Multi-context Embedding Deep Network for Shadow Removal." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2017. ⭐⭐⭐
scGAN: Nguyen, Vu, et al. "Shadow Detection with Conditional Generative Adversarial Networks." In ICCV. 2017. ⭐⭐
Patched CNN: Hosseinzadeh, Sepideh, Moein Shakeri, and Hong Zhang. "Fast Shadow Detection from a Single Image Using a Patched Convolutional Neural Network." arXiv preprint arXiv:1709.09283 (2017). ⭐
ST-CGAN: Wang, Jifeng, et al. "Stacked Conditional Generative Adversarial Networks for Jointly Learning Shadow Detection and Shadow Removal." arXiv preprint arXiv:1712.02478 (2017). ⭐⭐ (ISTD dataset)
A+D Net: Le, Hieu, et al. "A+ D net: Training a shadow detector with adversarial shadow attenuation." Proceedings of the European Conference on Computer Vision (ECCV). 2018. ⭐⭐⭐
Lazy annotation for immature SBU： Vicente, Yago, et al. "Noisy label recovery for shadow detection in unfamiliar domains." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016. ⭐⭐⭐
StackedCNN + SBU: Vicente, Tomás F. Yago, et al. "Large-scale training of shadow detectors with noisily-annotated shadow examples." European Conference on Computer Vision. Springer, Cham, 2016. ⭐⭐⭐⭐ (SBU dataset)
CPAdv-Net: Mohajerani, Sorour, and Parvaneh Saeedi. "Shadow Detection in Single RGB Images Using a Context Preserver Convolutional Neural Network Trained by Multiple Adversarial Examples." IEEE Transactions on Image Processing (2019). ⭐⭐
Color Constancy: Sidorov, Oleksii. "Conditional GANs for Multi-Illuminant Color Constancy: Revolution or Yet Another Approach?." CVPR workshop, 2019. ⭐⭐
DSDNet: Zheng, Quanlong, et al. "Distraction-aware Shadow Detection." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2019. ⭐⭐⭐⭐
ARGAN: Ding, Bin, et al. "ARGAN: Attentive Recurrent Generative Adversarial Network for Shadow Detection and Removal." In ICCV, (2019). ⭐⭐⭐
SP+M-Net: Le, H., & Samaras, D. (2019). Shadow removal via shadow image decomposition. In Proceedings of the IEEE International Conference on Computer Vision (pp. 8578-8587). ⭐⭐⭐⭐
Portrait Shadow Manipulation: Zhang, Xuaner Cecilia, et al. "Portrait Shadow Manipulation." In SIGGRAPH (2020). ⭐⭐⭐⭐⭐
Weakly-supervised shadow decomposition: Le, Hieu, and Dimitris Samaras. "From Shadow Segmentation to Shadow Removal." arXiv preprint arXiv:2008.00267 (2020). ⭐⭐⭐⭐⭐ (Video Shadow Removal Dataset)
AEF: Fu, Lan, et al. "Auto-Exposure Fusion for Single-Image Shadow Removal." CVPR 2021. ⭐⭐⭐
G2R-ShadowNet: Liu, Zhihao, et al. "From Shadow Generation to Shadow Removal." arXiv preprint arXiv:2103.12997 (2021). ⭐⭐⭐⭐⭐
Video Shadow: Chen, Z., Wan, L., Zhu, L., Shen, J., Fu, H., Liu, W., & Qin, J. (2021). Triple-cooperative Video Shadow Detection. In CVPR 2021. ⭐⭐⭐⭐
Removing Objects and their Shadows: Zhang, E., Martin-Brualla, R., Kontkanen, J., & Curless, B. L. (2021). No Shadow Left Behind: Removing Objects and their Shadows using Approximate Lighting and Geometry. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 16397-16406). ⭐⭐⭐⭐⭐
Shadow Generation + DE-SOBA dataset: Hong, Y., Niu, L., Zhang, J., & Zhang, L. (2021). Shadow Generation for Composite Image in Real-world Scenes. arXiv preprint arXiv:2104.10338. ⭐⭐⭐⭐
G2R-ShadowNet: Liu, Zhihao, et al. "From Shadow Generation to Shadow Removal." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2021. ⭐⭐⭐⭐
Temporal Feature Warping: Hu, S., Le, H., & Samaras, D. (2021). Temporal Feature Warping for Video Shadow Detection. arXiv preprint arXiv:2107.14287. ⭐⭐⭐⭐
CANet: Chen, Z., Long, C., Zhang, L., & Xiao, C. (2021). CANet: A Context-Aware Network for Shadow Removal. In Proceedings of the IEEE/CVF International Conference on Computer Vision (pp. 4743-4752). ⭐⭐⭐⭐⭐
FDRNet: Zhu, L., Xu, K., Ke, Z., & Lau, R. W. (2021). Mitigating Intensity Bias in Shadow Detection via Feature Decomposition and Reweighting. In Proceedings of the IEEE/CVF International Conference on Computer Vision (pp. 4702-4711). ⭐⭐⭐⭐⭐
SADC: Xu, Yimin, et al. "Shadow-Aware Dynamic Convolution for Shadow Removal." arXiv preprint arXiv:2205.04908 (2022). ⭐⭐

Image Restoration

DRRN: Tai, Ying, Jian Yang, and Xiaoming Liu. "Image super-resolution via deep recursive residual network." The IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2017. ⭐⭐⭐⭐
DID-MDN: Zhang, He, and Vishal M. Patel. "Density-aware Single Image De-raining using a Multi-stream Dense Network." arXiv preprint arXiv:1802.07412 (2018). ⭐⭐
IDN: Hui, Zheng, Xiumei Wang, and Xinbo Gao. "Fast and Accurate Single Image Super-Resolution via Information Distillation Network." In CVPR. 2018. ⭐⭐⭐
SFT-GAN: Wang, X., Yu, K., Dong, C., & Loy, C. C. (2018). Recovering realistic texture in image super-resolution by deep spatial feature transform. In CVPR. 2018. ⭐⭐⭐
Deep Multi-scale Convolutional Neural Network for Dynamic Scene Deblurring：Nah, Seungjun, Tae Hyun Kim, and Kyoung Mu Lee. "Deep multi-scale convolutional neural network for dynamic scene deblurring." In CVPR, 2017. ⭐⭐⭐
Enhanced Deep Residual Networks for Single Image Super-Resolution: Lim, Bee, et al. "Enhanced deep residual networks for single image super-resolution." The CVPR workshops, 2017. ⭐
AGAN for Raindrop Removal: Qian, Rui, et al. "Attentive Generative Adversarial Network for Raindrop Removal from A Single Image." In CVPR. 2018. ⭐⭐⭐⭐⭐
DCPDN: Zhang, He, and Vishal M. Patel. "Densely connected pyramid dehazing network." In CVPR, 2018. ⭐⭐⭐
GFN: Ren, W., Ma, L., Zhang, J., Pan, J., Cao, X., Liu, W., & Yang, M. H. (2018). Gated fusion network for single image dehazing. In CVPR, 2018. ⭐⭐⭐⭐
SIDCGAN: Li, Runde, et al. "Single Image Dehazing via Conditional Generative Adversarial Network." In CVPR, 2018. ⭐⭐
Dehaze Benchmark: Li, Boyi, et al. "Benchmarking Single Image Dehazing and Beyond." IEEE Transactions on Image Processing (2018). ⭐⭐⭐⭐⭐
Cityscapes + Haze: Sakaridis, Christos, Dengxin Dai, and Luc Van Gool. "Semantic foggy scene understanding with synthetic data." International Journal of Computer Vision (2018): 1-20. ⭐⭐⭐⭐⭐
RESCAN: Li, Xia, et al. "Recurrent Squeeze-and-Excitation Context Aggregation Net for Single Image Deraining." European Conference on Computer Vision. Springer, Cham, 2018. ⭐⭐⭐
UD-GAN: Jin, Xin, et al. "Unsupervised Single Image Deraining with Self-supervised Constraints." arXiv preprint arXiv:1811.08575 (2018). ⭐⭐⭐⭐⭐
Deep Tree-Structured Fusion Model: Fu, Xueyang, et al. "A Deep Tree-Structured Fusion Model for Single Image Deraining." arXiv preprint arXiv:1811.08632 (2018). ⭐⭐
Dual CNN: Pan, J., Liu, S., Sun, D., Zhang, J., Liu, Y., Ren, J., ... & Yang, M. H. Learning Dual Convolutional Neural Networks for Low-Level Vision. In CVPR, 2018 (pp. 3070-3079). ⭐⭐⭐
RAM: Kim, Jun-Hyuk, et al. "RAM: Residual Attention Module for Single Image Super-Resolution." arXiv preprint arXiv:1811.12043 (2018). ⭐⭐⭐
DNSR (Bi-cycle GAN): Zhao, Tianyu, et al. "Unsupervised Degradation Learning for Single Image Super-Resolution." arXiv preprint arXiv:1812.04240 (2018). ⭐⭐⭐⭐⭐
Cycle-Defog2Refog：Liu, Wei, et al. "End-to-End Single Image Fog Removal using Enhanced Cycle Consistent Adversarial Networks." arXiv preprint arXiv:1902.01374 (2019). ⭐⭐
SPANet： Tianyu Wang, Xin Yang, Ke Xu, Shaozhe Chen, Qiang Zhang, Rynson W.H. Lau. "Spatial Attentive Single-Image Deraining with a High Quality Real Rain Dataset." In CVPR 2019. ⭐⭐⭐⭐
remove rain streaks and rain accumulation： Ruoteng Li, Loong-Fah Cheong, and Robby T. Tan. "Heavy Rain Image Restoration: Integrating Physics Model and Conditional Adversarial Learning." In CVPR 2019. ⭐⭐⭐⭐⭐
Rain O’er Me: Huangxing Lin, Yanlong Li, Xinghao Ding, Weihong Zeng, Yue Huang, John Paisley: "Rain O’er Me: Synthesizing real rain to derain with data distillation." arXiv preprint arXiv:1904.04605 (2019). ⭐⭐⭐⭐
RNAN: Zhang, Y., Li, K., Li, K., Zhong, B., & Fu, Y. (2019). Residual Non-local Attention Networks for Image Restoration. arXiv preprint arXiv:1903.10082. ⭐⭐⭐⭐⭐
Perceptual GAN loss + TV loss： Ledig, C., Theis, L., Huszár, F., Caballero, J., Cunningham, A., Acosta, A., ... & Shi, W. (2017). Photo-realistic single image super-resolution using a generative adversarial network. In CVPR (pp. 4681-4690).(code) ⭐⭐⭐⭐⭐
PReNet: Ren, Dongwei, et al. "Progressive Image Deraining Networks: A Better and Simpler Baseline." In CVPR, 2019. ⭐⭐⭐
Zoom to Learn, Learn to Zoom: Zhang, Xuaner, et al. "Zoom to Learn, Learn to Zoom." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2019. ⭐⭐⭐⭐
Derain Beachmark: Li, Siyuan, et al. "Single image deraining: A comprehensive benchmark analysis." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2019. ⭐⭐⭐
Dual residual block: Liu, Xing, et al. "Dual Residual Networks Leveraging the Potential of Paired Operations for Image Restoration." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2019. ⭐⭐⭐
Semi-supervised Transfer Learning for Image Rain Removal: Wei, Wei, et al. "Semi-Supervised Transfer Learning for Image Rain Removal." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2019. ⭐⭐⭐⭐
UMRL： Yasarla, Rajeev, and Vishal M. Patel. "Uncertainty Guided Multi-Scale Residual Learning-using a Cycle Spinning CNN for Single Image De-Raining." CVPR 2019. ⭐⭐⭐⭐
NASNet: Qin, Xu, and Zhilin Wang. "NASNet: A Neuron Attention Stage-by-Stage Net for Single Image Deraining." arXiv preprint arXiv:1912.03151 (2019). ⭐⭐⭐⭐
DerainCycleGAN: Wei, Yanyan, et al. "DerainCycleGAN: An Attention-guided Unsupervised Benchmark for Single Image Deraining and Rainmaking." arXiv preprint arXiv:1912.07015 (2019). ⭐⭐⭐⭐
Physics-Based Rain Rendering: HALDER, Shirsendu Sukanta; LALONDE, Jean-François; CHARETTE, Raoul de. Physics-Based Rendering for Improving Robustness to Rain. In: ICCV, 2019. pp. 10203-10212. ⭐⭐⭐⭐⭐
Partial Convolution (mask-guided): Liu, Guilin, et al. "Image inpainting for irregular holes using partial convolutions." Proceedings of the European Conference on Computer Vision (ECCV). 2018. ⭐⭐⭐⭐⭐
Derain Survey: Wang, H., Li, M., Wu, Y., Zhao, Q., & Meng, D. (2019). A Survey on Rain Removal from Video and Single Image. arXiv preprint arXiv:1909.08326. ⭐⭐⭐⭐
Deep Adversarial Decomposition: Zou, Zhengxia, et al. "Deep Adversarial Decomposition: A Unified Framework for Separating Superimposed Images." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020. ⭐⭐⭐⭐
CARN: Ahn, Namhyuk, Byungkon Kang, and Kyung-Ah Sohn. "Fast, accurate, and lightweight super-resolution with cascading residual network." Proceedings of the European Conference on Computer Vision (ECCV). 2018. ⭐⭐⭐
Semi-supervised derain with Gaussian processes: Yasarla, Rajeev, Vishwanath A. Sindagi, and Vishal M. Patel. "Syn2Real Transfer Learning for Image Deraining Using Gaussian Processes." In CVPR. 2020. ⭐⭐⭐⭐
EPDN: Qu, Yanyun, et al. "Enhanced pix2pix dehazing network." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2019. ⭐⭐⭐⭐
PEPSI: Shin, Yong-Goo, et al. "PEPSI++: Fast and lightweight network for image inpainting." IEEE Transactions on Neural Networks and Learning Systems (2020). ⭐⭐⭐
holistic attention network: Niu, Ben, et al. "Single image super-resolution via a holistic attention network." European Conference on Computer Vision. Springer, Cham, 2020. ⭐⭐⭐
SNet, VNet, and ANet： Wang, Yinglong, et al. "Rethinking image deraining via rain streaks and vapors." European Conference on Computer Vision. Springer, Cham, 2020. ⭐⭐⭐
JRGR (Disentangled): Ye, Y., Chang, Y., Zhou, H., & Yan, L. (2021). Closing the Loop: Joint Rain Generation and Removal via Disentangled Image Translation. In CVPR 2021. ⭐⭐⭐⭐⭐
ACER-Net: Wu, H., Qu, Y., Lin, S., Zhou, J., Qiao, R., Zhang, Z., ... & Ma, L. (2021). Contrastive Learning for Compact Single Image Dehazing. In CVPR 2021. ⭐⭐⭐
MPRNet: Zamir, S. W., Arora, A., Khan, S., Hayat, M., Khan, F. S., Yang, M. H., & Shao, L. (2021). Multi-stage progressive image restoration. arXiv preprint arXiv:2102.02808.
AdderSR: Song, D., Wang, Y., Chen, H., Xu, C., Xu, C., & Tao, D. (2020). AdderSR: Towards energy efficient image super-resolution. In CVPR 2021. ⭐⭐⭐⭐
RICNet： Ni, Siqi, et al. "Controlling the Rain: From Removal to Rendering." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2021. ⭐⭐⭐
Video rain streaks+fog: Yan, W., Tan, R. T., Yang, W., & Dai, D. (2021). Self-Aligned Video Deraining With Transmission-Depth Consistency. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 11966-11976). ⭐⭐⭐⭐
Uformer: Wang, Z., Cun, X., Bao, J., & Liu, J. (2021). Uformer: A General U-Shaped Transformer for Image Restoration. arXiv preprint arXiv:2106.03106. ⭐⭐⭐
Real Video Dehaze Data: Zhang, Xinyi, et al. "Learning To Restore Hazy Video: A New Real-World Dataset and a New Method." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2021. ⭐⭐⭐⭐
Hybrid Local-Global Transformer: Zhao, D., Li, J., Li, H., & Xu, L. (2021). Hybrid Local-Global Transformer for Image Dehazing. arXiv preprint arXiv:2109.07100. ⭐⭐⭐⭐
Restormer: Zamir, S. W., Arora, A., Khan, S., Hayat, M., Khan, F. S., & Yang, M. H. (2021). Restormer: Efficient Transformer for High-Resolution Image Restoration. arXiv preprint arXiv:2111.09881. ⭐⭐⭐⭐
MAXIM: Tu, Z., Talebi, H., Zhang, H., Yang, F., Milanfar, P., Bovik, A., & Li, Y. (2022). MAXIM: Multi-Axis MLP for Image Processing. arXiv preprint arXiv:2201.02973. ⭐⭐⭐
NAFNet: Chen, L., Chu, X., Zhang, X., & Sun, J. (2022). Simple Baselines for Image Restoration. arXiv preprint arXiv:2204.04676. ⭐⭐⭐⭐⭐
KCKE: Chen, Wei-Ting, et al. "Learning Multiple Adverse Weather Removal via Two-Stage Knowledge Learning and Multi-Contrastive Regularization: Toward a Unified Model." CVPR. 2022. ⭐⭐⭐⭐

Nighttime & Low-light

dehaze + nighttime: Yan, Wending, Robby T. Tan, and Dengxin Dai. "Nighttime defogging using high-low frequency decomposition and grayscale-color networks." In ECCV, 2020. ⭐⭐⭐⭐⭐
Nighttime Visibility Enhancement: Sharma, Aashish, and Robby T. Tan. "Nighttime Visibility Enhancement by Increasing the Dynamic Range and Suppression of Light Effects." In CVPR. 2021. ⭐⭐⭐⭐

Image Synthesis

Let there be Color!: Iizuka, Satoshi, Edgar Simo-Serra, and Hiroshi Ishikawa. "Let there be color!: joint end-to-end learning of global and local image priors for automatic image colorization with simultaneous classification." ACM Transactions on Graphics (TOG) 35.4 (2016): 110. ⭐⭐⭐⭐⭐
Colorful Image Colorization: Zhang, Richard, Phillip Isola, and Alexei A. Efros. "Colorful image colorization." European Conference on Computer Vision. Springer, Cham, 2016. ⭐⭐⭐⭐
Neural Style: Gatys, Leon A., Alexander S. Ecker, and Matthias Bethge. "A neural algorithm of artistic style." arXiv preprint arXiv:1508.06576 (2015). ⭐⭐⭐⭐⭐
Texture Synthesis: Gatys, Leon, Alexander S. Ecker, and Matthias Bethge. "Texture synthesis using convolutional neural networks." Advances in Neural Information Processing Systems. 2015. ⭐⭐⭐⭐
Semantic Annotation Artwork: Champandard, Alex J. "Semantic style transfer and turning two-bit doodles into fine artworks." arXiv preprint arXiv:1603.01768 (2016). ⭐⭐⭐
MRC+CNN Image Synthesis: Li, Chuan, and Michael Wand. "Combining markov random fields and convolutional neural networks for image synthesis." In CVPR. 2016. ⭐⭐⭐⭐
More Experiments on Neural Style: Novak, Roman, and Yaroslav Nikulin. "Improving the neural algorithm of artistic style." arXiv preprint arXiv:1605.04603 (2016). ⭐⭐
Deep Photo Style Transfer: Luan, Fujun, et al. "Deep photo style transfer." In CVPR. 2017. ⭐⭐⭐⭐⭐
Pretraining is All You Need + Diffusion: Wang, Tengfei, et al. "Pretraining is All You Need for Image-to-Image Translation." arXiv preprint arXiv:2205.12952 (2022). ⭐⭐⭐

Computational Photography

Multi-Illumination Dataset: Murmann, Lukas, et al. "A Dataset of Multi-Illumination Images in the Wild." Proceedings of the IEEE International Conference on Computer Vision. 2019. ⭐⭐⭐⭐⭐
WESPE: Ignatov, Andrey, et al. "WESPE: weakly supervised photo enhancer for digital cameras." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. 2018. ⭐⭐⭐
Zurich RAW to RGB dataset + PyNet: Ignatov, Andrey, Luc Van Gool, and Radu Timofte. "Replacing Mobile Camera ISP with a Single Deep Learning Model." arXiv preprint arXiv:2002.05509 (2020). ⭐⭐⭐⭐

GAN

GAN: Goodfellow, Ian, et al. "Generative adversarial nets." In NIPS. 2014. ⭐⭐⭐⭐⭐
cGAN: Mirza, Mehdi, and Simon Osindero. "Conditional generative adversarial nets." arXiv preprint arXiv:1411.1784 (2014). ⭐⭐⭐⭐⭐
Image-to-Image Translation with Conditional Adversarial Networks: Isola, Phillip, et al. "Image-to-image translation with conditional adversarial networks." arXiv preprint (2017). ⭐⭐⭐⭐⭐
cycleGAN：Zhu, Jun-Yan, et al. "Unpaired image-to-image translation using cycle-consistent adversarial networks." arXiv preprint (2017). ⭐⭐⭐⭐⭐
StartGAN: Choi, Yunjey, et al. "Stargan: Unified generative adversarial networks for multi-domain image-to-image translation." In CVPR 2018. ⭐⭐⭐⭐
E-GAN: Wang, C., Xu, C., Yao, X., & Tao, D. (2018). Evolutionary Generative Adversarial Networks. arXiv preprint arXiv:1803.00657. ⭐⭐⭐⭐
DCGAN: Radford, Alec, Luke Metz, and Soumith Chintala. "Unsupervised representation learning with deep convolutional generative adversarial networks." arXiv preprint arXiv:1511.06434 (2015). ⭐⭐⭐⭐
GANtruth： Bujwid, Sebastian, et al. "GANtruth-an unpaired image-to-image translation method for driving scenarios." arXiv preprint arXiv:1812.01710 (2018). ⭐⭐⭐
AttentionGAN: Tang, H., Liu, H., Xu, D., Torr, P. H.S., & Sebe, N. (2019). AttentionGAN: Unpaired Image-to-Image Translation using Attention-Guided Generative Adversarial Networks. arXiv preprint arXiv:1911.11897. ⭐⭐⭐⭐
Multiclass Sketch-to-Image Translation： Ghosh, A., Zhang, R., Dokania, P. K., Wang, O., Efros, A. A., Torr, P. H.S., & Shechtman, E. (2019). Interactive Sketch & Fill: Multiclass Sketch-to-Image Translation. In Proceedings of the IEEE International Conference on Computer Vision (pp. 1171-1180). ⭐⭐⭐
RealnessGAN: Yuanbo Xiangli, etal. Real or not real, that is a question. In ICLR 2020. ⭐⭐⭐⭐
Domain-bridged GAN: Pizzati, Fabio, et al. "Domain bridge for unpaired image-to-image translation and unsupervised domain adaptation." The IEEE Winter Conference on Applications of Computer Vision. 2020. ⭐⭐⭐⭐
SinGAN: Shaham, T. R., Dekel, T., & Michaeli, T. (2019). SinGAN: Learning a generative model from a single natural image. In Proceedings of the IEEE International Conference on Computer Vision (pp. 4570-4580). ⭐⭐⭐⭐⭐
CUT: Park, T., Efros, A. A., Zhang, R., & Zhu, J. Y. (2020, August). Contrastive learning for unpaired image-to-image translation. In European Conference on Computer Vision (pp. 319-345). Springer, Cham. ⭐⭐⭐⭐⭐

Disentangled

Deblur+Disentangled: Lu, Boyu, Jun-Cheng Chen, and Rama Chellappa. "Unsupervised domain-specific deblurring via disentangled representations." In CVPR. 2019. ⭐⭐⭐⭐⭐
One-Shot Unsupervised Image Translation: Cohen, Tomer, and Lior Wolf. "Bidirectional One-Shot Unsupervised Domain Mapping." Proceedings of the IEEE International Conference on Computer Vision. 2019. ⭐⭐⭐⭐

AR/VR

Indoor Lighting Estimation: Garon, Mathieu, et al. "Fast Spatially-Varying Indoor Lighting Estimation." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2019. ⭐⭐⭐

Person Re-ID

IANet: Hou, Ruibing, et al. "Interaction-And-Aggregation Network for Person Re-Identification." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2019. ⭐⭐⭐⭐⭐
AlignedReID: Zhang, Xuan, et al. "AlignedReID: Surpassing human-level performance in person re-identification." arXiv preprint arXiv:1711.08184 (2017). ⭐⭐⭐⭐⭐

Distillation

Knowledge Distillation: Hinton, G., Vinyals, O., & Dean, J. (2015). Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531. ⭐⭐⭐⭐⭐
Deep Mutual Learning: Zhang, Ying, et al. "Deep mutual learning." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018. ⭐⭐⭐⭐⭐
Cooperative learning: Batra, Tanmay, and Devi Parikh. "Cooperative learning with visual attributes." arXiv preprint arXiv:1705.05512 (2017). ⭐⭐⭐
Deeply-supervised Knowledge Synergy: Sun, D., Yao, A., Zhou, A., & Zhao, H. (2019). Deeply-supervised Knowledge Synergy. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 6997-7006). ⭐⭐⭐⭐⭐
ONE: Lan, Xu, Xiatian Zhu, and Shaogang Gong. "Knowledge distillation by On-the-fly Native Ensemble." Proceedings of the 32nd International Conference on Neural Information Processing Systems. Curran Associates Inc., 2018. ⭐⭐⭐⭐⭐
Segmentation Distillation: Liu, Yifan, et al. "Structured Knowledge Distillation for Semantic Segmentation." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2019. ⭐⭐⭐⭐

Uncertainty

aleatoric uncertainty and epistemic uncertainty: Kendall, Alex, and Yarin Gal. "What uncertainties do we need in bayesian deep learning for computer vision?." Advances in neural information processing systems. 2017. ⭐⭐⭐⭐⭐
Learning Model Confidence： Charles Corbière, Nicolas Thome, Avner Bar-Hen, Matthieu Cord, Patrick Pérez. "Addressing Failure Prediction by Learning Model Confidence" NeurIPS, 2019. ⭐⭐⭐⭐

Transformer

Transformer: Vaswani, Ashish, et al. "Attention is all you need." arXiv preprint arXiv:1706.03762 (2017). ⭐⭐⭐⭐⭐
Pre-trained image processing transformer: Chen, Hanting, et al. "Pre-trained image processing transformer." arXiv preprint arXiv:2012.00364 (2020). ⭐⭐⭐⭐
texture transformer for Super-resolution: Yang, Fuzhi, et al. "Learning texture transformer network for image super-resolution." In CVPR, 2020. ⭐⭐⭐⭐
TransUnet: Chen, Jieneng, et al. "TransUnet: Transformers make strong encoders for medical image segmentation." arXiv preprint arXiv:2102.04306 (2021). ⭐⭐⭐⭐
Swin transformer: Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., ... & Guo, B. (2021). Swin transformer: Hierarchical vision transformer using shifted windows. arXiv preprint arXiv:2103.14030. ⭐⭐⭐⭐⭐
VOLO: Yuan, Li, et al. "VOLO: Vision Outlooker for Visual Recognition." arXiv preprint arXiv:2106.13112 (2021). ⭐⭐⭐⭐⭐
Video Swin Transformer: Liu, Ze, et al. "Video Swin Transformer." arXiv preprint arXiv:2106.13230 (2021). ⭐⭐⭐
Focal Transformer:Yang, J., Li, C., Zhang, P., Dai, X., Xiao, B., Yuan, L., & Gao, J. (2021). Focal Self-attention for Local-Global Interactions in Vision Transformers. arXiv preprint arXiv:2107.00641. ⭐⭐⭐⭐⭐
Pyramid vision transformer: Wang, W., Xie, E., Li, X., Fan, D. P., Song, K., Liang, D., ... & Shao, L. (2021). Pyramid vision transformer: A versatile backbone for dense prediction without convolutions. arXiv preprint arXiv:2102.12122. ⭐⭐⭐⭐
Pyramid vision transformer V2： Wang, W., Xie, E., Li, X., Fan, D. P., Song, K., Liang, D., ... & Shao, L. (2021). PVTv2: Improved Baselines with Pyramid Vision Transformer. arXiv preprint arXiv:2106.13797. ⭐⭐⭐⭐
Swin Transformer V2: Liu, Z., Hu, H., Lin, Y., Yao, Z., Xie, Z., Wei, Y., ... & Guo, B. (2021). Swin Transformer V2: Scaling Up Capacity and Resolution. arXiv preprint arXiv:2111.09883. ⭐⭐⭐⭐
DeiT: Touvron, Hugo, et al. "Training data-efficient image transformers & distillation through attention." International Conference on Machine Learning. 2021. ⭐⭐⭐⭐

General Perception

Perceiver: Jaegle, Andrew, et al. "Perceiver: General perception with iterative attention." International Conference on Machine Learning. PMLR, 2021. ⭐⭐⭐⭐⭐
Perceiver IO: Jaegle, Andrew, et al. "Perceiver IO: A general architecture for structured inputs & outputs." arXiv preprint arXiv:2107.14795 (2021). ⭐⭐⭐⭐
Florence: Yuan, Lu, et al. "Florence: A New Foundation Model for Computer Vision." arXiv preprint arXiv:2111.11432 (2021). ⭐⭐⭐⭐⭐
Unified-IO： Unified-IO: A Unified Model for Vision Language and Multi-modal tasks. arXiv:2206.08916 (2022). ⭐⭐⭐⭐
CoCa: Yu, Jiahui, et al. "CoCa: Contrastive captioners are image-text foundation models." arXiv preprint arXiv:2205.01917 (2022). ⭐⭐⭐⭐⭐

Traditional Method

Rolling Guidance Filter: Zhang, Q., Shen, X., Xu, L., & Jia, J. Rolling guidance filter. In ECCV, 2014. ⭐⭐⭐⭐⭐

Talks

G-RMI: Google. (Object Detection) slides
2017 CVPR Tutorial: video and slides
16-18 Computer Vision Conferences: https://www.youtube.com/channel/UC0n76gicaarsN_Y9YShWwhw/playlists

xw-hu/Reading-List

xw-hu

Reviews

Repository Details

Reading-List

Basic Network and Techniques

Object Detection

Semantic Segmentation

Instance Segmentation

Weakly Supervised

Unsupervised/Self-supervised

Semi-supervised

Domain Adaptation

Domain Generalization

Video

Saliency

Attention

Depth Information and Stereo Vision

Shadow Detection/Removal

Image Restoration

Nighttime & Low-light

Image Synthesis

Computational Photography

GAN

Disentangled

AR/VR

Person Re-ID

Distillation

Uncertainty

Transformer

General Perception

Traditional Method

Talks

More Repositories