Reading-List
Reading list on deep learning.
Basic Network and Techniques
- AlexNet: MLA Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. "Imagenet classification with deep convolutional neural networks." Advances in neural information processing systems. 2012.
⭐ ⭐ ⭐ ⭐ ⭐ - Dropout: Srivastava, Nitish, et al. "Dropout: a simple way to prevent neural networks from overfitting." Journal of Machine Learning Research 15.1 (2014): 1929-1958.
⭐ ⭐ ⭐ ⭐ - VGG: Simonyan, Karen, and Andrew Zisserman. "Very deep convolutional networks for large-scale image recognition." arXiv preprint arXiv:1409.1556 (2014).
⭐ ⭐ ⭐ ⭐ ⭐ - GoogLeNet: Szegedy, Christian, et al. "Going deeper with convolutions." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2015.
⭐ ⭐ ⭐ ⭐ ⭐ - Batch Normalization: Ioffe, Sergey, and Christian Szegedy. "Batch normalization: Accelerating deep network training by reducing internal covariate shift." arXiv preprint arXiv:1502.03167 (2015). [Inception v2]
⭐ ⭐ ⭐ ⭐ ⭐ - PReLU & msra Initilization: He, Kaiming, et al. "Delving deep into rectifiers: Surpassing human-level performance on imagenet classification." Proceedings of the IEEE international conference on computer vision. 2015.
⭐ ⭐ ⭐ ⭐ ⭐ - InceptionV3: Szegedy, Christian, et al. "Rethinking the inception architecture for computer vision." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016.
⭐ ⭐ ⭐ ⭐ - ResNet: He, Kaiming, et al. "Deep residual learning for image recognition." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016.
⭐ ⭐ ⭐ ⭐ ⭐ - Identity ResNet: He, Kaiming, et al. "Identity mappings in deep residual networks." European Conference on Computer Vision. Springer International Publishing, 2016.
⭐ ⭐ ⭐ ⭐ ⭐ - CReLU: Shang, Wenling, et al. "Understanding and improving convolutional neural networks via concatenated rectified linear units." Proceedings of the International Conference on Machine Learning (ICML). 2016.
⭐ ⭐ ⭐ - InceptionV4 & Inception-ResNet: Szegedy, Christian, et al. "Inception-v4, inception-resnet and the impact of residual connections on learning." arXiv preprint arXiv:1602.07261 (2016).
⭐ ⭐ ⭐ ⭐ - ResNeXt: Xie, Saining, et al. "Aggregated residual transformations for deep neural networks." arXiv preprint arXiv:1611.05431 (2016).
⭐ ⭐ ⭐ ⭐ - Batch Renormalization: Ioffe, Sergey. "Batch Renormalization: Towards Reducing Minibatch Dependence in Batch-Normalized Models." arXiv preprint arXiv:1702.03275 (2017).
⭐ ⭐ ⭐ ⭐ - Xception: Chollet, François. "Xception: Deep Learning with Depthwise Separable Convolutions." arXiv preprint arXiv:1610.02357 (2016).
⭐ ⭐ ⭐ - MobileNets: Howard, Andrew G., et al. "Mobilenets: Efficient convolutional neural networks for mobile vision applications." arXiv preprint arXiv:1704.04861 (2017).
⭐ ⭐ ⭐ - DenseNet: Huang, Gao, et al. "Densely connected convolutional networks." arXiv preprint arXiv:1608.06993 (2016).
⭐ ⭐ ⭐ ⭐ ⭐ - PolyNet: Zhang, Xingcheng, et al. "Polynet: A pursuit of structural diversity in very deep networks." arXiv preprint arXiv:1611.05725 (2016). Slides
⭐ ⭐ ⭐ ⭐ - IRNN: Le, Quoc V., Navdeep Jaitly, and Geoffrey E. Hinton. "A simple way to initialize recurrent networks of rectified linear units." arXiv preprint arXiv:1504.00941 (2015).
⭐ ⭐ ⭐ - ReNet: Visin, Francesco, et al. "ReNet: A recurrent neural network based alternative to convolutional networks." arXiv preprint arXiv:1505.00393 (2015).
⭐ ⭐ ⭐ ⭐ - Non-local Neural Network: Wang, Xiaolong, Ross Girshick, Abhinav Gupta, and Kaiming He. "Non-local Neural Networks." arXiv preprint arXiv:1711.07971 (2017).
⭐ ⭐ ⭐ ⭐ - Group Normalization: Wu, Yuxin, and Kaiming He. "Group normalization." In ECCV (2018).
⭐ ⭐ ⭐ ⭐ ⭐ - SENet: Hu, Jie, Li Shen, and Gang Sun. "Squeeze-and-excitation networks."In CVPR (2018).
⭐ ⭐ ⭐ ⭐ ⭐ - Rethinking ImageNet Pre-training: He, Kaiming, Ross Girshick, and Piotr Dollár. "Rethinking ImageNet Pre-training." arXiv preprint arXiv:1811.08883 (2018).
⭐ ⭐ ⭐ ⭐ - CBAM: Woo, Sanghyun, et al. "CBAM: Convolutional block attention module." Proceedings of the European Conference on Computer Vision (ECCV). 2018.
⭐ ⭐ ⭐ ⭐ - Network generator: Saining Xie, Alexander Kirillov, Ross Girshick, Kaiming He. Exploring Randomly Wired Neural Networks for Image Recognition. arXiv:1904.01569 (2019).
⭐ ⭐ ⭐ ⭐ ⭐ - GCNet: Cao, Yue, et al. "GCNet: Non-local Networks Meet Squeeze-Excitation Networks and Beyond." arXiv preprint arXiv:1904.11492 (2019).
⭐ ⭐ ⭐ ⭐ - SqueezeNet: Forrest N. Iandola, etal. SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5MB model size. In ICLR, 2017.
⭐ ⭐ ⭐ ⭐ - Dynamic Filter: Jia, Xu, et al. "Dynamic filter networks." Advances in neural information processing systems. 2016.
⭐ ⭐ ⭐ ⭐ - CondConv: Yang, Brandon, et al. "Condconv: Conditionally parameterized convolutions for efficient inference." Advances in Neural Information Processing Systems. 2019.
⭐ ⭐ ⭐ ⭐ - SimSiam: Chen, X., & He, K. (2020). Exploring Simple Siamese Representation Learning. In CVPR 2021.
⭐ ⭐ ⭐ ⭐ - CycleMLP: Chen, S., Xie, E., Ge, C., Liang, D., & Luo, P. (2021). CycleMLP: A MLP-like Architecture for Dense Prediction. arXiv preprint arXiv:2107.10224.
⭐ ⭐ ⭐ ⭐ - EfficientNet: Tan, Mingxing, and Quoc Le. "EfficientNet: Rethinking model scaling for convolutional neural networks." International Conference on Machine Learning. PMLR, 2019.
⭐ ⭐ ⭐ ⭐ - ConvNeXt: Liu, Z., Mao, H., Wu, C. Y., Feichtenhofer, C., Darrell, T., & Xie, S. (2022). A ConvNet for the 2020s. arXiv preprint arXiv:2201.03545
⭐ ⭐ ⭐ ⭐ ⭐ - CoAtNet: Dai, Z., Liu, H., Le, Q., & Tan, M. (2021). CoAtNet: Marrying convolution and attention for all data sizes. Advances in Neural Information Processing Systems, 34.
⭐ ⭐ ⭐ ⭐ ⭐ - Large_Kernel: Ding, X., Zhang, X., Zhou, Y., Han, J., Ding, G., & Sun, J. (2022). Scaling Up Your Kernels to 31x31: Revisiting Large Kernel Design in CNNs. arXiv preprint arXiv:2203.06717.
⭐ ⭐ ⭐ ⭐ ⭐ - MPViT: Lee, Y., Kim, J., Willette, J., & Hwang, S. J. MPViT: Multi-Path Vision Transformer for Dense Prediction. In CVPR 2022.
⭐ ⭐ ⭐ - Deformable Attention: Xia, Z., Pan, X., Song, S., Li, L. E., & Huang, G. (2022). Vision Transformer with Deformable Attention. arXiv preprint arXiv:2201.00520.
⭐ ⭐ ⭐ ⭐ - EfficientNet: Tan, Mingxing, and Quoc Le. "EfficientNet: Rethinking model scaling for convolutional neural networks." International conference on machine learning. PMLR, 2019.
⭐ ⭐ ⭐ ⭐ ⭐ - HaloNets: Vaswani, Ashish, et al. "Scaling local self-attention for parameter efficient visual backbones." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2021.
⭐ ⭐ ⭐ ⭐ - SLaK: Liu S, Chen T, Chen X, et al. More ConvNets in the 2020s: Scaling up Kernels Beyond 51x51 using Sparsity[J]. arXiv preprint arXiv:2207.03620, 2022.
⭐ ⭐ ⭐ ⭐ - MetaFormer: Yu, Weihao, et al. "Metaformer is actually what you need for vision." In CVPR. 2022.
⭐ ⭐ ⭐ - Resnet strikes back: Wightman, R., Touvron, H., & Jégou, H. (2021). Resnet strikes back: An improved training procedure in timm. arXiv preprint arXiv:2110.00476.
⭐ ⭐ ⭐ - VSA: Zhang, Qiming, et al. "VSA: Learning Varied-Size Window Attention in Vision Transformers." arXiv preprint arXiv:2204.08446 (2022).
⭐ ⭐ ⭐ ⭐
Object Detection
- Overfeat: Sermanet, Pierre, et al. "Overfeat: Integrated recognition, localization and detection using convolutional networks." arXiv preprint arXiv:1312.6229 (2013).
⭐ ⭐ ⭐ ⭐ - RCNN: Girshick, Ross, et al. "Rich feature hierarchies for accurate object detection and semantic segmentation." Proceedings of the IEEE conference on computer vision and pattern recognition. 2014.
⭐ ⭐ ⭐ ⭐ ⭐ - SPP: He, Kaiming, et al. "Spatial pyramid pooling in deep convolutional networks for visual recognition." European Conference on Computer Vision. Springer International Publishing, 2014.
⭐ ⭐ ⭐ ⭐ ⭐ - Fast RCNN: Girshick, Ross. "Fast r-cnn." Proceedings of the IEEE International Conference on Computer Vision. 2015.
⭐ ⭐ ⭐ ⭐ ⭐ - Faster RCNN: Ren, Shaoqing, et al. "Faster r-cnn: Towards real-time object detection with region proposal networks." Advances in neural information processing systems. 2015.
⭐ ⭐ ⭐ ⭐ ⭐ - R-CNN minus R: Lenc, Karel, and Andrea Vedaldi. "R-cnn minus r." arXiv preprint arXiv:1506.06981 (2015).
⭐ - End-to-end people detection in crowded scenes: Stewart, Russell, Mykhaylo Andriluka, and Andrew Y. Ng. "End-to-end people detection in crowded scenes." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016.
⭐ ⭐ - YOLO: Redmon, Joseph, et al. "You only look once: Unified, real-time object detection." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016.
⭐ ⭐ ⭐ ⭐ ⭐ - ION: Bell, Sean, et al. "Inside-outside net: Detecting objects in context with skip pooling and recurrent neural networks." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016.
⭐ ⭐ ⭐ ⭐ - MultiPath: Zagoruyko, Sergey, et al. "A multipath network for object detection." arXiv preprint arXiv:1604.02135 (2016).
⭐ ⭐ ⭐ - SSD: Liu, Wei, et al. "SSD: Single shot multibox detector." European Conference on Computer Vision. Springer International Publishing, 2016.
⭐ ⭐ ⭐ ⭐ ⭐ - OHEM: Shrivastava, Abhinav, Abhinav Gupta, and Ross Girshick. "Training region-based object detectors with online hard example mining." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016.
⭐ ⭐ ⭐ ⭐ ⭐ - HyperNet: Kong, Tao, et al. "HyperNet: towards accurate region proposal generation and joint object detection." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016.
⭐ ⭐ ⭐ ⭐ - SDP: Yang, Fan, Wongun Choi, and Yuanqing Lin. "Exploit all the layers: Fast and accurate cnn object detector with scale dependent pooling and cascaded rejection classifiers." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016.
⭐ ⭐ ⭐ ⭐ - SubCNN: Xiang, Yu, et al. "Subcategory-aware convolutional neural networks for object proposals and detection." Applications of Computer Vision (WACV), 2017 IEEE Winter Conference on. IEEE, 2017.
⭐ ⭐ ⭐ - MSCNN: Cai, Zhaowei, et al. "A unified multi-scale deep convolutional neural network for fast object detection." European Conference on Computer Vision. Springer International Publishing, 2016.
⭐ ⭐ ⭐ ⭐ - RFCN: Li, Yi, Kaiming He, and Jian Sun. "R-fcn: Object detection via region-based fully convolutional networks." Advances in Neural Information Processing Systems. 2016.
⭐ ⭐ ⭐ ⭐ ⭐ - Shallow Network: Ashraf, Khalid, et al. "Shallow networks for high-accuracy road object-detection." arXiv preprint arXiv:1606.01561 (2016).
⭐ ⭐ - Is Faster R-CNN Doing Well for Pedestrian Detection: Zhang, Liliang, et al. "Is Faster R-CNN Doing Well for Pedestrian Detection?." European Conference on Computer Vision. Springer International Publishing, 2016.
⭐ ⭐ - GCNN: Najibi, Mahyar, Mohammad Rastegari, and Larry S. Davis. "G-cnn: an iterative grid based object detector." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016.
⭐ ⭐ ⭐ - LocNet: Gidaris, Spyros, and Nikos Komodakis. "Locnet: Improving localization accuracy for object detection." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016.
⭐ ⭐ ⭐ - PVANet: Kim, Kye-Hyeon, et al. "PVANET: Deep but Lightweight Neural Networks for Real-time Object Detection." arXiv preprint arXiv:1608.08021 (2016).
⭐ ⭐ ⭐ ⭐ - FPN: Lin, Tsung-Yi, et al. "Feature Pyramid Networks for Object Detection." arXiv preprint arXiv:1612.03144 (2016).
⭐ ⭐ ⭐ ⭐ ⭐ - TDM: Shrivastava, Abhinav, et al. "Beyond Skip Connections: Top-Down Modulation for Object Detection." arXiv preprint arXiv:1612.06851 (2016).
⭐ ⭐ ⭐ ⭐ - YOLO9000: Redmon, Joseph, and Ali Farhadi. "YOLO9000: Better, Faster, Stronger." arXiv preprint arXiv:1612.08242 (2016).
⭐ ⭐ ⭐ ⭐ - Speed/accuracy trade-offs for modern convolutional object detectors: Huang, Jonathan, et al. "Speed/accuracy trade-offs for modern convolutional object detectors." arXiv preprint arXiv:1611.10012 (2016).
⭐ ⭐ - GDB-Net: Zeng, Xingyu, et al. "Crafting GBD-Net for Object Detection." arXiv preprint arXiv:1610.02579 (2016). Slides
⭐ ⭐ ⭐ ⭐ - WRInception: Lee, Youngwan, et al. "Wide-Residual-Inception Networks for Real-time Object Detection." arXiv preprint arXiv:1702.01243 (2017).
⭐ - DSSD: Fu, Cheng-Yang, et al. "DSSD: Deconvolutional Single Shot Detector." arXiv preprint arXiv:1701.06659 (2017).
⭐ ⭐ ⭐ ⭐ - A-Fast-RCNN (Hard positive generation): Wang, Xiaolong, Abhinav Shrivastava, and Abhinav Gupta. "A-fast-rcnn: Hard positive generation via adversary for object detection." arXiv preprint arXiv:1704.03414 (2017).
⭐ ⭐ ⭐ code - RRC: Ren, Jimmy, et al. "Accurate Single Stage Detector Using Recurrent Rolling Convolution." arXiv preprint arXiv:1704.05776 (2017).
⭐ ⭐ ⭐ - Deformable ConvNets: Dai, Jifeng, et al. "Deformable Convolutional Networks." arXiv preprint arXiv:1703.06211 (2017).
⭐ ⭐ ⭐ ⭐ - RSSD: Jeong, Jisoo, Hyojin Park, and Nojun Kwak. "Enhancement of SSD by concatenating feature maps for object detection." arXiv preprint arXiv:1705.09587 (2017).
⭐ ⭐ - Perceptual GAN: Li, Jianan, et al. "Perceptual Generative Adversarial Networks for Small Object Detection." arXiv preprint arXiv:1706.05274 (2017).
⭐ ⭐ ⭐ - RetinaNet (Focal Loss): Tsung-Yi Lin, Priya Goyal, Ross B. Girshick, Kaiming He, and Piotr Dollár. "Focal Loss for Dense Object Detection." In ICCV. 2017.
⭐ ⭐ ⭐ ⭐ ⭐ - YOLOv3: Redmon, Joseph, and Ali Farhadi. "YOLOv3: An Incremental Improvement." arXiv preprint arXiv:1804.02767 (2018).
⭐ ⭐ ⭐ - Domain Adaptive Faster R-CNN: Chen, Yuhua, et al. "Domain adaptive faster r-cnn for object detection in the wild." In CVPR, 2018.
⭐ ⭐ ⭐ ⭐ - OMNIA Faster R-CNN: Rame, Alexandre, et al. "OMNIA Faster R-CNN: Detection in the wild through dataset merging and soft distillation." arXiv preprint arXiv:1812.02611 (2018). [Omni-Supervised across different datasets for object detection]
⭐ ⭐ ⭐ ⭐ - Libra R-CNN: Pang, J., Chen, K., Shi, J., Feng, H., Ouyang, W., & Lin, D. (2019). Libra R-CNN: Towards Balanced Learning for Object Detection. arXiv preprint arXiv:1904.02701.
⭐ ⭐ ⭐ ⭐ - FCOS: Tian, Zhi, et al. "FCOS: Fully Convolutional One-Stage Object Detection." arXiv preprint arXiv:1904.01355 (2019).
⭐ ⭐ ⭐ ⭐ ⭐ - POTO: Prediction-aware OneTo-One (POTO) label assignment: Wang, J., Song, L., Li, Z., Sun, H., Sun, J., & Zheng, N. (2020). End-to-end object detection with fully convolutional network. arXiv preprint arXiv:2012.03544.
⭐ ⭐ ⭐ ⭐ - FaPN: FaPN: Feature-aligned Pyramid Network for Dense Image Prediction. Shihua Huang etal. 2021. arXiv preprint arXiv:2021.07058.
⭐ ⭐ ⭐ - DETR: Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., & Zagoruyko, S. End-to-end object detection with transformers. In ECCV, 2020.
⭐ ⭐ ⭐ ⭐ ⭐ - Sparse R-CNN: Sun, P., Zhang, R., Jiang, Y., Kong, T., Xu, C., Zhan, W., ... & Luo, P. Sparse r-cnn: End-to-end object detection with learnable proposals. In CVPR (pp. 14454-14463), 2021.
⭐ ⭐ ⭐ ⭐ ⭐
Semantic Segmentation
- FCN: Long, Jonathan, Evan Shelhamer, and Trevor Darrell. "Fully convolutional networks for semantic segmentation." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2015.
⭐ ⭐ ⭐ ⭐ ⭐ - Deconvolution Network for Segmentation: Noh, Hyeonwoo, Seunghoon Hong, and Bohyung Han. "Learning deconvolution network for semantic segmentation." Proceedings of the IEEE International Conference on Computer Vision. 2015.
⭐ ⭐ ⭐ - U-Net: Ronneberger, Olaf, Philipp Fischer, and Thomas Brox. "U-net: Convolutional networks for biomedical image segmentation." International Conference on Medical image computing and computer-assisted intervention. Springer, Cham, 2015.
⭐ ⭐ ⭐ ⭐ ⭐ - CRF as RNN: Zheng, Shuai, et al. "Conditional random fields as recurrent neural networks." In ICCV. 2015.
⭐ ⭐ ⭐ ⭐ - PSPNet: Zhao, Hengshuang, et al. "Pyramid scene parsing network." arXiv preprint arXiv:1612.01105 (2016).
⭐ ⭐ ⭐ - Deeplab v1v2: Chen, Liang-Chieh, et al. "Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs." IEEE transactions on pattern analysis and machine intelligence 40.4 (2018): 834-848.
⭐ ⭐ ⭐ ⭐ ⭐ - Deeplab v3: Chen, Liang-Chieh, et al. "Rethinking atrous convolution for semantic image segmentation." arXiv preprint arXiv:1706.05587 (2017).
⭐ ⭐ ⭐ - Deeplab v3+: Chen, Liang-Chieh, et al. "Encoder-decoder with atrous separable convolution for semantic image segmentation." arXiv preprint arXiv:1802.02611 (2018).
⭐ ⭐ ⭐ - PSANet: Zhao, Hengshuang, et al. "PSANet: Point-wise Spatial Attention Network for Scene Parsing." Proceedings of the European Conference on Computer Vision (ECCV). 2018.
⭐ ⭐ ⭐ ⭐ [good summary of context information] - OCNet: Yuan, Yuhui, and Jingdong Wang. "OCNet: Object Context Network for Scene Parsing." arXiv preprint arXiv:1809.00916 (2018).
⭐ ⭐ ⭐ - ReSeg: Visin, Francesco, et al. "Reseg: A recurrent neural network-based model for semantic segmentation." In CVPR Workshops. 2016.
⭐ ⭐ - CCNet: Huang, Zilong, et al. "CCNet: Criss-Cross Attention for Semantic Segmentation." arXiv preprint arXiv:1811.11721 (2018).
⭐ ⭐ ⭐ - Depth-aware CNN: Wang, Weiyue, and Ulrich Neumann. "Depth-aware CNN for RGB-D Segmentation." In ECCV, 2018.
⭐ ⭐ ⭐ ⭐ ⭐ - DFANet: Li, H., Xiong, P., Fan, H., & Sun, J. (2019). DFANet: Deep Feature Aggregation for Real-Time Semantic Segmentation. arXiv preprint arXiv:1904.02216.
⭐ ⭐ - DADA: Vu, Tuan-Hung, et al. "DADA: Depth-aware Domain Adaptation in Semantic Segmentation." arXiv preprint arXiv:1904.01886 (2019).
⭐ ⭐ ⭐ ⭐ - CFNet: Zhang, Hang, et al. "Co-Occurrent Features in Semantic Segmentation." In CVPR, 2019.
⭐ ⭐ ⭐ - PointRend Kirillov, A., Wu, Y., He, K., & Girshick, R. (2019). PointRend: Image Segmentation as Rendering. arXiv preprint arXiv:1912.08193.
⭐ ⭐ ⭐ ⭐ - Trans2Seg: Xie, E., Wang, W., Wang, W., Sun, P., Xu, H., Liang, D., & Luo, P. (2021). Segmenting transparent object in the wild with transformer. arXiv preprint arXiv:2101.08461.
⭐ ⭐ ⭐ ⭐ - Swin-Unet: Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., & Wang, M. (2021). Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation. arXiv preprint arXiv:2105.05537.
⭐ ⭐ ⭐ ⭐ - SegFormer: Xie, E., Wang, W., Yu, Z., Anandkumar, A., Alvarez, J. M., & Luo, P. (2021). SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers. arXiv preprint arXiv:2105.15203.
⭐ ⭐ ⭐ ⭐
Instance Segmentation
- MNC: Dai, Jifeng, Kaiming He, and Jian Sun. "Instance-aware semantic segmentation via multi-task network cascades." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016.
⭐ ⭐ ⭐ ⭐ ⭐ - InstanceFCN: Dai, Jifeng, et al. "Instance-sensitive fully convolutional networks." arXiv preprint arXiv:1603.08678 (2016).
⭐ ⭐ ⭐ ⭐ - FCIS: Li, Yi, et al. "Fully convolutional instance-aware semantic segmentation." arXiv preprint arXiv:1611.07709 (2016).
⭐ ⭐ ⭐ ⭐ ⭐ - Mask R-CNN: He, Kaiming, Georgia Gkioxari, Piotr Dollár, and Ross Girshick. "Mask R-CNN." In ICCV. 2017.
⭐ ⭐ ⭐ ⭐ ⭐ - Learning to Segment Every Thing (Mask^X R-CNN): Hu, Ronghang, Piotr Dollár, Kaiming He, Trevor Darrell, and Ross Girshick. "Learning to Segment Every Thing." arXiv preprint arXiv:1711.10370 (2017).
⭐ ⭐ ⭐ ⭐ ⭐ - PANet: Liu, Shu, et al. "Path aggregation network for instance segmentation." arXiv preprint arXiv:1803.01534 (2018).
⭐ ⭐ ⭐ ⭐ - Panoptic Segmentation: Kirillov, A., He, K., Girshick, R., Rother, C., & Dollár, P. (2018). Panoptic Segmentation. arXiv preprint arXiv:1801.00868.
⭐ ⭐ ⭐ ⭐ - Panoptic FPN: Kirillov, A., Girshick, R., He, K., & Dollár, P. (2019). Panoptic Feature Pyramid Networks. arXiv preprint arXiv:1901.02446.
⭐ ⭐ ⭐ ⭐ ⭐ - Mask Scoring R-CNN: Huang, Z., Huang, L., Gong, Y., Huang, C., & Wang, X. (2019). Mask Scoring R-CNN. arXiv preprint arXiv:1903.00241.
⭐ ⭐ ⭐ ⭐ - TensorMask: Chen, X., Girshick, R., He, K., & Dollár, P. (2019). TensorMask: A Foundation for Dense Object Segmentation. arXiv preprint arXiv:1903.12174.
⭐ ⭐ ⭐ ⭐ - SSAP: Gao, Naiyu, et al. "SSAP: Single-shot instance segmentation with affinity pyramid." Proceedings of the IEEE International Conference on Computer Vision. 2019.
⭐ ⭐ ⭐ - EmbedMask: Ying, H., Huang, Z., Liu, S., Shao, T., & Zhou, K. (2019). EmbedMask: Embedding Coupling for One-stage Instance Segmentation. arXiv preprint arXiv:1912.01954.
⭐ ⭐ ⭐ ⭐ ⭐ - CondInst Tian, Z., Shen, C., & Chen, H. (2020). Conditional Convolutions for Instance Segmentation. In ECCV 2020.
⭐ ⭐ ⭐ ⭐ ⭐ - MaskFormer: Cheng, B., Schwing, A. G., & Kirillov, A. (2021). Per-Pixel Classification is Not All You Need for Semantic Segmentation. arXiv preprint arXiv:2107.06278.
⭐ ⭐ ⭐ ⭐ ⭐ (Semantic+Instance) - SOLQ: Dong, B., Zeng, F., Wang, T., Zhang, X., & Wei, Y. (2021). SOLQ: Segmenting Objects by Learning Queries. arXiv preprint arXiv:2106.02351.
⭐ ⭐ ⭐ - QueryInst: Yang, S., Fang, Y., Wang, X., Li, Y., Shan, Y., Feng, B., & Liu, W. (2021). Tracking Instances as Queries. arXiv preprint arXiv:2106.11963.
⭐ ⭐ ⭐ ⭐ ⭐ - ISTR: Hu, J., Cao, L., Lu, Y., Zhang, S., Wang, Y., Li, K., ... & Ji, R. (2021). ISTR: End-to-End Instance Segmentation with Transformers. arXiv preprint arXiv:2105.00637.
⭐ ⭐ ⭐
Weakly Supervised
- Weakly Supervised Object Localization with Multi-fold Multiple Instance Learning: Cinbis, Ramazan Gokberk, Jakob Verbeek, and Cordelia Schmid. "Weakly supervised object localization with multi-fold multiple instance learning." IEEE transactions on pattern analysis and machine intelligence 39.1 (2017): 189-203.
⭐ ⭐ ⭐ - Weakly Supervised Deep Detection Networks: Bilen, Hakan, and Andrea Vedaldi. "Weakly supervised deep detection networks." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016.
⭐ ⭐ ⭐ ⭐ - Weakly- and Semi-Supervised Learning: Papandreou, George, et al. "Weakly-and semi-supervised learning of a deep convolutional network for semantic image segmentation." Proceedings of the IEEE International Conference on Computer Vision. 2015.
⭐ ⭐ ⭐ ⭐ - Image-level to pixel-level labeling: Pinheiro, Pedro O., and Ronan Collobert. "From image-level to pixel-level labeling with convolutional networks." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2015.
- Weakly Supervised Localization using Deep Feature Maps: Bency, Archith J., et al. "Weakly supervised localization using deep feature maps." arXiv preprint arXiv:1603.00489 (2016).
- WELDON: Durand, Thibaut, Nicolas Thome, and Matthieu Cord. "Weldon: Weakly supervised learning of deep convolutional neural networks." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016.
- WILDCAT: Durand, Thibaut, et al. "WILDCAT: Weakly Supervised Learning of Deep ConvNets for Image Classification, Pointwise Localization and Segmentation." The IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2017.
- SGDL: Lai, Baisheng, and Xiaojin Gong. "Saliency guided dictionary learning for weakly-supervised image parsing." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016.
Unsupervised/Self-supervised
- Learning Features by Watching Objects Move: Pathak, Deepak, et al. "Learning Features by Watching Objects Move." arXiv preprint arXiv:1612.06370 (2016).
⭐ ⭐ ⭐ ⭐ ⭐ - SimGAN: Shrivastava, Ashish, et al. "Learning from simulated and unsupervised images through adversarial training." arXiv preprint arXiv:1612.07828 (2016).
⭐ ⭐ ⭐ - OPN: Lee, Hsin-Ying, et al. "Unsupervised Representation Learning by Sorting Sequences." arXiv preprint arXiv:1708.01246 (2017).
⭐ ⭐ ⭐ - Transitive Invariance for Self-supervised Visual Representation Learning: Wang, Xiaolong, et al. "Transitive Invariance for Self-supervised Visual Representation Learning" Proceedings of the IEEE International Conference on Computer Vision. 2017.
⭐ ⭐ ⭐ code - Omni-Supervised Learning: Radosavovic, I., Dollár, P., Girshick, R., Gkioxari, G., & He, K. Data Distillation: Towards Omni-Supervised Learning. In CVPR, 2018.
⭐ ⭐ ⭐ ⭐ ⭐ - MAE: He, Kaiming, et al. "Masked autoencoders are scalable vision learners." In CVPR 2022.
⭐ ⭐ ⭐ ⭐ ⭐ - SimMIM Xie, Z., Zhang, Z., Cao, Y., Lin, Y., Bao, J., Yao, Z., ... & Hu, H. (2021). Simmim: A simple framework for masked image modeling. arXiv preprint arXiv:2111.09886
⭐ ⭐ ⭐ ⭐ ⭐ - ConvMAE: Gao, P., Ma, T., Li, H., Dai, J., & Qiao, Y. (2022). ConvMAE: Masked Convolution Meets Masked Autoencoders. arXiv preprint arXiv:2205.03892.
⭐ ⭐ ⭐
Semi-supervised
- Adversarial Self-Supervised Learning: Si, C., Nie, X., Wang, W., Wang, L., Tan, T., & Feng, J. (2020). Adversarial Self-Supervised Learning for Semi-Supervised 3D Action Recognition. In ECCV 2020.
⭐ ⭐ ⭐ - Directional Context-Aware Consistency: Lai, Xin, et al. "Semi-Supervised Semantic Segmentation With Directional Context-Aware Consistency." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2021.
⭐ ⭐ ⭐ ⭐ - Cross Pseudo Supervision: Chen, X., Yuan, Y., Zeng, G., & Wang, J. (2021). Semi-Supervised Semantic Segmentation with Cross Pseudo Supervision. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 2613-2622).
⭐ ⭐ ⭐ ⭐ - CutMix: French, G., Laine, S., Aila, T., & Mackiewicz, M. (2019). Semi-supervised semantic segmentation needs strong, varied perturbations. In BMCV.
⭐ ⭐ ⭐ - CGT: Ke, Z., Qiu, D., Li, K., Yan, Q., & Lau, R. W. (2020). Guided collaborative training for pixel-wise semi-supervised learning. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XIII 16 (pp. 429-445). Springer International Publishing.
⭐ ⭐ ⭐ ⭐ - Robust Mutual Learning: Zhang, P., Zhang, B., Zhang, T., Chen, D., & Wen, F. (2021). Robust Mutual Learning for Semi-supervised Semantic Segmentation. arXiv preprint arXiv:2106.00609.
⭐ ⭐ ⭐ ⭐
Domain Adaptation
- Learning from Synthetic Animals: Mu, J., Qiu, W., Hager, G. D., & Yuille, A. L. (2020). Learning from Synthetic Animals. In CVPR 2020.
⭐ ⭐ ⭐ ⭐ - CD3A: Kurmi, Vinod Kumar, et al. "Curriculum based dropout discriminator for domain adaptation." arXiv preprint arXiv:1907.10628 (2019).
⭐ ⭐ ⭐ - Open compound domain adaptation: Liu, Z., Miao, Z., Pan, X., Zhan, X., Lin, D., Yu, S. X., & Gong, B. (2020). Open compound domain adaptation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 12406-12415).
⭐ ⭐ ⭐ ⭐ ⭐
Domain Generalization
- Extrinsic and Intrinsic: Wang, Shujun, et al. "Learning from Extrinsic and Intrinsic Supervisions for Domain Generalization." In ECCV (2020).
⭐ ⭐ ⭐ ⭐ - DoFE: Wang, Shujun, et al. "DoFE: Domain-oriented Feature Embedding for Generalizable Fundus Image Segmentation on Unseen Datasets." IEEE Transactions on Medical Imaging (2020).
⭐ ⭐ ⭐ ⭐ - Self-Challenging: Huang, Zeyi, et al. "Self-Challenging Improves Cross-Domain Generalization." arXiv preprint arXiv:2007.02454 (2020).
⭐ ⭐ ⭐ ⭐ - Generate Novel Domains: Zhou, Kaiyang, et al. "Learning to Generate Novel Domains for Domain Generalization." arXiv preprint arXiv:2007.03304 (2020).
⭐ ⭐ ⭐ - Jigsaw puzzles: Carlucci, Fabio M., et al. "Domain generalization by solving jigsaw puzzles." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2019.
⭐ ⭐ ⭐ ⭐
Video
- Semi-suepervised, memory network: Oh, S. W., Lee, J. Y., Xu, N., & Kim, S. J. (2019). Video object segmentation using space-time memory networks. In Proceedings of the IEEE/CVF International Conference on Computer Vision (pp. 9226-9235).
⭐ ⭐ ⭐ ⭐
Saliency
- DHSNet: Liu, Nian, and Junwei Han. "Dhsnet: Deep hierarchical saliency network for salient object detection." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016.
⭐ ⭐ ⭐ ⭐ - RFCN: Wang, Linzhao, et al. "Saliency detection with recurrent fully convolutional networks." European Conference on Computer Vision. Springer International Publishing, 2016.
⭐ ⭐ ⭐ ⭐ - RACDNN: Kuen, Jason, Zhenhua Wang, and Gang Wang. "Recurrent attentional networks for saliency detection." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016.
⭐ ⭐ ⭐ ⭐ - NLDF: Luo, Zhiming, et al. "Non-Local Deep Features for Salient Object Detection." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2017.
⭐ ⭐ ⭐ - DSS: Hou, Qibin, et al. "Deeply supervised salient object detection with short connections." arXiv preprint arXiv:1611.04849 (2016).
⭐ ⭐ ⭐ ⭐ - MSRNet: Li, Guanbin, et al. "Instance-Level Salient Object Segmentation." arXiv preprint arXiv:1704.03604 (2017).
⭐ ⭐ ⭐ ⭐ - Amulet: Zhang, Pingping, et al. "Amulet: Aggregating Multi-level Convolutional Features for Salient Object Detection." arXiv preprint arXiv:1708.02001 (2017).
⭐ ⭐ ⭐ ⭐ - UCF: Zhang, Pingping, et al. "Learning Uncertain Convolutional Features for Accurate Saliency Detection." arXiv preprint arXiv:1708.02031 (2017).
⭐ ⭐ ⭐ ⭐ - SRM: Wang, Tiantian, et al. "A Stagewise Refinement Model for Detecting Salient Objects in Images." In ICCV. 2017.
⭐ ⭐ ⭐ ⭐ - S4Net: Fan, Ruochen, et al. "$ S^ 4$ Net: Single Stage Salient-Instance Segmentation." arXiv preprint arXiv:1711.07618 (2017).
⭐ ⭐ ⭐ ⭐ ⭐ - Deep Edge-Aware Saliency Detection: Zhang, Jing, Yuchao Dai, Fatih Porikli, and Mingyi He. "Deep Edge-Aware Saliency Detection." arXiv preprint arXiv:1708.04366 (2017).
⭐ ⭐ ⭐ - Bi-Directional Message Passing Model: Zhang, Lu, et al. "A Bi-Directional Message Passing Model for Salient Object Detection." In CVPR. 2018.
⭐ ⭐ ⭐ - PiCANet: Liu, Nian, Junwei Han, and Ming-Hsuan Yang. "PiCANet: Learning Pixel-wise Contextual Attention for Saliency Detection." In CVPR. 2018.
⭐ ⭐ ⭐ ⭐ ⭐ - Detect Globally, Refine Locally: A Novel Approach to Saliency Detection: Wang, Tiantian, et al. "Detect Globally, Refine Locally: A Novel Approach to Saliency Detection." In CVPR. 2018.
⭐ ⭐ ⭐ - PAGRN: Zhang, Xiaoning, et al. "Progressive Attention Guided Recurrent Network for Salient Object Detection." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018.
⭐ ⭐ ⭐ - Reverse Attention for Salient Object Detection: Chen, Shuhan, et al. "Reverse Attention for Salient Object Detection." In ECCV, 2018.
⭐ ⭐ - CA-Fuse: Chen, Hao, and Youfu Li. "Progressively Complementarity-Aware Fusion Network for RGB-D Salient Object Detection." In CVPR. 2018.
⭐ ⭐ ⭐ - SOC dataset: Fan, Deng-Ping, et al. "Salient objects in clutter: Bringing salient object detection to the foreground." In ECCV. 2018.
⭐ ⭐ ⭐ ⭐ ⭐ [complex dataset + instance level] - DNA: Liu, Yun, et al. "DNA: Deeply-supervised Nonlinear Aggregation for Salient Object Detection." arXiv preprint arXiv:1903.12476 (2019).
⭐ ⭐ ⭐ - SE2Net: Zhou, S., Wang, J., Wang, F., & Huang, D. SE2Net: Siamese Edge-Enhancement Network for Salient Object Detection.
⭐ ⭐ ⭐ ⭐ ⭐ - PFAN: Zhao, T., & Wu, X. (2019). Pyramid Feature Selective Network for Saliency detection. In CVPR 2019.
⭐ ⭐ - PoolNet: Liu, Jiang-Jiang, et al. "A Simple Pooling-Based Design for Real-Time Salient Object Detection." In CVPR 2019.
⭐ ⭐ ⭐ ⭐
Attention
- SRN: Zhu, Feng, et al. "Learning Spatial Regularization with Image-level Supervisions for Multi-label Image Classification." arXiv preprint arXiv:1702.05891 (2017).
⭐ ⭐ ⭐ ⭐ - Zoom-in-Net: Wang, Zhe, et al. "Zoom-in-Net: Deep Mining Lesions for Diabetic Retinopathy Detection." arXiv preprint arXiv:1706.04372 (2017).
⭐ ⭐ ⭐ ⭐ - Multi-context attention: Chu, Xiao, et al. "Multi-context attention for human pose estimation." arXiv preprint arXiv:1702.07432 (2017).
⭐ ⭐ ⭐
Depth Information and Stereo Vision
- HFM-Net: Zeng, J., Tong, Y., Huang, Y., Yan, Q., Sun, W., Chen, J., & Wang, Y. (2019). Deep Surface Normal Estimation with Hierarchical RGB-D Fusion. arXiv preprint arXiv:1904.03405.
⭐ ⭐ ⭐ - MADNet: Tonioni, Alessio, et al. "Real-time self-adaptive deep stereo." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2019.
⭐ ⭐ ⭐ ⭐ (offline domain adaption) - Geometry-Aware Distillation: Jiao, Jianbo, et al. "Geometry-Aware Distillation for Indoor Semantic Segmentation." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2019.
⭐ ⭐ ⭐ - DiverseDepth: Yin, W., Wang, X., Shen, C., Liu, Y., Tian, Z., Xu, S., ... & Renyin, D. (2020). DiverseDepth: Affine-invariant depth prediction using diverse data. arXiv preprint arXiv:2002.00569.
⭐ ⭐ ⭐ ⭐
Shadow Detection/Removal
- DeshadowNet: Qu, Liangqiong, et al. "DeshadowNet: A Multi-context Embedding Deep Network for Shadow Removal." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2017.
⭐ ⭐ ⭐ - scGAN: Nguyen, Vu, et al. "Shadow Detection with Conditional Generative Adversarial Networks." In ICCV. 2017.
⭐ ⭐ - Patched CNN: Hosseinzadeh, Sepideh, Moein Shakeri, and Hong Zhang. "Fast Shadow Detection from a Single Image Using a Patched Convolutional Neural Network." arXiv preprint arXiv:1709.09283 (2017).
⭐ - ST-CGAN: Wang, Jifeng, et al. "Stacked Conditional Generative Adversarial Networks for Jointly Learning Shadow Detection and Shadow Removal." arXiv preprint arXiv:1712.02478 (2017).
⭐ ⭐ (ISTD dataset) - A+D Net: Le, Hieu, et al. "A+ D net: Training a shadow detector with adversarial shadow attenuation." Proceedings of the European Conference on Computer Vision (ECCV). 2018.
⭐ ⭐ ⭐ - Lazy annotation for immature SBU: Vicente, Yago, et al. "Noisy label recovery for shadow detection in unfamiliar domains." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016.
⭐ ⭐ ⭐ - StackedCNN + SBU: Vicente, Tomás F. Yago, et al. "Large-scale training of shadow detectors with noisily-annotated shadow examples." European Conference on Computer Vision. Springer, Cham, 2016.
⭐ ⭐ ⭐ ⭐ (SBU dataset) - CPAdv-Net: Mohajerani, Sorour, and Parvaneh Saeedi. "Shadow Detection in Single RGB Images Using a Context Preserver Convolutional Neural Network Trained by Multiple Adversarial Examples." IEEE Transactions on Image Processing (2019).
⭐ ⭐ - Color Constancy: Sidorov, Oleksii. "Conditional GANs for Multi-Illuminant Color Constancy: Revolution or Yet Another Approach?." CVPR workshop, 2019.
⭐ ⭐ - DSDNet: Zheng, Quanlong, et al. "Distraction-aware Shadow Detection." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2019.
⭐ ⭐ ⭐ ⭐ - ARGAN: Ding, Bin, et al. "ARGAN: Attentive Recurrent Generative Adversarial Network for Shadow Detection and Removal." In ICCV, (2019).
⭐ ⭐ ⭐ - SP+M-Net: Le, H., & Samaras, D. (2019). Shadow removal via shadow image decomposition. In Proceedings of the IEEE International Conference on Computer Vision (pp. 8578-8587).
⭐ ⭐ ⭐ ⭐ - Portrait Shadow Manipulation: Zhang, Xuaner Cecilia, et al. "Portrait Shadow Manipulation." In SIGGRAPH (2020).
⭐ ⭐ ⭐ ⭐ ⭐ - Weakly-supervised shadow decomposition: Le, Hieu, and Dimitris Samaras. "From Shadow Segmentation to Shadow Removal." arXiv preprint arXiv:2008.00267 (2020).
⭐ ⭐ ⭐ ⭐ ⭐ (Video Shadow Removal Dataset) - AEF: Fu, Lan, et al. "Auto-Exposure Fusion for Single-Image Shadow Removal." CVPR 2021.
⭐ ⭐ ⭐ - G2R-ShadowNet: Liu, Zhihao, et al. "From Shadow Generation to Shadow Removal." arXiv preprint arXiv:2103.12997 (2021).
⭐ ⭐ ⭐ ⭐ ⭐ - Video Shadow: Chen, Z., Wan, L., Zhu, L., Shen, J., Fu, H., Liu, W., & Qin, J. (2021). Triple-cooperative Video Shadow Detection. In CVPR 2021.
⭐ ⭐ ⭐ ⭐ - Removing Objects and their Shadows: Zhang, E., Martin-Brualla, R., Kontkanen, J., & Curless, B. L. (2021). No Shadow Left Behind: Removing Objects and their Shadows using Approximate Lighting and Geometry. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 16397-16406).
⭐ ⭐ ⭐ ⭐ ⭐ - Shadow Generation + DE-SOBA dataset: Hong, Y., Niu, L., Zhang, J., & Zhang, L. (2021). Shadow Generation for Composite Image in Real-world Scenes. arXiv preprint arXiv:2104.10338.
⭐ ⭐ ⭐ ⭐ - G2R-ShadowNet: Liu, Zhihao, et al. "From Shadow Generation to Shadow Removal." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2021.
⭐ ⭐ ⭐ ⭐ - Temporal Feature Warping: Hu, S., Le, H., & Samaras, D. (2021). Temporal Feature Warping for Video Shadow Detection. arXiv preprint arXiv:2107.14287.
⭐ ⭐ ⭐ ⭐ - CANet: Chen, Z., Long, C., Zhang, L., & Xiao, C. (2021). CANet: A Context-Aware Network for Shadow Removal. In Proceedings of the IEEE/CVF International Conference on Computer Vision (pp. 4743-4752).
⭐ ⭐ ⭐ ⭐ ⭐ - FDRNet: Zhu, L., Xu, K., Ke, Z., & Lau, R. W. (2021). Mitigating Intensity Bias in Shadow Detection via Feature Decomposition and Reweighting. In Proceedings of the IEEE/CVF International Conference on Computer Vision (pp. 4702-4711).
⭐ ⭐ ⭐ ⭐ ⭐ - SADC: Xu, Yimin, et al. "Shadow-Aware Dynamic Convolution for Shadow Removal." arXiv preprint arXiv:2205.04908 (2022).
⭐ ⭐
Image Restoration
- DRRN: Tai, Ying, Jian Yang, and Xiaoming Liu. "Image super-resolution via deep recursive residual network." The IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2017.
⭐ ⭐ ⭐ ⭐ - DID-MDN: Zhang, He, and Vishal M. Patel. "Density-aware Single Image De-raining using a Multi-stream Dense Network." arXiv preprint arXiv:1802.07412 (2018).
⭐ ⭐ - IDN: Hui, Zheng, Xiumei Wang, and Xinbo Gao. "Fast and Accurate Single Image Super-Resolution via Information Distillation Network." In CVPR. 2018.
⭐ ⭐ ⭐ - SFT-GAN: Wang, X., Yu, K., Dong, C., & Loy, C. C. (2018). Recovering realistic texture in image super-resolution by deep spatial feature transform. In CVPR. 2018.
⭐ ⭐ ⭐ - Deep Multi-scale Convolutional Neural Network for Dynamic Scene Deblurring:Nah, Seungjun, Tae Hyun Kim, and Kyoung Mu Lee. "Deep multi-scale convolutional neural network for dynamic scene deblurring." In CVPR, 2017.
⭐ ⭐ ⭐ - Enhanced Deep Residual Networks for Single Image Super-Resolution: Lim, Bee, et al. "Enhanced deep residual networks for single image super-resolution." The CVPR workshops, 2017.
⭐ - AGAN for Raindrop Removal: Qian, Rui, et al. "Attentive Generative Adversarial Network for Raindrop Removal from A Single Image." In CVPR. 2018.
⭐ ⭐ ⭐ ⭐ ⭐ - DCPDN: Zhang, He, and Vishal M. Patel. "Densely connected pyramid dehazing network." In CVPR, 2018.
⭐ ⭐ ⭐ - GFN: Ren, W., Ma, L., Zhang, J., Pan, J., Cao, X., Liu, W., & Yang, M. H. (2018). Gated fusion network for single image dehazing. In CVPR, 2018.
⭐ ⭐ ⭐ ⭐ - SIDCGAN: Li, Runde, et al. "Single Image Dehazing via Conditional Generative Adversarial Network." In CVPR, 2018.
⭐ ⭐ - Dehaze Benchmark: Li, Boyi, et al. "Benchmarking Single Image Dehazing and Beyond." IEEE Transactions on Image Processing (2018).
⭐ ⭐ ⭐ ⭐ ⭐ - Cityscapes + Haze: Sakaridis, Christos, Dengxin Dai, and Luc Van Gool. "Semantic foggy scene understanding with synthetic data." International Journal of Computer Vision (2018): 1-20.
⭐ ⭐ ⭐ ⭐ ⭐ - RESCAN: Li, Xia, et al. "Recurrent Squeeze-and-Excitation Context Aggregation Net for Single Image Deraining." European Conference on Computer Vision. Springer, Cham, 2018.
⭐ ⭐ ⭐ - UD-GAN: Jin, Xin, et al. "Unsupervised Single Image Deraining with Self-supervised Constraints." arXiv preprint arXiv:1811.08575 (2018).
⭐ ⭐ ⭐ ⭐ ⭐ - Deep Tree-Structured Fusion Model: Fu, Xueyang, et al. "A Deep Tree-Structured Fusion Model for Single Image Deraining." arXiv preprint arXiv:1811.08632 (2018).
⭐ ⭐ - Dual CNN: Pan, J., Liu, S., Sun, D., Zhang, J., Liu, Y., Ren, J., ... & Yang, M. H. Learning Dual Convolutional Neural Networks for Low-Level Vision. In CVPR, 2018 (pp. 3070-3079).
⭐ ⭐ ⭐ - RAM: Kim, Jun-Hyuk, et al. "RAM: Residual Attention Module for Single Image Super-Resolution." arXiv preprint arXiv:1811.12043 (2018).
⭐ ⭐ ⭐ - DNSR (Bi-cycle GAN): Zhao, Tianyu, et al. "Unsupervised Degradation Learning for Single Image Super-Resolution." arXiv preprint arXiv:1812.04240 (2018).
⭐ ⭐ ⭐ ⭐ ⭐ - Cycle-Defog2Refog:Liu, Wei, et al. "End-to-End Single Image Fog Removal using Enhanced Cycle Consistent Adversarial Networks." arXiv preprint arXiv:1902.01374 (2019).
⭐ ⭐ - SPANet: Tianyu Wang, Xin Yang, Ke Xu, Shaozhe Chen, Qiang Zhang, Rynson W.H. Lau. "Spatial Attentive Single-Image Deraining with a High Quality Real Rain Dataset." In CVPR 2019.
⭐ ⭐ ⭐ ⭐ - remove rain streaks and rain accumulation: Ruoteng Li, Loong-Fah Cheong, and Robby T. Tan. "Heavy Rain Image Restoration: Integrating Physics Model and Conditional Adversarial Learning." In CVPR 2019.
⭐ ⭐ ⭐ ⭐ ⭐ - Rain O’er Me: Huangxing Lin, Yanlong Li, Xinghao Ding, Weihong Zeng, Yue Huang, John Paisley: "Rain O’er Me: Synthesizing real rain to derain with data distillation." arXiv preprint arXiv:1904.04605 (2019).
⭐ ⭐ ⭐ ⭐ - RNAN: Zhang, Y., Li, K., Li, K., Zhong, B., & Fu, Y. (2019). Residual Non-local Attention Networks for Image Restoration. arXiv preprint arXiv:1903.10082.
⭐ ⭐ ⭐ ⭐ ⭐ - Perceptual GAN loss + TV loss: Ledig, C., Theis, L., Huszár, F., Caballero, J., Cunningham, A., Acosta, A., ... & Shi, W. (2017). Photo-realistic single image super-resolution using a generative adversarial network. In CVPR (pp. 4681-4690).(code)
⭐ ⭐ ⭐ ⭐ ⭐ - PReNet: Ren, Dongwei, et al. "Progressive Image Deraining Networks: A Better and Simpler Baseline." In CVPR, 2019.
⭐ ⭐ ⭐ - Zoom to Learn, Learn to Zoom: Zhang, Xuaner, et al. "Zoom to Learn, Learn to Zoom." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2019.
⭐ ⭐ ⭐ ⭐ - Derain Beachmark: Li, Siyuan, et al. "Single image deraining: A comprehensive benchmark analysis." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2019.
⭐ ⭐ ⭐ - Dual residual block: Liu, Xing, et al. "Dual Residual Networks Leveraging the Potential of Paired Operations for Image Restoration." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2019.
⭐ ⭐ ⭐ - Semi-supervised Transfer Learning for Image Rain Removal: Wei, Wei, et al. "Semi-Supervised Transfer Learning for Image Rain Removal." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2019.
⭐ ⭐ ⭐ ⭐ - UMRL: Yasarla, Rajeev, and Vishal M. Patel. "Uncertainty Guided Multi-Scale Residual Learning-using a Cycle Spinning CNN for Single Image De-Raining." CVPR 2019.
⭐ ⭐ ⭐ ⭐ - NASNet: Qin, Xu, and Zhilin Wang. "NASNet: A Neuron Attention Stage-by-Stage Net for Single Image Deraining." arXiv preprint arXiv:1912.03151 (2019).
⭐ ⭐ ⭐ ⭐ - DerainCycleGAN: Wei, Yanyan, et al. "DerainCycleGAN: An Attention-guided Unsupervised Benchmark for Single Image Deraining and Rainmaking." arXiv preprint arXiv:1912.07015 (2019).
⭐ ⭐ ⭐ ⭐ - Physics-Based Rain Rendering: HALDER, Shirsendu Sukanta; LALONDE, Jean-François; CHARETTE, Raoul de. Physics-Based Rendering for Improving Robustness to Rain. In: ICCV, 2019. pp. 10203-10212.
⭐ ⭐ ⭐ ⭐ ⭐ - Partial Convolution (mask-guided): Liu, Guilin, et al. "Image inpainting for irregular holes using partial convolutions." Proceedings of the European Conference on Computer Vision (ECCV). 2018.
⭐ ⭐ ⭐ ⭐ ⭐ - Derain Survey: Wang, H., Li, M., Wu, Y., Zhao, Q., & Meng, D. (2019). A Survey on Rain Removal from Video and Single Image. arXiv preprint arXiv:1909.08326.
⭐ ⭐ ⭐ ⭐ - Deep Adversarial Decomposition: Zou, Zhengxia, et al. "Deep Adversarial Decomposition: A Unified Framework for Separating Superimposed Images." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020.
⭐ ⭐ ⭐ ⭐ - CARN: Ahn, Namhyuk, Byungkon Kang, and Kyung-Ah Sohn. "Fast, accurate, and lightweight super-resolution with cascading residual network." Proceedings of the European Conference on Computer Vision (ECCV). 2018.
⭐ ⭐ ⭐ - Semi-supervised derain with Gaussian processes: Yasarla, Rajeev, Vishwanath A. Sindagi, and Vishal M. Patel. "Syn2Real Transfer Learning for Image Deraining Using Gaussian Processes." In CVPR. 2020.
⭐ ⭐ ⭐ ⭐ - EPDN: Qu, Yanyun, et al. "Enhanced pix2pix dehazing network." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2019.
⭐ ⭐ ⭐ ⭐ - PEPSI: Shin, Yong-Goo, et al. "PEPSI++: Fast and lightweight network for image inpainting." IEEE Transactions on Neural Networks and Learning Systems (2020).
⭐ ⭐ ⭐ - holistic attention network: Niu, Ben, et al. "Single image super-resolution via a holistic attention network." European Conference on Computer Vision. Springer, Cham, 2020.
⭐ ⭐ ⭐ - SNet, VNet, and ANet: Wang, Yinglong, et al. "Rethinking image deraining via rain streaks and vapors." European Conference on Computer Vision. Springer, Cham, 2020.
⭐ ⭐ ⭐ - JRGR (Disentangled): Ye, Y., Chang, Y., Zhou, H., & Yan, L. (2021). Closing the Loop: Joint Rain Generation and Removal via Disentangled Image Translation. In CVPR 2021.
⭐ ⭐ ⭐ ⭐ ⭐ - ACER-Net: Wu, H., Qu, Y., Lin, S., Zhou, J., Qiao, R., Zhang, Z., ... & Ma, L. (2021). Contrastive Learning for Compact Single Image Dehazing. In CVPR 2021.
⭐ ⭐ ⭐ - MPRNet: Zamir, S. W., Arora, A., Khan, S., Hayat, M., Khan, F. S., Yang, M. H., & Shao, L. (2021). Multi-stage progressive image restoration. arXiv preprint arXiv:2102.02808.
- AdderSR: Song, D., Wang, Y., Chen, H., Xu, C., Xu, C., & Tao, D. (2020). AdderSR: Towards energy efficient image super-resolution. In CVPR 2021.
⭐ ⭐ ⭐ ⭐ - RICNet: Ni, Siqi, et al. "Controlling the Rain: From Removal to Rendering." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2021.
⭐ ⭐ ⭐ - Video rain streaks+fog: Yan, W., Tan, R. T., Yang, W., & Dai, D. (2021). Self-Aligned Video Deraining With Transmission-Depth Consistency. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 11966-11976).
⭐ ⭐ ⭐ ⭐ - Uformer: Wang, Z., Cun, X., Bao, J., & Liu, J. (2021). Uformer: A General U-Shaped Transformer for Image Restoration. arXiv preprint arXiv:2106.03106.
⭐ ⭐ ⭐ - Real Video Dehaze Data: Zhang, Xinyi, et al. "Learning To Restore Hazy Video: A New Real-World Dataset and a New Method." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2021.
⭐ ⭐ ⭐ ⭐ - Hybrid Local-Global Transformer: Zhao, D., Li, J., Li, H., & Xu, L. (2021). Hybrid Local-Global Transformer for Image Dehazing. arXiv preprint arXiv:2109.07100.
⭐ ⭐ ⭐ ⭐ - Restormer: Zamir, S. W., Arora, A., Khan, S., Hayat, M., Khan, F. S., & Yang, M. H. (2021). Restormer: Efficient Transformer for High-Resolution Image Restoration. arXiv preprint arXiv:2111.09881.
⭐ ⭐ ⭐ ⭐ - MAXIM: Tu, Z., Talebi, H., Zhang, H., Yang, F., Milanfar, P., Bovik, A., & Li, Y. (2022). MAXIM: Multi-Axis MLP for Image Processing. arXiv preprint arXiv:2201.02973.
⭐ ⭐ ⭐ - NAFNet: Chen, L., Chu, X., Zhang, X., & Sun, J. (2022). Simple Baselines for Image Restoration. arXiv preprint arXiv:2204.04676.
⭐ ⭐ ⭐ ⭐ ⭐ - KCKE: Chen, Wei-Ting, et al. "Learning Multiple Adverse Weather Removal via Two-Stage Knowledge Learning and Multi-Contrastive Regularization: Toward a Unified Model." CVPR. 2022.
⭐ ⭐ ⭐ ⭐
Nighttime & Low-light
- dehaze + nighttime: Yan, Wending, Robby T. Tan, and Dengxin Dai. "Nighttime defogging using high-low frequency decomposition and grayscale-color networks." In ECCV, 2020.
⭐ ⭐ ⭐ ⭐ ⭐ - Nighttime Visibility Enhancement: Sharma, Aashish, and Robby T. Tan. "Nighttime Visibility Enhancement by Increasing the Dynamic Range and Suppression of Light Effects." In CVPR. 2021.
⭐ ⭐ ⭐ ⭐
Image Synthesis
- Let there be Color!: Iizuka, Satoshi, Edgar Simo-Serra, and Hiroshi Ishikawa. "Let there be color!: joint end-to-end learning of global and local image priors for automatic image colorization with simultaneous classification." ACM Transactions on Graphics (TOG) 35.4 (2016): 110.
⭐ ⭐ ⭐ ⭐ ⭐ - Colorful Image Colorization: Zhang, Richard, Phillip Isola, and Alexei A. Efros. "Colorful image colorization." European Conference on Computer Vision. Springer, Cham, 2016.
⭐ ⭐ ⭐ ⭐ - Neural Style: Gatys, Leon A., Alexander S. Ecker, and Matthias Bethge. "A neural algorithm of artistic style." arXiv preprint arXiv:1508.06576 (2015).
⭐ ⭐ ⭐ ⭐ ⭐ - Texture Synthesis: Gatys, Leon, Alexander S. Ecker, and Matthias Bethge. "Texture synthesis using convolutional neural networks." Advances in Neural Information Processing Systems. 2015.
⭐ ⭐ ⭐ ⭐ - Semantic Annotation Artwork: Champandard, Alex J. "Semantic style transfer and turning two-bit doodles into fine artworks." arXiv preprint arXiv:1603.01768 (2016).
⭐ ⭐ ⭐ - MRC+CNN Image Synthesis: Li, Chuan, and Michael Wand. "Combining markov random fields and convolutional neural networks for image synthesis." In CVPR. 2016.
⭐ ⭐ ⭐ ⭐ - More Experiments on Neural Style: Novak, Roman, and Yaroslav Nikulin. "Improving the neural algorithm of artistic style." arXiv preprint arXiv:1605.04603 (2016).
⭐ ⭐ - Deep Photo Style Transfer: Luan, Fujun, et al. "Deep photo style transfer." In CVPR. 2017.
⭐ ⭐ ⭐ ⭐ ⭐ - Pretraining is All You Need + Diffusion: Wang, Tengfei, et al. "Pretraining is All You Need for Image-to-Image Translation." arXiv preprint arXiv:2205.12952 (2022).
⭐ ⭐ ⭐
Computational Photography
- Multi-Illumination Dataset: Murmann, Lukas, et al. "A Dataset of Multi-Illumination Images in the Wild." Proceedings of the IEEE International Conference on Computer Vision. 2019.
⭐ ⭐ ⭐ ⭐ ⭐ - WESPE: Ignatov, Andrey, et al. "WESPE: weakly supervised photo enhancer for digital cameras." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. 2018.
⭐ ⭐ ⭐ - Zurich RAW to RGB dataset + PyNet: Ignatov, Andrey, Luc Van Gool, and Radu Timofte. "Replacing Mobile Camera ISP with a Single Deep Learning Model." arXiv preprint arXiv:2002.05509 (2020).
⭐ ⭐ ⭐ ⭐
GAN
- GAN: Goodfellow, Ian, et al. "Generative adversarial nets." In NIPS. 2014.
⭐ ⭐ ⭐ ⭐ ⭐ - cGAN: Mirza, Mehdi, and Simon Osindero. "Conditional generative adversarial nets." arXiv preprint arXiv:1411.1784 (2014).
⭐ ⭐ ⭐ ⭐ ⭐ - Image-to-Image Translation with Conditional Adversarial Networks: Isola, Phillip, et al. "Image-to-image translation with conditional adversarial networks." arXiv preprint (2017).
⭐ ⭐ ⭐ ⭐ ⭐ - cycleGAN:Zhu, Jun-Yan, et al. "Unpaired image-to-image translation using cycle-consistent adversarial networks." arXiv preprint (2017).
⭐ ⭐ ⭐ ⭐ ⭐ - StartGAN: Choi, Yunjey, et al. "Stargan: Unified generative adversarial networks for multi-domain image-to-image translation." In CVPR 2018.
⭐ ⭐ ⭐ ⭐ - E-GAN: Wang, C., Xu, C., Yao, X., & Tao, D. (2018). Evolutionary Generative Adversarial Networks. arXiv preprint arXiv:1803.00657.
⭐ ⭐ ⭐ ⭐ - DCGAN: Radford, Alec, Luke Metz, and Soumith Chintala. "Unsupervised representation learning with deep convolutional generative adversarial networks." arXiv preprint arXiv:1511.06434 (2015).
⭐ ⭐ ⭐ ⭐ - GANtruth: Bujwid, Sebastian, et al. "GANtruth-an unpaired image-to-image translation method for driving scenarios." arXiv preprint arXiv:1812.01710 (2018).
⭐ ⭐ ⭐ - AttentionGAN: Tang, H., Liu, H., Xu, D., Torr, P. H.S., & Sebe, N. (2019). AttentionGAN: Unpaired Image-to-Image Translation using Attention-Guided Generative Adversarial Networks. arXiv preprint arXiv:1911.11897.
⭐ ⭐ ⭐ ⭐ - Multiclass Sketch-to-Image Translation: Ghosh, A., Zhang, R., Dokania, P. K., Wang, O., Efros, A. A., Torr, P. H.S., & Shechtman, E. (2019). Interactive Sketch & Fill: Multiclass Sketch-to-Image Translation. In Proceedings of the IEEE International Conference on Computer Vision (pp. 1171-1180).
⭐ ⭐ ⭐ - RealnessGAN: Yuanbo Xiangli, etal. Real or not real, that is a question. In ICLR 2020.
⭐ ⭐ ⭐ ⭐ - Domain-bridged GAN: Pizzati, Fabio, et al. "Domain bridge for unpaired image-to-image translation and unsupervised domain adaptation." The IEEE Winter Conference on Applications of Computer Vision. 2020.
⭐ ⭐ ⭐ ⭐ - SinGAN: Shaham, T. R., Dekel, T., & Michaeli, T. (2019). SinGAN: Learning a generative model from a single natural image. In Proceedings of the IEEE International Conference on Computer Vision (pp. 4570-4580).
⭐ ⭐ ⭐ ⭐ ⭐ - CUT: Park, T., Efros, A. A., Zhang, R., & Zhu, J. Y. (2020, August). Contrastive learning for unpaired image-to-image translation. In European Conference on Computer Vision (pp. 319-345). Springer, Cham.
⭐ ⭐ ⭐ ⭐ ⭐
Disentangled
- Deblur+Disentangled: Lu, Boyu, Jun-Cheng Chen, and Rama Chellappa. "Unsupervised domain-specific deblurring via disentangled representations." In CVPR. 2019.
⭐ ⭐ ⭐ ⭐ ⭐ - One-Shot Unsupervised Image Translation: Cohen, Tomer, and Lior Wolf. "Bidirectional One-Shot Unsupervised Domain Mapping." Proceedings of the IEEE International Conference on Computer Vision. 2019.
⭐ ⭐ ⭐ ⭐
AR/VR
- Indoor Lighting Estimation: Garon, Mathieu, et al. "Fast Spatially-Varying Indoor Lighting Estimation." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2019.
⭐ ⭐ ⭐
Person Re-ID
- IANet: Hou, Ruibing, et al. "Interaction-And-Aggregation Network for Person Re-Identification." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2019.
⭐ ⭐ ⭐ ⭐ ⭐ - AlignedReID: Zhang, Xuan, et al. "AlignedReID: Surpassing human-level performance in person re-identification." arXiv preprint arXiv:1711.08184 (2017).
⭐ ⭐ ⭐ ⭐ ⭐
Distillation
- Knowledge Distillation: Hinton, G., Vinyals, O., & Dean, J. (2015). Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531.
⭐ ⭐ ⭐ ⭐ ⭐ - Deep Mutual Learning: Zhang, Ying, et al. "Deep mutual learning." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018.
⭐ ⭐ ⭐ ⭐ ⭐ - Cooperative learning: Batra, Tanmay, and Devi Parikh. "Cooperative learning with visual attributes." arXiv preprint arXiv:1705.05512 (2017).
⭐ ⭐ ⭐ - Deeply-supervised Knowledge Synergy: Sun, D., Yao, A., Zhou, A., & Zhao, H. (2019). Deeply-supervised Knowledge Synergy. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 6997-7006).
⭐ ⭐ ⭐ ⭐ ⭐ - ONE: Lan, Xu, Xiatian Zhu, and Shaogang Gong. "Knowledge distillation by On-the-fly Native Ensemble." Proceedings of the 32nd International Conference on Neural Information Processing Systems. Curran Associates Inc., 2018.
⭐ ⭐ ⭐ ⭐ ⭐ - Segmentation Distillation: Liu, Yifan, et al. "Structured Knowledge Distillation for Semantic Segmentation." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2019.
⭐ ⭐ ⭐ ⭐
Uncertainty
- aleatoric uncertainty and epistemic uncertainty: Kendall, Alex, and Yarin Gal. "What uncertainties do we need in bayesian deep learning for computer vision?." Advances in neural information processing systems. 2017.
⭐ ⭐ ⭐ ⭐ ⭐ - Learning Model Confidence: Charles Corbière, Nicolas Thome, Avner Bar-Hen, Matthieu Cord, Patrick Pérez. "Addressing Failure Prediction by Learning Model Confidence" NeurIPS, 2019.
⭐ ⭐ ⭐ ⭐
Transformer
- Transformer: Vaswani, Ashish, et al. "Attention is all you need." arXiv preprint arXiv:1706.03762 (2017).
⭐ ⭐ ⭐ ⭐ ⭐ - Pre-trained image processing transformer: Chen, Hanting, et al. "Pre-trained image processing transformer." arXiv preprint arXiv:2012.00364 (2020).
⭐ ⭐ ⭐ ⭐ - texture transformer for Super-resolution: Yang, Fuzhi, et al. "Learning texture transformer network for image super-resolution." In CVPR, 2020.
⭐ ⭐ ⭐ ⭐ - TransUnet: Chen, Jieneng, et al. "TransUnet: Transformers make strong encoders for medical image segmentation." arXiv preprint arXiv:2102.04306 (2021).
⭐ ⭐ ⭐ ⭐ - Swin transformer: Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., ... & Guo, B. (2021). Swin transformer: Hierarchical vision transformer using shifted windows. arXiv preprint arXiv:2103.14030.
⭐ ⭐ ⭐ ⭐ ⭐ - VOLO: Yuan, Li, et al. "VOLO: Vision Outlooker for Visual Recognition." arXiv preprint arXiv:2106.13112 (2021).
⭐ ⭐ ⭐ ⭐ ⭐ - Video Swin Transformer: Liu, Ze, et al. "Video Swin Transformer." arXiv preprint arXiv:2106.13230 (2021).
⭐ ⭐ ⭐ - Focal Transformer:Yang, J., Li, C., Zhang, P., Dai, X., Xiao, B., Yuan, L., & Gao, J. (2021). Focal Self-attention for Local-Global Interactions in Vision Transformers. arXiv preprint arXiv:2107.00641.
⭐ ⭐ ⭐ ⭐ ⭐ - Pyramid vision transformer: Wang, W., Xie, E., Li, X., Fan, D. P., Song, K., Liang, D., ... & Shao, L. (2021). Pyramid vision transformer: A versatile backbone for dense prediction without convolutions. arXiv preprint arXiv:2102.12122.
⭐ ⭐ ⭐ ⭐ - Pyramid vision transformer V2: Wang, W., Xie, E., Li, X., Fan, D. P., Song, K., Liang, D., ... & Shao, L. (2021). PVTv2: Improved Baselines with Pyramid Vision Transformer. arXiv preprint arXiv:2106.13797.
⭐ ⭐ ⭐ ⭐ - Swin Transformer V2: Liu, Z., Hu, H., Lin, Y., Yao, Z., Xie, Z., Wei, Y., ... & Guo, B. (2021). Swin Transformer V2: Scaling Up Capacity and Resolution. arXiv preprint arXiv:2111.09883.
⭐ ⭐ ⭐ ⭐ - DeiT: Touvron, Hugo, et al. "Training data-efficient image transformers & distillation through attention." International Conference on Machine Learning. 2021.
⭐ ⭐ ⭐ ⭐
General Perception
- Perceiver: Jaegle, Andrew, et al. "Perceiver: General perception with iterative attention." International Conference on Machine Learning. PMLR, 2021.
⭐ ⭐ ⭐ ⭐ ⭐ - Perceiver IO: Jaegle, Andrew, et al. "Perceiver IO: A general architecture for structured inputs & outputs." arXiv preprint arXiv:2107.14795 (2021).
⭐ ⭐ ⭐ ⭐ - Florence: Yuan, Lu, et al. "Florence: A New Foundation Model for Computer Vision." arXiv preprint arXiv:2111.11432 (2021).
⭐ ⭐ ⭐ ⭐ ⭐ - Unified-IO: Unified-IO: A Unified Model for Vision Language and Multi-modal tasks. arXiv:2206.08916 (2022).
⭐ ⭐ ⭐ ⭐ - CoCa: Yu, Jiahui, et al. "CoCa: Contrastive captioners are image-text foundation models." arXiv preprint arXiv:2205.01917 (2022).
⭐ ⭐ ⭐ ⭐ ⭐
Traditional Method
- Rolling Guidance Filter: Zhang, Q., Shen, X., Xu, L., & Jia, J. Rolling guidance filter. In ECCV, 2014.
⭐ ⭐ ⭐ ⭐ ⭐
Talks
- G-RMI: Google. (Object Detection) slides
- 2017 CVPR Tutorial: video and slides
- 16-18 Computer Vision Conferences: https://www.youtube.com/channel/UC0n76gicaarsN_Y9YShWwhw/playlists