Paper Collection - A List of Computer Vision Papers and Notes
- Image Classification
- Popular Module
- Object Detection in Image
- Image Caption
- Image Generations
- Image and Language
- Activation Maximization
- Style Transfer
- Super Resolution
- Image Segmentation
- Open Courses
- Online Books
Image Classification:
Network in Network [Paper] [Note] [Torch Code]
- Lin, Min, Qiang Chen, and Shuicheng Yan. "Network in network." arXiv preprint arXiv:1312.4400 (2013).
VGG [Paper] [Note] [Torch Code]
- Simonyan, Karen, and Andrew Zisserman. "Very deep convolutional networks for large-scale image recognition." arXiv preprint arXiv:1409.1556 (2014).
GoogleNet [Paper] [Note] [Torch Code]
- Szegedy, Christian, et al. "Going deeper with convolutions." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2015.
ResNet [Paper] [Note] [Torch Code]
- He, Kaiming, et al. "Deep residual learning for image recognition." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016.
Popular Module
- Srivastava, Nitish, et al. "Dropout: a simple way to prevent neural networks from overfitting." Journal of Machine Learning Research 15.1 (2014): 1929-1958.
Batch Normalization [Paper] [Note]
- Ioffe S, Szegedy C. Batch normalization: Accelerating deep network training by reducing internal covariate shift[J]. arXiv preprint arXiv:1502.03167, 2015.
Object Detection in Image
- Ross Girshick, Jeff Donahue, Trevor Darrell, Jitendra Malik, Rich feature hierarchies for accurate object detection and semantic segmentation
Spatial pyramid pooling in deep convolutional networks for visual recognition [[Paper]] (http://arxiv.org/abs/1406.4729) [Note] [Code]
- He K, Zhang X, Ren S, et al. Spatial pyramid pooling in deep convolutional networks for visual recognition[J]. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 2015, 37(9): 1904-1916.
Fast R-CNN [[Paper]] (http://arxiv.org/pdf/1504.08083) [Note] [Code]
- Ross Girshick, Fast R-CNN, arXiv:1504.08083.
Faster R-CNN, Microsoft Research [[Paper]] (http://arxiv.org/pdf/1506.01497) [Note] [Code] [Python Code]
- Shaoqing Ren, Kaiming He, Ross Girshick, Jian Sun, Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks, arXiv:1506.01497.
End-to-end people detection in crowded scenes [[Paper]] (http://arxiv.org/abs/1506.04878) [Note] [Code]
- Russell Stewart, Mykhaylo Andriluka, End-to-end people detection in crowded scenes, arXiv:1506.04878.
You Only Look Once: Unified, Real-Time Object Detection [[Paper]] (http://arxiv.org/abs/1506.02640) [Note] [Code]
- Joseph Redmon, Santosh Divvala, Ross Girshick, Ali Farhadi, You Only Look Once: Unified, Real-Time Object Detection, arXiv:1506.02640
Adaptive Object Detection Using Adjacency and Zoom Prediction [[Paper]] (http://arxiv.org/abs/1512.07711) [Note]
- Lu Y, Javidi T, Lazebnik S. Adaptive Object Detection Using Adjacency and Zoom Prediction[J]. arXiv:1512.07711, 2015.
Inside-Outside Net: Detecting Objects in Context with Skip Pooling and Recurrent Neural Networks [Paper] [Note]
- Sean Bell, C. Lawrence Zitnick, Kavita Bala, Ross Girshick. arXiv:1512.04143, 2015.
G-CNN: an Iterative Grid Based Object Detector [Paper]
- Mahyar Najibi, Mohammad Rastegari, Larry S. Davis. arXiv:1512.07729, 2015.
Seq-NMS for Video Object Detection [Paper] [Note]
- Wei Han, Pooya Khorrami, Tom Le Paine, Prajit Ramachandran, Mohammad Babaeizadeh, Honghui Shi, Jianan Li, Shuicheng Yan, Thomas S. Huang. Seq-NMS for Video Object Detection. arXiv preprint arXiv:1602.08465, 2016
Image Caption
Exploring Nearest Neighbor Approaches for Image Captioning [Paper]
- Devlin J, Gupta S, Girshick R, et al. Exploring Nearest Neighbor Approaches for Image Captioning[J]. arXiv preprint arXiv:1505.04467, 2015.
Show and Tell: A Neural Image Caption Generator [Paper] [Note]
- Vinyals, Oriol, et al. "Show and tell: A neural image caption generator." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2015.
Image Generations:
Pixel Recurrent Neural Networks [Paper] [Note]
- van den Oord A, Kalchbrenner N, Kavukcuoglu K. Pixel Recurrent Neural Networks[J]. arXiv preprint arXiv:1601.06759, 2016.
Variational Autoencoder [Paper] [Note]
- Kingma D P, Welling M. Auto-encoding variational bayes[J]. arXiv preprint arXiv:1312.6114, 2013.
DRAW: A recurrent neural network for image generation [Paper] [Torch Code] [Tensorflow Code] [Note]
- Gregor K, Danihelka I, Graves A, et al. DRAW: A recurrent neural network for image generation[J]. arXiv preprint arXiv:1502.04623, 2015.
Scribbler: Controlling Deep Image Synthesis with Sketch and Color [Paper] [Note]
- Patsorn Sangkloy, Jingwan Lu, et al. Scribbler: Controlling Deep Image Synthesis with Sketch and Color. arXiv preprint arXiv:1612.00835, 2016.
Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks [Paper]
- Radford A, Metz L, Chintala S. Unsupervised representation learning with deep convolutional generative adversarial networks[J]. arXiv preprint arXiv:1511.06434, 2015.
Improved Techniques for Training GANs [Paper]
- Salimans T, Goodfellow I, Zaremba W, et al. Improved Techniques for Training GANs[J]. arXiv preprint arXiv:1606.03498, 2016.
InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets[Paper]
- Chen X, Duan Y, Houthooft R, et al. InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets[J]. arXiv preprint arXiv:1606.03657, 2016.
Image-to-Image Translation with Conditional Adversarial Networks [Paper] [Note] [Torch Code] [Tensorflow Code]
- Isola P, Zhu J Y, Zhou T, et al. Image-to-Image Translation with Conditional Adversarial Networks[J]. arXiv preprint arXiv:1611.07004, 2016.
Learning to Generate Images of Outdoor Scenes from Attributes and Semantic Layouts [Paper] [Note]
- Levent Karacan, Zeynep Akata, Aykut Erdem, Erkut Erdem. Learning to Generate Images of Outdoor Scenes from Attributes and Semantic Layouts [J]. arXiv preprint arXiv:1612.00215, 2016.
Learning to Discover Cross-Domain Relations with Generative Adversarial Networks [Paper] [Note]
- Kim, Taeksoo, et al. "Learning to Discover Cross-Domain Relations with Generative Adversarial Networks." arXiv preprint arXiv:1703.05192 (2017).
Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks [Paper] [Note]
- Zhu J Y, Park T, Isola P, et al. Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks[J]. arXiv preprint arXiv:1703.10593, 2017.
BEGAN: Boundary Equilibrium Generative Adversarial Networks [Paper] [Note]
- Berthelot, David, Tom Schumm, and Luke Metz. "BEGAN: Boundary Equilibrium Generative Adversarial Networks." arXiv preprint arXiv:1703.10717 (2017).
StackGAN: Text to Photo-realistic Image Synthesis with Stacked Generative Adversarial Networks [Paper] [Note] [Tensorflow Code]
- Zhang, Han, et al. "StackGAN: Text to Photo-realistic Image Synthesis with Stacked Generative Adversarial Networks." arXiv preprint arXiv:1612.03242 (2016).
Invertible Conditional GANs for image editing [Paper] [Note]
- Perarnau G, van de Weijer J, Raducanu B, et al. Invertible Conditional GANs for image editing[J]. arXiv preprint arXiv:1611.06355, 2016.
Stacked Generative Adversarial Networks [Paper] [Note]
- Huang X, Li Y, Poursaeed O, et al. Stacked generative adversarial networks[J]. arXiv preprint arXiv:1612.04357, 2016.
Rotating Your Face Using Multi-task Deep Neural Network [Paper] [Note]
- Yim J, Jung H, Yoo B I, et al. Rotating your face using multi-task deep neural network[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2015: 676-684.
Image and Language
Learning Deep Representations of Fine-Grained Visual Descriptions [Paper] [Note]
- Reed, Scott, et al. "Learning deep representations of fine-grained visual descriptions." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016.
Activation Maximization
Synthesizing the preferred inputs for neurons in neural networks via deep generator networks [Paper] [Note]
- Nguyen A, Dosovitskiy A, Yosinski J, et al. Synthesizing the preferred inputs for neurons in neural networks via deep generator networks[J]. arXiv preprint arXiv:1605.09304, 2016.
Style Transfer
A neural algorithm of artistic style [Paper] [Note]
- Gatys L A, Ecker A S, Bethge M. A neural algorithm of artistic style[J]. arXiv preprint arXiv:1508.06576, 2015.
Perceptual losses for real-time style transfer and super-resolution [Paper] [Note]
- Johnson J, Alahi A, Fei-Fei L. Perceptual losses for real-time style transfer and super-resolution[J]. arXiv preprint arXiv:1603.08155, 2016.
Preserving Color in Neural Artistic Style Transfer [Paper] [Note] [Pytorch Code]
- Gatys, Leon A., et al. "Preserving color in neural artistic style transfer." arXiv preprint arXiv:1606.05897 (2016).
A Learned Representation For Artistic Style [Paper] [Note] [Tensorflow Code] [Lasagne Code]
- Dumoulin, Vincent, Jonathon Shlens, and Manjunath Kudlur. "A learned representation for artistic style." (2017).
Demystifying Neural Style Transfer [Paper]
- Li, Yanghao, et al. "Demystifying Neural Style Transfer." arXiv preprint arXiv:1701.01036 (2017).
Arbitrary Style Transfer in Real-time with Adaptive Instance Normalization [Paper]
- Huang, Xun, and Serge Belongie. "Arbitrary Style Transfer in Real-time with Adaptive Instance Normalization." arXiv preprint arXiv:1703.06868 (2017).
Fast Patch-based Style Transfer of Arbitrary Style [Paper]
- Chen, Tian Qi, and Mark Schmidt. "Fast Patch-based Style Transfer of Arbitrary Style." arXiv preprint arXiv:1612.04337 (2016).
Low-level vision
Texture Enhancement via High-Resolution Style Transfer for Single-Image Super-Resolution [Paper] [Note]
- Il Jun Ahn, Woo Hyun Nam. Texture Enhancement via High-Resolution Style Transfer for Single-Image Super-Resolution [J]. arXiv preprint arXiv:1612.00085, 2016.
Deep Joint Image Filtering [Paper] [Note]
- Li Y, Huang J B, Ahuja N, et al. Deep joint image filtering[C]//European Conference on Computer Vision. Springer International Publishing, 2016: 154-169.
Image Segmentation
Fully convolutional networks for semantic segmentation [Paper] [Note]
- Long J, Shelhamer E, Darrell T. Fully convolutional networks for semantic segmentation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2015: 3431-3440.
Video Editing
Deep Video Color Propagation [Paper] [Note]
- Meyer S, Cornillère V, Djelouah A, et al. Deep Video Color Propagation. BMVC 2018.
Deep Matching
AnchorNet: A Weakly Supervised Network to Learn Geometry-sensitive Features For Semantic Matching [Paper] [Note]
- Novotný D, Larlus D, Vedaldi A. AnchorNet: A Weakly Supervised Network to Learn Geometry-Sensitive Features for Semantic Matching, CVPR. 2017
Open Courses
- CS231n: Convolutional Neural Networks for Visual Recognition [Course Page]
- CS224d: Deep Learning for Natural Language Processing [Course Page]
Online Books
- Deep Learning by Ian Goodfellow, Yoshua Bengio and Aaron Courville
Mathmatics
- Introduction to Probability Models, Sheldon M. Ross
Misc
k-means++: The advantages of careful seeding [Paper] [Note]
- Arthur D, Vassilvitskii S. k-means++: The advantages of careful seeding[C]//Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms. Society for Industrial and Applied Mathematics, 2007: 1027-1035.