There are no reviews yet. Be the first to send feedback to the community and the maintainers!
Repository Details
Image Captioning is the process of generating textual description of an image. It uses both Natural Language Processing and Computer Vision to generate the captions. The dataset will be in the form [image to captions]. The dataset consists of input images and their corresponding output captions. Encoder The Convolutional Neural Network can be thought of as an encoder. The input image is given to CNN to extract the features. The last hidden state of the CNN is connected to the Decoder. Decoder The Decoder is a Recurrent Neural Network(RNN) which does language modelling up to the word level. The first time step receives the encoded output from the encoder and also the START vector using Use Pretrained Resnet-50 model trained on ImageNet dataset (available publicly on google) for image feature extraction and created 4 layered RNN layer model and other relevant layers for image caption generation