There are no reviews yet. Be the first to send feedback to the community and the maintainers!
Repository Details
Applications of LSTM - Image caption Generation. For generating captions for images, we will use a popular dataset for image captioning tasks known as Microsoft Common Objects in Context (MS-COCO). We will �rst process images from the dataset (MS-COCO) to obtain an encoding of the images with a pretrained Convolutional Neural Network (CNN), which is already good at classifying images. The CNN will take a �xed-size image as the input and output the class the image belongs to (for example, cat, dog, bus, and tree). Using this CNN, we can obtain compressed encoded vectors describing images. Then we will process the captions of the images to learn the word embeddings of the words found in captions. We can also use pretrained word vectors for this task. Finally, having obtained both the image and word encodings, we will feed them into an LSTM and train it on the images and their respective captions.