• Stars
    star
    1
  • Language
    Jupyter Notebook
  • Created about 2 years ago
  • Updated about 2 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

In this project, an image caption generator has been developed using a CNN-LSTM model. Some key aspects about this project to note are that the model depends on the data, so it cannot predict the words that are out of its vocabulary. A dataset consisting of 8000 images is used here. But for production-level models i.e. higher accuracy models, the model needs to be trained on larger than 100,000 images datasets so that better accuracy models can be developed. The models were trained with only 6480 datasets from a total of 8091 datasets, which does not have a wide variety of images, so the model does not generate the accurate descriptions of the given image. Due to an imbalanced dataset there is overfitting in this model which gives a more accurate result to the trained datasets but for the unseen data the descriptions are less accurate. All in all, a system was made that generates the caption for the input images. The input images fed into the system could be in .jpg and .png format. The system was made that takes the images in supported format and then provides captions of given images. Hereby, we agree this project ignites our interest in application of Deep Learning knowledge in Computer Vision and expects to explore more in the future.