Discover sriragjayakumar/Show-and-Speak-DETR Open Source project

Stars
1
Language
Python
License
MIT License
Created about 2 years ago
Updated about 2 years ago

sriragjayakumar

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

In recent times, developments in the fields of image captioning have proposed novel models such as show and speak (SAS) which directly synthesize spoken description of images, bypassing the need for any text or phonemes. The basic structure of SAS is an encoder-decoder architecture that takes an image as input and predicts the spectrogram of speech that describes this image. The final speech audio is obtained from the predicted spectrogram via WaveNet. Further, SAS uses recurrent neural network-based models such as LSTMs for speech production. We propose to investigate in this study the use of transformers for SAS models, given the superior performance of transformers for generating sequential data.

Table_detector

To detect tables in a pdf using faster RCNN

Python

Table-column-detection

Python

10-arm-testbed

Re-implement in Python the results presented in Figure 2.2 of the Sutton & Barto book comparing a greedy method with two -greedy methods (𝜀 =0.01 and 𝜀 =0.1), on the 10-armed testbed, and present your code and results. Include a discussion of the exploration - exploitation dilemma in relation to your findings.

Python

sriragjayakumar/Show-and-Speak-DETR

sriragjayakumar

Reviews

Repository Details

More Repositories