There are no reviews yet. Be the first to send feedback to the community and the maintainers!
Repository Details
Multi-stage pipeline for generating an AVSR dataset consisting of active-speaker face tracks with their transcriptions from widely available videos (such as TV data).