4MC-4M-Image-Text-Pairs-with-CLIP-embeddings
I have created a dataset of Image-Text-Pairs by using the cosine similarity of the CLIP embeddings of the image & it's caption derrived from YFCC100M. I have also added propabilities from a NSFW detector & more.