• Stars
    star
    232
  • Rank 172,847 (Top 4 %)
  • Language
    Python
  • License
    MIT License
  • Created about 5 years ago
  • Updated almost 2 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Image search engine

Building a Deep Image Search Engine using tf.Keras

Motivation :

Imagine having a data collection of hundreds of thousands to millions of images without any metadata describing the content of each image. How can we build a system that is able to find a sub-set of those images that best answer a user’s search query ?
What we will basically need is a search engine that is able to rank image results given how well they correspond to the search query, which can be either expressed in a natural language or by another query image.
The way we will solve the problem in this post is by training a deep neural model that learns a fixed length representation (or embedding) of any input image and text and makes it so those representations are close in the euclidean space if the pairs text-image or image-image are β€œsimilar”.

Data set :

I could not find a data-set of search result ranking that is big enough but I was able to get this data-set : http://jmcauley.ucsd.edu/data/amazon/ which links E-commerce item images to their title and description. We will use this metadata as the supervision source to learn meaningful joined text-image representations. The experiments were limited to fashion (Clothing, Shoes and Jewelry) items and to 500,000 images in order to manage the computations and storage costs.

Problem setting :

The data-set we have links each image with a description written in natural language. So we define a task in which we want to learn a joined, fixed length representation for images and text so that each image representation is close to the representation of its description.

Model :

The model takes 3 inputs : The image (which is the anchor), the image title+description ( the positive example) and the third input is some randomly sampled text (the negative example).
Then we define two sub-models :

  • Image encoder : Resnet50 pre-trained on ImageNet+GlobalMaxpooling2D
  • Text encoder : GRU+GlobalMaxpooling1D

The image sub-model produces the embedding for the Anchor **E_a **and the text sub-model outputs the embedding for the positive title+description E_p and the embedding for the negative text E_n.

We then train by optimizing the following triplet loss:

L = max( d(E_a, E_p)-d(E_a, E_n)+alpha, 0)

Where d is the euclidean distance and alpha is a hyper parameter equal to 0.4 in this experiment.

Basically what this loss allows to do is to make **d(E_a, E_p) small and make d(E_a, E_n) **large, so that each image embedding is close to the embedding of its description and far from the embedding of random text.

Visualization Results :

Once we learned the image embedding model and text embedding model we can visualize them by projecting them into two dimensions using tsne (https://scikit-learn.org/stable/modules/generated/sklearn.manifold.TSNE.html ).

Test Images and their corresponding text description are linked by green lines

We can see from the plot that generally, in the embedding space, images and their corresponding descriptions are close. Which is what we would expect given the training loss that was used.

Text-image Search :

Here we use few examples of text queries to search for the best matches in a set of 70,000 images. We compute the text embedding for the query and then the embedding for each image in the collection. We finally select the top 9 images which are the closest to the query in the embedding space.

These examples show that the embedding models are able to learn useful representations of images and embeddings of simple composition of words.

Image-Image Search :

Here we will use an image as a query and then search in the database of 70,000 images for the examples that are most similar to it. The ranking is determined by how close each pair of images are in the embedding space using the euclidean distance.

The results illustrate that the embeddings generated are high level representations of images that capture the most important characteristics of the objects represented without being excessively influenced by the orientation, lighting or minor local details, without being trained explicitly to do so.

Conclusion :

In this project we worked on the Machine learning blocks that allow us to build a keyword and image based search engine applied to a collection of images. The basic idea is to learn a meaningful and joined embedding function for text and image and then use the distance between items in the embedding space to rank search results.

References :

More Repositories

1

time_series_forecasting

Python
194
star
2

ECG_Heartbeat_Classification

CNN for heartbeat classification
Python
148
star
3

EEG_classification

EEG Sleep stage classification using CNN with Keras
Python
145
star
4

medical_image_segmentation

Medical image segmentation ( Eye vessel segmentation)
Python
124
star
5

audio_classification

CNN 1D vs 2D audio classification
Jupyter Notebook
103
star
6

recommender_transformer

Python
82
star
7

graph_classification

Learning from graph data using Keras
Python
64
star
8

music_genre_classification

music genre classification : LSTM vs Transformer
Python
61
star
9

rubiks_cube

Rubik's Cube solver using reinforcement learning
Python
53
star
10

kinship_prediction

Deep Neural Networks for Kinship prediction using face photos
Python
47
star
11

face_age_gender

Can we predict the age and gender of someone given a picture of their face ?
Python
42
star
12

COLA_pytorch

COLA contrastive pre-training method implemented in PyTorch
Python
40
star
13

DeepTabular

Python
37
star
14

fingerprint_denoising

U-Net for fingerprint denoising
Python
26
star
15

IntegratedGradientsPytorch

Integrated gradients attribution method implemented in PyTorch
Python
23
star
16

sudoku_solver

Solving a Sudoku Puzzle from a screenshot
Python
22
star
17

llm-serve-tutorial

Python
21
star
18

RL

RL algorithm implementations from scratch.
Python
17
star
19

distill-llm

Python
17
star
20

FastImageClassification

A Step-By-Step tutorial to build and deploy an image classification API
Python
14
star
21

Recommender_keras

Basic recommendation system for Movilens dataset using Keras
Python
12
star
22

xumi

Python
9
star
23

ner_playground

Python
6
star
24

celery_ml_deploy

Python
6
star
25

knowledge_distillation

Knowledge Distillation
Python
5
star
26

gcp_model_deploy_example

Python
5
star
27

handwriting_forensics

Python
5
star
28

TagSuggestionImages

Suggest multiple Tags/Labels that better fit an image
Python
4
star
29

active_learning

Active Learning Applied to image and tabular data
Python
4
star
30

code_search

Python
4
star
31

nicegui_tutorial

Python
4
star
32

learning_to_abstain

Know what you don't know
Python
3
star
33

malignancy_detection

malignancy detection using CNNs with Keras
Python
3
star
34

LLM-Voice

Python
3
star
35

ReconstructionAuxLoss

Improve Neural Network's Generalization Performance By Adding an Unsupervised Auxiliary Loss - Pytorch Lightning
Python
2
star
36

bleach_bot

Python
2
star
37

doc-llm

Python
2
star
38

interpretable_nlp

Python
1
star
39

streamlit_demo

Python
1
star
40

prefect_mlops

1
star
41

dimensionality_reduction

HTML
1
star
42

ToyImageClassificationDataset

Toy Image Classification Dataset Annotated with Labelme
Python
1
star
43

constrained_llm_generation

Python
1
star