• Stars
    star
    103
  • Rank 331,177 (Top 7 %)
  • Language
    Jupyter Notebook
  • License
    Apache License 2.0
  • Created over 7 years ago
  • Updated over 7 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Finding similar images in the Holidays dataset

holiday-similarity

Various attempts to build a neural network to distinguish between similar and different images from the INRIA Holiday Photos Dataset.

Project started out as an attempt to figure out how to use the Keras ImageDataGenerator to apply the same transformation to a pair of input images. This can be found in 01-holidays-data-augment.ipynb.

I then attempted to build a Siamese network based on the Keras example mnist_siamese_graph.py. This uses a shared convolutional network with Contrastive Divergence as the loss function. This network predicts a number between 1 and 0, 1 being very similar and 0 being very dissimilar. For the Convolutional Network, I used LeNet configuration, which is probably not powerful enough to do any learning for the Holidays dataset. You can see the code in 02-holidays-siamese-network.ipynb. Even with a slightly more complex network configuration that I started out with, I was unable to get more than 60% accuracy on the network, so I abandoned this approach.

Next I tried to set up a baseline by generating image vectors from a bunch of pretrained networks available as part of Keras. Image Vectors were generated using VGG-16, VGG-19, ResNet50, InceptionV3, and xCeption networks. The notebook for vector generation is 03-pretrained-nets-vectorizers.ipynb.

In order to train a network if an image pair is similar or not, I would take the training triple consisting of the (image_left, image_right, label) triple and lookup vectors for the two images, then use a merge strategy to merge the two vectors. The strategies used are element-wise cosine (dot), element-wise absolute difference (l1) and element-wise euclidean distance (l2). For each of the strategies above, we feed the resulting vector into a Naive Bayes, SVM, XGBoost and Random Forest classifier respectively. All but the 3rd is from Scikit-Learn, and the 3rd is from the XGBoost package. The notebooks for these are 04-pretrained-vec-dot-classifier.ipynb, 05-pretrained-vec-l1-classifier.ipynb and 06-pretrained-vec-l2-classifier.ipynb. The best results were with XGBoost classifier, and the two top vectorizer networks were ResNet50 and InceptionV3.

We then replace the XGBoost classifier with a 3 layer fully connected network and repeat the experiment using the ResNet50 and InceptionV3 vectors, and each of the 3 merge strategies in addition to concatenation, in 07-pretrained-vec-nn-classifier.ipynb. Here we find that the best results come with dot product and Inception V3 vectors.

Finally, we replace the vectorizer backend network with a pre-trained InceptionV3 network with the prediction layer removed, similar to how it is used for generating image vectors, and reuse the FCN we trained in the previous notebook as the head of this network, in 08-holidays-siamese-finetune.ipynb. A Siamese network requires that we share the CNN, but since we are treating the weights for the Inception network as frozen, it makes no difference whether we share the network or use copies of the network. We also use the ImageDataGenerator to augment our images, something we cannot do when using vectors. Resulting model is unfortunately not as good as the one using image vectors from a pre-trained network with a FCN front-end.

More Repositories

1

statlearning-notebooks

Python notebooks for exercises covered in Stanford statlearning class (where exercises were in R).
376
star
2

eeap-examples

Code for Document Similarity on Reuters dataset using Encode, Embed, Attend, Predict recipe
Jupyter Notebook
259
star
3

dl-models-for-qa

Keras DL models to answer 8th grade science multiple choice questions (Kaggle AllenAI competition).
Python
237
star
4

nltk-examples

Worked examples from the NLTK Book
Python
183
star
5

fttl-with-keras

Transfer Learning and Fine Tuning for Cross Domain Image Classification with Keras
Jupyter Notebook
83
star
6

ner-re-with-transformers-odsc2022

Building NER and RE components using HuggingFace Transformers
Jupyter Notebook
47
star
7

hia-examples

Hadoop In Action Examples
Java
39
star
8

mlia-examples

Python and R Examples
Python
39
star
9

pytorch-gnn-tutorial-odsc2021

Repository for GNN tutorial using Pytorch and Pytorch Geometric (PyG) for ODSC 2021
Jupyter Notebook
36
star
10

reuters-docsim

Different approaches to computing document similarity
Python
28
star
11

mia-scala-examples

Mahout Examples
Scala
26
star
12

keras-tutorial-odsc2020

Notebooks for Keras Tutorial presented at ODSC West 2020
Jupyter Notebook
26
star
13

ltr-examples

Supporting code for Learning to Rank (LTR) presentation
Jupyter Notebook
16
star
14

polydlot

My attempt to learn more than one Deep Learning framework
Jupyter Notebook
16
star
15

intro-dl-talk-code

Jupyter notebooks and code for Intro to DL talk at Genesys
Jupyter Notebook
14
star
16

scalcium

Scala NLP Algorithms
Scala
10
star
17

solr4-extras

Random solr4 customizations
Scala
10
star
18

delsym

An actor based content ingestion pipeline
Scala
10
star
19

nlp-graph-examples

Examples for Graphorum 2019 presentation -- Graph Techniques for Natural Language Processing
Jupyter Notebook
10
star
20

esc

Scala client for ElasticSearch
Scala
9
star
21

deeplearning-ai-examples

Jupyter Notebook
8
star
22

bpwj

Java Parser Development Framework from Steven Metsker's "Building Parsers With Java book"
Java
8
star
23

thinkstats-examples

Worked examples for exercises in Think Stats using the Scientific Python stack.
Jupyter Notebook
8
star
24

saturn-scispacy

SaturnCloud notebooks to extract annotations from CORD-19 dataset using SciSpacy pretrained models
Jupyter Notebook
8
star
25

llm-rag-eval

Large Language Model (LLM) powered evaluator for Retrieval Augmented Generation (RAG) pipelines.
Python
8
star
26

content-engineering-tutorial

Jupyter Notebook
7
star
27

vespa-poc

Small Proof of Concept to familiarize myself with Vespa.ai functionality
Python
7
star
28

neural-re-experiments

Jupyter Notebook
5
star
29

kg-aligned-entity-linker

Knowledge Graph Aligned Entity Linker using BERT and Sentence Transformers
Jupyter Notebook
5
star
30

bayesian-stats-examples

Python versions of things taught in the Bayesian Statistics courses on Coursera
Jupyter Notebook
4
star
31

neurips-papers-node2vec

Jupyter Notebook
3
star
32

tgni

Experimental NER techniques to address common (for me) text analysis problems.
Java
3
star
33

snorkel-pytorch-lstm-gpu

Code for my GPU port of Snorkel 's Pytorch discriminative model (LSTM)
Python
2
star
34

compmethods-notebooks

Python Notebooks for the Computional Methods for Data Analysis course on Coursera.
2
star
35

claimintel

Descriptive Stats on Claims Data
Scala
1
star
36

misc-docs

Account for storing miscellaneous text files for sharing
1
star
37

spark-data-algorithms

Implementations of common data algorithms in Spark
1
star
38

sherpa

Django based web application to help with organizing a conference (summit)
Python
1
star
39

pytorch-drl-examples

Reimplementation of Deep Reinforcement Learning examples from "Deep Reinforcement Learning with Python" by Sudharsan Ravichandran
1
star