• Stars
    star
    139
  • Rank 254,745 (Top 6 %)
  • Language
    Jupyter Notebook
  • License
    MIT License
  • Created almost 4 years ago
  • Updated 12 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Using efficientnet to provide embeddings for retrieval

image_embeddings

pypi ci

Using efficientnet to provide embeddings for retrieval. Read the blog post at https://medium.com/@rom1504/image-embeddings-ed1b194d113e

Why this repo ? Embeddings are a widely used technique that is well known in scientific circles. But it seems to be underused and not very well known for most engineers. I want to show how easy it is to represent things as embeddings, and how many application this unlocks. Checkout the demo first!

knn example

Workflow

  1. download some pictures
  2. run inference on them to get embeddings
  3. simple knn example, to understand what's the point : click on some pictures and see KNN

Simple Install

Run pip install image_embeddings

Example workflow

  1. run image_embeddings save_examples_to_folder --images_count=1000 --output_folder=tf_flower_images, this will retrieve 1000 image files from https://www.tensorflow.org/datasets/catalog/tf_flowers (but you can also pick any other dataset)
  2. produce tf records with image_embeddings write_tfrecord --image_folder=tf_flower_images --output_folder=tf_flower_tf_records --shards=10
  3. run the inference with image_embeddings run_inference --tfrecords_folder=tf_flower_tf_records --output_folder=tf_flower_embeddings
  4. run a random knn search on them image_embeddings random_search --path=tf_flower_embeddings

Optionally if you want to use the embeddings in numpy (in other languages), run image_embeddings embeddings_to_numpy --input_path=tf_flower_embeddings --output_path=tf_flower_numpy. In particular this can be used in the web demo

$ image_embeddings random_search --path=tf_flower_embeddings
image_roses_261
160.83 image_roses_261
114.36 image_roses_118
102.77 image_roses_537
92.95 image_roses_659
88.49 image_roses_197

Explore the Simple notebook for more details.

You can try it locally or try it in colab

The From scratch notebook provides an explanation on how to build this from scratch.

API

image_embeddings.downloader

Downloader from tensorflow datasets. Any other set of images could be used instead

image_embeddings.downloader.save_examples_to_folder(output_folder, images_count=1000, dataset="tf_flowers")

Save https://www.tensorflow.org/datasets/catalog/tf_flowers to folder Also works with other tf datasets

image_embeddings.inference

Create tf recors from images files, and apply inference with an efficientnet model. Other models could be used.

image_embeddings.inference.write_tfrecord(image_folder, output_folder, num_shards=100)

Write tf records from an image folders

image_embeddings.inference.run_inference(tfrecords_folder, output_folder, batch_size=1000)

Run inference on provided tf records and save to folder the embeddings

image_embeddings.knn

Convenience methods to read, build indices and apply search on them. These methods are provided as example. Use faiss directly for bigger datasets.

image_embeddings.knn.read_embeddings(path)

Run embeddings from path and return a tuple with

  • embeddings as a numpy matrix
  • an id to name dictionary
  • a name to id dictionary

image_embeddings.knn.build_index(emb)

Build a simple faiss inner product index using the provided matrix of embeddings

image_embeddings.knn.search(index, id_to_name, emb, k=5)

Search the query embeddings and return an array of (distance, name) images

image_embeddings.knn.display_picture(image_path, image_name)

Display one picture from the given path and image name in jupyter

image_embeddings.knn.display_results(image_path, results)

Display the results from search method

image_embeddings.knn.random_search(path)

Load the embeddings, apply a random search on them and display the result

image_embeddings.knn.embeddings_to_numpy(input_path, output_folder)

Load the embeddings from the input folder as parquet and save them as

  • json for the id -> name mapping
  • numpy for the embeddings

Particularly useful to read the embeddings from other languages

Advanced Installation

Prerequisites

Make sure you use python>=3.6 and an up-to-date version of pip and setuptools

python --version
pip install -U pip setuptools

It is recommended to install image_embeddings in a new virtual environment. For example

python3 -m venv image_embeddings_env
source image_embeddings_env/bin/activate
pip install -U pip setuptools
pip install image_embeddings

Using Pip

pip install image_embeddings

From Source

First, clone the image_embeddings repo on your local machine with

git clone https://github.com/rom1504/image_embeddings.git
cd image_embeddings
make install

To install development tools and test requirements, run

make install-dev

Test

To run unit tests in your current environment, run

make test

To run lint + unit tests in a fresh virtual environment, run

make venv-lint-test

Lint

To run black --check:

make lint

To auto-format the code using black

make black

Tasks

More Repositories

1

img2dataset

Easily turn large sets of image urls to an image dataset. Can download, resize and package 100M urls in 20h on one machine.
Python
3,192
star
2

clip-retrieval

Easily compute clip embeddings and build a clip retrieval system with them
Jupyter Notebook
2,089
star
3

cc2dataset

Easily convert common crawl to a dataset of caption and document. Image/text Audio/text Video/text, ...
Python
290
star
4

laion-prepro

Get hundred of million of image+url from the crawling at home dataset and preprocess them
Python
190
star
5

awesome-semantic-search

Semantic search with embeddings: index anything
127
star
6

MinecraftChat

Minecraft web based chat client
JavaScript
101
star
7

embedding-reader

Efficiently read embedding in streaming from any filesystem
Python
84
star
8

rbot

bot made with mineflayer which can do task
JavaScript
81
star
9

gpu-tester

gpu tester detects broken and slow gpus in a cluster
Python
61
star
10

dalle-service

Dalle service
JavaScript
50
star
11

any2dataset

Turn any collection of files into a dataset
Python
41
star
12

python-template

Simple python template
Python
36
star
13

audio2dataset

Easily turn large sets of audio urls to an audio dataset.
Python
19
star
14

minecraft-schematics-dataset

Minecraft schematics dataset
Jupyter Notebook
15
star
15

sshd_android

How to access your android phone from anywhere using ssh
14
star
16

kaggle-fashion-dalle

Kaggle fashion dataset in dalle format
Jupyter Notebook
13
star
17

slurm-tracking-bot

Simple slurm tracking bot to check usage
Python
9
star
18

static-ondisk-kv

Simple and fast implementation of a static on disk key value store, in python
Python
9
star
19

all-clip

Load any clip model with a standardized interface
Python
9
star
20

web-minecraft-crafter

A web interface to minecraft crafter
JavaScript
8
star
21

word_knn

Quickly find closest words using an efficient knn and word embeddings
Python
6
star
22

parse-wikitext

A simple wikitext parser in node.js
JavaScript
6
star
23

node-fernflower

Simple fernflower java decompiler wrapper
JavaScript
5
star
24

wct-datatables-net

Datatables.net as a webcomponent
JavaScript
5
star
25

node-corenlp-client

Simple corenlp client to the corenlp http server using request-promise
JavaScript
5
star
26

node-minecraft-proxies

Create minecraft proxies in node.js
JavaScript
5
star
27

flying-squid-schematic

Flying-squid plugin providing /listSchemas and /loadSchema commands.
JavaScript
4
star
28

minecraft-crafter

Tells you how to get any item by crafting in minecraft
JavaScript
4
star
29

flying-squid-irc

Make a bridge between flying-squid and an IRC channel.
JavaScript
4
star
30

TvSeriesOrganizer

Application targetting desktop and mobile to organize your tv series
QML
4
star
31

minecraft-schematic-crawler

Automatic minecraft schematic crawler for bots and ML
JavaScript
4
star
32

tensorflow_captcha_solver

Captcha solver based on https://medium.com/@ageitgey/how-to-break-a-captcha-system-in-15-minutes-with-machine-learning-dbebb035a710
Python
4
star
33

PersonalKnowledgeBase

Storing data about people.
4
star
34

adjective-animal

Generate an adjective-animal name !
JavaScript
4
star
35

auto-squid

Auto update and start flying-squid
Shell
4
star
36

rom1504.github.io

Personal website
3
star
37

minespy

Spy everybody with your minecraft proxy
JavaScript
3
star
38

npm-safeguard

Download the most popular npm packages and check if they have accidentally published dot files
JavaScript
3
star
39

ideas

Ideas
3
star
40

FaceRecognition

A program made using perl, bash, c++, opencv and libsvm which make it possible to automatically recognize faces.
Perl
3
star
41

imlb

Instant Messaging Logs Base : store and make available all your instant messages
3
star
42

schematic-to-world

Load a minecraft schematic into prismarine world
JavaScript
3
star
43

distributed-shuffle

A simple implementation of distributed shuffle, intended for learning
Python
2
star
44

AutoTathamet

Create Diablo2 bots with a powerful, stable, and high level JavaScript API.
JavaScript
2
star
45

minecraft-task-graph

Define a graph of tasks for minecraft
2
star
46

deepfashion_to_tfrecords

Convert deepfashion to tfrecords to learn multimodal models
Jupyter Notebook
2
star
47

rom1504

Profile readme
2
star
48

voxel-prismarine-world

An experimental prismarine-world visualizer using voxeljs.
JavaScript
2
star
49

mcpe-protocol-extractor

Extract MCPE protocol from pocketmine
JavaScript
2
star
50

BinaryTreeExample

This is an example for the GenericBinaryTree lib
C++
2
star
51

getSubtitle

Allow you to easily get tv show english subtitle from the command line from addic7ed.
Perl
2
star
52

MasonJar

NodeJS Minecraft implementation used on 8BitBlocks 2.0
JavaScript
1
star
53

fromconfig-mlflow

A fromconfig Launcher for MlFlow
Python
1
star
54

SignalList

A list container built around QList that emit signals when add,delete,.. methods are called.
C++
1
star
55

GenericBinaryTree

This is a generic binary tree implementation and a viewer of these Tree for Qt
C++
1
star
56

CorganoBot

@Corgano's minecraft bot
JavaScript
1
star
57

testing_repo

Just tests
1
star
58

autofaiss_rom1504

Automatically create Faiss knn indices with the most optimal similarity search parameters.
Python
1
star
59

ReVerbHttp

A simple http server to query ReVerb
Java
1
star
60

ChineseNumber

A chinese number converter in c++/Qt with unit tests
C++
1
star
61

pascal_interpreter

Make pascal graph call, pascal interpreter and compiler to c
OpenEdge ABL
1
star
62

DBpediaPerl

A very simple perl module which allow you to query the DBpedia sparql endpoint.
Perl
1
star
63

getQuotesSmooth

get quotes from smoothirc.net
JavaScript
1
star
64

FreebasePerl

A very simple perl module which allow you to query the freebase database.
Perl
1
star
65

node-facebook-import

Import facebook logs into a database.
JavaScript
1
star
66

rom1504.fr

My site
HTML
1
star
67

rcontact

Gestionnaire de contacts
C++
1
star
68

BotIrssi

Un bot irc proposant des jeux et autres fonctionnalités, plugin irssi
Perl
1
star
69

FaceDetect

A program that uses opencv, bash, perl, c++ and detect faces in pictures.
Perl
1
star
70

JsonConv

Convert Json to xml and sql
TeX
1
star
71

moteurPhysique

Gestion de plusieurs entités et de leur déplacement. On peut aussi construire une unité à partir du batiment.
C++
1
star
72

keras-square-function-estimator

A simple example on estimating the square function in keras
Python
1
star
73

RelExHttp

A simple http server to query RelEx
Java
1
star
74

faiss-java

Maven package for faiss
Java
1
star
75

FaceRecognitionInterface

A software that handle the whole process of tagging people on pictures.
C++
1
star
76

my-github-backups

Backup of my github projects
1
star
77

SimpleEditor

A simple editor made with Qt
C++
1
star
78

distributed-translator

Translate millions of captions to hundred of languages efficiently
Python
1
star
79

TvSeriesOrganizerPluginInterface

Allow plugin to interact on an episode
C++
1
star
80

GeneralQmlItems

Some useful general Qml Items
IDL
1
star
81

ngengine

A 2D/3D Game Engine (C++, OpenGL, Glm).
C++
1
star
82

TvSeriesAPI

A c++ Qt API providing series data from thetvdb and trakt
C++
1
star
83

client_irc

Client irc built with Qt (inspired by xchat)
C++
1
star
84

node-raknet

UDP network library that follows the RakNet protocol for Node.js
JavaScript
1
star
85

freehex

An hex game
JavaScript
1
star
86

ecosysteme

Une sorte de simulation d'écosystème codé en c++ avec SDL
C++
1
star