• Stars
    star
    380
  • Rank 112,766 (Top 3 %)
  • Language
    Jupyter Notebook
  • Created almost 3 years ago
  • Updated over 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Visualize hnsw, faiss and other anns index

Feder

What is feder

Feder is a JavaScript tool designed to aid in the comprehension of embedding vectors. It visualizes index files from Faiss, HNSWlib, and other ANN libraries to provide insight into how these libraries function and the concept of high-dimensional vector embeddings. Currently, Feder is primarily focused on the IVF_FLAT index file type from Faiss and the HNSW index file type from HNSWlib, though additional index types will be added in the future.

Feder is written in javascript, and we also provide a python library federpy, which is based on federjs.

NOTE:

  • In IPython environment, it supports users to generate the corresponding visualization directly.
  • In other environments, it supports outputting visualizations as html files, which can be opened by the user through the browser with web service enabled.

Online demos

How feder works

Wiki

HNSW visualization screenshots

image

IVF_Flat visualization screenshots

image image image

Quick Start

Installation

Use npm or yarn.

yarn install @zilliz/feder

Material Preparation

Make sure that you have built an index and dumped the index file by Faiss or HNSWlib.

Init Feder

Specifying the dom container that you want to show the visualizations.

import { Feder } from '@zilliz/feder';

const feder = new Feder({
  filePath: 'faiss_file', // file path
  source: 'faiss', // faiss | hnswlib
  domSelector: '#container', // attach dom to render
  viewParams: {}, // optional
});

Visualize the index structure.

  • HNSW - Feder will show the top-3 levels of the hnsw-tree.
  • IVF_Flat - Feder will show all the clusters.
feder.overview();

Explore the search process.

Set search parameters (optional) and Specify the query vector.

feder
  .setSearchParams({
    k: 8, // hnsw, ivf_flat
    ef: 100, // hnsw (ef_search)
    nprobe: 8, // ivf_flat
  })
  .search(target_vector);

Examples

We prepare a simple case, which is the visualizations of the hnsw and ivf_flat with 17,000+ vectors that embedded from VOC 2012).

git clone [email protected]:zilliztech/feder.git
cd feder
yarn install
yarn dev

Then open http://localhost:12355/

It will show 4 visualizations:

  • hnsw overview
  • hnsw search view
  • ivf_flat overview
  • ivf_flat search view

Feder for Large Index

Feder consists of three components:

  • FederIndex - parse the index file. It requires a lot of memory.
  • FederLayout - layout calculations. It consumes a lot of computational resources.
  • FederView - render and interaction.

In case of excessive amount of data, we support separating the computation part and running it on a node server. We have two solutions for you:

  • oneServer
    • federServer (with FederIndex and FederLayout).
  • twoServer
    • indexServer (with FederIndex)
    • layoutServer (with FederLayout)

Referring to case/oneServer and case/twoServer.

Example with One Server

  1. launch the server
yarn test_one_server_backend
  1. launch the front web service
yarn test_one_server_front
  1. open http://localhost:8000

Example with Two Servers

  1. launch the FederIndex server
yarn test_two_server_feder_index
  1. launch the FederLayout server
yarn test_two_server_feder_layout
  1. launch the front web service
yarn test_two_server_front
  1. open http://localhost:8000

Pipeline - explore a new dataset with feder

Step 1. Dataset preparation

Put all images to test/data/images/. (example dataset VOC 2012)

You can also generate random vectors without embedding for index building and skip to step 3.

Step 2. Generate embedding vectors

Recommend to use towhee, one line of code to generating embedding vectors!

We have the encoded vectors ready for you.

Step 3. Build an index and dump it.

You can use faiss or hnswlib to build the index.

(*Detailed procedures please refer to their tutorials.)

Referring to test/data/gen_hnswlib_index_*.py or test/data/gen_faiss_index_*.py

Or we have the index file ready for you.

Step 4. Init Feder.

import { Feder } from '@zilliz/feder';
import * as d3 from 'd3';

const domSelector = '#container';
const filePath = [index_file_path];
const source = "hnswlib"; // "hnswlib" or "faiss"

const mediaCallback = (rowId) => mediaUrl;

const feder = new Feder({
  filePath,
  source,
  domSelector,
  viewParams: {
    mediaType: 'img',
    mediaCallback,
  },
});

If use the random_data, no need to specify the mediaType.

import { Feder } from '@zilliz/feder';
import * as d3 from 'd3';

const domSelector = '#container';
const filePath = [index_file_path];

const feder = new Feder({
  filePath,
  source: 'hnswlib',
  domSelector,
});

Step 5. Explore the index!

Visualize the overview

feder.overview();

or visualize the search process.

feder.search(target_vector[, targetMediaUrl]);

or randomly select an vector as the target to visualize the search process.

feder.searchRandTestVec();

More cases refer to the test/test.js

Blogs

Roadmap

We're still in the early stages, we will support more types of anns index, and more unstructured data viewer, stay tuned.

Acknowledgments

More Repositories

1

GPTCache

Semantic cache for LLMs. Fully integrated with LangChain and llama_index.
Python
7,164
star
2

attu

The GUI for Milvus
TypeScript
1,246
star
3

VectorDBBench

A Benchmark Tool for VectorDB
Python
523
star
4

akcio

Akcio is a demonstration project for Retrieval Augmented Generation (RAG). It leverages the power of LLM to generate responses and uses vector databases to fetch relevant documents to enhance the quality and relevance of the output.
Python
249
star
5

knowhere

Knowhere is an open-source vector search engine, integrating FAISS, HNSW, etc.
C++
172
star
6

phantoscope

Open Source, Cloud Native, RESTful Search Engine Powered by Neural Networks
Python
140
star
7

milvus-backup

Backup and restore tool for Milvus
Go
124
star
8

pyglass

Graph Library for Approximate Similarity Search
C++
85
star
9

milvus-helm

Mustache
56
star
10

milvus-operator

The Kubernetes Operator of Milvus.
Go
47
star
11

MolSearch

An opensource molecule analyze software
JavaScript
43
star
12

starling

C++
35
star
13

BBAnn

Block-based Approximate Nearest Neighbor
C++
31
star
14

awesome-milvus

A curated list of awesome Milvus projects and resources.
27
star
15

cloud-vectordb-examples

Zilliz Cloud examples
Java
27
star
16

milvus-cdc

Milvus-CDC is a change data capture tool for Milvus. It can capture the changes of upstream Milvus collections and sink them to downstream Milvus.
Go
24
star
17

vector-index-visualization-tool

visualization tool for vector search index
TypeScript
22
star
18

milvus-migration

Go
18
star
19

vectordb-benchmark

Python
18
star
20

kafka-connect-milvus

kafka-connect-milvus sink connector
Java
17
star
21

Retriever-for-GPTs

An external retriever for GPTs implemented with Zilliz Cloud Pipelines, a more flexible and economic alternative to default GPTs knowledge base.
16
star
22

terraform-provider-zillizcloud

Go
12
star
23

spark-milvus

Java
7
star
24

phantoscope-bootcamp

Bootcamp for Phantoscope
Python
6
star
25

md2md

tool for generating markdown file, support fragment, variables
JavaScript
6
star
26

zilliz-cloud-typescript-example

TypeScript
6
star
27

infini-client

arctern client
TypeScript
4
star
28

milvus-bulkload

Python
1
star
29

arctern-webdocs

1
star
30

codelabs

Zilliz codelabs
JavaScript
1
star
31

zdoc-demos

Jupyter Notebook
1
star
32

milvus_gobench

1
star
33

milvus-gather

Go
1
star