• Stars
    star
    268
  • Rank 153,144 (Top 4 %)
  • Language
    Jupyter Notebook
  • Created over 2 years ago
  • Updated over 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

The definitive guide to using Vector Search to solve your semantic search production workload needs.

Vector Search engines provide the ability for developers to store vectors structured around certain algorithms (i.e. KNN), and an engine to compute similar vectors (like cosine distance) to determine which vectors are related.

This repository provides a comprehensive overview of the vector search landscape inclusive of tutorials, guides, best-practices, and extended learning. Please review the Education section to learn more.

Here is how you may use a Vector Search engine within your application search architecture:

Topics

๐Ÿง‘โ€๐Ÿซ Foundation - Learn the core concepts of vector-based information retrieval.

๐ŸŽฌ Use Cases - Understand where it makes sense to use vector search.

๐Ÿ’ต Architecture - Guides on how to use vector search in your architecture.

Foundations

# Label Description
1 Keyword vs Vector Search The difference between standard (TF-IDF) text search and vector search and when to use each.
2 Sparse Vector Tutorial A walkthrough of building your own sparse vector feature extraction engine.
3 Dense Vector Tutorial A walkthrough of building your own dense vector feature extraction engine.
4 Atlas Vector Search Engine Guides that showcase MongoDB Atlas' vector search implementation.
5 Vector Search Comparisons A comparison of the most popular vector search engines.

Use Cases

# Label Description
1 Sentence Similarity Determination of how similar to texts are.
2 Token Classification Classification of text into pre-defined categories.
3 Question and Answering Building systems that automatically answer questions.
4 Personalization Using client data to personalize query results.
5 Automated Synonym Creation Enriching synonyms collection automatically.
6 Summarization Reconstruction of a corpus into less words.
7 Conversational Dialogue response generation.
8 File Search Search the contents of files across multiple modalities

Architecture

One-click model deployment that never leaves your AWS account

# Source Description
1 Reference Architecture Common best-practices for deploying vector search architecture in production.
2 Model Hosting Suggestions on how to host your vector models.
3 Model Versioning Common best-practices for versioning your models as they evolve.
4 Feedback Loops Query re-ranking, learn-to-rank and more.
5 Selecting Models Which model supports your domain-specific tasks best?

Education

Although a challenging topic to grasp, there's a myriad of educational resources at your disposal.

Information Retrieval

Overarching field of education.

Transformer Architecture

Latest breakthrough in the area of converting human content (text, images, etc.) into vector representations. Transformers are a deep learning model that utilize "self-attention", and differentially weigh the significance of each part of the input data.

In order to determine what is deemed relevant, computers need to measure the distance between points, in this case vectors.

Gratitude

This repository wouldn't be possible without several key individuals:

Watch for Changes

This is a living repository and will evolve as I learn and the landscape changes. Please subscribe to changes accordingly via:

More Repositories

1

flask-mvc-boilerplate

Flask boilerlate in MVC format
Python
39
star
2

atlas-search-guide

Reference of foundations and common patterns in Atlas Search.
Jupyter Notebook
24
star
3

Map-My-Experience

Map my experience's photos with Google Maps API and Exif data
HTML
12
star
4

Podcast-Word-Cloud

Podcast transcription
Python
9
star
5

flask-stripe-ecommerce-boilerplate

Stripe implementation on a Flask backend in a nice boilerplate template
HTML
8
star
6

maps-visualizer

Visualize location data extracted from a Google Maps Timeline history JSON file. It enables filtering locations by timestamp and plotting them on a map using custom Mapbox styles.
HTML
7
star
7

paper-as-code

Python
4
star
8

atlas-management

Shell
4
star
9

epstein-docs

Python
4
star
10

mongodb-full-text-search-app

Demonstration of MongoDB's Full Text Search capability within a Flask web application
HTML
3
star
11

try-davi

Davi: Your Personal Medical Billing Advocate
JavaScript
2
star
12

instagram-puller

Pull emails from Instagram bios based on subject
Python
2
star
13

personal

Open sourcing my endeavors
2
star
14

domain-name-checker

check the availability of domain names given combinations of tlds and words
Python
2
star
15

outbox-pattern-mongodb

How an application can use MongoDB to create a `transactional outbox` via the outbox pattern
Python
2
star
16

file-processor

A Python library that uses AI to convert unstructured files (like PDFs, HTML, etc.) into structured data.
Python
1
star
17

collie-connector

Modular and customizable framework for building and deploying accurate generative model applications, such as video montages, chatbots, audio summaries, and image montages.
1
star
18

mongodb-sizing-estimator

Estimate your MongoDB Atlas cluster based on current mongodb usage
Python
1
star
19

Genetic-Report-Tool

Build a "genetic conditions report" of SNPs & alleles using your existing genome CSV
Python
1
star
20

aws-docker-ecr-lambda

python class for creating dynamic code and requirements
Python
1
star