• Stars
    star
    165
  • Rank 228,906 (Top 5 %)
  • Language
    Jupyter Notebook
  • License
    Apache License 2.0
  • Created over 5 years ago
  • Updated 2 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Set of Jupyter notebooks demonstrating Learning to Rank integrated with Solr and Elasticsearch

Hello LTR :)

The overall goal of this project is to demonstrate all of the steps required to work with LTR in Elasticsearch or Solr. There's two modes of running. Just running and editing notebooks in a docker container. Or local development (also requiring docker to run the search engine).

No fuss setup: You just want to play with LTR

Follow these steps if you're just playing around & are OK with possibly losing some work (all notebooks exist just in the docker container)

With docker & docker-compose simply run

docker-compose up

at the root dir and go to town!

This will run jupyter and all search engines in Docker containers. Check that each is up at the default ports:

  • Solr: localhost:8983
  • Elasticsearch: localhost:9200
  • Kibana: localhost:5601
  • Jupyter: localhost:8888

You want to build your own LTR notebooks

Follow these steps if you want to do more serious work with the notebooks. For example, if you want to build a demo with your work's data or something you want to preserve later.

Run your search engine with Docker

You probably just want to work with one search engine. So whichever one you're working with, launch that search engine in Docker.

Running Solr w/ LTR

Setup Solr with docker compose to work with just Solr examples:

cd notebooks/solr
docker-compose up

Running Elasticsearch w/ LTR

Setup Elasticsearch with docker compose to work with just Elasticsearch examples:

cd notebooks/elasticsearch
docker-compose up

Run Jupyter locally w/ Python 3 and all prereqs

Setup Python requirements

  • Ensure Python 3.7 or later is installed on your system
  • Create a virtual environment: python3 -m venv venv
  • Start the virtual environment: source venv/bin/activate
  • Check install tooling is up to date python -m pip install -U pip wheel setuptools
  • Install the requirements pip install -r requirements.txt

Note: The above commands should be run from the root folder of the project.

Start Jupyter notebook and confirm operation

  • Run jupyter notebook
  • Browse to notebooks/{search_engine}/{collection}
  • Open either the "hello-ltr (Solr)" or "hello-ltr (ES)" as appropriate and ensure you get a graph at the last cell

Tests

Automatically run everything...

NB: It may be necessary to increase the number of open files on MacOS to a higher value than the default 256 for the tests to complete successfully. Use:

$ ulimit -n 4096

to increase the value to a sensible amount.

To run a full suite of tests, such as to verify a PR, you can simply run

./tests/test.sh

Optionally with containers rebuilt

./tests/test.sh --rebuild-containers

Failing tests will have their output in tests/last_run.ipynb

While developing...

For more informal development:

  • Startup the Solr and ES Docker containers
  • Do your development
  • Run the command as needed: python tests/run_most_nbs.py
  • Tests fail if notebooks return any errors
    • The failing notebook will be stored at tests/last_run.ipynb

More Repositories

1

elasticsearch-learning-to-rank

Plugin to integrate Learning to Rank (aka machine learning for better relevance) with Elasticsearch
Java
1,480
star
2

relevant-search-book

Code and Examples for Relevant Search
Jupyter Notebook
300
star
3

quepid

Improve your Elasticsearch, OpenSearch, Solr, Vectara, Algolia and Custom Search search quality.
Ruby
284
star
4

elyzer

"Stop worrying about Elasticsearch analyzers", my therapist says
Python
153
star
5

splainer

Elasticsearch/Solr Sandbox for exploring explain information and tweaking
JavaScript
135
star
6

hello-nlp

A natural language search microservice
Python
96
star
7

awesome-search-relevance

Tools and other things for people who work on search relevance & information retrieval
82
star
8

Spyglass

Simple search results with Solr and EmberJS
JavaScript
58
star
9

solr-to-es

Migrate a Solr node to an Elasticsearch index.
Python
54
star
10

lucene-query-example

Educational Examle of a custom Lucene Query & Scorer
Java
48
star
11

solr_nginx

Starter Reverse Proxy Configuration for Solr
47
star
12

RankyMcRankFace

Hardened Fork of Ranklib learning to rank library
Java
44
star
13

SemanticSearchInNumpy

XSLT
44
star
14

hangry

Vector search in Lucene based search attempting to use just the existing Lucene data structures (experimental)
Java
43
star
15

trireme

Migration tool providing support for Apache Cassandra, DataStax Enterprise Cassandra, & DataStax Enterprise Solr.
Python
37
star
16

elastic-graph-recommender

Building recommenders with Elastic Graph!
JavaScript
37
star
17

elasticsearch-ltr-demo

This demo uses data from TheMovieDB (TMDB) to demonstrate using Ranklib learning to rank models with Elasticsearch.
HTML
36
star
18

lazy-semantic-indexing

Elasticsearch Latent Semantic Indexing experimentation
Python
33
star
19

pdf-discovery-demo

Demonstration of searching PDF document with Solr, Tika, and Tesseract
JavaScript
30
star
20

match-query-parser

Search a single field with different query time analyzers in Solr
Java
25
star
21

splainer-search

Angular JS Solr and Elasticsearch and OpenSearch Diagnostic Search Services
JavaScript
25
star
22

tmdb_dump

Dump TheMovieDB
Python
23
star
23

es-tmdb

Elasticsearch TMDB examples
Python
21
star
24

solr-tmdb

TheMovieDB in Solr
Python
19
star
25

StackExchangeSolrIndexing

AutoTaxonomyExtractionAndTagging
XML
18
star
26

skipchunk

Extracts a latent knowledge graph from text and index/query it in elasticsearch or solr
Python
18
star
27

cfn-solr

Cloud formation script for solr servers
Shell
16
star
28

search-metrics

Python functions for popular relevance metrics (ndcg, err, etc)
Python
15
star
29

solr_angular_demo

A little search widget for instant Solr search with angular
JavaScript
15
star
30

lucene-bm25f

BM25F demo with lucene using BlendedTermQuery and a custom similarity
Java
15
star
31

bearded-wookie

An experiment in visualizing your Solr index via term counts, document counts, and memory usage per field and data type.
CSS
15
star
32

elasticsearch-image-search

Stupid Experiments in Elasticsearch Image Search
Jupyter Notebook
14
star
33

ubi

User Behavior Insights standard schema
13
star
34

solr-movielens-recommender

Movielens collaborative filtering with Solr streaming expression
Python
11
star
35

grand_central

Docker & Kubernetes deployment system for dynamic environments.
Java
11
star
36

agent_q

Headless agent for test driven relevancy with Quepid.com
Ruby
10
star
37

ltr-synth-judg

Experiments in creating synthetic training data for learning to rank
Python
9
star
38

payload-component

Solr component that surfaces payloads for matching terms
Java
9
star
39

goRank

click tracking for creating judgement lists for search-y stuff
Go
8
star
40

puppet-solr

Puppet module for installing solr with a stand alone jetty server
Shell
7
star
41

SolrSwan

SolrSwan is a query parser and highlighter for Solr that accepts proximity and Boolean queries.
Java
7
star
42

semantic-search-course

Semantic Search Course, Originally delivered at Code4Lib
Python
7
star
43

Sample-Spark-Project

Sample Spark project with Scala and SBT
Scala
7
star
44

solr_dump

Dump Solr docs to file; Write dumped docs to a Solr
Python
7
star
45

lucene_codec_hello_world

Starting point and instructions on developing a Lucene Codec
Java
7
star
46

solr-docker

Sample Dockerfiles for running Solr in a container
6
star
47

o19s-lambda

AWS Lambda Functions to make your life easier.
JavaScript
6
star
48

StackExchangeElasticSearch

Playing with ElasticSearch and the SciFi Stackexchange Dataset
Python
6
star
49

highlighting-pdf-viewer

A component (written in Vue) that supports highlighting of words in the PDF document.
Vue
6
star
50

opensearch-ubi

OpenSearch plugin for User Behavior Insights
Java
6
star
51

elasticsearch-vagrant

An ubuntu 14.04 vagrant box running Elasticsearch
Shell
5
star
52

jackhanna

Simple CLI for Zookeeper
Java
5
star
53

tlre-nlp

Materials for "Think Like A Relevance Engineer - NLP" Training
Jupyter Notebook
5
star
54

keel

This gem provides a few easy to run rake tasks to deploy your Rails application to a Kubernetes cluster.
Ruby
5
star
55

bad-libs

πŸ“ Automatically converts any book into a Mad-Libs style game of silliness using spaCy. Free Charles Dickens included!
Jupyter Notebook
4
star
56

elasticsearch-query-builder-example

Basic Elasticsearch Query Builder Plugin
Java
4
star
57

natural-language-search

Colaboratory notebooks for OSC's Natural Language Search training
Jupyter Notebook
4
star
58

word2vec-experiments

Some experimentation with word2vec
Jupyter Notebook
3
star
59

trec-news-index

Index for the TREC Washington Post corpus
Jupyter Notebook
3
star
60

twittalytics

Twitter Analytics with Cassandra
Python
3
star
61

solr-monitor

Java
2
star
62

search-viz

Various experiments demonstrating pairing realtime visualizations with search results.
JavaScript
2
star
63

tm-import

Importing public domain Trademark XML from Google
Go
2
star
64

elasticsearch-heatmap

Java
2
star
65

o19s-blog-ltr

Using the Elasticsearch LTR demo w/ some hand-created judgments
Python
2
star
66

JodaTimeCodecs

A collection of Cassandra TypeCodecs for serializing and deserializing Joda Time objects.
Java
2
star
67

Spark-Cassandra-Demo

Demo code for loading data into Cassandra and Solr with Spark.
Java
2
star
68

trec-podcasts-index

Index Spotify's 100k podcasts dataset into Elasticsearch
Python
2
star
69

ispy_component

Relevance debugging component for Solr
Java
2
star
70

quepid-jupyterlite

Jupyter notebooks to help with search relevancy measurements, optimized for Quepid.
Jupyter Notebook
2
star
71

clustering-lowes-grouts

Code to support a blog post about extracting tags from Lowes.com for clustering unsanded grout search results
JavaScript
2
star
72

visualizing-signals

A Practical Introduction to Exploring and Visualizing E-Commerce Search Signal Data
Shell
2
star
73

solr-query-parser-demo

A "surround"-like and capitalization custom query parsers demo
Java
2
star
74

user-behavior-insights-elasticsearch

User Behavior Insights (UBI) plugin for Elasticsearch
Java
2
star
75

metric-plots

Plots for search metrics nDCG and ERR
JavaScript
1
star
76

jupyter-blogs

Drafts of Doug's Jupyter Notebook Blog Posts
Python
1
star
77

os-tmdb

TLRE OpenSearch
Python
1
star
78

ndoch-trademark-challenge

Applications built for the National Day of Civic Hacking's USPTO Trademarks Challenge
Ruby
1
star
79

movielens-judgments

experiments using movielens genome tags as an experimental ltr training set
Python
1
star
80

training_coms

R scripts to manage bulk training communications and certificate generation
R
1
star
81

jarjar

Joint Analysis Review of Judgements And Raters
Jupyter Notebook
1
star
82

puppet-modules

puppet modules for o19s
Puppet
1
star
83

thats-trackable

Running app for XC team.
Ruby
1
star
84

ggoodggraphics

The grammer of graphics is powerful and now in Python thanks for `plotnine`!
Jupyter Notebook
1
star