• Stars
    star
    4,701
  • Rank 9,005 (Top 0.2 %)
  • Language
    TypeScript
  • License
    MIT License
  • Created about 6 years ago
  • Updated 4 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Tiny and powerful JavaScript full-text search engine for browser and Node

MiniSearch

CI Build Coverage Status Minzipped Size npm npm downloads types

MiniSearch is a tiny but powerful in-memory fulltext search engine written in JavaScript. It is respectful of resources, and it can comfortably run both in Node and in the browser.

Try out the demo application.

Find the complete documentation and API reference here, and more background about MiniSearch, including a comparison with other similar libraries, in this blog post.

MiniSearch follows semantic versioning, and documents releases and changes in the changelog.

Use case

MiniSearch addresses use cases where full-text search features are needed (e.g. prefix search, fuzzy search, ranking, boosting of fields…), but the data to be indexed can fit locally in the process memory. While you won't index the whole Internet with it, there are surprisingly many use cases that are served well by MiniSearch. By storing the index in local memory, MiniSearch can work offline, and can process queries quickly, without network latency.

A prominent use-case is real time search "as you type" in web and mobile applications, where keeping the index on the client enables fast and reactive UIs, removing the need to make requests to a search server.

Features

  • Memory-efficient index, designed to support memory-constrained use cases like mobile browsers.

  • Exact match, prefix search, fuzzy match, field boosting.

  • Auto-suggestion engine, for auto-completion of search queries.

  • Modern search result ranking algorithm.

  • Documents can be added and removed from the index at any time.

  • Zero external dependencies.

MiniSearch strives to expose a simple API that provides the building blocks to build custom solutions, while keeping a small and well tested codebase.

Installation

With npm:

npm install minisearch

With yarn:

yarn add minisearch

Then require or import it in your project:

// If you are using import:
import MiniSearch from 'minisearch'

// If you are using require:
const MiniSearch = require('minisearch')

Alternatively, if you prefer to use a <script> tag, you can require MiniSearch from a CDN:

<script src="https://cdn.jsdelivr.net/npm/[email protected]/dist/umd/index.min.js"></script>

In this case, MiniSearch will appear as a global variable in your project.

Finally, if you want to manually build the library, clone the repository and run yarn build (or yarn build-minified for a minified version + source maps). The compiled source will be created in the dist folder (UMD, ES6 and ES2015 module versions are provided).

Usage

Basic usage

// A collection of documents for our examples
const documents = [
  {
    id: 1,
    title: 'Moby Dick',
    text: 'Call me Ishmael. Some years ago...',
    category: 'fiction'
  },
  {
    id: 2,
    title: 'Zen and the Art of Motorcycle Maintenance',
    text: 'I can see by my watch...',
    category: 'fiction'
  },
  {
    id: 3,
    title: 'Neuromancer',
    text: 'The sky above the port was...',
    category: 'fiction'
  },
  {
    id: 4,
    title: 'Zen and the Art of Archery',
    text: 'At first sight it must seem...',
    category: 'non-fiction'
  },
  // ...and more
]

let miniSearch = new MiniSearch({
  fields: ['title', 'text'], // fields to index for full-text search
  storeFields: ['title', 'category'] // fields to return with search results
})

// Index all documents
miniSearch.addAll(documents)

// Search with default options
let results = miniSearch.search('zen art motorcycle')
// => [
//   { id: 2, title: 'Zen and the Art of Motorcycle Maintenance', category: 'fiction', score: 2.77258, match: { ... } },
//   { id: 4, title: 'Zen and the Art of Archery', category: 'non-fiction', score: 1.38629, match: { ... } }
// ]

Search options

MiniSearch supports several options for more advanced search behavior:

// Search only specific fields
miniSearch.search('zen', { fields: ['title'] })

// Boost some fields (here "title")
miniSearch.search('zen', { boost: { title: 2 } })

// Prefix search (so that 'moto' will match 'motorcycle')
miniSearch.search('moto', { prefix: true })

// Search within a specific category
miniSearch.search('zen', {
  filter: (result) => result.category === 'fiction'
})

// Fuzzy search, in this example, with a max edit distance of 0.2 * term length,
// rounded to nearest integer. The mispelled 'ismael' will match 'ishmael'.
miniSearch.search('ismael', { fuzzy: 0.2 })

// You can set the default search options upon initialization
miniSearch = new MiniSearch({
  fields: ['title', 'text'],
  searchOptions: {
    boost: { title: 2 },
    fuzzy: 0.2
  }
})
miniSearch.addAll(documents)

// It will now by default perform fuzzy search and boost "title":
miniSearch.search('zen and motorcycles')

Auto suggestions

MiniSearch can suggest search queries given an incomplete query:

miniSearch.autoSuggest('zen ar')
// => [ { suggestion: 'zen archery art', terms: [ 'zen', 'archery', 'art' ], score: 1.73332 },
//      { suggestion: 'zen art', terms: [ 'zen', 'art' ], score: 1.21313 } ]

The autoSuggest method takes the same options as the search method, so you can get suggestions for misspelled words using fuzzy search:

miniSearch.autoSuggest('neromancer', { fuzzy: 0.2 })
// => [ { suggestion: 'neuromancer', terms: [ 'neuromancer' ], score: 1.03998 } ]

Suggestions are ranked by the relevance of the documents that would be returned by that search.

Sometimes, you might need to filter auto suggestions to, say, only a specific category. You can do so by providing a filter option:

miniSearch.autoSuggest('zen ar', {
  filter: (result) => result.category === 'fiction'
})
// => [ { suggestion: 'zen art', terms: [ 'zen', 'art' ], score: 1.21313 } ]

Field extraction

By default, documents are assumed to be plain key-value objects with field names as keys and field values as simple values. In order to support custom field extraction logic (for example for nested fields, or non-string field values that need processing before tokenization), a custom field extractor function can be passed as the extractField option:

// Assuming that our documents look like:
const documents = [
  { id: 1, title: 'Moby Dick', author: { name: 'Herman Melville' }, pubDate: new Date(1851, 9, 18) },
  { id: 2, title: 'Zen and the Art of Motorcycle Maintenance', author: { name: 'Robert Pirsig' }, pubDate: new Date(1974, 3, 1) },
  { id: 3, title: 'Neuromancer', author: { name: 'William Gibson' }, pubDate: new Date(1984, 6, 1) },
  { id: 4, title: 'Zen in the Art of Archery', author: { name: 'Eugen Herrigel' }, pubDate: new Date(1948, 0, 1) },
  // ...and more
]

// We can support nested fields (author.name) and date fields (pubDate) with a
// custom `extractField` function:

let miniSearch = new MiniSearch({
  fields: ['title', 'author.name', 'pubYear'],
  extractField: (document, fieldName) => {
    // If field name is 'pubYear', extract just the year from 'pubDate'
    if (fieldName === 'pubYear') {
      const pubDate = document['pubDate']
      return pubDate && pubDate.getFullYear().toString()
    }

    // Access nested fields
    return fieldName.split('.').reduce((doc, key) => doc && doc[key], document)
  }
})

The default field extractor can be obtained by calling MiniSearch.getDefault('extractField').

Tokenization

By default, documents are tokenized by splitting on Unicode space or punctuation characters. The tokenization logic can be easily changed by passing a custom tokenizer function as the tokenize option:

// Tokenize splitting by hyphen
let miniSearch = new MiniSearch({
  fields: ['title', 'text'],
  tokenize: (string, _fieldName) => string.split('-')
})

Upon search, the same tokenization is used by default, but it is possible to pass a tokenize search option in case a different search-time tokenization is necessary:

// Tokenize splitting by hyphen
let miniSearch = new MiniSearch({
  fields: ['title', 'text'],
  tokenize: (string) => string.split('-'), // indexing tokenizer
  searchOptions: {
    tokenize: (string) => string.split(/[\s-]+/) // search query tokenizer
  }
})

The default tokenizer can be obtained by calling MiniSearch.getDefault('tokenize').

Term processing

Terms are downcased by default. No stemming is performed, and no stop-word list is applied. To customize how the terms are processed upon indexing, for example to normalize them, filter them, or to apply stemming, the processTerm option can be used. The processTerm function should return the processed term as a string, or a falsy value if the term should be discarded:

let stopWords = new Set(['and', 'or', 'to', 'in', 'a', 'the', /* ...and more */ ])

// Perform custom term processing (here discarding stop words and downcasing)
let miniSearch = new MiniSearch({
  fields: ['title', 'text'],
  processTerm: (term, _fieldName) =>
    stopWords.has(term) ? null : term.toLowerCase()
})

By default, the same processing is applied to search queries. In order to apply a different processing to search queries, supply a processTerm search option:

let miniSearch = new MiniSearch({
  fields: ['title', 'text'],
  processTerm: (term) =>
    stopWords.has(term) ? null : term.toLowerCase(), // index term processing
  searchOptions: {
    processTerm: (term) => term.toLowerCase() // search query processing
  }
})

The default term processor can be obtained by calling MiniSearch.getDefault('processTerm').

API Documentation

Refer to the API documentation for details about configuration options and methods.

Browser compatibility

MiniSearch natively supports all modern browsers implementing JavaScript standards, but requires a polyfill when used in Internet Explorer, as it makes use functions like Object.entries, Array.includes, and Array.from, which are standard but not available on older browsers. The package core-js is one such polyfill that can be used to provide those functions.

Contributing

Contributions to MiniSearch are welcome! Please read the contributions guidelines. Reading the design document is also useful to understand the project goals and the technical implementation.

More Repositories

1

jQCloud

jQuery plugin for drawing neat word clouds that actually look like clouds
JavaScript
646
star
2

cubdb

Elixir embedded key/value database
Elixir
555
star
3

immutable

Thread-safe, persistent, immutable collections for the Crystal language
Crystal
203
star
4

guitarstack

Digital guitar effects right in your browser!
JavaScript
200
star
5

react-minisearch

React integration for the MiniSearch client side full-text search library
TypeScript
46
star
6

jquery-machine

Kick-ass finite state machines with jQuery
JavaScript
41
star
7

nerves_thermal_camera

Thermal camera imaging with Elixir, Nerves, Raspberry Pi, and a MLX90640 sensor
Elixir
37
star
8

noiseGen

A jQuery plugin to generate background noise dynamically
JavaScript
31
star
9

cubq

Queue and stack semantics on top of CubDB
Elixir
23
star
10

elixir_bme680

An Elixir library to interface with the BME680 (and BME280) environmental sensor
C
19
star
11

sprockets-chain

Pure JS dependency chain generator for Sprockets
JavaScript
13
star
12

similaria

Scala library implementing a performant and easy to use item-based recommendation engine
Scala
12
star
13

elixir_mlx90640

An Elixir library to interface with the MLX90640 Far Infrared Thermal Sensor Array.
C++
11
star
14

meteor-omnipoll

Quick polls with Meteor
JavaScript
10
star
15

jsterminal

JSterminal, one bookmarklet to rule 'em all
JavaScript
9
star
16

chitchat

An educational object oriented toy language
JavaScript
8
star
17

ablab

Minimal AB-test library for Rails applications
Ruby
7
star
18

submachine

Kick-ass JavaScript finite state machines
CoffeeScript
7
star
19

nerves_rpi4_hid_gadget_poc

Proof of concept of enabling USB-C gadget mode on Raspberry Pi 4 with Nerves
Elixir
6
star
20

Moka

a damn cool static website framework
Ruby
5
star
21

Vimfiles

My messy .vim folder
Vim Script
5
star
22

risp

LISP interpreter in Ruby with macros and interop, because why not?
Ruby
5
star
23

jquery-toCurrency

jQuery plugin to format numbers to currency
JavaScript
5
star
24

wheels-class

Simple and powerful classical inheritance for JavaScript. Fully compatible with CommonJS / Node, AMD and standard browser script tag include.
JavaScript
3
star
25

gwooks

A DSL for quickly creating endpoints for GitHub post-receive webhooks
Ruby
3
star
26

eventspitter

Minimal but versatile JavaScript pub/sub working everywhere
CoffeeScript
3
star
27

dataspec

Describe how your data should look like, validate it and generate examples
JavaScript
2
star
28

loud-accessors

JavaScript attribute accessors (getter and setter) that emit events on change and read
CoffeeScript
2
star
29

miniprofiler-dashboard

A nice UI and dashboard for MiniProfiler
JavaScript
2
star
30

wheels-pubsub

Simple and lightweight PubSub, fully compatible with CommonJS / Node, AMD and standard browser script tag include.
JavaScript
2
star
31

TwitterSimulationModel

An agent-based simulation model of Twitter network
Java
1
star
32

old_lucaongaro.eu

My old personal website
Ruby
1
star
33

lucaongaro.eu

My personal website and blog
CSS
1
star
34

share-experiment

An experiment
JavaScript
1
star
35

pantiltcam

WebRTC controlled pan-tilt night vision camera experiment
HTML
1
star