• Stars
    star
    249
  • Rank 162,987 (Top 4 %)
  • Language
    JavaScript
  • License
    Other
  • Created over 8 years ago
  • Updated over 7 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

A portable, persistent, electron-embeddable fulltext search + document store database for node.js

yunodb

A portable, persistent, electron compatible fulltext search + document store database for node.js. LevelDB underneath.

js-standard-style Travis npm cc-zero

How it works

yuno is a JSON document store with fulltext search. It's meant for embedding in electron apps, focuses solely on text search, and in most cases should handle millions of documents easily.

yuno is pretty basic - it has three components:

  • The document store, which is just the raw JSON objects stored in leveldb
  • The inverted search index, powered by search-index
  • A customisable natural language processing pipeline that is applied to documents before adding them to the index, greatly improving speed and memory usage compared to the vanilla search-index.

None of this is revolutionary - actually it's standard in fulltext-search database engines. And all the pieces exist already in the node ecosystem. But I couldn't find a node fulltext search and document store that could handle millions of documents, persisted on disk, didn't have crazy memory requirements and could be easily bundled into an electron app.

Like, db, y u no exist already??

yuno.jpg

Install

npm install --save yunodb

##Β Use

Create / load a database

yuno(options, callback)

e.g.

var yuno = require('yunodb')

var dbopts = {
  location: './.yuno',
  keyField: 'id',
  indexMap: ['text']
}
var db = yuno(dbopts, (err, dbhandle) => {
  if (err) throw err

  // do stuff with the db
  db = dbhandle
})

opts configures the two persistent datastores. Possible key-value pairs are:

  • location (String, required) - Base directory in which both datastores will be kept.
  • keyField (String, required) - JSONpath specifying the field in each document to be used as a key in the document store.
  • indexMap (Array | Object, required) - JSONpaths specifying the fields in each document to index for fulltext searching. See index mapping below for details.
  • deletable (Boolean, optional) - Whether documents should be deletable. Setting to true increases index size. Default: false.
  • ngramLength (Integer | Array, optional) - ngram length(s) to use when building index.

Index mapping

It is quite rare that all fields in a database should be exposed to the user search. More often, we want to allow the user to search certain fields, but retrieve the full document for each result. The indexMap option allows you to specify how to index documents.

There are two ways to tell yuno how to index:

1. Pass an Array of fields

The simple option - an array of fields to index. The contents of each field will be passed through the default Natural Language Processing pipeline before being added to the search index.

2. Pass an Object mapping fields to processors

To fine-tune the processing on a per-field basis, pass an Object where each key is a field to index. Values can be one of:

  • true/false whether to apply the default NLP pipeline
  • function a custom processing function.

Custom processing take the field value as a single argument, and their return value (either a string or an array) will be tokenised and added to the index.

Add documents

db.add(documents, options, callback)

  • documents, array of JSON-able objects to store
  • options optional, can override the database-wide indexMap option
  • callback, function to call on completion, with a single argument to be passed an error if there was one

e.g.

var docs = [
  { id: 1, text: 'hello '},
  { id: 2, text: 'goodbye '},
  { id: 3, text: 'tortoise '}
]

function done (err) {
  if (err) throw err
  console.log('successfully added', docs.length, 'documents')
}

db.add(docs, done)

or using a custom indexMap:

// trim whitespace
function trim (str) { return str.trim() }

db.add(docs, { text: trim }, doneAdding)

###Β Delete documents

db.del(documents, callback)

  • documents, document (object), id (string), or array of documents or ids
  • callback, function to call on completion, with a single argument to be passed an error if there was one

e.g.

// document
db.del({ id: '1234', otherkey: 'something else' }, done)

// with id
db.del('1234', done)

// array
db.del(['1234', '1235', '1236'], done)

Search

db.search(query, opts, callback)

Returns a cursor that can be used to page through the results. By default the pageSize is 50.

  • query, string search query
  • opts, (optional) options object
  • callback, function to call on completion. Takes two arguments:
    • err error or null
    • results object containing the result metadata and hits

e.g.

var cursor = db.search('tortoise', function(err, results) {
  if (err) throw err

  // first 50 results
  console.log(results)

  cursor.next(function(err, results) {
    // next page in here
  })
})

CLI

yuno has a minimal command-line interface that can be used to create a database from a file containing JSON objects.

Install the CLI:

npm install --global yuno

Create a new database:

yuno create <database path> <JSON data>

The JSON data file must contain JSON objects, rather than an array. For example:

{ "id": "1234", "title": "the coleopterist's handbook" }
{ "id": "4321", "title": "bark and ambrosia beetles of south america" }

You can provide database options as a JSON file using the --opts argument:

yuno create --opts <JSON options> <database path> <JSON data>

Where the options JSON looks like:

{
  "keyField": "id",
  "indexMap": {
    "title": true,
  }
}

Contributing

yuno is being built to serve my use-case of embedding pre-made databases in electron apps. If you have another use-case and would like features added, please open an issue to discuss it - I'm happy to add things that will be widely useful.

Contributions are very welcome. Please open an issue to discuss any changes you would like to PR, or mention in an existing issue that you plan to work on it.

Ideas for improving performance are particularly welcome.

License - CC0

https://creativecommons.org/publicdomain/zero/1.0/

yuno is public domain code. Do whatever you want with it. Credit would be appreciated, but it's not required.

More Repositories

1

transrate

Understand your transcriptome assembly
Ruby
98
star
2

datastructures

A collection of data structures in Ruby for my data structures challenge
Ruby
55
star
3

pdf-narcissist

Hide a PDF inside a thumbnail of its own first page
JavaScript
50
star
4

assemblotron

Automated optimisation of de-novo transcriptome assembly
Ruby
25
star
5

electron-renderify

Browserify transform to allow bundling for Electron renderer processes
JavaScript
17
star
6

biopsy

Biopsy - the Bioinformatic Optimisation System
Ruby
16
star
7

choo-asyncify

make choo's events non-blocking ... πŸš‹ ...
JavaScript
13
star
8

phd

My phd thesis (WIP)
TeX
9
star
9

better-blast

An evolving prototype experimenting with some ways to improve on BLAST. If it works, we'll engineer an industrial-strength suite.
C++
9
star
10

ralink_RT5370_linux_driver

ralink RT5370 linux driver fixed to compile with v3+ kernel
C
8
star
11

slidewinder

This repo has moved to the slidewinder organisation
HTML
8
star
12

bindeps

Binary dependency management for Ruby
Ruby
7
star
13

chain

bioinformatic pipelines with minimal effort
Ruby
7
star
14

mozilla_science_fellowship_application

My application to the Mozilla Science Fellowship, 2015
6
star
15

etherpad-archive

Archive your etherpads!
Shell
5
star
16

java-download

Node module to download a particular Java package (JVM / JDK) from Oracle
JavaScript
5
star
17

choo-kanye

kanye (w)rapper for choo - easily add keyboard shortcuts to your choo app πŸš‚πŸš‹πŸš‹ + πŸš‹πŸŽΆπŸŽ€
JavaScript
5
star
18

mozilla_science_fellowship

Keeping track of my work during my Mozilla Science Lab Fellowship
3
star
19

transrate-paper

Paper for transrate
CSS
2
star
20

installing-node-tools

Cross-platform node tool installation instructions to be linked from READMEs/documentation
2
star
21

array-split-stream

a nodeJS transform stream that splits arrays
JavaScript
2
star
22

beau-selector

Run XPath or CSS selectors on arbitrary X(HT)ML documents in the command-line
CoffeeScript
2
star
23

bioruby-blast

A wrapper for BLAST+ to make it an easy dependency for gems
Ruby
1
star
24

hyperdrive-binary-search

Binary search for entries in a hyperdrive archive by name
JavaScript
1
star
25

transrate-viewer

experiments in d3.js for viewing transrate results
JavaScript
1
star
26

orb14_bestpractice

Sketch of a community site for best-practise data processing and reporting in bioinformatics
Ruby
1
star
27

vercel-headless-scraper-api

Examples of using vercel workers to turn any website into an API
JavaScript
1
star
28

solvers-gene-motifs

Solvers.io project to predict gene expression from motif combinations in promoters
1
star
29

meteor-pagedown-bootstrap

Meteor smart package for pagedown-bootstrap
JavaScript
1
star
30

dexperimentr

Differential expression experiment workflows
R
1
star
31

EBSeq

clone of EBSeq from bioconductor - beware experimental features
R
1
star
32

dreamcatcher

use nightmare.js to automate tricky download / scraping scenarios
JavaScript
1
star
33

fs-readstream-progress

fs.createReadStream wrapper that emits progress events. Also works with hyperdrive.
JavaScript
1
star
34

graphsample

Subsample FASTQ by sampling connected components of a de-Bruijn graph
C++
1
star
35

lorem-ipsum-dom

Returns a div containing the full 4500 words of Lorem Ipsum, in paragraphs
1
star
36

TGAC-2014-genome-annotation

Course materials for the genome annotation part of the TGAC SeqAhead course 2014
JavaScript
1
star