• Stars
    star
    54
  • Rank 541,385 (Top 11 %)
  • Language
    Scala
  • License
    MIT License
  • Created almost 8 years ago
  • Updated about 3 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Light-weight convenience wrapper around Lucene to simplify complex tasks and add Scala sugar.

lucene4s

Build Status Stories in Ready Gitter Maven Central

Light-weight convenience wrapper around Lucene to simplify complex tasks and add Scala sugar.

Setup

lucene4s is published to Sonatype OSS and Maven Central currently supporting Scala 2.11, 2.12, 2.13, and 3.0.

Configuring the dependency in SBT simply requires:

libraryDependencies += "com.outr" %% "lucene4s" % "1.11.1"

Using

Imports

You may find yourself needing other imports depending on what you're doing, but the majority of functionality can be achieved simply importing com.outr.lucene4s._:

import com.outr.lucene4s._

Creating a Lucene Instance

Lucene is the object utilized for doing anything with Lucene, so you first need to instantiate it:

val directory = Paths.get("index")
val lucene = new DirectLucene(Nil, directory = Option(directory))

NOTE: If you leave directory blank or set it to None (the default) it will use an in-memory index.

Creating Fields

For type-safety and convenience we can create the fields we'll be using in the document ahead of time:

val name = lucene.create.field[String]("name")
val address = lucene.create.field[String]("address")

Inserting Documents

Inserting is quite easy using the document builder:

lucene.doc().fields(name("John Doe"), address("123 Somewhere Rd.")).index()
lucene.doc().fields(name("Jane Doe"), address("123 Somewhere Rd.")).index()

Querying Documents

Querying documents is just as easy with the query builder:

val paged = lucene.query().sort(Sort(name)).search()
paged.results.foreach { searchResult =>
  println(s"Name: ${searchResult(name)}, Address: ${searchResult(address)}")
}

This will return a PagedResults instance with the page size set to the limit. There are convenience methods for navigating the pagination and accessing the results.

The above code will output:

Name: John Doe, Address: 123 Somewhere Rd.
Name: Jane Doe, Address: 123 Somewhere Rd.

Highlighting Results

Though querying is nice, we may want to stylize the output to show the matched results. This is pretty simple:

val paged = lucene.query().sort(Sort(name)).filter(fuzzy(name("jhn"))).highlight().search()
paged.results.foreach { searchResult =>
  val highlighting = searchResult.highlighting(name).head
  println(s"Fragment: ${highlighting.fragment}, Word: ${highlighting.word}")
}

The above code will output:

Fragment: <em>John</em> Doe, Word: John
Fragment: <em>Jane</em> Doe, Word: Jane

Faceted Searching

See https://github.com/outr/lucene4s/blob/master/implementation/src/test/scala/tests/FacetsSpec.scala

Full-Text Searching

In lucene4s the Lucene instance holds a fullText Field that contains a concatenation of all the fields that are configured as fullTextSearchable. This defaults to Lucene.defaultFullTextSearchable which defaults to false.

The fullText field is the default field used for searches if it's not specified in the SearchTerm. Let's see an example:

val paged = lucene.query().filter(wildcard("doe*")).search()
paged.total should be(4)
paged.results(0)(firstName) should be("John")
paged.results(1)(firstName) should be("Jane")
paged.results(2)(firstName) should be("Baby")
paged.results(3)(firstName) should be("James")

For a complete example, see: https://github.com/outr/lucene4s/blob/master/implementation/src/test/scala/tests/FullTextSpec.scala

Keyword Searching

As we saw previously, the fullText field provides us with a concatenation of all fields configured to be fullTextSearchable. In addition, if you create an instance of KeywordIndexing you can query against a no-duplicates index of keywords for the fullText (although you can override defaults to apply keyword indexing to any field). All we have to do is create and instance referencing the Lucene instance and the name (used for storage purposes):

val keywordIndexing = KeywordIndexing(lucene, "keywords")
val keywords = keywordIndexing.search("do*")
println("Keywords: ${keywords.results.map(_.word).mkString(", ")}")

The above code would output:

Keywords: Doe

For the complete example see: https://github.com/outr/lucene4s/blob/master/implementation/src/test/scala/tests/SimpleSpec.scala

Case Class Support

lucene4s provides a powerful Macro-based system to generate two-way mappings between case classes and Lucene fields at compile-time. This is accomplished through the use of Searchable. The setup is pretty simple.

Setup

First we need to define a case class to model the data in the index:

case class Person(id: Int, firstName: String, lastName: String, age: Int, address: String, city: String, state: String, zip: String)

As you can see, this is a bare-bones case class with nothing special about it.

Next we need to define a Searchable trait the defines the unique identification for update and delete:

trait SearchablePerson extends Searchable[Person] {
  // This is necessary for update and delete to reference the correct document.
  override def idSearchTerms(person: Person): List[SearchTerm] = List(exact(id(person.id)))
  
  /*
    Though at compile-time all fields will be generated from the params in `Person`, for code-completion we can define
    an unimplemented method in order to properly reference the field. This will still compile without this definition,
    but most IDEs will complain.
   */
  def id: Field[Int]
}

As the last part of our set up we simply need to generate it from our Lucene instance:

val people = lucene.create.searchable[SearchablePerson]

Inserting

Now that we've configured everything inserting a person is trivial:

people.insert(Person(1, "John", "Doe", 23, "123 Somewhere Rd.", "Lalaland", "California", "12345")).index()

Notice that we still have to call index() at the end for it to actually invoke. This allows us to do more advanced tasks like adding facets, adding non-Searchable fields, etc. before actually inserting.

Updating

Now lets try updating our Person:

people.update(Person(1, "John", "Doe", 23, "321 Nowhere St.", "Lalaland", "California", "12345")).index()

As you can see here, the signature is quite similar to insert. Internally this will utilize idSearchTerms as we declared previously to apply the update. In this case that means as long as we don't change the id (1) then calls to update will replace an existing record if one exists.

Querying

Querying works very much the same as in the previous examples, except we get our QueryBuilder from our people instance:

val paged = people.query().search()
paged.entries.foreach { person =>
  println(s"Person: $person")
}

Note that instead of calling paged.results we call paged.entries as it represents the conversion to Person. We can still use paged.results if we want access to the SearchResult like before.

Deleting

Deleting is just as easy as inserting and updating:

people.delete(Person(1, "John", "Doe", 23, "321 Nowhere St.", "Lalaland", "California", "12345"))

Additional Information

All Searchable implementations automatically define a docType field that is used to uniquely separate different Searchable instances so you don't have to worry about multiple different instances overlapping.

For more examples see https://github.com/outr/lucene4s/blob/master/implementation/src/test/scala/tests/SearchableSpec.scala

Geospatial Support

One of the great features of Lucene is geospatial querying and what Lucene wrapper would be complete without it?

Creating a Spatial Field

In order to create a stored, queryable, filterable, and sortable latitude and longitude you need only create a SpatialPoint field:

val location: Field[SpatialPoint] = lucene.create.field[SpatialPoint]("location")

Sorting Nearest a Point

Most of the time it's most useful to take an existing latitude and longitude and sort your results returning the nearest documents to that location:

val paged = lucene.query().sort(Sort.nearest(location, SpatialPoint(40.7142, -74.0119))).search()

Filtering by Distance

If you want to filter your results to only include entries within a certain range of a location:

val newYorkCity = SpatialPoint(40.7142, -74.0119)
val paged = lucene
  .query()
  .sort(Sort.nearest(location, newYorkCity))
  .filter(spatialDistance(location, newYorkCity, 50.miles))
  .search()

More Repositories

1

scribe

The fastest logging library in the world. Built from scratch in Scala and programmatically configurable.
Scala
514
star
2

youi

Next generation user interface and application development in Scala and Scala.js for web, mobile, and desktop.
Scala
209
star
3

reactify

The first and only true Functional Reactive Programming framework for Scala.
Scala
85
star
4

scarango

ArangoDB client written in Scala
Scala
59
star
5

scalarelational

Type-Safe framework for defining, modifying, and querying SQL databases
Scala
58
star
6

media4s

Scala command-line wrapper around ffmpeg, ffprobe, ImageMagick, and other tools relating to media.
Scala
34
star
7

perfolation

Performance focused interpolation
Scala
30
star
8

profig

Powerful configuration management for Scala (JSON, properties, command-line arguments, and environment variables)
Scala
28
star
9

sgine

Scala Engine for OpenGL-based Desktop, Android, and iOS game and business development.
Scala
23
star
10

mailgun4s

Mailgun API implementation in Scala
Scala
17
star
11

neo4akka

Neo4j Scala client using Akka-Http
Scala
15
star
12

powerscala

Powerful framework providing many useful utilities and features on top of the Scala language.
Scala
15
star
13

scala-stripe

Complete Browser and Server client integration of Stripe in Scala and Scala.js
Scala
12
star
14

spice

Powerful client / server technology for Scala
Scala
9
star
15

jefe

Manages installation, updating, downloading, launching, error reporting, and more for your application.
Scala
8
star
16

googleapi.scala.js

Wrapper around Google APIs
Scala
6
star
17

scalajs-pixijs

Scala.js facade for Pixi.js
JavaScript
6
star
18

giant-scala

Advanced functionality for working with MongoDB in Scala
Scala
6
star
19

scalapass

Useful tools for managing storage and validation of passwords in Scala applications
Scala
5
star
20

pdf4s

Simplified wrapper to create PDFs in Scala
Scala
5
star
21

outrgl

DEPRECATED: Please use http://youi.io going forward
Scala
5
star
22

nextui

UI Abstraction Framework
Scala
4
star
23

youi-designer

User interface designer tool to create, edit, import, export, and generate user interfaces for youi.
Scala
4
star
24

pmc

Project Management in Code - An incredibly straight-forward project management and build tool for Scala.
Scala
4
star
25

lightdb

Bare Metal Modular Database
Scala
4
star
26

youi-template

Infrastructure for working with existing HTML files.
Scala
4
star
27

uberzip

Very fast multi-threaded unzipping utility.
Scala
3
star
28

youi-example

Example application built on YouI
Scala
3
star
29

robobrowser

Headless Browser wrapper library providing lots of features for API-access
Scala
3
star
30

hyperscala

DEPRECATED - See https://github.com/outr/youi for something far better.
Scala
3
star
31

youi-plugin

SBT plugin for use with YouI projects
Scala
3
star
32

outrbackup

Multi-threaded backup system.
Scala
2
star
33

iconsole

Web-based terminal / console with modular integration and distributed connectivity
Scala
2
star
34

async

Scala and Scala.js framework to execute and schedule asynchronous tasks
Scala
2
star
35

batcher

Command-line tool to batch operations, pause, save, and control concurrency
Scala
2
star
36

jsdoc2scalajs

Automated conversion of JSDocs to Scala.js facades.
Scala
1
star
37

smartystreets-scala-sdk

Scala SDK for SmartyStreets (https://smartystreets.com)
Scala
1
star
38

scalarelational-manual

Source for generating the ScalaRelational manual
Scala
1
star
39

webmidi.scala.js

Scala.js facade for Web MIDI API and https://github.com/cwilso/WebMIDIAPIShim
Scala
1
star
40

sgine-desktop.g8

Desktop-only template for Sgine
Scala
1
star
41

jar-heaven

The final solution to JAR Hell
Scala
1
star
42

torrent

Prototype for bittorrent management in Scala
Scala
1
star
43

scalajs-fabricjs

Facade around Fabric.js for Scala.js
Scala
1
star
44

geoscala

Locational data index that is full-text searchable and can update itself. Complete geospatial sorting and filtering support.
Scala
1
star