• Stars
    star
    309
  • Rank 135,306 (Top 3 %)
  • Language
    Ruby
  • License
    MIT License
  • Created about 9 years ago
  • Updated about 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Full-Featured ElasticSearch Ruby Client with a Chainable DSL

search_flip

Full-Featured Elasticsearch Ruby Client with a Chainable DSL

Build Gem Version

Using SearchFlip it is dead-simple to create index classes that correspond to Elasticsearch indices and to manipulate, query and aggregate these indices using a chainable, concise, yet powerful DSL. Finally, SearchFlip supports Elasticsearch 2.x, 5.x, 6.x, 7.x and 8.x. Check section Feature Support for version dependent features.

CommentIndex.search("hello world", default_field: "title").where(visible: true).aggregate(:user_id).sort(id: "desc")

CommentIndex.aggregate(:user_id) do |aggregation|
  aggregation.aggregate(histogram: { date_histogram: { field: "created_at", interval: "month" }})
end

CommentIndex.range(:created_at, gt: Date.today - 1.week, lt: Date.today).where(state: ["approved", "pending"])

Updating from previous SearchFlip versions

Checkout UPDATING.md for detailed instructions.

Comparison with other gems

There are great ruby gems to work with Elasticsearch like e.g. searchkick and elasticsearch-ruby already. However, they don't have a chainable API. Compare yourself.

# elasticsearch-ruby
Comment.search(
  query: {
    query_string: {
      query: "hello world",
      default_operator: "AND"
    }
  }
)

# searchkick
Comment.search("hello world", where: { available: true }, order: { id: "desc" }, aggs: [:username])

# search_flip
CommentIndex.search("hello world").where(available: true).sort(id: "desc").aggregate(:username)

Finally, SearchFlip comes with a minimal set of dependencies.

Reference Docs

SearchFlip has a great documentation. Check youself at http://www.rubydoc.info/github/mrkamel/search_flip

Install

Add this line to your application's Gemfile:

gem 'search_flip'

and then execute

$ bundle

or install it via

$ gem install search_flip

Config

You can change global config options like:

SearchFlip::Config[:environment] = "development"
SearchFlip::Config[:base_url] = "http://127.0.0.1:9200"

Available config options are:

  • index_prefix to have a prefix added to your index names automatically. This can be useful to separate the indices of e.g. testing and development environments.
  • base_url to tell SearchFlip how to connect to your cluster
  • bulk_limit a global limit for bulk requests
  • bulk_max_mb a global limit for the payload of bulk requests
  • auto_refresh tells SearchFlip to automatically refresh an index after import, index, delete, etc operations. This is e.g. useful for testing, etc. Defaults to false.

Usage

First, create a separate class for your index and include SearchFlip::Index.

class CommentIndex
  include SearchFlip::Index
end

Then tell the Index about the index name, the corresponding model and how to serialize the model for indexing.

class CommentIndex
  include SearchFlip::Index

  def self.index_name
    "comments"
  end

  def self.model
    Comment
  end

  def self.serialize(comment)
    {
      id: comment.id,
      username: comment.username,
      title: comment.title,
      message: comment.message
    }
  end
end

Optionally, you can specify a custom type_name, but note that starting with Elasticsearch 7, types are deprecated.

class CommentIndex
  # ...

  def self.type_name
    "comment"
  end
end

You can additionally specify an index_scope which will automatically be applied to scopes, eg. ActiveRecord::Relation objects, passed to #import, #index, etc. This can be used to preload associations that are used when serializing records or to restrict the records you want to index.

class CommentIndex
  # ...

  def self.index_scope(scope)
    scope.preload(:user)
  end
end

CommentIndex.import(Comment.all) # => CommentIndex.import(Comment.all.preload(:user))

To specify a custom mapping:

class CommentIndex
  # ...

  def self.mapping
    {
      properties: {
        # ...
      }
    }
  end

  # ...
end

Please note that you need to specify the mapping without a type name, even for Elasticsearch versions before 7, as SearchFlip will add the type name automatically if neccessary.

To specify index settings:

def self.index_settings
  {
    settings: {
      number_of_shards: 10,
      number_of_replicas: 2
    }
  }
end

Then you can interact with the index:

CommentIndex.create_index
CommentIndex.index_exists?
CommentIndex.delete_index
CommentIndex.update_mapping
CommentIndex.close_index
CommentIndex.open_index

Index records (automatically uses the Bulk API):

CommentIndex.import(Comment.all)
CommentIndex.import(Comment.first)
CommentIndex.import([Comment.find(1), Comment.find(2)])
CommentIndex.import(Comment.where("created_at > ?", Time.now - 7.days))

Query records:

CommentIndex.total_entries
# => 2838

CommentIndex.search("title:hello").records
# => [#<Comment ...>, #<Comment ...>, ...]

CommentIndex.where(username: "mrkamel").total_entries
# => 13

CommentIndex.aggregate(:username).aggregations(:username)
# => {1=>#<SearchFlip::Result doc_count=37 ...>, 2=>... }
...

Please note that you can check the request that will be send to Elasticsearch by calling #request on the query:

CommentIndex.search("hello world").sort(id: "desc").aggregate(:username).request
# => {:query=>{:bool=>{:must=>[{:query_string=>{:query=>"hello world", :default_operator=>:AND}}]}}, ...}

Delete records:

# for Elasticsearch >= 2.x and < 5.x, the delete-by-query plugin is required
# for the following query:

CommentIndex.match_all.delete

# or delete manually via the bulk API:

CommentIndex.bulk do |indexer|
  CommentIndex.match_all.find_each do |record|
    indexer.delete record.id
  end
end

When indexing or deleting documents, you can pass options to control the bulk indexing and you can use all options provided by the Bulk API:

CommentIndex.import(Comment.first, { bulk_limit: 1_000 }, op_type: "create", routing: "routing_key")

# or directly

CommentIndex.create(Comment.first, { bulk_max_mb: 100 }, routing: "routing_key")
CommentIndex.update(Comment.first, ...)

Checkout the Elasticsearch Bulk API docs for more info as well as SearchFlip::Bulk for a complete list of available options to control the bulk indexing of SearchFlip.

Working with Elasticsearch Aliases

You can use and manage Elasticsearch Aliases like the following:

class UserIndex
  include SearchFlip::Index

  def self.index_name
    alias_name
  end

  def self.alias_name
    "users"
  end
end

Then, create an index, import the records and add the alias like:

new_user_index = UserIndex.with_settings(index_name: "users-#{SecureRandom.hex}")
new_user_index.create_index
new_user_index.import User.all
new_user.connection.update_aliases(actions: [
  add: { index: new_user_index.index_name, alias: new_user_index.alias_name }
])

If the alias already exists, you have to remove it as well first within update_aliases.

Please note: with_settings(index_name: '...') returns an anonymous (i.e. temporary) class which inherits from UserIndex and overwrites index_name.

Chainable Methods

SearchFlip supports even more advanced usages, like e.g. post filters, filtered aggregations or nested aggregations via simple to use API methods.

Query/Filter Criteria Methods

SearchFlip provides powerful methods to query/filter Elasticsearch:

  • where

The .where method feels like ActiveRecord's where and adds a bool filter clause to the request:

CommentIndex.where(reviewed: true)
CommentIndex.where(likes: 0 .. 10_000)
CommentIndex.where(state: ["approved", "rejected"])
  • where_not

The .where_not method is like .where, but excluding the matching documents:

CommentIndex.where_not(id: [1, 2, 3])
  • range

Use .range to add a range filter query:

CommentIndex.range(:created_at, gt: Date.today - 1.week, lt: Date.today)
  • filter

Use .filter to add raw filter queries:

CommentIndex.filter(term: { state: "approved" })
  • should

Use .should to add raw should queries:

CommentIndex.should([
  { term: { state: "approved" } },
  { term: { user: "mrkamel" } },
])
  • must

Use .must to add raw must queries:

CommentIndex.must(term: { state: "approved" })
  • must_not

Like must, but excluding the matching documents:

CommentIndex.must_not(term: { state: "approved" })
  • search

Adds a query string query, with AND as default operator:

CommentIndex.search("hello world")
CommentIndex.search("state:approved")
CommentIndex.search("username:a*")
CommentIndex.search("state:approved OR state:rejected")
CommentIndex.search("hello world", default_operator: "OR")
  • exists

Use exists to add an exists query:

CommentIndex.exists(:state)
  • exists_not

Like exists, but excluding the matching documents:

CommentIndex.exists_not(:state)
  • match_all

Simply matches all documents:

CommentIndex.match_all
  • match_none

Simply matches none documents at all:

CommentIndex.match_none
  • all

Simply returns the criteria as is or an empty criteria when called on the index class directly. Useful for chaining.

CommentIndex.all
  • to_query

Sometimes, you want to convert the constraints of a search flip query to a raw query to e.g. use it in a should clause:

CommentIndex.should([
  CommentIndex.range(:likes_count, gt: 10).to_query,
  CommentIndex.search("search term").to_query
])

It returns all added queries and filters, including post filters as a raw query:

CommentIndex.where(state: "new").search("text").to_query
# => {:bool=>{:filter=>[{:term=>{:state=>"new"}}], :must=>[{:query_string=>{:query=>"text", ...}}]}}

Post Query/Filter Criteria Methods

All query/filter criteria methods (#where, #where_not, #range, etc.) are available in post filter mode as well, ie. filters/queries applied after aggregations are calculated. Checkout the Elasticsearch docs for further info.

query = CommentIndex.aggregate(:user_id)
query = query.post_where(reviewed: true)
query = query.post_search("username:a*")

Checkout PostFilterable for a complete API reference.

Aggregations

SearchFlip allows to elegantly specify nested aggregations, no matter how deeply nested:

query = OrderIndex.aggregate(:username, order: { revenue: "desc" }) do |aggregation|
  aggregation.aggregate(revenue: { sum: { field: "price" }})
end

Generally, aggregation results returned by Elasticsearch are returned as a SearchFlip::Result, which basically is a Hashie::Mash, such that you can access them via:

query.aggregations(:username)["mrkamel"].revenue.value

Still, if you want to get the raw aggregations returned by Elasticsearch, access them without supplying any aggregation name to #aggregations:

query.aggregations # => returns the raw aggregation section

query.aggregations["username"]["buckets"].detect { |bucket| bucket["key"] == "mrkamel" }["revenue"]["value"] # => 238.50

Once again, the criteria methods (#where, #range, etc.) are available in aggregations as well:

query = OrderIndex.aggregate(average_price: {}) do |aggregation|
  aggregation = aggregation.match_all
  aggregation = aggregation.where(user_id: current_user.id) if current_user

  aggregation.aggregate(average_price: { avg: { field: "price" }})
end

query.aggregations(:average_price).average_price.value

Even various criteria for top hits aggregations can be specified elegantly:

query = ProductIndex.aggregate(sponsored: { top_hits: {} }) do |aggregation|
  aggregation.sort(:rank).highlight(:title).source([:id, :title])
end

Checkout Aggregatable as well as Aggregation for a complete API reference.

Suggestions

query = CommentIndex.suggest(:suggestion, text: "helo", term: { field: "message" })
query.suggestions(:suggestion).first["text"] # => "hello"

Highlighting

CommentIndex.highlight([:title, :message])
CommentIndex.highlight(:title).highlight(:description)
CommentIndex.highlight(:title, require_field_match: false)
CommentIndex.highlight(title: { type: "fvh" })
query = CommentIndex.highlight(:title).search("hello")
query.results[0]._hit.highlight.title # => "<em>hello</em> world"

Other Criteria Methods

There are even more chainable criteria methods to make your life easier. For a full list, checkout the reference docs.

  • source

In case you want to restrict the returned fields, simply specify the fields via #source:

CommentIndex.source([:id, :message]).search("hello world")
  • paginate, page, per

SearchFlip supports will_paginate and kaminari compatible pagination. Thus, you can either use #paginate or #page in combination with #per:

CommentIndex.paginate(page: 3, per_page: 50)
CommentIndex.page(3).per(50)
  • profile

Use #profile to enable query profiling:

query = CommentIndex.profile(true)
query.raw_response["profile"] # => { "shards" => ... }
  • preload, eager_load and includes

Uses the well known methods from ActiveRecord to load associated database records when fetching the respective records themselves. Works with other ORMs as well, if supported.

Using #preload:

CommentIndex.preload(:user, :post).records
PostIndex.includes(comments: :user).records

or #eager_load

CommentIndex.eager_load(:user, :post).records
PostIndex.eager_load(comments: :user).records

or #includes

CommentIndex.includes(:user, :post).records
PostIndex.includes(comments: :user).records
  • find_in_batches

Used to fetch and yield records in batches using the ElasicSearch scroll API. The batch size and scroll API timeout can be specified.

CommentIndex.search("hello world").find_in_batches(batch_size: 100) do |batch|
  # ...
end
  • find_results_in_batches

Used like find_in_batches, but yielding the raw results (as SearchFlip::Result objects) instead of database records.

CommentIndex.search("hello world").find_results_in_batches(batch_size: 100) do |batch|
  # ...
end
  • find_each

Like #find_in_batches but yielding one record at a time.

CommentIndex.search("hello world").find_each(batch_size: 100) do |record|
  # ...
end
  • find_each_result

Like #find_results_in_batches, but yielding one record at a time.

CommentIndex.search("hello world").find_each_result(batch_size: 100) do |batch|
  # ...
end
  • scroll

You can as well use the underlying scroll API directly, ie. without using higher level scrolling:

query = CommentIndex.scroll(timeout: "5m")

until query.records.empty?
  # ...

  query = query.scroll(id: query.scroll_id, timeout: "5m")
end
  • failsafe

Use #failsafe to prevent any exceptions from being raised for query string syntax errors or Elasticsearch being unavailable, etc.

CommentIndex.search("invalid/request").execute
# raises SearchFlip::ResponseError

# ...

CommentIndex.search("invalid/request").failsafe(true).execute
# => #<SearchFlip::Response ...>
  • merge

You can merge criterias, ie. combine the attributes (constraints, settings, etc) of two individual criterias:

CommentIndex.where(approved: true).merge(CommentIndex.search("hello"))
# equivalent to: CommentIndex.where(approved: true).search("hello")
  • timeout

Specify a timeout to limit query processing time:

CommentIndex.timeout("3s").execute
  • http_timeout

Specify a http timeout for the request which will be send to Elasticsearch:

CommentIndex.http_timeout(3).execute
  • terminate_after

Activate early query termination to stop query processing after the specified number of records has been found:

CommentIndex.terminate_after(10).execute

For further details and a full list of methods, check out the reference docs.

  • custom

You can add a custom clause to the request via custom

CommentIndex.custom(custom_clause: '...')

This can be useful for Elasticsearch features not yet supported via criteria methods by SearchFlip, custom plugin clauses, etc.

Custom Criteria Methods

To add custom criteria methods, you can add class methods to your index class.

class HotelIndex
  # ...

  def self.where_geo(lat:, lon:, distance:)
    filter(geo_distance: { distance: distance, location: { lat: lat, lon: lon } })
  end
end

HotelIndex.search("bed and breakfast").where_geo(lat: 53.57532, lon: 10.01534, distance: '50km').aggregate(:rating)

Using multiple Elasticsearch clusters

To use multiple Elasticsearch clusters, specify a connection within your indices:

MyConnection = SearchFlip::Connection.new(base_url: "http://elasticsearch.host:9200")

class MyIndex
  include SearchFlip::Index

  def self.connection
    MyConnection
  end
end

This allows to use different clusters per index e.g. when migrating indices to new versions of Elasticsearch.

You can specify basic auth, additional headers, request timeouts, etc via:

http_client = SearchFlip::HTTPClient.new

# Basic Auth
http_client = http_client.basic_auth(user: "username", pass: "password")

# Raw Auth Header
http_client = http_client.auth("Bearer VGhlIEhUVFAgR2VtLCBST0NLUw")

# Proxy Settings
http_client = http_client.via("proxy.host", 8080)

# Custom headers
http_client = http_client.headers(key: "value")

# Timeouts
http_client = http_client.timeout(20)

SearchFlip::Connection.new(base_url: "...", http_client: http_client)

AWS Elasticsearch / Signed Requests

To use SearchFlip with AWS Elasticsearch and signed requests, you have to add aws-sdk-core to your Gemfile and tell SearchFlip to use the SearchFlip::AwsSigv4Plugin:

require "search_flip/aws_sigv4_plugin"

MyConnection = SearchFlip::Connection.new(
  base_url: "https://your-elasticsearch-cluster.es.amazonaws.com",
  http_client: SearchFlip::HTTPClient.new(
    plugins: [
      SearchFlip::AwsSigv4Plugin.new(
        region: "...",
        access_key_id: "...",
        secret_access_key: "..."
      )
    ]
  )
)

Again, in your index you need to specify this connection:

class MyIndex
  include SearchFlip::Index

  def self.connection
    MyConnection
  end
end

Routing and other index-time options

Override index_options in case you want to use routing or pass other index-time options:

class CommentIndex
  include SearchFlip::Index

  def self.index_options(comment)
    {
      routing: comment.user_id,
      version: comment.version,
      version_type: "external_gte"
    }
  end
end

These options will be passed whenever records get indexed, deleted, etc.

Instrumentation

SearchFlip supports instrumentation for request execution via ActiveSupport::Notifications compatible instrumenters to e.g. allow global performance tracing, etc.

To use instrumentation, configure the instrumenter:

SearchFlip::Config[:instrumenter] = ActiveSupport::Notifications

Subsequently, you can subscribe to notifcations for request.search_flip:

ActiveSupport::Notifications.subscribe("request.search_flip") do |name, start, finish, id, payload|
  payload[:index] # the index class
  payload[:request] # the request hash sent to Elasticsearch
  payload[:response] # the SearchFlip::Response object or nil in case of errors
end

A notification will be send for every request that is sent to Elasticsearch.

Non-ActiveRecord models

SearchFlip ships with built-in support for ActiveRecord models, but using non-ActiveRecord models is very easy. The model must implement a find_each class method and the Index class needs to implement Index.record_id and Index.fetch_records. The default implementations for the index class are as follows:

class MyIndex
  include SearchFlip::Index

  def self.record_id(object)
    object.id
  end

  def self.fetch_records(ids)
    model.where(id: ids)
  end
end

Thus, if your ORM supports .find_each, #id and #where you are already good to go. Otherwise, simply add your custom implementation of those methods that work with whatever ORM you use.

JSON

SearchFlip is using the Oj gem to generate JSON. More concretely, SearchFlip is using:

Oj.dump({ key: "value" }, mode: :custom, use_to_json: true, time_format: :xmlschema, bigdecimal_as_decimal: false)

The use_to_json option is used for maximum compatibility, most importantly when using rails ActiveSupport::TimeWithZone timestamps, which oj can not serialize natively. However, use_to_json adds performance overhead. You can change the json options via:

SearchFlip::Config[:json_options] = {
  mode: :custom,
  use_to_json: false,
  time_format: :xmlschema,
  bigdecimal_as_decimal: false
}

However, you then have to convert timestamps manually for indexation via e.g.:

class MyIndex
  # ...

  def self.serialize(model)
    {
      # ...

      created_at: model.created_at.to_time
    }
  end
end

Please check out the oj docs for more details.

Feature Support

  • for Elasticsearch 2.x, the delete-by-query plugin is required to delete records via queries
  • #match_none is only available with Elasticsearch >= 5
  • #track_total_hits is only available with Elasticsearch >= 7

Keeping your Models and Indices in Sync

Besides the most basic approach to get you started, SearchFlip currently doesn't ship with any means to automatically keep your models and indices in sync, because every method is very much bound to the concrete environment and depends on your concrete requirements. In addition, the methods to achieve model/index consistency can get arbitrarily complex and we want to keep this bloat out of the SearchFlip codebase.

class Comment < ActiveRecord::Base
  include SearchFlip::Model

  notifies_index(CommentIndex)
end

It uses after_commit (if applicable, after_save, after_destroy and after_touch otherwise) hooks to synchronously update the index when your model changes.

Semantic Versioning

SearchFlip is using Semantic Versioning: SemVer

Links

Contributing

  1. Fork it
  2. Create your feature branch (git checkout -b my-new-feature)
  3. Commit your changes (git commit -am 'Add some feature')
  4. Push to the branch (git push origin my-new-feature)
  5. Create new Pull Request

Running the test suite

Running the tests is super easy. The test suite uses sqlite, such that you only need to install Elasticsearch. You can install Elasticsearch on your own, or you can e.g. use docker-compose:

$ cd search_flip
$ sudo ES_IMAGE=elasticsearch:5.4 docker-compose up
$ rspec

That's it.

More Repositories

1

search_cop

Search engine like fulltext query support for ActiveRecord
Ruby
780
star
2

heartbeat

Use Heartbeat to monitor your Hetzner Failover IP and automatically switch to another server.
Ruby
54
star
3

similarity

Similarity is an optical as well as keyword based image similarity search engine built on top of Lire.
Java
32
star
4

swift_client

Small but powerful client to interact with OpenStack Swift
Ruby
15
star
5

increment_with_sql

Provides increment_with_sql! and decrement_with_sql! for ActiveRecord models
Ruby
7
star
6

spreadshirt_client

Communicate with the spreadshirt API
Ruby
7
star
7

redstream

Using redis streams to keep your primary database in sync with secondary datastores
Ruby
4
star
8

tempfile_for

Create temporary files for in-memory data
Ruby
4
star
9

oauth2_api_client

Small but powerful client around oauth2 and http-rb to interact with APIs
Ruby
4
star
10

run_after_commit

Run code in an ActiveRecord model after it is committed
Ruby
4
star
11

kraps

Kraps allows to process and perform calculations on extremely large datasets in parallel using ruby
Ruby
3
star
12

replicaza

Highly available GTID-only mysql binlog to kafka replicator
Java
3
star
13

apriori

Another ruby apriori wrapper
Ruby
2
star
14

bbque

Queue and process ruby job objects in the background
Ruby
2
star
15

kafka_sync

Using Kafka to keep secondary datastores in sync with your primary datastore
Ruby
2
star
16

grid

Grid
CSS
1
star
17

unicorn-rails

unicorn-rails
Shell
1
star
18

action_backup

micro backup framework in ruby
Ruby
1
star
19

heartbeat53

Monitoring and failover for route53
Ruby
1
star
20

render_object

Map an object to a partial.
Ruby
1
star
21

default_css

Default CSS file
1
star
22

to_ascii

Convert locale dependent characters
Ruby
1
star
23

attachie

Declarative and flexible attachments
Ruby
1
star
24

spella

Multi-language, Multi word, utf-8 spelling correction server for e.g. search engines using a levenshtein automaton and a Trie.
Kotlin
1
star
25

process_manager

A process manager framework for forking, threading and graceful termination
Ruby
1
star
26

s3sync

Sync S3 buckets to your filesystem
Ruby
1
star
27

test-unit-around

Use around instead or in combination with test/unit's setup and teardown methods
Ruby
1
star
28

tab_log

Tab delimited logs with Active Record alike interface
Ruby
1
star
29

to_tag

Transform words into tags
Ruby
1
star
30

index-server

index server
C
1
star
31

solr_csv_indexer

Simply batch index big csv data into solr.
Ruby
1
star
32

to_hash_key

Generate redis-safe hash keys using SHA1 easily
Ruby
1
star
33

search_cop_logo

Logo for search_cop
1
star
34

distributed_job

Keep track of distributed jobs spanning multiple workers using redis
Ruby
1
star
35

rencrypt

CLI to generate/update SSL certificates on hetzner cloud servers using letsencrypt
Ruby
1
star
36

to_pdf

Convert a HTML string to a PDF using htmldoc.
Ruby
1
star
37

cassandra_store

Easy to use ActiveRecord like ORM for Cassandra
Ruby
1
star
38

to_permalink

ToPermalink generates permalinks from arbitrary strings
Ruby
1
star
39

routing-pattern

A powerful, but minimal library to conveniently parse and stringify route patterns
JavaScript
1
star
40

capacity-report

Simply send out a filesystem capacity report
Shell
1
star
41

attachments

Declarative and flexible attachments
Ruby
1
star
42

default_scss

Default Sass CSS file
1
star
43

s3upload

Bulk upload to s3
Ruby
1
star
44

resolvable_hash

Resolve references within a hash
Ruby
1
star
45

solr_precedence

Fixing Solr operator precedence
Ruby
1
star
46

cassandra_record

Powerful ORM for Cassandra
Ruby
1
star
47

significance

Calculate significance of cooccurring words
Ruby
1
star
48

in_vertical_groups_of

Easily generate vertical groups
Ruby
1
star
49

map-reduce-ruby

The easiest way to write distributed, larger than memory map-reduce jobs
Ruby
1
star
50

redlocker

Acquire and keep distributed locks using redis
Ruby
1
star