• Stars
    star
    780
  • Rank 55,956 (Top 2 %)
  • Language
    Ruby
  • License
    MIT License
  • Created almost 10 years ago
  • Updated 8 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Search engine like fulltext query support for ActiveRecord

SearchCop

Build Status Code Climate Gem Version

search_cop

SearchCop extends your ActiveRecord models to support fulltext search engine like queries via simple query strings and hash-based queries. Assume you have a Book model having various attributes like title, author, stock, price, available. Using SearchCop you can perform:

Book.search("Joanne Rowling Harry Potter")
Book.search("author: Rowling title:'Harry Potter'")
Book.search("price > 10 AND price < 20 -stock:0 (Potter OR Rowling)")
# ...

Thus, you can hand out a search query string to your models and you, your app's admins and/or users will get powerful query features without the need for integrating additional third party search servers, since SearchCop can use fulltext index capabilities of your RDBMS in a database agnostic way (currently MySQL and PostgreSQL fulltext indices are supported) and optimizes the queries to make optimal use of them. Read more below.

Complex hash-based queries are supported as well:

Book.search(author: "Rowling", title: "Harry Potter")
Book.search(or: [{author: "Rowling"}, {author: "Tolkien"}])
Book.search(and: [{price: {gt: 10}}, {not: {stock: 0}}, or: [{title: "Potter"}, {author: "Rowling"}]])
Book.search(or: [{query: "Rowling -Potter"}, {query: "Tolkien -Rings"}])
Book.search(title: {my_custom_sql_query: "Rowl"}})
# ...

Installation

Add this line to your application's Gemfile:

gem 'search_cop'

And then execute:

$ bundle

Or install it yourself as:

$ gem install search_cop

Usage

To enable SearchCop for a model, include SearchCop and specify the attributes you want to expose to search queries within a search_scope:

class Book < ActiveRecord::Base
  include SearchCop

  search_scope :search do
    attributes :title, :description, :stock, :price, :created_at, :available
    attributes comment: ["comments.title", "comments.message"]
    attributes author: "author.name"
    # ...
  end

  has_many :comments
  belongs_to :author
end

You can of course as well specify multiple search_scope blocks as you like:

search_scope :admin_search do
  attributes :title, :description, :stock, :price, :created_at, :available

  # ...
end

search_scope :user_search do
  attributes :title, :description

  # ...
end

How does it work

SearchCop parses the query and maps it to an SQL Query in a database agnostic way. Thus, SearchCop is not bound to a specific RDBMS.

Book.search("stock > 0")
# ... WHERE books.stock > 0

Book.search("price > 10 stock > 0")
# ... WHERE books.price > 10 AND books.stock > 0

Book.search("Harry Potter")
# ... WHERE (books.title LIKE '%Harry%' OR books.description LIKE '%Harry%' OR ...) AND (books.title LIKE '%Potter%' OR books.description LIKE '%Potter%' ...)

Book.search("available:yes OR created_at:2014")
# ... WHERE books.available = 1 OR (books.created_at >= '2014-01-01 00:00:00.00000' and books.created_at <= '2014-12-31 23:59:59.99999')

SearchCop is using ActiveSupport's beginning_of_year and end_of_year methods for the values used in building the SQL query for this case.

Of course, these LIKE '%...%' queries won't achieve optimal performance, but check out the section below on SearchCop's fulltext capabilities to understand how the resulting queries can be optimized.

As Book.search(...) returns an ActiveRecord::Relation, you are free to pre- or post-process the search results in every possible way:

Book.where(available: true).search("Harry Potter").order("books.id desc").paginate(page: params[:page])

Security

When you pass a query string to SearchCop, it gets parsed, analyzed and mapped to finally build up an SQL query. To be more precise, when SearchCop parses the query, it creates objects (nodes), which represent the query expressions (And-, Or-, Not-, String-, Date-, etc Nodes). To build the SQL query, SearchCop uses the concept of visitors like e.g. used in Arel, such that, for every node there must be a visitor, which transforms the node to SQL. When there is no visitor, an exception is raised when the query builder tries to "visit" the node. The visitors are responsible for sanitizing the user supplied input. This is primilarly done via quoting (string-, table-name-, column-quoting, etc). SearchCop is using the methods provided by the ActiveRecord connection adapter for sanitizing/quoting to prevent SQL injection. While we can never be 100% safe from security issues, SearchCop takes security issues seriously. Please report responsibly via security at flakks dot com in case you find any security related issues.

json/jsonb/hstore

SearchCop supports json fields for MySQL, as well as json, jsonb and hstore fields for postgres. Currently, field values are always expected to be strings and no arrays are supported. You can specify json attributes via:

search_scope :search do
  attributes user_agent: "context->browser->user_agent"

  # ...
end

where context is a json/jsonb column which e.g. contains:

{
  "browser": {
    "user_agent": "Firefox ..."
  }
}

Fulltext index capabilities

By default, i.e. if you don't tell SearchCop about your fulltext indices, SearchCop will use LIKE '%...%' queries. Unfortunately, unless you create a trigram index (postgres only), these queries can not use SQL indices, such that every row needs to be scanned by your RDBMS when you search for Book.search("Harry Potter") or similar. To avoid the penalty of LIKE queries, SearchCop can exploit the fulltext index capabilities of MySQL and PostgreSQL. To use already existing fulltext indices, simply tell SearchCop to use them via:

class Book < ActiveRecord::Base
  # ...

  search_scope :search do
    attributes :title, :author

    options :title, :type => :fulltext
    options :author, :type => :fulltext
  end

  # ...
end

SearchCop will then transparently change its SQL queries for the attributes having fulltext indices to:

Book.search("Harry Potter")
# MySQL: ... WHERE (MATCH(books.title) AGAINST('+Harry' IN BOOLEAN MODE) OR MATCH(books.author) AGAINST('+Harry' IN BOOLEAN MODE)) AND (MATCH(books.title) AGAINST ('+Potter' IN BOOLEAN MODE) OR MATCH(books.author) AGAINST('+Potter' IN BOOLEAN MODE))
# PostgreSQL: ... WHERE (to_tsvector('simple', books.title) @@ to_tsquery('simple', 'Harry') OR to_tsvector('simple', books.author) @@ to_tsquery('simple', 'Harry')) AND (to_tsvector('simple', books.title) @@ to_tsquery('simple', 'Potter') OR to_tsvector('simple', books.author) @@ to_tsquery('simple', 'Potter'))

Obviously, these queries won't always return the same results as wildcard LIKE queries, because we search for words instead of sub-strings. However, fulltext indices will usually of course provide better performance.

Moreover, the query above is not yet perfect. To improve it even more, SearchCop tries to optimize the queries to make optimal use of fulltext indices while still allowing to mix them with non-fulltext attributes. To improve queries even more, you can group attributes and specify a default field to search in, such that SearchCop must no longer search within all fields:

search_scope :search do
  attributes all: [:author, :title]

  options :all, :type => :fulltext, default: true

  # Use default: true to explicitly enable fields as default fields (whitelist approach)
  # Use default: false to explicitly disable fields as default fields (blacklist approach)
end

Now SearchCop can optimize the following, not yet optimal query:

Book.search("Rowling OR Tolkien stock > 1")
# MySQL: ... WHERE ((MATCH(books.author) AGAINST('+Rowling' IN BOOLEAN MODE) OR MATCH(books.title) AGAINST('+Rowling' IN BOOLEAN MODE)) OR (MATCH(books.author) AGAINST('+Tolkien' IN BOOLEAN MODE) OR MATCH(books.title) AGAINST('+Tolkien' IN BOOLEAN MODE))) AND books.stock > 1
# PostgreSQL: ... WHERE ((to_tsvector('simple', books.author) @@ to_tsquery('simple', 'Rowling') OR to_tsvector('simple', books.title) @@ to_tsquery('simple', 'Rowling')) OR (to_tsvector('simple', books.author) @@ to_tsquery('simple', 'Tolkien') OR to_tsvector('simple', books.title) @@ to_tsquery('simple', 'Tolkien'))) AND books.stock > 1

to the following, more performant query:

Book.search("Rowling OR Tolkien stock > 1")
# MySQL: ... WHERE MATCH(books.author, books.title) AGAINST('Rowling Tolkien' IN BOOLEAN MODE) AND books.stock > 1
# PostgreSQL: ... WHERE to_tsvector('simple', books.author || ' ' || books.title) @@ to_tsquery('simple', 'Rowling | Tokien') and books.stock > 1

What is happening here? Well, we specified all as the name of an attribute group that consists of author and title. As we, in addition, specified all to be a fulltext attribute, SearchCop assumes there is a compound fulltext index present on author and title, such that the query is optimized accordingly. Finally, we specified all to be the default attribute to search in, such that SearchCop can ignore other attributes, like e.g. stock, as long as they are not specified within queries directly (like for stock > 0).

Other queries will be optimized in a similar way, such that SearchCop tries to minimize the fultext constraints within a query, namely MATCH() AGAINST() for MySQL and to_tsvector() @@ to_tsquery() for PostgreSQL.

Book.search("(Rowling -Potter) OR Tolkien")
# MySQL: ... WHERE MATCH(books.author, books.title) AGAINST('(+Rowling -Potter) Tolkien' IN BOOLEAN MODE)
# PostgreSQL: ... WHERE to_tsvector('simple', books.author || ' ' || books.title) @@ to_tsquery('simple', '(Rowling & !Potter) | Tolkien')

To create a fulltext index on books.title in MySQL, simply use:

add_index :books, :title, :type => :fulltext

Regarding compound indices, which will e.g. be used for the default field all we already specified above, use:

add_index :books, [:author, :title], :type => :fulltext

Please note that MySQL supports fulltext indices for MyISAM and, as of MySQL version 5.6+, for InnoDB as well. For more details about MySQL fulltext indices visit http://dev.mysql.com/doc/refman/5.6/en/fulltext-search.html

Regarding PostgreSQL there are more ways to create a fulltext index. However, one of the easiest ways is:

ActiveRecord::Base.connection.execute "CREATE INDEX fulltext_index_books_on_title ON books USING GIN(to_tsvector('simple', title))"

Moreover, for PostgreSQL you should change the schema format in config/application.rb:

config.active_record.schema_format = :sql

Regarding compound indices for PostgreSQL, use:

ActiveRecord::Base.connection.execute "CREATE INDEX fulltext_index_books_on_title ON books USING GIN(to_tsvector('simple', author || ' ' || title))"

To handle NULL values with PostgreSQL correctly, use COALESCE both at index creation time and when specifying the search_scope:

ActiveRecord::Base.connection.execute "CREATE INDEX fulltext_index_books_on_title ON books USING GIN(to_tsvector('simple', COALESCE(author, '') || ' ' || COALESCE(title, '')))"

plus:

search_scope :search do
  attributes :title

  options :title, :type => :fulltext, coalesce: true
end

To use another PostgreSQL dictionary than simple, you have to create the index accordingly and you need tell SearchCop about it, e.g.:

search_scope :search do
  attributes :title

  options :title, :type => :fulltext, dictionary: "english"
end

For more details about PostgreSQL fulltext indices visit http://www.postgresql.org/docs/9.3/static/textsearch.html

Other indices

In case you expose non-fulltext attributes to search queries (price, stock, etc.), the respective queries, like Book.search("stock > 0"), will profit from the usual non-fulltext indices. Thus, you should add a usual index on every column you expose to search queries plus a fulltext index for every fulltext attribute.

In case you can't use fulltext indices, because you're e.g. still on MySQL 5.5 while using InnoDB or another RDBMS without fulltext support, you can make your RDBMS use usual non-fulltext indices for string columns if you don't need the left wildcard within LIKE queries. Simply supply the following option:

class User < ActiveRecord::Base
  include SearchCop

  search_scope :search do
    attributes :username

    options :username, left_wildcard: false
  end

  # ...

such that SearchCop will omit the left most wildcard.

User.search("admin")
# ... WHERE users.username LIKE 'admin%'

Similarly, you can disable the right wildcard as well:

search_scope :search do
  attributes :username

  options :username, right_wildcard: false
end

Default operator

When you define multiple fields on a search scope, SearcCop will use by default the AND operator to concatenate the conditions, e.g:

class User < ActiveRecord::Base
  include SearchCop

  search_scope :search do
    attributes :username, :fullname
  end

  # ...
end

So a search like User.search("something") will generate a query with the following conditions:

... WHERE username LIKE '%something%' AND fullname LIKE '%something%'

However, there are cases where using AND as the default operator is not desired, so SearchCop allows you to override it and use OR as the default operator instead. A query like User.search("something", default_operator: :or) will generate the query using OR to concatenate the conditions

... WHERE username LIKE '%something%' OR fullname LIKE '%something%'

Finally, please note that you can apply it to fulltext indices/queries as well.

Associations

If you specify searchable attributes from another model, like

class Book < ActiveRecord::Base
  # ...

  belongs_to :author

  search_scope :search do
    attributes author: "author.name"
  end

  # ...
end

SearchCop will by default eager_load the referenced associations, when you perform Book.search(...). If you don't want the automatic eager_load or need to perform special operations, specify a scope:

class Book < ActiveRecord::Base
  # ...

  search_scope :search do
    # ...

    scope { joins(:author).eager_load(:comments) } # etc.
  end

  # ...
end

SearchCop will then skip any association auto loading and will use the scope instead. You can as well use scope together with aliases to perform arbitrarily complex joins and search in the joined models/tables:

class Book < ActiveRecord::Base
  # ...

  search_scope :search do
    attributes similar: ["similar_books.title", "similar_books.description"]

    scope do
      joins "left outer join books similar_books on ..."
    end

    aliases similar_books: Book # Tell SearchCop how to map SQL aliases to models
  end

  # ...
end

Assocations of associations can as well be referenced and used:

class Book < ActiveRecord::Base
  # ...

  has_many :comments
  has_many :users, :through => :comments

  search_scope :search do
    attributes user: "users.username"
  end

  # ...
end

Custom table names and associations

SearchCop tries to infer a model's class name and SQL alias from the specified attributes to autodetect datatype definitions, etc. This usually works quite fine. In case you're using custom table names via self.table_name = ... or if a model is associated multiple times, SearchCop however can't infer the class and SQL alias names, e.g.

class Book < ActiveRecord::Base
  # ...

  has_many :users, :through => :comments
  belongs_to :user

  search_scope :search do
    attributes user: ["user.username", "users_books.username"]
  end

  # ...
end

Here, for queries to work you have to use users_books.username, because ActiveRecord assigns a different SQL alias for users within its SQL queries, because the user model is associated multiple times. However, as SearchCop now can't infer the User model from users_books, you have to add:

class Book < ActiveRecord::Base
  # ...

  search_scope :search do
    # ...

    aliases :users_books => :users
  end

  # ...
end

to tell SearchCop about the custom SQL alias and mapping. In addition, you can always do the joins yourself via a scope {} block plus aliases and use your own custom sql aliases to become independent of names auto-assigned by ActiveRecord.

Supported operators

Query string queries support AND/and, OR/or, :, =, !=, <, <=, >, >=, NOT/not/-, (), "..." and '...'. Default operators are AND and matches, OR has precedence over AND. NOT can only be used as infix operator regarding a single attribute.

Hash based queries support and: [...] and or: [...], which take an array of not: {...}, matches: {...}, eq: {...}, not_eq: {...}, lt: {...}, lteq: {...}, gt: {...}, gteq: {...} and query: "..." arguments. Moreover, query: "..." makes it possible to create sub-queries. The other rules for query string queries apply to hash based queries as well.

Custom operators (Hash based queries)

SearchCop also provides the ability to define custom operators by defining a generator in search_scope. They can then be used with the hash based query search. This is useful when you want to use database operators that are not supported by SearchCop.

Please note, when using generators, you are responsible for sanitizing/quoting the values (see example below). Otherwise your generator will allow SQL injection. Thus, please only use generators if you know what you're doing.

For example, if you wanted to perform a LIKE query where a book title starts with a string, you can define the search scope like so:

search_scope :search do
  attributes :title

  generator :starts_with do |column_name, raw_value|
    pattern = "#{raw_value}%"
    "#{column_name} LIKE #{quote pattern}"
  end
end

When you want to perform the search you use it like this:

Book.search(title: { starts_with: "The Great" })

Security Note: The query returned from the generator will be interpolated directly into the query that goes to your database. This opens up a potential SQL Injection point in your app. If you use this feature you'll want to make sure the query you're returning is safe to execute.

Mapping

When searching in boolean, datetime, timestamp, etc. fields, SearchCop performs some mapping. The following queries are equivalent:

Book.search("available:true")
Book.search("available:1")
Book.search("available:yes")

as well as

Book.search("available:false")
Book.search("available:0")
Book.search("available:no")

For datetime and timestamp fields, SearchCop expands certain values to ranges:

Book.search("created_at:2014")
# ... WHERE created_at >= '2014-01-01 00:00:00' AND created_at <= '2014-12-31 23:59:59'

Book.search("created_at:2014-06")
# ... WHERE created_at >= '2014-06-01 00:00:00' AND created_at <= '2014-06-30 23:59:59'

Book.search("created_at:2014-06-15")
# ... WHERE created_at >= '2014-06-15 00:00:00' AND created_at <= '2014-06-15 23:59:59'

Chaining

Chaining of searches is possible. However, chaining does currently not allow SearchCop to optimize the individual queries for fulltext indices.

Book.search("Harry").search("Potter")

will generate

# MySQL: ... WHERE MATCH(...) AGAINST('+Harry' IN BOOLEAN MODE) AND MATCH(...) AGAINST('+Potter' IN BOOLEAN MODE)
# PostgreSQL: ... WHERE to_tsvector(...) @@ to_tsquery('simple', 'Harry') AND to_tsvector(...) @@ to_tsquery('simple', 'Potter')

instead of

# MySQL: ... WHERE MATCH(...) AGAINST('+Harry +Potter' IN BOOLEAN MODE)
# PostgreSQL: ... WHERE to_tsvector(...) @@ to_tsquery('simple', 'Harry & Potter')

Thus, if you use fulltext indices, you better avoid chaining.

Debugging

When using Model#search, SearchCop conveniently prevents certain exceptions from being raised in case the query string passed to it is invalid (parse errors, incompatible datatype errors, etc). Instead, Model#search returns an empty relation. However, if you need to debug certain cases, use Model#unsafe_search, which will raise them.

Book.unsafe_search("stock: None") # => raise SearchCop::IncompatibleDatatype

Reflection

SearchCop provides reflective methods, namely #attributes, #default_attributes, #options and #aliases. You can use these methods to e.g. provide an individual search help widget for your models, that lists the attributes to search in as well as the default ones, etc.

class Product < ActiveRecord::Base
  include SearchCop

  search_scope :search do
    attributes :title, :description

    options :title, default: true
  end
end

Product.search_reflection(:search).attributes
# {"title" => ["products.title"], "description" => ["products.description"]}

Product.search_reflection(:search).default_attributes
# {"title" => ["products.title"]}

# ...

Semantic Versioning

Starting with version 1.0.0, SearchCop uses Semantic Versioning: SemVer

Contributing

  1. Fork it
  2. Create your feature branch (git checkout -b my-new-feature)
  3. Commit your changes (git commit -am 'Add some feature')
  4. Push to the branch (git push origin my-new-feature)
  5. Create new Pull Request

More Repositories

1

search_flip

Full-Featured ElasticSearch Ruby Client with a Chainable DSL
Ruby
309
star
2

heartbeat

Use Heartbeat to monitor your Hetzner Failover IP and automatically switch to another server.
Ruby
54
star
3

similarity

Similarity is an optical as well as keyword based image similarity search engine built on top of Lire.
Java
32
star
4

swift_client

Small but powerful client to interact with OpenStack Swift
Ruby
15
star
5

increment_with_sql

Provides increment_with_sql! and decrement_with_sql! for ActiveRecord models
Ruby
7
star
6

spreadshirt_client

Communicate with the spreadshirt API
Ruby
7
star
7

tempfile_for

Create temporary files for in-memory data
Ruby
4
star
8

redstream

Using redis streams to keep your primary database in sync with secondary datastores
Ruby
4
star
9

oauth2_api_client

Small but powerful client around oauth2 and http-rb to interact with APIs
Ruby
4
star
10

run_after_commit

Run code in an ActiveRecord model after it is committed
Ruby
4
star
11

kraps

Kraps allows to process and perform calculations on extremely large datasets in parallel using ruby
Ruby
3
star
12

replicaza

Highly available GTID-only mysql binlog to kafka replicator
Java
3
star
13

kafka_sync

Using Kafka to keep secondary datastores in sync with your primary datastore
Ruby
2
star
14

apriori

Another ruby apriori wrapper
Ruby
2
star
15

bbque

Queue and process ruby job objects in the background
Ruby
2
star
16

grid

Grid
CSS
1
star
17

unicorn-rails

unicorn-rails
Shell
1
star
18

action_backup

micro backup framework in ruby
Ruby
1
star
19

heartbeat53

Monitoring and failover for route53
Ruby
1
star
20

render_object

Map an object to a partial.
Ruby
1
star
21

default_css

Default CSS file
1
star
22

to_ascii

Convert locale dependent characters
Ruby
1
star
23

attachie

Declarative and flexible attachments
Ruby
1
star
24

spella

Multi-language, Multi word, utf-8 spelling correction server for e.g. search engines using a levenshtein automaton and a Trie.
Kotlin
1
star
25

tab_log

Tab delimited logs with Active Record alike interface
Ruby
1
star
26

s3sync

Sync S3 buckets to your filesystem
Ruby
1
star
27

test-unit-around

Use around instead or in combination with test/unit's setup and teardown methods
Ruby
1
star
28

to_tag

Transform words into tags
Ruby
1
star
29

index-server

index server
C
1
star
30

solr_csv_indexer

Simply batch index big csv data into solr.
Ruby
1
star
31

to_hash_key

Generate redis-safe hash keys using SHA1 easily
Ruby
1
star
32

search_cop_logo

Logo for search_cop
1
star
33

distributed_job

Keep track of distributed jobs spanning multiple workers using redis
Ruby
1
star
34

cassandra_store

Easy to use ActiveRecord like ORM for Cassandra
Ruby
1
star
35

rencrypt

CLI to generate/update SSL certificates on hetzner cloud servers using letsencrypt
Ruby
1
star
36

map-reduce-ruby

The easiest way to write distributed, larger than memory map-reduce jobs
Ruby
1
star
37

to_pdf

Convert a HTML string to a PDF using htmldoc.
Ruby
1
star
38

to_permalink

ToPermalink generates permalinks from arbitrary strings
Ruby
1
star
39

routing-pattern

A powerful, but minimal library to conveniently parse and stringify route patterns
JavaScript
1
star
40

capacity-report

Simply send out a filesystem capacity report
Shell
1
star
41

process_manager

A process manager framework for forking, threading and graceful termination
Ruby
1
star
42

attachments

Declarative and flexible attachments
Ruby
1
star
43

in_vertical_groups_of

Easily generate vertical groups
Ruby
1
star
44

default_scss

Default Sass CSS file
1
star
45

s3upload

Bulk upload to s3
Ruby
1
star
46

resolvable_hash

Resolve references within a hash
Ruby
1
star
47

solr_precedence

Fixing Solr operator precedence
Ruby
1
star
48

cassandra_record

Powerful ORM for Cassandra
Ruby
1
star
49

significance

Calculate significance of cooccurring words
Ruby
1
star
50

redlocker

Acquire and keep distributed locks using redis
Ruby
1
star