Lucene/Solr Synonym-Expanding EDisMax Parser
Current version : 5.0.5 (changelog)
Maintainer
License
Summary
Extension of the ExtendedDisMaxQueryParserPlugin that splits queries into a "normal" query and a "synonym" query. This enables proper query-time synonym expansion, with no reindexing required.
This also fixes lots of bugs with how Solr typically handles synonyms using the SynonymFilterFactory.
For more details, read my blog post on the subject.
Getting Started
The following tutorial will set up a working synonym-enabled Solr app using the techproducts sample project from Solr itself.
Step 1
Download the latest JAR file:
Or use Maven:
mvn dependency:copy -Dartifact=com.github.healthonnet:hon-lucene-synonyms:5.0.5 -DoutputDirectory=.
<dependency>
<groupId>com.github.healthonnet</groupId>
<artifactId>hon-lucene-synonyms</artifactId>
<version>5.0.5</version>
</dependency>
Or if you are using an older version of Solr, then you can use the last version of this plugin to support older Solr versions (1.3.5):
JAR | Solr |
hon-lucene-synonyms-1.3.5-solr-3.x.jar | 3.4.0, 3.5.0, and 3.6.x |
hon-lucene-synonyms-1.3.5-solr-4.0.0.jar | 4.0.0 |
hon-lucene-synonyms-1.3.5-solr-4.1.0.jar | 4.1.0 and 4.2.x |
hon-lucene-synonyms-1.3.5-solr-4.3.0.jar | 4.3+ |
hon-lucene-synonyms-2.0.0.jar | 5.3.1 |
Step 2
Download Solr from the Solr home page. For this tutorial, we'll use Solr 6.2.0. You do not need
the sources; the tgz
or zip
file will work fine.
Step 3
Extract the compressed file.
Step 4
Copy hon-lucene-synonyms-*.jar
file into solr-6.2.0/server/solr-webapp/webapp/WEB-INF/lib/
.
cp hon-lucene-synonyms-*.jar solr-6.2.0/server/solr-webapp/webapp/WEB-INF/lib/
Note that the jar may be placed in other locations if Solr is configured properly. The following tips are primarily valid only in Solr stand-alone and Solr Master/Slave configurations. The Solr Plugins section of the Solr CWIKI has more details about running plugins on Solr.
- A collection or core can be configured to use the Lib Directives in SolrConfig.
- A Solr server itself can be configured to use
sharedLib
directive in solr.xml.
If you want to configure the plugin for SolrCloud check out Adding Custom Plugins in SolrCloud Mode.
Step 5
Download example_synonym_file.txt and copy it to the solr-6.2.0/server/solr/configsets/sample_techproducts_configs/conf/
directory.
Step 6
Download example_config.xml and copy the <queryParser>...</queryParser>
section into solr-6.2.0/server/solr/configsets/sample_techproducts_configs/conf/solrconfig.xml
just before the </config>
tag at the end.
This defines the analyzer that will be used to generate synonyms.
Protip: You can customize this analyzer based on your synonym set. E.g. if your synonyms are all two words or less, you can safely set maxShingleSize
to 2.
Step 7
Start up the app by running solr-6.2.0/bin/solr -e techproducts
.
Step 8
In your browser, navigate to
You should see a response like this:
<response>
...
<lst name="debug">
<str name="rawquerystring">dog</str>
<str name="querystring">dog</str>
<str name="parsedquery">(DisjunctionMaxQuery((text:dog))^1.0 ((+(DisjunctionMaxQuery((text:canis)) DisjunctionMaxQuery((text:familiaris))))/no_coord^1.0) ((+DisjunctionMaxQuery((text:hound)))/no_coord^1.0) ((+(DisjunctionMaxQuery((text:man's)) DisjunctionMaxQuery((text:best)) DisjunctionMaxQuery((text:friend))))/no_coord^1.0) ((+DisjunctionMaxQuery((text:pooch)))/no_coord^1.0))</str>
<str name="parsedquery_toString">(((text:dog))^1.0 ((+((text:canis) (text:familiaris)))^1.0) ((+(text:hound))^1.0) ((+((text:man's) (text:best) (text:friend)))^1.0) ((+(text:pooch))^1.0))</str>
<lst name="explain"/>
<arr name="queryToHighlight">
<str>org.apache.lucene.search.BooleanClause:((text:dog))^1.0 ((+((text:canis) (text:familiaris)))^1.0) ((+(text:hound))^1.0) ((+((text:man's) (text:best) (text:friend)))^1.0) ((+(text:pooch))^1.0)</str>
</arr>
<arr name="expandedSynonyms">
<str>canis familiaris</str>
<str>dog</str>
<str>hound</str>
<str>man's best friend</str>
<str>pooch</str>
</arr>
<lst name="mainQueryParser">
<str name="QParser">ExtendedDismaxQParser</str>
<null name="altquerystring"/>
<null name="boost_queries"/>
<arr name="parsed_boost_queries"/>
<null name="boostfuncs"/>
</lst>
<lst name="synonymQueryParser">
<str name="QParser">ExtendedDismaxQParser</str>
<null name="altquerystring"/>
<null name="boost_queries"/>
<arr name="parsed_boost_queries"/>
<null name="boostfuncs"/>
</lst>
...
</lst>
</response>
Note that the input query dog
has been expanded into dog
, hound
, pooch
, canis familiaris
, and man's best friend
.
Tweaking the results
Boost the non-synonym part to 1.2 and the synonym part to 1.1 by adding synonyms.originalBoost=1.2&synonyms.synonymBoost=1.1
:
(((text:dog))^1.2
((+((text:canis) (text:familiaris)))^1.1)
((+(text:hound))^1.1)
((+((text:man's) (text:best) (text:friend)))^1.1)
((+(text:pooch))^1.1))
Apply a minimum "should" match of 75% by adding mm=75%25
:
(((text:dog))^1.0
((+(((text:canis) (text:familiaris))~1))^1.0)
((+(text:hound))^1.0)
((+(((text:man's) (text:best) (text:friend))~2))^1.0)
((+(text:pooch))^1.0))
Observe how phrase queries are properly handled by using q="dog"
instead of q=dog
:
(((text:dog))^1.0
((+(text:"canis familiaris"))^1.0)
((+(text:hound))^1.0)
((+(text:"man's best friend"))^1.0)
((+(text:pooch))^1.0))
Gotchas
Keep in mind that you must add defType=synonym_edismax
and synonyms=true
to enable
the parser in the first place.
Also, you must either define qf
in the query parameters or defaultSearchField
in solr/conf/schema.xml
,
so that the parser knows which fields to use during synonym expansion.
If you enable debugging (with debugQuery=on
), the plugin will output helpful information about
how synonyms are being expanded.
Query parameters
The following are parameters that you can use to tweak the synonym expansion.
Param | Type | Default | Summary |
synonyms | boolean | false | Enable or disable synonym expansion entirely. True if enabled. |
synonyms.analyzer | String | null | Name of the analyzer defined in solrconfig.xml to use. (E.g. in the examples, it's myCoolAnalyzer). This must be non-null, if you define more than one analyzer (e.g. for more than one language). |
synonyms.originalBoost | float | 1.0 | Boost value applied to the original (non-synonym) part of the query. |
synonyms.synonymBoost | float | 1.0 | Boost value applied to the synonym part of the query. |
synonyms.disablePhraseQueries | boolean | false | True if synonym expansion should be disabled when the user input contains a phrase query (i.e. a quoted query). This option is offered because expansion of phrase queries may be considered non-intuitive to users. |
synonyms.constructPhrases | boolean | false | v1.2.2+: True if expanded synonyms should always be treated like phrases (i.e. wrapped in quotes). This option is offered in case your synonyms contain lots of phrases composed of common words (e.g. "man's best friend" for "dog"). Only affects the expanded synonyms; not the original query. See issue #5 for more discussion. |
synonyms.ignoreQueryOperators | boolean | false | v1.3.2+: If you treat query operators (e.g. AND and OR) as usual words and want the synonyms be added to the query anyhow, set this option to true. |
synonyms.bag | boolean | false | v1.3.2+: When false (default), this plugin generates additional synonym queries by using the original query string as a template: dog bite => dog bite, canis familiaris bite, dog chomp, canis familiaris chomp. When true a simpler, "bag of terms" query is created from the synonyms. IE dog bite => bite dog chomp canis familiaris. The simpler query will be more performant but loses positional information. Use with synonyms.constructPhrases to keep synonym phrases such as "canis familiaris". |
synonyms.ignoreMM | boolean | false | v1.3.5+: When false (default), the mm param is applied to the original query and to the synonym queries. When true mm is ignored for the synonym queries and applied only to the original query. |
Compile it yourself
Download the code and run:
mvn install
Testing
Run the tests using maven:
mvn test
Changelog
- 5.0.5
- Tested with Solr 6.2.0
- Fixed #65 Matches all docs if bf (Boost Function) present @janhoy
- 5.0.4
- Solr 6.0.0 support.
- Distributed on Maven central now.
- 2.0.0
- BREAKING CHANGE: Updated to support Solr 5.3.1. Removed support for older versions of Solr.
- Note that as of Lucene 5.2.0, when synonyms are parsed, original terms are now correctly marked as type
word
instead of typesynonym
LUCENE-6400.
- v1.3.5
- Added
synonyms.ignoreMM
option
- Added
- v1.3.4
- v1.3.3
- Fixed #33: synonyms are now weighted equally, regardless of how many there are per word.
- Fixed #31: synonyms are no longer given extra weight when using the params
bq
,bf
, andboost
. debugQuery=on
now gives more helpful debug output.- Fixed #9, #26, #32, and #34. Note that this is a documentation change; not a code change, so to get the benefits of this "fix," you'll need to manually perform Step 6 again.
- v1.3.2
- v1.3.1
- Avoid luceneMatchVersion in config (#20)
- v1.3.0
- Added support for Solr 4.3.0 (#19)
- New way of loading Tokenizers and TokenFilters
- New XML syntax for config in solrconfig.xml
- v1.2.3
- Fixed #16
- Verified support for Solr 4.2.0 with the 4.1.0 branch (unit tests passed)
- Improved automation of unit tests
- v1.2.2
- Added
synonyms.constructPhrases
option to fix #5 - Added proper handling for phrase slop settings
- Added
- v1.2.1
- Added support for Solr 4.1.0 (#4)
- v1.2
- Added support for Solr 4.0.0 (#3)
- v1.1
- v1.0
- Initial release