• Stars
    star
    117
  • Rank 295,459 (Top 6 %)
  • Language
    Java
  • License
    Apache License 2.0
  • Created over 12 years ago
  • Updated about 5 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Elasticsearch Index Termlist

Elasticsearch Index Termlist Plugin

This plugin extends Elasticsearch with a term list capability. It presents a list of terms in a field of an index and can also list each terms frequency. Term lists can be generated from one index or even of all of the indexes.

Versions

Elasticsearch Plugin Release date
2.3.0 2.3.0.0 March 29, 2016
2.2.0 2.2.0.2 March 22, 2016
1.5.2 1.5.2.0 Jun 5, 2015
1.5.0 1.5.0.0 Apr 9, 2015
1.4.4 1.4.4.0 Mar 15, 2015
1.4.0 1.4.0.2 Feb 19, 2015
1.4.0 1.4.0.1 Jan 14, 2015
1.4.0 1.4.0.0 Nov 18, 2014
1.3.2 1.3.0.0 Aug 21, 2014
1.2.1 1.2.1.0 Jul 3, 2014

Installation

./bin/plugin -install index-termlist -url http://xbib.org/repository/org/xbib/elasticsearch/plugin/elasticsearch-index-termlist/1.5.2.0/elasticsearch-index-termlist-1.5.2.0-plugin.zip

Do not forget to restart the node after installing.

Project docs

The Maven project site is available at Github

Issues

All feedback is welcome! If you find issues, please post them at Github

Introduction

Getting the list of all terms indexed is useful for various purposes, for example

  • term statistics
  • building dictionaries
  • controlling the overall effects of analyzers on the indexed terms
  • automatic query building on indexed terms, e.g. for load tests
  • input to linguistic analysis tools
  • for other post-processing of the indexed terms outside of Elasticsearch

Optionally, the term list can be narrowed down to a field name. The field name is the Lucene field name as found in the Lucene index.

Only terms of field names not starting with underscore are listed. Terms of internal fields like _uid, _all, or _type are always skipped.

Response

For each term, statistics are computed.

{
   "_shards": {
      "total": 3,
      "successful": 3,
      "failed": 0
   },
   "took": 384,
   "numdocs": 51279,
   "numterms": 100,
   "terms": [
	  {
		 "term": "aacr2",
		 "totalfreq": 34699,
		 "docfreq": 34697,
		 "min": 1,
		 "max": 2,
		 "mean": 1.0000505458956723,
		 "geomean": 1.0000399550985877,
		 "sumofsquares": 34703,
		 "sumoflogs": 1.3862943611198906,
		 "sigma": 0.008475454987021664,
		 "variance": 0.00007183333723703039
	  }, ...

took - milliseconds required for executing

numdocs - the number of documents examined

numterms - the number of terms returned

terms - the array of term infos

term - the name of the term

totalfreq - the total number of occurrences of this term

docfreq - the document count where this term appears in

min - the minimum number of occurrences of this term in a document

max - the maximum number of occurrences of this term in a document

mean - the mean of the term occurences

geomean - the gemotric mean of the term occurrences

sumofsquares - sum of the squares of the term occurrences

sumoflogs - sum of the logarithms of the term occurences

variance - the variance of the term occurences

sigma - the standard deviation, equal to sqrt(variance)

Example

Consider the following example

curl -XDELETE 'http://localhost:9200/test/'
curl -XPUT 'http://localhost:9200/test/'
curl -XPUT 'http://localhost:9200/test/test/1' -d '{ "test": "Hello World" }'
curl -XPUT 'http://localhost:9200/test/test/2' -d '{ "test": "Hello Jörg Prante" }'
curl -XPUT 'http://localhost:9200/test/test/3' -d '{ "message": "elastic search" }'

Get term list of index test

curl -XGET 'http://localhost:9200/test/_termlist'

Get term list of index test of field message

curl -XGET 'http://localhost:9200/test/_termlist?field=message'

Get term list of index test with total frequencies but only the first three of the list

curl -XGET 'http://localhost:9200/test/_termlist?size=3'

Get term list of terms starting with hello in index test field test

curl -XGET 'http://localhost:9200/test/_termlist?field=test&term=hello'

A page of 100 terms of a sorted list of terms in your index beginning with a

curl -XGET 'http://localhost:9200/books/_termlist?term=a&sortbyterms&pretty&from=0&size=100' 

A page of 100 terms of a sorted list of terms in your index beginning with frodo,'frod','fro' and 'fr', since your backtracingcount is set to 3

curl -XGET 'http://localhost:9200/books/_termlist?term=frodo&sortbyterms&pretty&from=0&size=100*backtracingcount=3' 

Caution

The term list is built internally into an unsorted, compact set of strings which i s not streamed to the client. You should be aware that if you have lots of unique terms in the index, this procedure consumes a lot of heap memory and may result in out of memory situations that can render your Elasticsearch cluster unusable until it is restarted.

License

Elasticsearch Term List Plugin

Copyright (C) 2011-2015 Jörg Prante

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

More Repositories

1

elasticsearch-jdbc

JDBC importer for Elasticsearch
Java
2,840
star
2

elasticsearch-knapsack

Knapsack plugin is an import/export tool for Elasticsearch
Java
472
star
3

elasticsearch-langdetect

A plugin for language detection in Elasticsearch using Nakatani Shuyo's language detector
Java
251
star
4

elasticsearch-transport-websocket

WebSockets for ElasticSearch
Java
113
star
5

elasticsearch-plugin-bundle

A bundle of useful Elasticsearch plugins
Java
110
star
6

elasticsearch-analysis-decompound

Decompounding Plugin for Elasticsearch
Java
87
star
7

elasticsearch-skywalker

Skywalker for Elasticsearch is like Luke for Lucene
Java
79
star
8

jdbc-driver-csv

JDBC driver for CSV
Java
68
star
9

elasticsearch-analysis-skos

SKOS analysis for Elasticsearch
Java
54
star
10

elasticsearch-xml

XML interface for Elasticsearch REST
Java
43
star
11

elasticsearch-csv

CSV format for Elasticsearch REST search responses
Java
42
star
12

elasticsearch-analysis-hunspell

Hunspell analysis for ElasticSearch
Java
38
star
13

elasticsearch-analysis-naturalsort

Natural sort plugin for Elasticsearch
Java
38
star
14

elasticsearch-analysis-reference

A reference mechanism for including content from other documents during the Elasticsearch analysis field mapping phase
Java
35
star
15

log4j2-elasticsearch

Log4j2 Elasticsearch appender
Java
27
star
16

elasticsearch-analysis-baseform

Baseform lemmatization for Elasticsearch
Java
26
star
17

elasticsearch-analysis-standardnumber

Analyze standard numbers like ARK, DOI, EAN, GTIN, IBAN, ISAN, ISBN, ISMN, ISNI, ISSN, ISTC, ISWC, ORCID, PPN, SICI, UPC, ZDB with Elasticsearch
Java
23
star
18

elasticsearch-helper

Helper classes for Elasticsearch client
Java
20
star
19

elasticsearch-functionscore-conditionalboost

Boost documents in Elasticsearch when they match dynamic conditions
Java
18
star
20

Elasticsearch-Dancer-App

a simple Elasticsearch/Dancer/Bootstrap application for demonstration
Perl
14
star
21

netty-http

HTTP 1.1 and 2.0 asynchronous client and server for Netty
Java
11
star
22

elasticsearch-payload

Term payloads for Elasticsearch
Java
11
star
23

elasticsearch-simple-action-plugin

A simple action plugin for Elasticsearch
Java
11
star
24

elasticsearch-plugin-deploy

Refreshable Elasticsearch plugins
Java
10
star
25

log4j2-elasticsearch-http

Log4j2 Elasticsearch appender using the Java JDK HTTP client
Java
9
star
26

elasticsearch-plugin-ratpack

Elasticsearch plugin for embedding Ratpack http://ratpack.io
Java
9
star
27

gradle-plugin-jflex

A JFlex plugin for Gradle
Groovy
8
star
28

elasticsearch-analysis-opennlp

Elasticsearch plugin for sentence detection, named entity recognition, part-of-speech tagging with OpenNLP
Java
7
star
29

elasticsearch-analysis-phonetic-eudex

Eudex phonetic analysis plugin for Elasticsearch
Java
6
star
30

elx

Elasticsearch extensions - rich API, clients, index lifecycle management, lightweight, small footprint, and much more - for Java 17+
Java
6
star
31

elasticsearch-aggregations

More aggregations for Elasticsearch
Java
6
star
32

elasticsearch-syslog

Receiving syslog messages with Elasticsearch
Java
5
star
33

elasticsearch-client-http

Java HTTP client for Elasticsearch
Java
4
star
34

jdbc-csv

JDBC driver for CSV files
Java
3
star
35

elasticsearch-analysis-hyphen

Hyphen analysis for Elasticsearch
Java
2
star
36

alpine-glibc-java

Alpine Linux + glibc + OpenJDK
Dockerfile
2
star
37

datastructures

More data structures for Java
Java
1
star
38

content

Content processing with JSON, RDF, XML, YAML for Java, with settings and config API
Java
1
star
39

gradle-plugin-jacc

Jacc plugin for Gradle
Groovy
1
star
40

barcode

Improved version of Okapi Barcode Library for Java 8
Java
1
star
41

elasticsearch-devkit

My dev kit for Elasticsearch (derived from mainline Elasticsearch build tools)
Java
1
star
42

elasticsearch-client

Modularized, OpenJDK 11 version of Elasticsearch client
Java
1
star
43

rpm

RPM Redhat Package Manager implemented in Java
Java
1
star