There are no reviews yet. Be the first to send feedback to the community and the maintainers!
Repository Details
A batch-based text search and filtering pipeline in Apache Spark, by taking in a large set of documents and a set of user defined queries, then for each query, ranking the documents by relevance for that query and filtering out overly similar documents.