TFIDF
tf–idf, short for term frequency–inverse document frequency, is a numerical statistic that is intended to reflect how important a word is to a document in a collection or corpus.[1]:8 It is often used as a weighting factor in information retrieval and text mining. The tf-idf value increases proportionally to the number of times a word appears in the document, but is offset by the frequency of the word in the corpus, which helps to adjust for the fact that some words appear more frequently in general.
http://en.wikipedia.org/wiki/Tf%E2%80%93idf
Example Web App
https://james-moriarty-tf-idf.herokuapp.com/corpuses
cd app
bundle
rackup
open http://localhost:9292/corpuses
Installation
Add this line to your application's Gemfile:
gem 'tfidf', :github => 'jamesmoriarty/tf-idf'
And then execute:
$ bundle
Contributing
- Fork it ( https://github.com/[my-github-username]/tf-idf/fork )
- Create your feature branch (
git checkout -b my-new-feature
) - Commit your changes (
git commit -am 'Add some feature'
) - Push to the branch (
git push origin my-new-feature
) - Create a new Pull Request