• Stars
    star
    128
  • Rank 279,479 (Top 6 %)
  • Language
    Python
  • License
    MIT License
  • Created over 9 years ago
  • Updated almost 3 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

python-rake

Note on Upgrades

Some users have reported issues importing the stoplists in the upgrade to 1.1.*, if you experience import issues after upgrading try doing a full uninstall + reinstall.


Build Status Upload Python Package PyPI version

A Python module implementation of the Rapid Automatic Keyword Extraction (RAKE) algorithm as described in: Rose, S., Engel, D., Cramer, N., & Cowley, W. (2010). Automatic Keyword Extraction from Individual Documents. In M. W. Berry & J. Kogan (Eds.), Text Mining: Theory and Applications: John Wiley & Sons. Initially by @aneesha, packaged by @tomaspinho.

The source code is released under the MIT License.

Installation

pip install python-rake #or pip3

Usage

For external .txt, .csv, etc files: Takes path as string datatype. Words can be on same or different lines but must be seperated by non-word characters. This should support all languages as it's based on unicode, but please validate the results of and report any issues with non-western languages, as they haven't been thoroughly tested.

import RAKE
Rake = RAKE.Rake(<path_to_your_stopwords_file>)
Rake.run(<text>);

To change how a file is read-in, simply use the code below. The default regex described above is [\W\n]+.

RAKE.Rake(<path_to_your_stopwords_file> , regex = '<your regex>')

For lists:

import RAKE
Rake = RAKE.Rake(<list>); #takes stopwords as list of strings
Rake.run(<text>)

SmartStopList(), FoxStopList(), NLTKStopList() and MySQLStopList return the expected lists as lists, they can be used as shown bellow. GoogleSearchStopList() returns what were thought to be stop words in Google search back when large numbers of search suggestions very available. RanksNLStopList() and RanksNLLongStopList() returns the in-house developed stoplists from Ranks NL, a webmaster suite.

import RAKE
Rake = RAKE.Rake(RAKE.SmartStopList())
Rake.run(<text>)

Additional flags:

The RAKE.rake function also accepts minCharacters, maxWords and minFrequency flags to better tune your outputs. minCharacters is the minimum characters allowed in a keyword. maxWords is the maximum number of words allowed in a phrase considered as a keyword. minFrequency is the minimum number of occurances a keyword has to have to be considered as a keyword. An example of this which shows the default values is as follows:

import RAKE
rake = RAKE.Rake(RAKE.SmartStopList())
rake.run(<text>, minCharacters = 1, maxWords = 5, minFrequency = 1)

Other stoplists and stoplists in other languages can be found at https://github.com/trec-kba/many-stop-words/tree/master/orig, at http://www.ranks.nl/stopwords, at https://sites.google.com/site/kevinbouge/stopwords-lists and in the NLTK stopwords package

Releases

I will push releases to PyPi periodically, but if there is a feature in master not built/pushed and you want it to be, just ping me.

Credit

This is a maintained fork of the original python RAKE project, which can be found here: https://github.com/aneesha/RAKE The Fox Stopwords list was originally created by Christopher Fox, http://dl.acm.org/citation.cfm?id=378888 The Smart stopwords list was originally created by Gerard Salton and Chris Buckley for the experimental SMART information retrieval system at Cornell University. The MySQL stopwords list is (surprisingly) from MySQL, owned and mainted by Oracle and under the GPL2 license. The NTLK stopword list was created by the NLTK project under the Apache license, project here: https://github.com/nltk/nltk The Ranks NL stopword lists were created by Ranks NL, who also compiled the Google Search stopword list, who said via email that we could include them in this package if we credited them.

More Repositories

1

home-cluster

My WIP private cloud
Jinja
56
star
2

practical-testing-with-molecule

The code for the demos given in the Practical Ansible Testing With Molecule talk at Ansiblefest Atlanta 2019
Python
29
star
3

game-server-operator

Python
18
star
4

ansible-k8s-status-module

Python
5
star
5

jellyfin-operator

Python
5
star
6

schema-transformer

A library for transforming data in various formats into JSON
Python
4
star
7

k-fold-cross-validation

Python
4
star
8

example-generated-operator

Dockerfile
2
star
9

project-cloner

Python
2
star
10

prom-alert-to-event-webhook

Go
2
star
11

SpeedDrillPlugin

A bakkesmod plugin that keeps track of the amount of time spent in a freeplay session and the amount of time between ball touches. Visibility/position/scale of the two timers is configurable in the menu, as are the thresholds for feedback between touches.
C++
2
star
12

owner-ref-proxy

Go
1
star
13

openshift-ansible-vagrant-env

Shell
1
star
14

pong-golang

Go
1
star
15

tower-apb

1
star
16

ansible-kubernetes-modules-docs

Python
1
star
17

operator-sdk-ansible-collection

Go
1
star
18

async-ansible-operator

Dockerfile
1
star
19

scrapi-tools

[DEPRECATED] Tools for making scrAPI consumers
Python
1
star
20

nltk-keyword-extractor

A library built on NLTK for extracting keywords from sentences/paragraphs
Python
1
star
21

ClinicalTrialsParser

Python classes / functions to parse Clinical Trial data in the form of XML returned from clinicaltrials.gov. Older version also allows for parsing of Json returned from Lilly COI API.
JavaScript
1
star
22

service-catalog

Share and consume services in Kubernetes using service brokers
Go
1
star
23

enhancements-2

Enhancements tracking repository for Konveyor
1
star
24

whitehouse.gov-problem

Proof of solution to https://www.whitehouse.gov/blog/2015/05/17/hello-world
Coq
1
star