• Stars
    star
    752
  • Rank 59,934 (Top 2 %)
  • Language
    Python
  • Created almost 11 years ago
  • Updated over 2 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Python implementation of TextRank algorithm for automatic keyword extraction and summarization using Levenshtein distance as relation between text units. This project is based on the paper "TextRank: Bringing Order into Text" by Rada Mihalcea and Paul Tarau. https://web.eecs.umich.edu/~mihalcea/papers/mihalcea.emnlp04.pdf

TextRank

This is a python implementation of TextRank for automatic keyword and sentence extraction (summarization) as done in https://web.eecs.umich.edu/~mihalcea/papers/mihalcea.emnlp04.pdf. However, this implementation uses Levenshtein Distance as the relation between text units.

This implementation carries out automatic keyword and sentence extraction on 10 articles gotten from http://theonion.com

  • 100 word summary
  • Number of keywords extracted is relative to the size of the text (a third of the number of nodes in the graph)
  • Adjacent keywords in the text are concatenated into keyphrases

Usage

To install the library run the setup.py module located in the repository's root directory. Alternatively, if you have access to pip you may install the library directly from github:

pip install git+https://github.com/davidadamojr/TextRank.git

Use of the library requires downloading nltk resources. Use the textrank initialize command to fetch the required data. Once the data has finished downloading you may execute the following commands against the library:

textrank extract_summary <filename>
textrank extract_phrases <filename>

Contributing

Install the library as "editable" within a virtual environment.

pip install -e .

Dependencies

Dependencies are installed automatically with pip but can be installed serparately.

More Repositories

1

cliptext_chrome

The Cliptext chrome extension creates a context menu item that converts selected text in your browser into an image that can be automatically shared on Twitter to avoid the 140 character limit.
JavaScript
13
star
2

cliptext

The Cliptext web app converts text into an image that can be automatically shared on Twitter to avoid the 140 character limit.
PHP
9
star
3

diary_of_programming_puzzles

A collection of my solutions to some programming puzzles and common software engineering coding interview questions.
Python
4
star
4

alchemy-news-api

An Alchemy News API library for Node.js
JavaScript
4
star
5

microhaskell

This is an implementation of a fictional programming language called "MicroHaskell"; a tiny subset of the Haskell programming language implemented in Java.
Java
3
star
6

nextmeal

JavaScript
1
star
7

sensor_spy

Java
1
star
8

microblog

Python
1
star
9

data_science_from_scratch

Python
1
star
10

kit

KeepInTouch (kit) is an Android app that lets you set daily, weekly, biweekly and monthly reminders to call people. The app serves as a good project for learning about ContentProviders and Android Alarms/AlarmManager.
Java
1
star
11

TinySearch

This is an (intelligent) search engine implementation in Perl. It searches a particular domain (http://www.unt.edu) for relevant pages and serves to demonstrate key concepts in information retrieval and web search. It consists of a crawler, indexer and graphical user interface. A demo can be found at http://students.cse.unt.edu/~dta0022/searchengine/ A PDF document explaining how the search engine works can be found at http://students.cse.unt.edu/~dta0022/searchengine/report.pdf
Perl
1
star