• Stars
    star
    242
  • Rank 167,048 (Top 4 %)
  • Language
    PHP
  • License
    MIT License
  • Created over 8 years ago
  • Updated 11 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

โšก ๐Ÿ˜ TextRank (resource-efficient and low-cost automatic text summarisation) for PHP

TextRank

This source code is an implementation of TextRank algorithm in PHP programming language, under MIT licence.

TextRank vs. ChatGPT

GPTs like ChatGPT are supervised language models that understand the context and generate new content from the given input using vast resources while TextRank is a cost-efficient/low-cost text extraction algorithm. TextRank algorithm also can be used as a pre-processor to a GPT model to reduce the text size to save on resource consumption.

TextRank or Automatic summarization

Automatic summarization is the process of reducing a text document with a computer program in order to create a summary that retains the most important points of the original document. Technologies that can make a coherent summary take into account variables such as length, writing style and syntax. Automatic data summarization is part of machine learning and data mining. The main idea of summarization is to find a representative subset of the data, which contains the information of the entire set. Summarization technologies are used in a large number of sectors in industry today. - Wikipedia

The algorithm of this implementation is:

  • Extracts sentences,
  • Removes stopwords,
  • Adds integer values to words by finding and counting the matching words,
  • Weights the values of the words,
  • Normalizes values to get the scores,
  • Sorts by scores

Install to use it in your project

cd your-project-folder
composer require php-science/textrank

Install for contributing

cd git-project-folder
docker-compose build
docker-compose up -d
composer install
composer test

Examples

use PhpScience\TextRank\Tool\StopWords\English;

// String contains a long text, see the /res/sample1.txt file.
$text = "Lorem ipsum...";

$api = new TextRankFacade();
// English implementation for stopwords/junk words:
$stopWords = new English();
$api->setStopWords($stopWords);

// Array of the most important keywords:
$result = $api->getOnlyKeyWords($text); 

// Array of the sentences from the most important part of the text:
$result = $api->getHighlights($text); 

// Array of the most important sentences from the text:
$result = $api->summarizeTextBasic($text);

More examples:

Authors, Contributors

Name GitHub user
David Belicza @DavidBelicza
Riccardo Marton @riccardomarton
Syndesi @Syndesi
vincentsch @vincentsch
Andrew Welch @khalwat
Andrey Astashov @mvcaaa
Leo Toneff @bragle
Willy Arisky @willyarisky
Robert-Jan Keizer @KeizerDev
Morty @evil1morty
Sezer Fidancฤฑ @SezerFidanci