textrankr
Reorder sentences using TextRank algorithm.
- Mostly designed for Korean, but not limited to.
- Check out lexrankr, which is another awesome summarizer!
- Not available for Python 2 anymore (if necessary, use version 0.3).
Installation
pip install textrankr
Tokenizers
Tokenizers are not included. You have to implement one by yourself.
Example:
from typing import List
class MyTokenizer:
def __call__(self, text: str) -> List[str]:
tokens: List[str] = text.split()
return tokens
νκ΅μ΄μ κ²½μ° KoNLPyλ₯Ό μ¬μ©νλ λ°©λ²μ΄ μμ΅λλ€. μλ μμμ²λΌ phrases
λ₯Ό μ°κ²λλ©΄ μλ°νλ ν ν¬λμ΄μ κ° μλμ§λ§ μ΄κ² λ μ’μ κ²°κ³Όλ₯Ό μ£Όλκ² κ°μ΅λλ€.
from typing import List
from konlpy.tag import Okt
class OktTokenizer:
okt: Okt = Okt()
def __call__(self, text: str) -> List[str]:
tokens: List[str] = self.okt.phrases(text)
return tokens
Usage
from typing import List
from textrankr import TextRank
mytokenizer: MyTokenizer = MyTokenizer()
textrank: TextRank = TextRank(mytokenizer)
k: int = 3 # num sentences in the resulting summary
summarized: str = textrank.summarize(your_text_here, k)
print(summarized) # gives you some text
# if verbose = False, it returns a list
summaries: List[str] = textrank.summarize(your_text_here, k, verbose=False)
for summary in summaries:
print(summary)
Test
Use docker.
docker build -t textrankr -f Dockerfile .
docker run --rm -it textrankr