Top Rating
- Top Contributors
  Discover the Top Open Source contributors by country or by language
- Interviews
  Discover real stories from Open Source developers
Discover

Discover your Favorite Language
Discover the top trending repositories and projects on Github. Explore the latest trends in your preferred languages.

Assembly

F#

Java

PHP

Shell

CSS

Python

Nix

More Languages
Awesome

Awesome repositories
Discover the most awesome repositories and projects of your favorite languages. Inspired by the Awesome-* lists trend in GitHub.

Nix

C

Dart

Scala

Clojure

Groovy

Jupyter Notebook

Objective-C

More Languages
By Country

Rankings by Country
Discover the community of talented open source contributors in each country.

🇹🇴 Tonga

🇰🇵 North Korea

🇸🇷 Suriname

🇧🇪 Belgium

🇭🇷 Croatia

🇳🇱 Netherlands

🇵🇦 Panama

🇲🇿 Mozambique

All Countries Compare Countries

theeluwin/textrankr

Stars
208
Rank 189,015 (Top 4 %)
Language
Python
License
MIT License
Created over 8 years ago
Updated almost 4 years ago

theeluwin/textrankr

theeluwin

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

TextRank for Korean.

textrankr

Reorder sentences using TextRank algorithm.

Mostly designed for Korean, but not limited to.
Check out lexrankr, which is another awesome summarizer!
Not available for Python 2 anymore (if necessary, use version 0.3).

Installation

pip install textrankr

Tokenizers

Tokenizers are not included. You have to implement one by yourself.

Example:

from typing import List

class MyTokenizer:
    def __call__(self, text: str) -> List[str]:
        tokens: List[str] = text.split()
        return tokens

한국어의 경우 KoNLPy를 사용하는 방법이 있습니다. 아래 예시처럼 phrases를 쓰게되면 엄밀히는 토크나이저가 아니지만 이게 더 좋은 결과를 주는것 같습니다.

from typing import List
from konlpy.tag import Okt

class OktTokenizer:
    okt: Okt = Okt()

    def __call__(self, text: str) -> List[str]:
        tokens: List[str] = self.okt.phrases(text)
        return tokens

Usage

from typing import List
from textrankr import TextRank

mytokenizer: MyTokenizer = MyTokenizer()
textrank: TextRank = TextRank(mytokenizer)

k: int = 3  # num sentences in the resulting summary

summarized: str = textrank.summarize(your_text_here, k)
print(summarized)  # gives you some text

# if verbose = False, it returns a list
summaries: List[str] = textrank.summarize(your_text_here, k, verbose=False)
for summary in summaries:
    print(summary)

Test

Use docker.

docker build -t textrankr -f Dockerfile .
docker run --rm -it textrankr

pytorch-sgns

Skipgram Negative Sampling implemented in PyTorch

NotoSansKR-Hestia

경량화된 노토 산스 한글 폰트.

lexrankr

LexRank for Korean.

sci-news-sum-kr-50

네이버 뉴스 중 IT/과학 분야에서 50개를 선정해서 요약에 해당하는 문장을 태깅해둔 데이터셋입니다.

session-aware-bert4rec

Official repository for "Exploiting Session Information in BERT-based Session-aware Sequential Recommendation", SIGIR 2022 short.

kata

pocket-galaxy

내부 작업용 django + vue(vuetify) boilerplate. 짠 하면 돌아감.

Love2Live

CVAE based School Idol image generation. Published in proc. of SSCC 2nd, 2017.

CMYK2RGB

You'll need this when converting Adobe Photoshop's CMYK to RGB.

infinite-monkey-sort

Simple integer sorting algorithm based on infinite monkey theorem.

ProxyRCA

Official repository for "Proxy-based Item Representation for Attribute and Context-aware Recommendation", WSDM 2024.

bear

Implementation of BEAR and SlashBurn.

wpe

Word Pair Encoding (WPE) for semi-automatic meaningful-keywords generation.

pytorch-quadratum

Additional torchvision image transforms for practical usage.

docker-ubuntu-konlpy

Docker image of latest Ubuntu for KoNLPy on Python 3.

docker-pytorch-ko

Docker image of latest PyTorch-CUDA for Koreans...

Kara

Kara the Coda 2 plugin for dealing with annoying blanks.

dnwc

disable naver webtoon comment

sscc-1st

Source code for paper submitted by @theeluwin at Proceedings of SSCC 1st.

basehangul-lua

Human-readable binary encoding for Lua

kaggle-hnm-preprocess

IDS 연구실 2022년도 2학기 UROP 과목용 자료. Kaggle H&M 챌린지를 위한 전처리 코드 모음.

Jupyter Notebook