gaussic/tf-idf-keyword

Stars
150
Rank 247,323 (Top 5 %)
Language
Python
License
MIT License
Created over 7 years ago
Updated over 5 years ago

gaussic/tf-idf-keyword

gaussic

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Keyword extraction based on TF-IDF on specific corpus. 基于特定语料库的TF-IDF的中文关键词提取

基于TF-IDF的中文关键词提取

requirements

默认环境python3，需要结巴分词器的支持

$ pip install jieba

IDF(逆文档频率)生成

用法：

$ python gen_idf.py -i <inputdir> -o <outputfile>

-i <inputdir> ：语料库目录，程序会扫描目录下的所有文件
-o <outputfile> ：保存idf到指定文件

TF-IDF关键词提取

用法：

$ python tfidf.py -i <idffile> -d <document> -t <topK>

-i <idffile> ： idf文件路径
-d <document> ：所需处理文档路径
-t <topK> ：返回topK结果

示例

$ python tfidf.py -i idf.txt -d test.txt -t 20

返回结果：

核
处理器
服务器
系统核心
封装
系列
插槽
核心
主频
产品
伊斯坦布尔
英特尔
功耗
多处理器
低仅
折合
浮点运算
性能
构建
吹起

注：该repo中提供的idf.txt由清华NLP组的新闻数据集训练获得。

text-classification-cnn-rnn

CNN-RNN中文文本分类，基于TensorFlow

SpringMVCDemo

This is a demo of my spring mvc project.

weibo_wordcloud

根据关键词抓取微博数据，再生成词云

char_rnn_lm_zh

language model in Chinese，基于Pytorch官方文档实现

geo_convert

WGS84 / BD09 / GCJ02 / MapBar 经纬度坐标互转。Geolocation Conversion.

baidu_hot_words

百度新闻的热搜关键词抓取，并根据关键词抓取新闻

Jupyter Notebook

keras-examples

Keras样例解析

Chinese-Lyric-Corpus

A Chinese lyric corpus which contains nearly 50,000 lyrics from 500 artists

text-classification

CNN for sentence classification using Pytorch and MXNET

Jupyter Notebook

lyric_crawler_163

抓取网易云音乐歌手歌词

Jupyter Notebook

fancywriter

Using deep neural nets to write books

Jupyter Notebook

tf-rnnlm

simplified version of tensorflow language model on PTB data

keras-doc-zh

https://keras-zh.readthedocs.io/

pytorch-poem-generator

A Chinese Poem Generator based on PyTorch Char-RNN

Jupyter Notebook

keras-deep-learning

Notebook for Book "Deep Learning with Python"

Jupyter Notebook

crawl_scripts

爬虫脚本整理

SpringDemo-List

这是一个SpringMVC绑定List参数的例子

cpp_primer_plus

code of c++ primer plus (for practice)

seq2seq-chatbot

Seq2Seq based chatbot

Jupyter Notebook

llm_from_scratch_code

code for book llm_from_scratch

Jupyter Notebook

Incentive

I have no idea what this repo is for, but this word just came into my mind.

screen_capture

windows下全屏截图

gaussic.github.io