• Stars
    star
    505
  • Rank 86,750 (Top 2 %)
  • Language
    Python
  • License
    MIT License
  • Created over 4 years ago
  • Updated almost 2 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

中文情感分析库(Chinese Sentiment))可对文本进行情绪分析、正负情感分析。Chinese sentiment analysis library, which supports counting the number of different emotional words in the text

Now cnsenti has been integrated into cntext, welcome to star!

cnsenti已停止维护,相关功能全部合并到 cntext库中

cnsenti

中文文档

The cnsenti library can perform sentiment analysis emotion analysis on chinese texts.

Features

  • default sentiment dictionary is Hownet
  • default emotion dicitonary is DLUT emotion dictionary, support 7 category emotion,such as happy/sad/hate...etc/
  • support importing custom txt sentiment dictionaries (pos and neg)

Notes

The emotional ontology library of Dalian University of Technology used for sentiment analysis in the code. If you publish a paper, please pay attention to the user license agreement

  1. The emotional vocabulary ontology is independently organized and marked by the Information Retrieval Laboratory of Dalian University of Technology, and can be used by universities, research institutes and individuals at home and abroad for academic research purposes.
  2. If any institute or individual needs to use it for commercial purposes, please send an email to [email protected] for negotiation.
  3. If you find any errors or improprieties in this resource during use, users are welcome to send your valuable comments to the mailbox [email protected], and we will solve them as quickly as possible.
  4. If users use this resource to publish papers or obtain scientific research results, please add words such as "using the emotional vocabulary ontology of the Information Retrieval Laboratory of Dalian University of Technology" to the paper to declare.
  5. Add the citation "徐琳宏,林鸿飞,潘宇,等.情感词汇本体的构造[J]. 情报学报, 2008, 27(2): 180-185."
  6. Any user who obtains the resource through copying or other informal downloads should also abide by the license agreement. The Information Retrieval Laboratory of Dalian University of Technology has the final right to interpret and modify the license agreement.

Installation

method 1

pip install cnsenti

method 2

pip install cnsenti -i https://pypi.tuna.tsinghua.edu.cn/simple/

Quick Start

Count the number of positive and negative emotional words in Chinese text

from cnsenti import Sentiment

senti = Sentiment()
test_text= '我好开心啊,非常非常非常高兴!今天我得了一百分,我很兴奋开心,愉快,开心'
result = senti.sentiment_count(test_text)
print(result)

Run

{'words': 24, 
'sentences': 2, 
'pos': 4, 
'neg': 0}

Count the number of words with different emotions in Chinese text

from cnsenti import Emotion

emotion = Emotion()
test_text = '我好开心啊,非常非常非常高兴!今天我得了一百分,我很兴奋开心,愉快,开心'
result = emotion.emotion_count(test_text)
print(result)

Run

{'words': 22, 
'sentences': 2, 
'好': 0, 
'乐': 4, 
'哀': 0, 
'怒': 0, 
'惧': 0, 
'恶': 0, 
'惊': 0}

Documents

cnsenti includes two class type: Emotion class and Sentiment class

  • Emotion class include emotion_count(text) method
  • Sentiment class includes method, such as **sentiment_count(text) ** and **sentiment_calculate(text) **

3.1 emotion_count(text)

emotion_count(text) is used to count the number of words that appear in various emotional adjectives in the text. Use Dalian University of Technology Emotion Ontology Database Dictionary to support 7 emotion statistics (好good, 乐happy, 哀sad, 怒angry, 惧fear, 恶disgust, 惊shock)

from cnsenti import Emotion

emotion = Emotion()
test_text = '我好开心啊,非常非常非常高兴!今天我得了一百分,我很兴奋开心,愉快,开心'
result = emotion.emotion_count(test_text)
print(result)

Run

{'words': 22, 
'sentences': 2, 
'好': 0, 
'乐': 4, 
'哀': 0, 
'怒': 0, 
'惧': 0, 
'恶': 0, 
'惊': 0}

detail

  • words :the words number of the chinese text
  • sentences: the sentence number of the chinese text
  • 好good, 乐happy, 哀sad, 怒angry, 惧fear, 恶disgust, 惊shock the words number of different emotion

3.2 sentiment_count(text)

隶属于Sentiment类,可对文本text中的正、负面词进行统计。默认使用Hownet词典,后面会讲到如何导入自定义正、负情感txt词典文件。这里以默认hownet词典进行统计。

sentiment_count(text) belongs to the Sentiment class and can count the positive and negative words number of the chinese text. The Hownet dictionary is used by default, and Sentiment class support custom positive and negative emotion dictionary txt files.

Here we use the default hownet dictionary to count the word number of chinese text.

from cnsenti import Sentiment

senti = Sentiment()
test_text = '我好开心啊,非常非常非常高兴!今天我得了一百分,我很兴奋开心,愉快,开心'
result = senti.sentiment_count(test_text)
print(result)

Run

{'words': 24, 
'sentences': 2, 
'pos': 4, 
'neg': 0}

Detail

  • words :the words number of the chinese text

  • sentences: the sentence number of the chinese text

  • pos: the positive words number of text chinese text

  • neg: the positive words number of text chinese text

3.3 sentiment_calculate(text)

隶属于Sentiment类,可更加精准的计算文本的情感信息。相比于sentiment_count只统计文本正负情感词个数,sentiment_calculate还考虑了

sentiment_calculate(text) belongs to the Sentiment class, which can calculate the emotional information of the chinese text more accurately. Compared with sentiment_count only counts the number of positive and negative sentiment words in the text, sentiment_calculate also considers

  • Is there a modifier of strength adverbs before emotional words
  • Is there an emotional semantic reversal effect of negative words before emotional words?

for examples:

from cnsenti import Sentiment

senti = Sentiment()
test_text = '我好开心啊,非常非常非常高兴!今天我得了一百分,我很兴奋开心,愉快,开心'
result1 = senti.sentiment_count(test_text)
result2 = senti.sentiment_calculate(test_text)
print('sentiment_count',result1)
print('sentiment_calculate',result2)

Run

sentiment_count 
{'words': 22, 
'sentences': 2, 
'pos': 4, 
'neg': 0}

sentiment_calculate 
{'sentences': 2, 
'words': 22, 
'pos': 27.0, 
'neg': 0.0}

3.4 custom dictionary

Let's first look at the sentence that contains emotional information but without emotional adjectives

from cnsenti import Sentiment
senti = Sentiment()      #两txt均为utf-8编码
test_text = '这家公司是行业的引领者,是中流砥柱。'
result1 = senti.sentiment_count(test_text)
result2 = senti.sentiment_calculate(test_text)
print('sentiment_count',result1)
print('sentiment_calculate',result2)

Run

sentiment_count {'words': 10, 'sentences': 1, 'pos': 0, 'neg': 0}
sentiment_calculate {'sentences': 1, 'words': 10, 'pos': 0, 'neg': 0}

As expected, although the sentence is positive, because cnsenti's own sentiment dictionary(Hownet) is only an adjective sentiment dictionary, for many scenarios, the applicability is limited, so pos=0.

3.4.1 the format of custom dictionary

cnsenti supports importing custom dictionaries, but currently only Sentiment supports importing custom positive and negative emotion dictionaries, custom dictionaries need to meet

  • Must be a txt file
  • In principle, encoding is recommended to be utf-8
  • only one word per line in txt file

3.4.2 Sentiment custom dictionary parameters

senti = Sentiment(pos='正面词自定义.txt',  
                  neg='负面词自定义.txt', 
                  merge=True,  
                  encoding='utf-8')
  • pos Positive sentiment dictionary txt file path
  • neg Negative sentiment dictionary txt file path
  • merge Boolean;merge=True, cnsenti will merge the custom dictionary with cnsenti's own dictionary; merge=False, cnsenti will only use the custom dictionary
  • encoding Both pos and neg txt are utf-8 encoding

3.4.3 Custom dictionary use case

I put this part in the test folder, the code and the custom dictionary are in the test, so I use the relative path to set the path of the custom dictionary

|test
   |---代码.py           #code
   |---正面词自定义.txt   #pos custom dictionary txt
   |---负面词自定义.txt   #neg custom dictionary txt

正面词自定义.txt pos custom dictionary txt

中流砥柱
引领者
from cnsenti import Sentiment

senti = Sentiment(pos='正面词自定义.txt',  #正面词典txt文件相对路径
                  neg='负面词自定义.txt',  #负面词典txt文件相对路径
                  merge=True,             #融合cnsenti自带词典和用户导入的自定义词典
                  encoding='utf-8')      #两txt均为utf-8编码

test_text = '这家公司是行业的引领者,是中流砥柱。今年的业绩非常好。'
result1 = senti.sentiment_count(test_text)
result2 = senti.sentiment_calculate(test_text)
print('sentiment_count',result1)
print('sentiment_calculate',result2)

Run

sentiment_count {'words': 16, 'sentences': 2, 'pos': 2, 'neg': 0}
sentiment_calculate {'sentences': 2, 'words': 16, 'pos': 5, 'neg': 0}

For the above parameters, we passed in the positive custom dictionary and the negative custom dictionary, and used the fusion mode (merge=True). You can use the cnsenti's own dictionary and the newly imported custom dictionary for sentiment calculation.

notes:

The library I designed currently only supports two types(for example pos and neg). If your research question is two classification problems, such as good and bad, beautiful and ugly, friendly and hostile etc., you can define two txt files, respectively assign values to pos and neg, after this setting, you can use the cnsenti library to solve your research quesiton.


More Repositories

1

cntext

文本分析包,支持字数统计、可读性、文档相似度、情感分析在内的多种文本分析方法。chinese text sentiment analysis
Python
250
star
2

ChineseTextAnalysisResouce

中文文本分析相关资源汇总
120
star
3

DaDengAndHisPython

【微信公众号:大邓和他的python】, Python语法快速入门https://www.bilibili.com/video/av44384851 Python网络爬虫快速入门https://www.bilibili.com/video/av72010301, 我的联系邮箱[email protected]
Jupyter Notebook
95
star
4

wordexpansion

使用SO_PMI互信息算法、词向量法快速构建不同领域(手机、汽车等)的专业情感词典
Python
85
star
5

shreport

上海证券交易所上市公司定期报告下载,项目地址
Python
79
star
6

eventextraction

中文复合事件抽取,能识别文本的模式,包括条件事件、顺承事件、反转事件等,可以用于文本逻辑性分析。
Python
23
star
7

cntopic

简单好用的lda话题模型,支持中英文。该库基于gensim和pyLDAvis,实现了lda话题模型及可视化功能。
Jupyter Notebook
22
star
8

weibo_crawler

weibo_crawler参考【nghuyong/WeiboSpider】https://github.com/nghuyong/WeiboSpider 对代码用法进行了简化,可以做轻度的微博数据采集。
Python
13
star
9

simtext

计算两文档间文本相似性指标
Python
12
star
10

smartscraper

SmartScraper: 简单、自动、快捷的Python网络爬虫
Python
11
star
11

text_analysis_about_social_science

社会科学类文本分析快速指南
9
star
12

Tool_Kits

涵盖网络爬虫、数据库、数据分析、机器学习、可视化、文本分析、GUI、自动化办公
8
star
13

multistop

停用词表, 同时支持中英德等15种语言。
Python
5
star
14

DataCollectionAndTextMiningWithPython

含Python语法入门、网络爬虫、文本分析、机器学习等,全部课程均含有代码课件,欢迎前来学习
Jupyter Notebook
5
star
15

hidadeng.github.io

大邓的个人博客,博客域名在下方, 访问可能有点慢啊。
HTML
4
star
16

spacyDemo

中英文在线spacy演示案例;
Python
3
star
17

pdfdocx

读取pdf、docx文件,返回文件内的文本数据。
Python
2
star
18

hidadeng

2
star
19

bsite

bsite是用于采集B站用户视频列表页、视频评论数据的python包。
Jupyter Notebook
2
star
20

ElegantBook

成功编译出带目录、参考文献的项目。
TeX
1
star
21

DengStartPage

拥有快捷关键词搜索方法的浏览器启动页
JavaScript
1
star
22

hugo-themes

1
star
23

my-favorite-typora-themes

favorite-typora-themes
CSS
1
star
24

tidytextpy

Python版的tidytext文本分析包
Jupyter Notebook
1
star
25

cnsentiDemo

Python
1
star