Must-read Papers on Sememe Computation
Contributed by Fanchao Qi.
Introduction
A sememe is defined as the minimum semantic unit in linguistics. Some linguists believe that meanings of all words can be decomposed of a limited set of sememes.
Sememes can help us comprehend human languages better. Some studies have proved that neural NLP models benefit from the incorporation of sememes.
HowNet is the most famous sememe-based knowledge base. It predefines a set of 2,000 sememes and uses them to annotate over 100,000 Chinese and English words.
OpenHowNet, developed by THUNLP, opens source core data of HowNet and provides convenient data access APIs.
Papers
Introduction
- Introduction to HowNet. Zhendong Dong and Qiang Dong. [pdf (Chinese)]
This paper gives an overall introduction to HowNet, including its features, philosophy and constructing method.
- HowNet - a hybrid language and knowledge resource. Zhendong Dong and Qiang Dong. NLP-KE 2003. [pdf]
This paper gives a brief introduction to HowNet.
- KDML — Knowledge Database Mark-up Language. Zhendong Dong and Qiang Dong. [pdf (Chinese)]
This paper gives a detailed introduction to Knowledge Database Mark-up Language, the mark-up language used in HowNet.
- Sememe knowledge computation: a review of recent advances in application and expansion of sememe knowledge bases. Fanchao Qi, Ruobing Xie, Yuan Zang, Zhiyuan Liu, Maosong Sun. Frontiers of Computer Science 2021. [pdf]
This paper summarizes the recent advances in application and expansion of sememe knowledge bases.
Applications of Sememes
- Sememe Knowledge and Auxiliary Information Enhanced Approach for Sarcasm Detection. Zhiyuan Wen, Lin Gui, Qianlong Wang, Mingyue Guo, Xiaoqi Yu, Jiachen Du, Ruifeng Xu. Information Processing & Management 2022. [pdf]
- Turn the Combination Lock: Learnable Textual Backdoor Attacks via Word Substitution. Fanchao Qi, Yuan Yao, Sophia Xu, Zhiyuan Liu, Maosong Sun. ACL-IJCNLP 2021. [pdf] [code]
- LET: Linguistic Knowledge Enhanced Graph Transformer for Chinese Short Text Matching. Boer Lyu, Lu Chen, Su Zhu, Kai Yu. AAAI-21. [pdf] [code]
- Conceptualized and Contextualized Gaussian Embedding. Chen Qian, Fuli Feng, Lijie Wen, Tat-Seng Chua. AAAI-21. [pdf]
- Chinese Lexical Simplification. Jipeng Qiang, Xinyu Lu, Yun Li, Yunhao Yuan, Xindong Wu. TASLP 2021. [pdf] [code]
- Word-level Textual Adversarial Attacking as Combinatorial Optimization. Yuan Zang, Fanchao Qi, Chenghao Yang, Zhiyuan Liu, Meng Zhang, Qun Liu, Maosong Sun. ACL 2020. [pdf] [code]
- Try to Substitute: An Unsupervised Chinese Word Sense Disambiguation Method Based on HowNet. Bairu Hou, Fanchao Qi, Yuan Zang, Xurui Zhang, Zhiyuan Liu, Maosong Sun. COLING 2020. [pdf] [code]
- End to End Chinese Lexical Fusion Recognition with Sememe Knowledge. Yijiang Liu, Meishan Zhang, Donghong Ji. COLING 2020. [pdf] [code]
- AliMe KG: Domain Knowledge Graph Construction andApplication in E-commerce. Feng-Lin Li, Hehong Chen, Guohai Xu, Tian Qiu, Feng Ji, Ji Zhang, Haiqing Chen. CIKM 2020. [pdf]
- Improving Sequence Modeling Ability of Recurrent Neural Networks via Sememes. Yujia Qin, Fanchao Qi, Sicong Ouyang, Zhiyuan Liu, Cheng Yang, Yasheng Wang, Qun Liu, Maosong Sun. TASLP 2020. [pdf] [code]
- Incorporating Sememes into Chinese Definition Modeling. Liner Yang, Cunliang Kong, Yun Chen, Yang Liu, Qinan Fan, Erhong Yang. TASLP 2020. [pdf] [code]
- Enhancing Transformer with Sememe Knowledge. Yuhui Zhang, Chenghao Yang, Zhengping Zhou, Zhiyuan Liu. Rep4NLP 2020. [pdf]
- Multi-channel Reverse Dictionary Model. Lei Zhang, Fanchao Qi, Zhiyuan Liu, Yasheng, Wang, Qun Liu, Maosong Sun. AAAI-20. [pdf] [code]
- K-BERT: Enabling Language Representation with Knowledge Graph. Weijie Liu, Peng Zhou, Zhe Zhao, Zhiruo Wang, Qi Ju, Haotang Deng and Ping Wang. AAAI-20. [pdf] [code]
- Leveraging Human Prior Knowledge to Learn Sense Representations. Tong Zhang, Wei Ye, Xiangyu Xi, Longyin Zhang, Shikun Zhang, Wen Zhao. ECAI 2020. [pdf]
- Lexical and Compositional Stream Learning for Event Detection with Sememe Knowledge. Jiale Yuan, Xin Xin, Ping Guo. ICIST 2020. [pdf]
- Modeling Semantic Compositionality with Sememe Knowledge. Fanchao Qi, Junjie Huang, Chenghao Yang, Zhiyuan Liu, Xiao Chen, Qun Liu and Maosong Sun. ACL 2019. [pdf] [code]
- Chinese Relation Extraction with Multi-Grained Information and External Linguistic Knowledge. Ziran Li, Ning Ding, Zhiyuan Liu, Haitao Zheng and Ying Shen. ACL 2019. [pdf] [code]
- Unsupervised Neural Aspect Extraction with Sememes. Ling Luo, Xiang Ao, Yan Song, Jinyao Li, Xiaopeng Yang, Qing He and Dong Yu. IJCAI 2019. [pdf]
- Semantic Hilbert Space for Text Representation Learning. Benyou Wang, Qiuchi Li, Massimo Melucci, Dawei Song. WWW 2019. [pdf] [code]
- Semantic Representation Learning Based on HowNet. Jingwen Zhu, Yuji Yang, Bin Xu and Juanzi Li. JCIP 2019. [pdf (Chinese)]
- A Word Representation Method Based on HowNet. Yang Chen and Zhiyong Luo. Acta Scientiarum Naturalium Universitatis Pekinensis 2019. [pdf (Chinese)]
- Evaluating Semantic Rationality of a Sentence: A Sememe-Word-Matching Neural Network based on HowNet. Shu Liu, Jingjing Xu and Xuancheng Ren. NLPCC 2019. [pdf]
- Language Modeling with Sparse Product of Sememe Experts. Yihong Gu, Jun Yan, Hao Zhu, Zhiyuan Liu, Ruobing Xie, Maosong Sun, Fen Lin and Leyu Lin. EMNLP 2018. [pdf] [code]
- Chinese LIWC Lexicon Expansion via Hierarchical Classification of Word Embeddings with Sememe Attention. Xiangkai Zeng, Cheng Yang, Cunchao Tu, Zhiyuan Liu and Maosong Sun. AAAI-18. [pdf] [code]
- Improved Word Representation Learning with Sememes. Yilin Niu, Ruobing Xie, Zhiyuan Liu and Maosong Sun. ACL 2017. [pdf] [code]
- Embedding for Words and Word Senses Based on Human Annotated Knowledge Base: A Case Study on HowNet. Maosong Sun and Xinxiong Chen. JCIP 2016. [pdf (Chinese)]
- Multi-aspect sentiment analysis for Chinese online social reviews based on topic modeling and HowNet lexicon. Xianghua Fu, Guo Li, Yanyan Guo and Zhiqiang Wang. Knowledge-Based Systems 2013. [pdf]
- Employing Morphological Structures and Sememes for Chinese Event Extraction. Peifeng Li and Guodong Zhou. COLING 2012. [pdf]
- Method of discriminant for Chinese sentence sentiment orientation based on HowNet. Lei Dang and Lei Zhang. Application Research of Computers 2010. [pdf (Chinese)]
- HowNet Based Chinese Question Automatic Classification. Jingguang Sun, Dongfeng Cai, Dexin Lv, and Yanju Dong. JCIP 2007. [pdf (Chinese)]
- Word Sense Disambiguation through Sememe Labeling. Xiangyu Duan, Jun Zhao and Bo Xu. IJCAI 2007. [pdf]
- Analogy Generation with HowNet. Tony Veale. IJCAI 2005. [pdf]
- Semantic orientation computing based on HowNet. Yanlan Zhu, Jin Min, Yaqian Zhou, Xuanjing Huang and Lide Wu. JCIP 2005. [pdf (Chinese)]
- Chinese word sense disambiguation using HowNet. Yuntao Zhang, Ling Gong and Yongcheng Wang. ICNC 2005. [pdf]
- Word Similarity Computing Based on HowNet. Qun Liu and Sujian Li. International Journal of Computational Linguistics & Chinese Language Processing 2002. [pdf (Chinese)]
Expansion of Sememe Knowledge Bases
- Bridging the Gap Between BabelNet and HowNet: Unsupervised Sense Alignment and Sememe Prediction. Xiang Zhang, Ning Shi, Bradley Hauer, Grzegorz Kondrak. EACL 2023. [pdf] [code]
- Glyph Enhanced Chinese Character Pre-Training for Lexical Sememe Prediction. Boer Lyu, Lu Chen, Kai Yu. Findings of ACL: EMNLP 2021. [pdf] [code]
- Automatic Construction of Sememe Knowledge Bases via Dictionaries. Fanchao Qi, Yanyi Chen, Fengyu Wang, Zhiyuan Liu, Maosong Sun. Findings of ACL: ACL-IJCNLP 2021. [pdf] [code]
- Towards Building a Multilingual Sememe Knowledge Base: Predicting Sememes for BabelNet Synsets. Fanchao Qi, Liang Chang, Maosong Sun, Sicong Ouyang and Zhiyuan Liu. AAAI-20. [pdf] [code]
- Sememe Tree Prediction for English-Chinese Word Pairs. Baoju Liu, Xuejun Shang, Liqing Liu, Yuanpeng Tan, Lei Hou, Juanzi Li. CCKS 2020. [pdf]
- Incorporating synonym for lexical sememe prediction: An attention-based model. Xiaojun Kang, Bing Li, Hong Yao, Qingzhong Liang, Shengwen Li, Junfang Gong, Xinchuan Li. Applied Sciences 2020. [pdf]
- Lexical Sememe Prediction using Dictionary Definitions by Capturing Local Semantic Correspondence. Jiaju Du, Fanchao Qi, Maosong Sun and Zhiyuan Liu. arXiv 2020. [pdf]
- OpenHowNet: An Open Sememe-based Lexical Knowledge Base. Fanchao Qi, Chenghao Yang, Zhiyuan Liu, Qiang Dong, Maosong Sun, Zhendong Dong. arXiv 2019. [pdf] [code]
- Cross-lingual Lexical Sememe Prediction. Fanchao Qi, Yankai Lin, Maosong Sun, Hao Zhu, Ruobing Xie and Zhiyuan Liu. EMNLP 2018. [pdf] [code]
- Lexical Sememe Prediction with RNN and Modern Chinese Dictionary. Mei Bai, Pin lv, Xu Long. ICNC-FSKD 2018. [pdf]
- Sememe Prediction: Learning Semantic Knowledge from Unstructured Textual Wiki Descriptions. Wei Li, Xuancheng Ren, Damai Dai, Yunfang Wu, Houfeng Wang and Xu Sun. arXiv 2018. [pdf] [code]
- Extended HowNet 2.0: An Entity-Relation Common-Sense Representation Model. Wei-Yun Ma and Yueh-Yin Shih. LREC 2018. [pdf]
- Incorporating Chinese Characters of Words for Lexical Sememe Prediction. Huiming Jin, Hao Zhu, Zhiyuan Liu, Ruobing Xie, Maosong Sun, Fen Lin and Leyu Lin. ACL 2018. [pdf] [code]
- Lexical Sememe Prediction via Word Embeddings and Matrix Factorization. Ruobing Xie, Xingchi Yuan, Zhiyuan Liu and Maosong Sun. IJCAI 2017. [pdf] [code]
- E-HowNet and Automatic Construction of a Lexical Ontology. Wei-Te Chen, Su-Chu Lin, Shu-Ling Huang, You-Shan Chung and Keh-Jiann Chen. COLING 2010. [pdf]
- Extended-HowNet: A Representational Framework for Concepts. Keh-Jiann Chen, Shu-Ling Huang, Yueh-Yin Shih and Yi-Jun Chen. OntoLex 2005. [pdf]