• Stars
    star
    206
  • Rank 190,504 (Top 4 %)
  • Language
    Python
  • License
    MIT License
  • Created over 4 years ago
  • Updated almost 2 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Hierarchy-Aware Global Model for Hierarchical Text Classification

HiAGM: Hierarchy-Aware Global Model for Hierarchical Text Classification

This repository implements the hierarchy-aware structure encoders for mutual interaction between label space and text features. This work has been accepted as the long paper 'Hierarchy-Aware Global Model for Hierarchical Text Classification' in ACL 2020. The dataset splits of NYTimes (New York Times) and WoS (Web of Science) are proposed in this repository.

Hierarchy-Aware Global Model

The hierarchy-aware global model improves the conventional text classification model with prior knowledge of the predefined hierarchical structure. The project folder consists of following parts:

  • config: config files (json format)
  • data: data dir, could be changed in config file (with sample data)
  • data_modules: Dataset / DataLoader / Collator / Vocab
  • helper: Configure / Hierarchy_Statistic / Logger / Utils
  • models: StructureModel / EmbeddingLayer / TextEncoder / TextPropagation (HiAGM-TP) / Multi-Label Attention (HiAGM-LA)
  • train_modules: Criterions / EvaluationMetrics / Trainer

Hierarchy-Aware Structure Encoder

  • Bidirectional TreeLSTM: weighted_tree_lstm.py & tree.py
  • Hierarchy-GCN: graphcnn.py

Setup

  • Python >= 3.6
  • torch >= 0.4.1
  • numpy >= 1.17.4

Preprocess

data_modules.preprocess

  • transform to json format file {'token': List[str], 'label': List[str]}
  • clean stopwords
  • RCV1-V2: The preprocess code could refer to the repository of reuters_loader.
  • NYTimes & WoS: data.preprocess_nyt & data.preprocess_wos. Please download the origin datasets and then use these codes to preprocess for HTC.

Prior Probability

  • helper.hierarchical_statistic
  • Note that first change the Root.child List
  • calculate the prior probability between parent-child pair in train dataset

Train

python train.py config/gcn-rcv1-v2.json
  • optimizer -> train.set_optimizer: default torch.optim.Adam
  • learning rate decay schedule callback -> train_modules.trainer.update_lr
  • earlystop callback -> train.py
  • Hyper-parameters are set in config.train

Citation

Please cite our ACL 2020 paper:

@article{jie2020hierarchy,  
 title={Hierarchy-Aware Global Model for Hierarchical Text Classification},  
 author={Jie Zhou, Chunping Ma, Dingkun Long, Guangwei Xu, Ning Ding, Haoyu Zhang, Pengjun Xie, Gongshen Liu},  
 booktitle={Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (ACL)},
 year={2020}  
}

More Repositories

1

ACE

[ACL-IJCNLP 2021] Automated Concatenation of Embeddings for Structured Prediction
Python
299
star
2

EcomGPT

An Instruction-tuned Large Language Model for E-commerce
Python
221
star
3

SeqGPT

SeqGPT: An Out-of-the-box Large Language Model for Open Domain Sequence Understanding
Python
204
star
4

KB-NER

Winner system (DAMO-NLP) of SemEval 2022 MultiCoNER shared task over 10 out of 13 tracks.
Python
177
star
5

Multi-CPR

[SIGIR 2022] Multi-CPR: A Multi Domain Chinese Dataset for Passage Retrieval
Python
164
star
6

CLNER

[ACL-IJCNLP 2021] Improving Named Entity Recognition by External Context Retrieving and Cooperative Learning
Python
91
star
7

MultilangStructureKD

[ACL 2020] Structure-Level Knowledge Distillation For Multilingual Sequence Labeling
Python
71
star
8

MuVER

[EMNLP 2021] MuVER: Improving First-Stage Entity Retrieval with Multi-View Entity Representations
Python
30
star
9

ProtoRE

Code for 'Prototypical Representation Learning for Relation Extraction'.
Python
30
star
10

RankingGPT

code for paper 《RankingGPT: Empowering Large Language Models in Text Ranking with Progressive Enhancement》
Python
28
star
11

DAAT-CWS

Coupling Distant Annotation and Adversarial Training for Cross-Domain Chinese Word Segmentation
Python
22
star
12

AISHELL-NER

[ICASSP 2022] AISHELL-NER: Named Entity Recognition from Chinese Speech
21
star
13

HLATR

Hybrid List Aware Transformer Reranking
18
star
14

AIN

Code for our EMNLP 2020 Paper "AIN: Fast and Accurate Sequence Labeling with Approximate Inference Network"
Python
18
star
15

MANNER

[ACL 2023] MANNER: A Variational Memory-Augmented Model for Cross Domain Few-Shot Named Entity Recognition
Python
17
star
16

EBM-Net

Codes for the EMNLP'2020 paper "Predicting Clinical Trial Results by Implicit Evidence Integration".
Python
14
star
17

CDQA

CDQA: Chinese Dynamic Question Answering Benchmark
Python
13
star
18

StructuralKD

[ACL-IJCNLP 2021] Structural Knowledge Distillation: Tractably Distilling Information for Structured Predictor
Python
9
star
19

MarCo-Dialog

Python
3
star
20

IBKD

This is the official repository for the IBKD knowledge distillation method, as described in the paper .
Python
3
star
21

Vec-RA-ODQA

Source code of paper Improving "Retrieval Augmented Open-Domain Question-Answering with Vectorized Contexts
Python
2
star
22

Key-Point-Analysis

Python
1
star