• Stars
    star
    124
  • Rank 278,885 (Top 6 %)
  • Language
    Python
  • License
    MIT License
  • Created over 5 years ago
  • Updated over 5 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

LiveBot: Generating Live Video Comments Based on Visual and Textual Contexts (AAAI 2019)

LiveBot

This is the codes and datasets for the papers: LiveBot: Generating Live Video Comments Based on Visual and Textual Contexts.

What is Live Video Comments?

Live video commenting, which is also called ''video barrage'' (''弹幕'' in Chinese or ''Danmaku'' in Japanese), is an emerging feature on online video sites that allows real-time comments from viewers to fly across the screen like bullets or roll at the right side of the screen.

Requirements

  • Ubuntu 16.0.4
  • Python 3.5
  • Pytorch 0.4.1
  • Sklearn >= 0.19.1

Datasets

  • Processed dataset can be directly used for our codes to reproduce the results reported in the paper. It should be downloaded from Google Drive or Baidu Pan, and put in the folder /data.

  • Raw dataset consists of the videos and the corresponding live comments that directly downloaded from the Bilibili video websites. It can be found at Google Drive or Baidu Pan. After processed with the scripts in the folder /data, it can be transformed into the processed datasets above.

Livebot Model

  • Step 1: Download the processed dataset above

  • Step 2: Train a model

    python3 codes/transformer.py -mode train -dir CKPT_DIR
    
  • Step 3: Restore the checkpoint and evaluate the model

    python3 codes/transformer.py -mode test -restore CKPT_DIR/checkpoint.pt -dir CKPT_DIR
    

Process a raw dataset (Optional)

  • Step 1: Extract the frames from the videos and the comments from the .ass files.
    python3 data/extract.py
    
  • Step 2: Convert the extracted images and text into the format required by our model.
    python3 data/preprocess.py
    
  • Step 3: Construct the candidate set for the evaluation of the model.
    python3 data/add_candidate.py
    

Note

  • More details regarding the model and the dataset can be found in our paper.

  • The code is currently non-deterministic due to various GPU ops, so you are likely to end up with a slightly better or worse evaluation.

Citation

Hopefully the codes and the datasets are useful for the future research. If you use the above codes or datasets for your research, please kindly cite our paper:

@inproceedings{livebot,
  author    = {Shuming Ma and
               Lei Cui and
               Damai Dai and
               Furu Wei and
               Xu Sun},
  title     = {LiveBot: Generating Live Video Comments Based on Visual and Textual Contexts},
  booktitle = {{AAAI} 2019},
  year      = {2019}
}

More Repositories

1

pkuseg-python

pkuseg多领域中文分词工具; The pkuseg toolkit for multi-domain Chinese word segmentation
Python
6,430
star
2

SGM

Sequence Generation Model for Multi-label Classification (COLING 2018)
Python
429
star
3

Chinese-Literature-NER-RE-Dataset

A Discourse-Level Named Entity Recognition and Relation Extraction Dataset for Chinese Literature Text
399
star
4

Global-Encoding

Global Encoding for Abstractive Summarization (ACL 2018)
Python
273
star
5

Graph-to-seq-comment-generation

Code for the paper ``Coherent Comments Generation for Chinese Articles with a Graph-to-Sequence Model''
Python
174
star
6

SU4MLC

Code for the article "Semantic-Unit-Based Dilated Convolution for Multi-Label Text Classification" (EMNLP 2018)
Python
154
star
7

DPGAN

Diversity-Promoting Generative Adversarial Network for Generating Informative and Diversified Text (EMNLP2018)
Python
144
star
8

superAE

Code for "Autoencoder as Assistant Supervisor: Improving Text Representation for Chinese Social Media Text Summarization"
Python
136
star
9

AdaMod

Adaptive and Momental Bounds for Adaptive Learning Rate Methods.
Python
125
star
10

text-autoaugment

[EMNLP 2021] Text AutoAugment: Learning Compositional Augmentation Policy for Text Classification
Python
124
star
11

label-words-are-anchors

Repository for Label Words are Anchors: An Information Flow Perspective for Understanding In-Context Learning
Python
115
star
12

meProp

meProp: Sparsified Back Propagation for Accelerated Deep Learning (ICML 2017)
C#
110
star
13

Unpaired-Sentiment-Translation

Code for "Unpaired Sentiment-to-Sentiment Translation: A Cycled Reinforcement Learning Approach" (ACL 2018)
Python
107
star
14

WEAN

Code for "Query and Output: Generating Words by Querying Distributed Word Representations for Paraphrase Generation" (NAACL 2018)
Python
93
star
15

label-embedding-network

Label Embedding Network
Python
90
star
16

Prime

A simple module consistently outperforms self-attention and Transformer model on main NMT datasets with SoTA performance.
Python
85
star
17

AAPR

Automatic Academic Paper Rating: Data and Model (ACL 2018)
Python
72
star
18

Skeleton-Based-Generation-Model

Code for "A Skeleton-Based Model for Promoting Coherence Among Sentences in Narrative Story Generation" (EMNLP 2018)
Python
64
star
19

Explicit-Sparse-Transformer

code for Explicit Sparse Transformer
Python
55
star
20

SMAE

This is the code for "Learning Sentiment Memories for Sentiment Modification without Parallel Data".
Python
55
star
21

LancoSum

A toolkit for abstractive summarization, which is easy to implement the baseline and our proposed models, which can achieve the SOTA performance.
Python
50
star
22

Seq2Set

Code for the paper "A Deep Reinforced Sequence-to-Set Model for Multi-Label Classification"
Python
50
star
23

AMM

The code for "An Auto-Encoder Matching Model for Learning Utterance-Level Semantic Dependency in Dialogue Generation" (EMNLP 2018)
Python
49
star
24

bag-of-words

Code for "Bag-of-Words as Target for Neural Machine Translation"
Python
45
star
25

AdaNorm

Code for "Understanding and Improving Layer Normalization"
Python
43
star
26

SRB

Code for "Improving Semantic Relevance for Sequence-to-Sequence Learning of Chinese Social Media Text Summarization"
Python
41
star
27

DynamicKD

Code for EMNLP 2021 main conference paper "Dynamic Knowledge Distillation for Pre-trained Language Models"
Python
38
star
28

simNet

Code for "simNet: Stepwise Image-Topic Merging Network for Generating Detailed and Comprehensive Image Captions" (EMNLP 2018)
Python
37
star
29

Embedding-Poisoning

Code for the paper "Be Careful about Poisoned Word Embeddings: Exploring the Vulnerability of the Embedding Layers in NLP Models" (NAACL-HLT 2021)
Python
34
star
30

well-classified-examples-are-underestimated

Code for the AAAI 2022 publication "Well-classified Examples are Underestimated in Classification with Deep Neural Networks"
Jupyter Notebook
32
star
31

IAIS

[ACL 2021] Learning Relation Alignment for Calibrated Cross-modal Retrieval
Python
30
star
32

Chinese-Dependency-Treebank-with-Ellipsis

An Ellipsis-aware Chinese Dependency Treebank for Web Text
Python
26
star
33

DeconvDec

Code for "Deconvolution-Based Global Decoding for Neural Machine Translation" (COLING 2018).
Python
26
star
34

HSSC

Code for "A Hierarchical End-to-End Model for Jointly Improving Text Summarization and Sentiment Classification" (IJCAI 2018)
Python
23
star
35

tcm_prescription_generation

Code for "Exploration on Generating Traditional Chinese Medicine Prescriptions from Symptoms with an End-to-End Approach"
Python
23
star
36

clip-openness

[ACL 2023] Delving into the Openness of CLIP
Python
22
star
37

CGM

Code for IJCAI 2021 main conference paper "Long-term, Short-term and Sudden Event: Trading Volume Movement Prediction with Graph-based Multi-view Modeling"
Python
21
star
38

SOS

Code for the paper "Rethinking Stealthiness of Backdoor Attack against NLP Models" (ACL-IJCNLP 2021)
Jupyter Notebook
21
star
39

codable-watermarking-for-llm

Repository for Towards Codable Watermarking for Large Language Models
Python
20
star
40

CMAC

The dataset and code for the paper "Cross-Modal Commentator: Automatic Machine Commenting Based on Cross-Modal Information"
Python
20
star
41

RAP

Code for the paper "RAP: Robustness-Aware Perturbations for Defending against Backdoor Attacks on NLP Models" (EMNLP 2021)
Python
19
star
42

MUKI

[Findings of EMNLP22] From Mimicking to Integrating: Knowledge Integration for Pre-Trained Language Models
Python
19
star
43

ChineseNER

Code for "Cross-Domain and Semi-Supervised Named Entity Recognition in Chinese Social Media: A Unified Model"
Python
18
star
44

meSimp

Codes for "Training Simplification and Model Simplification for Deep Learning: A Minimal Effort Back Propagation Method"
C#
18
star
45

LexicalAT

Codes for paper "LexicalAT: Lexical-Based Adversarial Reinforcement Training for Robust Sentiment Classification"
Python
17
star
46

Pivot

Code for "Key Fact as Pivot: A Two-Stage Model for Low Resource Table-to-Text Generation" (ACL 2019)
Python
17
star
47

Avg-Avg

[Findings of EMNLP 2022] Holistic Sentence Embeddings for Better Out-of-Distribution Detection
Python
16
star
48

RMSC

Data and code for paper "Review-Driven Multi-Label Music Style Classification by Exploiting Style Correlations"
Python
14
star
49

Decode-CRF

Conditional Random Fields with Decode-based Learning
C#
14
star
50

agent-backdoor-attacks

Code&Data for the paper "Watch Out for Your Agents! Investigating Backdoor Threats to LLM-Based Agents"
14
star
51

nndep

Transition-based Dependency Parser with neural networks and hybrid oracle
C#
14
star
52

SAPO

C# code for "Towards Easier and Faster Sequence Labeling for Natural Language Processing: A Search-based Probabilistic Online Learning Framework (SAPO)" (Information Sciences)
C#
13
star
53

SACT

Code for the article "Automatic Temperature Control for Neural Machine Translation" (EMNLP 2018)
Python
13
star
54

Augmented_Data_for_FST

The augmented data of the paper "Parallel Data Augmentation for Formality Style Transfer" (ACL 2020).
12
star
55

ACA4NMT

Code of a novel model for NMT
Python
11
star
56

CascadeBERT

Code for CascadeBERT, Findings of EMNLP 2021
Python
11
star
57

DCKD

Code and data for Distributional Correlation–Aware Knowledge Distillation for Stock Trading Volume Prediction (ECML-PKDD 22)
Python
10
star
58

Multi-Order-LSTM

Code for "Does Higher Order LSTM Have Better Accuracy for Segmenting and Labeling Sequence Data?"
Python
9
star
59

SemPre

Towards Semantics-Enhanced Pre-Training: Can Lexicon Definitions Help Learning Sentence Meanings? (AAAI 2021)
Python
9
star
60

DAN

[Findings of EMNLP 2022] Expose Backdoors on the Way: A Feature-Based Efficient Defense against Textual Backdoor Attacks
Python
9
star
61

FedMNMT

[Findings of ACL 2023] Communication Efficient Federated Learning for Multilingual Machine Translation with Adapter
Python
9
star
62

CVST

Code for paper "Knowledgeable Storyteller: A Commonsense-Driven Generative Model for Visual Storytelling"
7
star
63

Early-Exit

Code for the paper: A Global Past-Future Early Exit Method for Accelerating Inference of Pre-trained Language Models.
Python
7
star
64

Multi-Task-Learning

Online Multi-Task Learning Toolkit based on C#; code for "Large-Scale Personalized Human Activity Recognition using Online Multi-Task Learning" (TKDE)
C#
6
star
65

NLP_Code_Index

codes and papers from @lancopku
5
star
66

CRF-ADF

CRF Toolkit based on C#; support ADF (Adaptive stochastic gradient Decent based on Feature-frequency information, ACL 2012)
C#
4
star
67

GKD

Python
4
star
68

Sememe_prediction

Code for paper "Sememe Prediction: Learning Semantic Knowledge from Unstructured Textual Wiki Descriptions"
Python
3
star
69

LPVDN

Python code for paper - Learning Robust Representation for Clustering through Locality Preserving Variational Discriminative Network
Python
3
star
70

Attention-Augmentation

Python
2
star
71

GNOME

Code of the EACL 2023 Paper: Fine-Tuning Deteriorates General Textual Out-of-Distribution Detection by Distorting Task-Agnostic Features
1
star
72

MR-VPC

Towards Multimodal Video Paragraph Captioning Models Robust to Missing Modality
1
star