• Stars
    star
    221
  • Rank 179,773 (Top 4 %)
  • Language
    Python
  • Created about 1 year ago
  • Updated about 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

An Instruction-tuned Large Language Model for E-commerce

An Instruction-Following Large Language Model For E-commerce

Pytorcharxiv badge

Repo for EcomGPT: Instruction-tuning Large Language Models with Chain-of-Task Tasks for E-commerce

  • we proposed the first E-commerce instruction dataset EcomInstruct, with a total of 2.5 million instruction data.
  • EcomInstruct scales up the data size and task diversity by constructing atomic tasks with E-commerce basic data types, such as product information, user reviews. Atomic tasks are defined as intermediate tasks implicitly involved in solving a final task, which we also call Chain-of-Task tasks.
  • We developed EcomGPT by training the backbone model BLOOMZ with the EcomInstruct. Benefiting from the fundamental semantic understanding capabilities acquired from the Chain-of-Task tasks, EcomGPT exhibits excellent zero-shot generalization capabilities.

πŸ’‘ Perfomance

We perform a human evaluation on EcomGPT and ChatGPT using 12 E-commerce held-out datasets. EcomGPT outperforms or tied ChatGPT on 12 datasets.

πŸ›  Dependencies

pip install -r requirement.txt

Details

πŸ’» Model

The EcomGPT (7b1) is available at ModelScope.

πŸ“š Dataset (EcomInstruct)

We first open source 12 evaluation datasets. To ensure evaluation efficiency, each evaluation dataset is sampled with only 500 instances.

Dataset Lang. Task Metric
Lenove EN Named Entity Recognization F1, Rouge
Lenove EN Entity Span Detection Rouge
Reddit EN Extractive QA Rouge
ABSA EN Review Topic Classification F1, Rouge
MEPAVE ZH Attribute Value Recognization F1, Rouge
MEPAVE ZH Attribute Value Detection Rouge
Multi-CPR ZH Product Select Rouge
Multi-CPR ZH Product Align F1, Rouge
OpenBG ZH Title Attritube Matching F1, Rouge
OpenBG ZH Fine-grain Product Classify F1, Rouge
OpenBG ZH Coarse-grain Product Classify F1, Rouge
OpenBG ZH Title Generate Rouge

The dataset files satisfy the following file hierarchy:

.
β”œβ”€β”€ [Dataset Name]
β”‚   └── tasks
β”‚       └── [task name]
β”‚           β”œβ”€β”€ meta-info.json
β”‚           └── test.json
...
└── Reddit_QA
    └── tasks
        └── EN-Reddit_QA-Extract-Extract_QA
            β”œβ”€β”€ meta-info.json
            └── test.json

πŸ” Evaluation

One can evaluate the performance of EcomGPT with the following command:

python eval.py -tf ./test_tasks.txt -m [model name or path] -sn [result file name] -bdd [base dataset dir]

πŸ”₯ TODO

  • Open Source Weight of EcomGPT βœ…

πŸ“„ Citation

If you found this work useful, consider giving this repository a star and citing our paper as followed:

@article{li2023ecomgpt,
  title={EcomGPT: Instruction-tuning Large Language Models with Chain-of-Task Tasks for E-commerce},
  author={Li, Yangning and Ma, Shirong and Wang, Xiaobin and Huang, Shen and Jiang, Chengyue and Zheng, Hai-Tao and Xie, Pengjun and Huang, Fei and Jiang, Yong},
  journal={arXiv preprint arXiv:2308.06966},
  year={2023}
}

More Repositories

1

ACE

[ACL-IJCNLP 2021] Automated Concatenation of Embeddings for Structured Prediction
Python
299
star
2

HiAGM

Hierarchy-Aware Global Model for Hierarchical Text Classification
Python
206
star
3

SeqGPT

SeqGPT: An Out-of-the-box Large Language Model for Open Domain Sequence Understanding
Python
204
star
4

KB-NER

Winner system (DAMO-NLP) of SemEval 2022 MultiCoNER shared task over 10 out of 13 tracks.
Python
177
star
5

Multi-CPR

[SIGIR 2022] Multi-CPR: A Multi Domain Chinese Dataset for Passage Retrieval
Python
164
star
6

CLNER

[ACL-IJCNLP 2021] Improving Named Entity Recognition by External Context Retrieving and Cooperative Learning
Python
91
star
7

MultilangStructureKD

[ACL 2020] Structure-Level Knowledge Distillation For Multilingual Sequence Labeling
Python
71
star
8

MuVER

[EMNLP 2021] MuVER: Improving First-Stage Entity Retrieval with Multi-View Entity Representations
Python
30
star
9

ProtoRE

Code for 'Prototypical Representation Learning for Relation Extraction'.
Python
30
star
10

RankingGPT

code for paper γ€ŠRankingGPT: Empowering Large Language Models in Text Ranking with Progressive Enhancement》
Python
28
star
11

DAAT-CWS

Coupling Distant Annotation and Adversarial Training for Cross-Domain Chinese Word Segmentation
Python
22
star
12

AISHELL-NER

[ICASSP 2022] AISHELL-NER: Named Entity Recognition from Chinese Speech
21
star
13

HLATR

Hybrid List Aware Transformer Reranking
18
star
14

AIN

Code for our EMNLP 2020 Paper "AIN: Fast and Accurate Sequence Labeling with Approximate Inference Network"
Python
18
star
15

MANNER

[ACL 2023] MANNER: A Variational Memory-Augmented Model for Cross Domain Few-Shot Named Entity Recognition
Python
17
star
16

EBM-Net

Codes for the EMNLP'2020 paper "Predicting Clinical Trial Results by Implicit Evidence Integration".
Python
14
star
17

CDQA

CDQA: Chinese Dynamic Question Answering Benchmark
Python
13
star
18

StructuralKD

[ACL-IJCNLP 2021] Structural Knowledge Distillation: Tractably Distilling Information for Structured Predictor
Python
9
star
19

MarCo-Dialog

Python
3
star
20

IBKD

This is the official repository for the IBKD knowledge distillation method, as described in the paper .
Python
3
star
21

Vec-RA-ODQA

Source code of paper Improving "Retrieval Augmented Open-Domain Question-Answering with Vectorized Contexts
Python
2
star
22

Key-Point-Analysis

Python
1
star