• Stars
    star
    495
  • Rank 88,974 (Top 2 %)
  • Language
    Python
  • License
    Apache License 2.0
  • Created over 5 years ago
  • Updated 6 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Facilitating the design, comparison and sharing of deep text matching models.
logo

MatchZoo-py Tweet

PyTorch version of MatchZoo.

Facilitating the design, comparison and sharing of deep text matching models.
MatchZoo ๆ˜ฏไธ€ไธช้€š็”จ็š„ๆ–‡ๆœฌๅŒน้…ๅทฅๅ…ทๅŒ…๏ผŒๅฎƒๆ—จๅœจๆ–นไพฟๅคงๅฎถๅฟซ้€Ÿ็š„ๅฎž็Žฐใ€ๆฏ”่พƒใ€ไปฅๅŠๅˆ†ไบซๆœ€ๆ–ฐ็š„ๆทฑๅบฆๆ–‡ๆœฌๅŒน้…ๆจกๅž‹ใ€‚

Python 3.6 Pypi Downloads Documentation Status Build Status codecov License Requirements Status Gitter

The goal of MatchZoo is to provide a high-quality codebase for deep text matching research, such as document retrieval, question answering, conversational response ranking, and paraphrase identification. With the unified data processing pipeline, simplified model configuration and automatic hyper-parameters tunning features equipped, MatchZoo is flexible and easy to use.

Tasks Text 1 Text 2 Objective
Paraphrase Indentification string 1 string 2 classification
Textual Entailment text hypothesis classification
Question Answer question answer classification/ranking
Conversation dialog response classification/ranking
Information Retrieval query document ranking

Get Started in 60 Seconds

To train a Deep Semantic Structured Model, make use of MatchZoo customized loss functions and evaluation metrics to define a task:

import torch
import matchzoo as mz

ranking_task = mz.tasks.Ranking(losses=mz.losses.RankCrossEntropyLoss(num_neg=4))
ranking_task.metrics = [
    mz.metrics.NormalizedDiscountedCumulativeGain(k=3),
    mz.metrics.MeanAveragePrecision()
]

Prepare input data:

train_pack = mz.datasets.wiki_qa.load_data('train', task=ranking_task)
valid_pack = mz.datasets.wiki_qa.load_data('dev', task=ranking_task)

Preprocess your input data in three lines of code, keep track parameters to be passed into the model:

preprocessor = mz.models.ArcI.get_default_preprocessor()
train_processed = preprocessor.fit_transform(train_pack)
valid_processed = preprocessor.transform(valid_pack)

Generate pair-wise training data on-the-fly:

trainset = mz.dataloader.Dataset(
    data_pack=train_processed,
    mode='pair',
    num_dup=1,
    num_neg=4,
    batch_size=32
)
validset = mz.dataloader.Dataset(
    data_pack=valid_processed,
    mode='point',
    batch_size=32
)

Define padding callback and generate data loader:

padding_callback = mz.models.ArcI.get_default_padding_callback()

trainloader = mz.dataloader.DataLoader(
    dataset=trainset,
    stage='train',
    callback=padding_callback
)
validloader = mz.dataloader.DataLoader(
    dataset=validset,
    stage='dev',
    callback=padding_callback
)

Initialize the model, fine-tune the hyper-parameters:

model = mz.models.ArcI()
model.params['task'] = ranking_task
model.params['embedding_output_dim'] = 100
model.params['embedding_input_dim'] = preprocessor.context['embedding_input_dim']
model.guess_and_fill_missing_params()
model.build()

Trainer is used to control the training flow:

optimizer = torch.optim.Adam(model.parameters())

trainer = mz.trainers.Trainer(
    model=model,
    optimizer=optimizer,
    trainloader=trainloader,
    validloader=validloader,
    epochs=10
)

trainer.run()

References

Tutorials

English Documentation

If you're interested in the cutting-edge research progress, please take a look at awaresome neural models for semantic match.

Install

MatchZoo-py is dependent on PyTorch. Two ways to install MatchZoo-py:

Install MatchZoo-py from Pypi:

pip install matchzoo-py

Install MatchZoo-py from the Github source:

git clone https://github.com/NTMC-Community/MatchZoo-py.git
cd MatchZoo-py
python setup.py install

Models

Citation

If you use MatchZoo in your research, please use the following BibTex entry.

@inproceedings{Guo:2019:MLP:3331184.3331403,
 author = {Guo, Jiafeng and Fan, Yixing and Ji, Xiang and Cheng, Xueqi},
 title = {MatchZoo: A Learning, Practicing, and Developing System for Neural Text Matching},
 booktitle = {Proceedings of the 42Nd International ACM SIGIR Conference on Research and Development in Information Retrieval},
 series = {SIGIR'19},
 year = {2019},
 isbn = {978-1-4503-6172-9},
 location = {Paris, France},
 pages = {1297--1300},
 numpages = {4},
 url = {http://doi.acm.org/10.1145/3331184.3331403},
 doi = {10.1145/3331184.3331403},
 acmid = {3331403},
 publisher = {ACM},
 address = {New York, NY, USA},
 keywords = {matchzoo, neural network, text matching},
} 

Development Team

โ€‹ โ€‹ โ€‹ โ€‹

โ€‹ faneshion
โ€‹ Yixing Fan โ€‹

Core Dev
ASST PROF, ICT

โ€‹
Chriskuei
Jiangui Chen โ€‹

Core Dev
PhD. ICT

โ€‹
โ€‹ caiyinqiong
Yinqiong Cai

Core Dev
M.S. ICT

โ€‹
โ€‹ pl8787
โ€‹ Liang Pang โ€‹

Core Dev
ASST PROF, ICT

โ€‹
โ€‹ lixinsu
โ€‹ Lixin Su

Dev
PhD. ICT

โ€‹
โ€‹ ChrisRBXiong
โ€‹ Ruibin Xiong โ€‹

Dev
M.S. ICT

โ€‹
โ€‹ dyuyang
โ€‹ Yuyang Ding โ€‹

Dev
M.S. ICT

โ€‹
โ€‹ rgtjf
โ€‹ Junfeng Tian โ€‹

Dev
M.S. ECNU

โ€‹
โ€‹ wqh17101
โ€‹ Qinghua Wang โ€‹

Documentation
B.S. Shandong Univ.

โ€‹

Contribution

Please make sure to read the Contributing Guide before creating a pull request. If you have a MatchZoo-related paper/project/compnent/tool, send a pull request to this awesome list!

Thank you to all the people who already contributed to MatchZoo!

Bo Wang, Zeyi Wang, Liu Yang, Zizhen Wang, Zhou Yang, Jianpeng Hou, Lijuan Chen, Yukun Zheng, Niuguo Cheng, Dai Zhuyun, Aneesh Joshi, Zeno Gantner, Kai Huang, stanpcf, ChangQF, Mike Kellogg

Project Organizers

  • Jiafeng Guo
    • Institute of Computing Technology, Chinese Academy of Sciences
    • Homepage
  • Yanyan Lan
    • Institute of Computing Technology, Chinese Academy of Sciences
    • Homepage
  • Xueqi Cheng
    • Institute of Computing Technology, Chinese Academy of Sciences
    • Homepage

License

Apache-2.0

Copyright (c) 2019-present, Yixing Fan (faneshion)