• Stars
    star
    334
  • Rank 125,521 (Top 3 %)
  • Language
  • License
    Other
  • Created over 5 years ago
  • Updated 11 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Medical Question Answering Dataset of 47,457 QA pairs created from 12 NIH websites
-------------------------------------------
MedQuAD: Medical Question Answering Dataset  
-------------------------------------------

MedQuAD includes 47,457 medical question-answer pairs created from 12 NIH websites (e.g. cancer.gov, niddk.nih.gov, GARD, MedlinePlus Health Topics). The collection covers 37 question types (e.g. Treatment, Diagnosis, Side Effects) associated with diseases, drugs and other medical entities such as tests.  

We included additional annotations in the XML files, that could be used for diverse IR and NLP tasks, such as the question type, the question focus, its syonyms, its UMLS Concept Unique Identifier (CUI) and Semantic Type. 
We  added the category of the question focus (Disease, Drug or Other) in the 4 MedlinePlus collections. All other collections are about diseases.  
 
The paper cited below describes the collection, the construction method as well as its use and evaluation within a medical question answering system.   

N.B. We removed the answers from 3 subsets to respect the MedlinePlus copyright (https://medlineplus.gov/copyright.html):  
(1) A.D.A.M. Medical Encyclopedia, (2) MedlinePlus Drug information, and (3) MedlinePlus Herbal medicine and supplement information. 
-- We kept all the other information including the URLs in case you want to crawl the answers. Please contact me if you have any questions.  
 
-------------------
QA Test Collection  
-------------------

We used the test questions of the TREC-2017 LiveQA medical task:  https://github.com/abachaa/LiveQA_MedicalTask_TREC2017/tree/master/TestDataset. 

As described in our BMC paper, we have manually judged the answers retrieved by the IR and QA systems from the MedQuAD collection. 
We used the same judgment scores as the LiveQA Track: 1-Incorrect, 2-Related, 3-Incomplete, and 4-Excellent. 
-- Format of the qrels file: Question_ID judgment Answer_ID 

The QA test collection contains 2,479 judged answers that can be used to evaluate the performance of IR & QA systems on the LiveQA-Med test questions:  https://github.com/abachaa/MedQuAD/blob/master/QA-TestSet-LiveQA-Med-Qrels-2479-Answers.zip

----------
Reference  
---------- 

If you use the MedQuAD dataset and/or the collection of 2,479 judged answers, please cite the following paper: "A Question-Entailment Approach to Question Answering". Asma Ben Abacha and Dina Demner-Fushman. BMC Bioinformatics, 2019.    

	@ARTICLE{BenAbacha-BMC-2019,    
		  author    = {Asma {Ben Abacha} and Dina Demner{-}Fushman},
		  title     = {A Question-Entailment Approach to Question Answering},
		  journal = {{BMC} Bioinform.}, 
		  volume    = {20},
  		  number    = {1},
     		  pages     = {511:1--511:23},
  		  year      = {2019},
  	url       = {https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-019-3119-4}
		   }     
		   

More Repositories

1

Existing-Medical-QA-Datasets

Multimodal Question Answering in the Medical Domain: A summary of Existing Datasets and Systems
212
star
2

VQA-Med-2019

Visual Question Answering in the Medical Domain VQA-Med 2019
79
star
3

MEDIQA2019

Challenge on Textual Inference and Question Entailment in the Medical Domain https://sites.google.com/view/mediqa2019
Python
50
star
4

MTS-Dialog

A new collection of 1.7k doctor-patient conversations and corresponding clinical notes/summaries.
48
star
5

MEDIQA-Chat-2023

MEDIQA-Chat Shared Tasks @ ACL-ClinicalNLP 2023
Python
45
star
6

LiveQA_MedicalTask_TREC2017

Medical Question-Answering datasets prepared for the TREC 2017 LiveQA challenge (Medical Task)
38
star
7

MeQSum

Dataset for medical question summarization introduced in the ACL 2019 paper "On the Summarization of Consumer Health Questions" (A. Ben Abacha & D. Demner-Fushman)
26
star
8

MEDIQA2021

Python
21
star
9

VQA-Med-2021

VQA-Med 2021
Python
16
star
10

RQE_Data_AMIA2016

The medical question entailment data introduced in the AMIA 2016 Paper (Recognizing Question Entailment for Medical Question Answering)
14
star
11

Medication_QA_MedInfo2019

The gold standard corpus for medication question answering introduced in the MedInfo 2019 paper (Bridging the Gap between Consumers’ Medication Questions and Trusted Answers)
14
star
12

VQA-Med-2020

VQA-Med 2020
Python
12
star
13

MEDIQA-CORR-2024

Jupyter Notebook
11
star
14

3D-MIR

3D Medical Image Retrieval in Radiology
Jupyter Notebook
7
star
15

ImageCLEF-CaptionTask-2021

ImageCLEFmed 2021 - Caption Prediction and Concept Detection Tasks
2
star
16

EvaluationMetrics-ACL23

Python
2
star