• Stars
    star
    358
  • Rank 118,855 (Top 3 %)
  • Language
    Python
  • License
    Apache License 2.0
  • Created over 1 year ago
  • Updated 8 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Visual Med-Alpaca is an open-source, multi-modal foundation model designed specifically for the biomedical domain, built on the LLaMa-7B.

Visual Med-Alpaca: A Parameter-Efficient Biomedical LLM with Visual Capabilities [BLOG]

Chang Shu1*, Baian Chen2*, Fangyu Liu1, Zihao Fu1, Ehsan Shareghi3, Nigel Collier1

University of Cambridge1      Ruiping Health2     Monash University3

Abstract

Introducing Visual Med-Alpaca, an open-source, parameter-efficient biomedical foundation model that can be integrated with medical "visual experts" for multimodal biomedical tasks. Built upon the LLaMa-7B architecture (Touvron et al., 2023), this model is trained using an instruction set curated collaboratively by GPT-3.5-Turbo and human experts. Leveraging a few hours of instruction-tuning and the inclusion of plug-and-play visual modules, Visual Med-Alpaca can perform a diverse range of tasks, from interpreting radiological images to addressing complex clinical inquiries. The model can be replicated with ease, necessitating only a single consumer GPU.

Demo

Please fill out this form to access the online demo. Warning: Only for academic usage and do not apply to real clinical scenarios!

Overview

Domain-specific foundation models play a critical role in the biomedical field, as the language used in biomedical texts is highly specialized, often encompassing domain-specific concepts and relationships not found in general domain text corpora such as Wikipedia and Books. Empirical evidence demonstrates that pretraining on substantial amounts of biomedical text significantly improves language models' performance on various biomedical text mining tasks, as compared to existing publicly available pretrained language models (PLMs) (Lee et al., 2019; Gururangan et al., 2020, Gu et al., 2021).

Modern large language models (LLMs) necessitate an unprecedented level of computational resources for full-model fine-tuning. The cost of fine-tuning even a 7-billion-parameter LLM exclusively on PubMed is prohibitively expensive for the majority of academic institutions. Pretraining models on extensive medical image datasets to attain multimodal capabilities incurs even higher costs. Consequently, researchers are exploring more cost-effective techniques such as Adapter, Instruct-Tuning, and Prompt Augmentation to develop models that can be trained and deployed on consumer-level graphics cards while maintaining adequate performance. In the context of bridging text and vision for multimodal applications, training can also be similarly expensive (Alayrac et al., 2022). Besides, to the best of our knowledge, there is no publicly available multimodal generative foundation model specifically designed for biomedical applications.

In response to these challenges, we introduce Visual Med-Alpaca, an open-source, parameter-efficient biomedical foundation model that features a plug-and-play visual extension framework. To develop the Visual Med-Alpaca model, we initially create a biomedical instruction set by extracting medical questions from various medical datasets within the BigBIO repository (Fries et al., 2022). Subsequently, we prompt GPT-3.5-Turbo to synthesize answers for these questions. Multiple rounds of human filtering and editing are performed to refine the question-answer pairs, resulting in a high-quality instruction set comprising 54k data points. Next, we expand Med-Alpaca into Visual Med-Alpaca by connecting the textual model with "visual medical experts," which are specialized medical computer vision models. For instance, in radiology-domain applications, we train an in-house radiology image captioning model called Med-GIT (see later for details). When given an input image, a classifier determines if or which medical visual expert is responsible for the image. The designated medical expert then converts the image into a text prompt. The prompt manager subsequently merges the converted visual information with the textual query, enabling Med-Alpaca to generate an appropriate response.

Ongoing work. A paramount objective for the future is to thoroughly assess the medical proficiency and potential shortcomings of Visual Med-Alpaca, encompassing issues such as misleading medical advice and incorrect medical information. Moving beyond traditional benchmarking and manual evaluation methods, we aim to focus on different user groups, including doctors and patients, and evaluate all facets of the model through a user-centered approach. This comprehensive assessment will enable us to ensure the reliability and effectiveness of Visual Med-Alpaca in addressing various biomedical tasks and catering to the diverse needs of its users.

It is also important to note that Visual Med-Alpaca is strictly intended for academic research purposes and not legally approved for medical use in any country.

Resources:

Please submit a request to access the checkpoints, tokenizer as well as a huggingface served demo. We apologize for the inconvenience, and this is due to the safety concern and ethical requirements at Cambridge University.

Model Architecture and Training Pipeline

Visual Med-Alpaca bridges the textual and visual modalities through the prompt augmentation method. Firstly, the image input is fed into a type classifier to identify the appropriate module for converting visual information into an intermediate text format, which is then appended to the text inputs for subsequent reasoning procedures. For instance, medical plots are transformed into intermediate linearized tables through the use of the DePlot module. The prompt manager then merges the textual information extracted from images and text inputs into the prompt for Med-Alpaca, a large language model used for generating responses with the expertise in biomedical domain.

To incorporate biomedical knowledge and visual modality into the foundation model LLaMA-7B, we carried out fine-tuning using two distinct datasets. Initially, we performed standard fine-tuning and low-rank adaptation (LoRA) fine-tuning on LLaMA-7B model using a model-generated dataset comprising of 54,000 biomedical examples for instruction-tuning purposes. Secondly, we fine-tuned the Microsoft GIT model on the Radiology Objects in Context (ROCO) dataset to incorporate visual modality.

Domain Adaptation: Self-Instruct in the Biomedical Domain

The process of collecting inquiries from various medical question-and-answer datasets (MEDIQA RQE, MedQA, MedDialog, MEDIQA QA, PubMedQA) is implemented in our study. This approach aims to increase the diversity and thoroughness of the dataset and improve the accuracy and comprehensiveness of the obtained results.

We synthesize answers of these questions with GPT-3.5-Turbo in the self-instruct fashion. The GPT-3.5-Turbo model is equipped with advanced natural language processing capabilities that enable it to understand and generate human-like responses to a wide range of questions. This makes it a reliable tool for generating structural and informative answers.

The process of filtering and editing question-answer pairs was performed manually. A total of 54,000 turns were carefully selected, taking into account the criteria of balance and diversity.

Visual Experts: Radiology Image Captioning, DePlot, and More

Visual input constitutes a vital component of the medical domain, supplying indispensable information in healthcare environments. Healthcare professionals extensively depend on visual cues for diagnosis, monitoring, and treatment of patients. Medical imaging technologies, such as X-rays, CT scans, and MRIs, offer unparalleled insight into internal organs, detecting diseases and abnormalities that may be invisible to the naked eye. Additionally, scientific figures and medical records, including plots, charts, and tables, are prevalent in the medical field.

We propose linking visual experts with Med-Alpaca, as foundation model chaining presents a modular and highly adaptable framework for incorporating a diverse array of visual modules. Within this framework, any multimodal task can be divided into two essential stages: (1) the conversion of images to text, and (2) cognitive reasoning based on the derived text. In our context, visual experts (i.e., visual foundation models) transform medical images into an intermediate text representation. This converted data is then used to prompt a pretrained LLM, leveraging the inherent few-shot reasoning capabilities of LLMs to generate appropriate responses.

Currently, our platform supports two distinct visual experts: Med-GIT and DePlot, chosen due to the widespread presence of radiology images and plots within the medical domain. The system's architecture is also designed to enable seamless integration of alternative medical visual experts, and we plan to incorporate additional medical visual foundation models as visual experts in the near future.

The Med-GIT model represents a GIT: Generative Image-to-text Transformer for Vision and Language, fine-tuned specifically on the ROCO dataset to facilitate specialized radiology image captioning. The training procedure for the model is outlined in comprehensive detail in our publicly accessible Github repository.

Case Study

Input 1: What are the chemicals that treat hair loss? Image: No image.

  • Visual Med-Alpaca: There are several chemicals that can be used to treat hair loss, including: 1. Minoxidil: a medication that is applied to the scalp to promote hall growth. 2. Finasteride: a medication that is taken orally to slow down the production of dihydrotestosterone (DHT), a hormone that can cause hair loss. 3. Corticosteroids: topical medications that can be applied to the scalp to reduce inflammation and promote hair growth. 4. Anthralin: a medication that is applied to the scalp to promote hair growth by stimulating the production of blood cells that nourish the hair follicles. 5, Ketoconazole: a medication that is applied to the scalp to treat fungal infections that can cause hair loss. It is important to consult with a healthcare professional before using any of these chemicals to treat hair loss, as they can have potential side effects and may not be suitable for everyone.
OTHER METHODS
  • Alpaca-LoRA: The chemicals that treat hair loss include minoxidil, finasteride, and ketoconazole.
  • LLama-Adapter: Chemicals that treat hair loss include minoxidil, finasteride, and ketoconazole. Minoxidil is a topical solution that is applied directly to the scalp and is used to stimulate hair growth. Finasteride is a pill that is taken orally and is used to block the production of the hormone dihydrotestosterone, which can cause hair loss. Ketoconazole is an antifungal medication that is used to treat scalp infections that can cause hair loss.
  • ChatGPT (GPT-3.5-Turbo): There are several chemicals that are used to treat hair loss, including: 1. Minoxidil: This is an over-the-counter medication that is applied topically to the scalp. It works by increasing blood flow to the hair follicles, which can promote hair growth. 2. Finasteride: This is a prescription medication that is taken orally. It works by blocking the conversion of testosterone to dihydrotestosterone (DHT), which is a hormone that can cause hair loss. 3. Ketoconazole: This is an antifungal medication that is sometimes used to treat hair loss. It works by reducing inflammation and blocking the production of DHT. 4. Corticosteroids: These are anti-inflammatory medications that can be injected into the scalp. They are sometimes used to treat hair loss caused by autoimmune disorders or inflammation. 5. Spironolactone: This is a medication that is often used to treat high blood pressure, but it can also be effective in treating hair loss in women. It works by blocking the production of androgens, which can contribute to hair loss. It's important to note that the effectiveness of these treatments can vary depending on the cause and severity of hair loss. It's best to consult with a healthcare provider or a dermatologist to determine the most appropriate treatment for you.

Input 3: What is seen in the X-ray and what should be done?
Image:

  • Visual Med-Alpaca: Bilateral infiltrates and bilateral pleural effusion seen in a chest x-ray may indicate a respiratory infection or other respiratory conditions such as pneumonia pulmonary edema, or pleural effusion, it is important to consult a medical professional for a proper diagnosis and treatment plan.
OTHER METHODS
  • VQA Medical Model: the decubitus
  • Automatic Radiology Report Generation (Image-only): near complete opacification of the left hemi thorax with only a small amount of residual aerated lung in the left upper lobe .
  • ChatGPT (GPT-3.5-Turbo) + Automatic Radiology Report Generation (Image-only): Opacification of the lung on an X-ray indicates that there is an abnormality in the lung tissue. In this case, the near-complete opacification of the left hemithorax with only a small amount of residual aerated lung in the left upper lobe suggests that there is likely a significant pathology present in the left lung. Possible causes of this finding include pneumonia, lung abscess, tumor or mass, or pleural effusion (fluid accumulation around the lung). It is important to determine the underlying cause of this finding as soon as possible. A follow-up diagnostic workup may include a CT scan of the chest, sputum culture, bronchoscopy, and/or a biopsy of any suspicious areas in the lung. The appropriate treatment will depend on the underlying cause, but it may include antibiotics, drainage of any fluid collections, or surgery to remove any masses or tumors. Prompt evaluation and treatment are necessary to prevent complications and improve outcomes.

Input 3: How effective is this treatment on papule?
Image:

  • Visual Med-Alpaca: Based on the information provided, it appears that the treatment has been effective in reducing the size of the papule, The before and after images show a significant reduction in the size of the papule, indicating that the treatment has been successful, However, it is important to note that the control group may not have received any treatment, which could have also contributed to the reduction in size. It would be best to consult with a medical professional for a more accurate assessment of the effectiveness of the treatment.
OTHER METHODS
  • DePlot + GPT-3.5-Turbo: The table does not provide enough information to answer this question.
  • MatCha: 150
  • ChatGPT (GPT-3.5-Turbo, Text-only): I'm sorry, but I need more information to provide an accurate answer. There are many different types of papules, which are small, raised bumps on the skin, and each type may require different treatments. Additionally, the effectiveness of any particular treatment can depend on various factors, such as the severity of the condition, the individual's overall health, and the specifics of the treatment itself. If you can provide more information about the specific type of papule you are referring to and the treatment in question, I may be able to provide a more helpful answer.

Future Work

One of the most crucial ongoing works is the systematic evaluation of Visual Med-Alpaca, as well as other NLP models within the biomedical field. With the varying structure and type of medical data, it is essential to assess the efficacy of NLP models and their generalizability across different datasets.

We also expect pretraining on medical data can enhance the performance of NLP models in the biomedical field. It should help in the identification and reasoning of disease phenotypes, drug mechanism and the representation of clinical concepts.

The addition of genome protein modality may also help in achieving better reasoning in LLMs. Given that genetic and protein information are critical for understanding disease processes, LLMs can aid in the analysis of large volumes of genomic data, making it possible to identify novel mutations involved in various disease processes. Therefore, incorporating genomic information into LLMs will enable a wider range of applications within the biomedical field.

Implementation Details

We follow the hyper-parameters as reported in the Github repo of Alpaca-LoRA and Alpaca:

Model Batch size Learning rate Epochs Max length Weight decay
Med-Alpaca-7B 128 2e-5 3 512 0
Med-Alpaca-7B-LoRA 128 1e-4 3 512 -

Hardware and Training Time:

Model CPU count GPU count GPU type Train time
Med-Alpaca-7B 128 4 NVIDIA A100-SXM4-80GB 2.51 hours
Med-Alpaca-7B-LoRA 8 1 NVIDIA GeForce RTX 3090 Ti 6.55 hours

Disclaimers

Visual Med-Alpaca, is intended for academic research purposes only. Any commercial or clinical use of the model is strictly prohibited. This decision is based on the License Agreement inherited from LLaMA, on which the model is built. Additionally, Visual Med-Alpaca is not legally approved for medical use in any country. Users should be aware of the model's limitations in terms of medical knowledge and the possibility of misinformation. Therefore, any reliance on Visual Med-Alpaca for medical decision-making is at the user's own risk.

Note: The developers and owners of the model, the Language Technology Lab at Cambridge University, do not assume any liability for the accuracy or completeness of the information provided by Visual Med-Alpaca, nor will they be responsible for any potential harm caused by the misuse of the model.

Acknowledgement

We are deeply grateful for the contributions made by open-source projects: LLaMA, Stanford Alpaca, Alpaca-LoRA, DePlot, BigBio, ROCO, Visual-ChatGPT, GenerativeImage2Text.

More Repositories

1

MTL-Bioinformatics-2016

Python
223
star
2

sapbert

[NAACL'21 & ACL'21] SapBERT: Self-alignment pretraining for BERT & XL-BEL: Cross-Lingual Biomedical Entity Linking.
Python
167
star
3

BioNLP-2016

Python
121
star
4

xcopa

XCOPA: A Multilingual Dataset for Causal Commonsense Reasoning
97
star
5

visual-spatial-reasoning

[TACL'23] VSR: A probing benchmark for spatial undersranding of vision-language models.
Python
92
star
6

mirror-bert

[EMNLP'21] Mirror-BERT: Converting Pretrained Language Models to universal text encoders without labels.
Python
75
star
7

composable-sft

A library for parameter-efficient and composable transfer learning for NLP with sparse fine-tunings.
Python
68
star
8

cometa

Corpus of Online Medical EnTities: the cometA corpus
Jupyter Notebook
46
star
9

autopeft

AutoPEFT: Automatic Configuration Search for Parameter-Efficient Fine-Tuning (Zhou et al.; TACL)
Python
42
star
10

parameter-factorization

Factorization of the neural parameter space for zero-shot multi-lingual and multi-task transfer
Python
39
star
11

ContrastiveBLI

Improving Word Translation via Two-Stage Contrastive Learning (ACL 2022). Keywords: Bilingual Lexicon Induction, Word Translation, Cross-Lingual Word Embeddings.
Python
32
star
12

PairS

Aligning with Human Judgement: The Role of Pairwise Preference in Large Language Model Evaluators (Liu et al.; arXiv preprint arXiv:2403.16950)
Python
32
star
13

link-prediction_with_deep-learning

Python
28
star
14

eva

[AAAI'21] Code release for "Visual Pivoting for (Unsupervised) Entity Alignment".
Python
25
star
15

mop

Codes for paper: Mixture-of-Partitions: Infusing Large Biomedical Knowledge Graphs into BERT
Python
24
star
16

python4cl

Introductory Python course for computational lingustics
Jupyter Notebook
23
star
17

adversarial-postspec

Auxiliary GAN for WE post-specialisation
Python
23
star
18

ClaPS

Survival of the Most Influential Prompts: Efficient Black-Box Prompt Search via Clustering and Pruning (Zhou et al.; EMNLP 2023 Findings)
Python
16
star
19

SIPHS

15
star
20

ACL2022_tutorial_multilingual_dialogue

Materials for "Natural Language Processing for Multilingual Task-Oriented Dialogue" Tutorial at ACL 2022
14
star
21

multi3woz

The official repository for Multi3WOZ: A Multilingual, Multi-Domain, Multi-Parallel Dataset for Training and Evaluating Culturally Adapted Task-Oriented Dialog Systems (TACL 2023)
Python
14
star
22

BLICEr

Improving Bilingual Lexicon Induction with Cross-Encoder Reranking (Findings of EMNLP 2022). Keywords: Bilingual Lexicon Induction, Word Translation, Cross-Lingual Word Embeddings.
Python
13
star
23

ECNMT

Emergent Communication Pretraining for Few-Shot Machine Translation
Python
13
star
24

multilabel-nn

Initializing neural networks for hierarchical multi-label text classification
Python
12
star
25

medlama

Python
12
star
26

post-specialisation

Post-Specialisation: Retrofitting Vectors of Words Unseen in Lexical Resources
Python
12
star
27

MirrorWiC

[CoNLL'21] MirrorWiC: On Eliciting Word-in-Context Representationsfrom Pretrained Language Models
Python
11
star
28

e2e_tod_toolkit

A codebase for e2e ToD toolkit.
Python
10
star
29

sw_study

Roff
9
star
30

nn_for_LBD

Repository for paper 'Neural networks for open and closed Literature-based Discovery'
Python
9
star
31

chat

Python
9
star
32

lionlbd

Source code for the LION LBD Tool
JavaScript
9
star
33

prompt4bli

On Bilingual Lexicon Induction with Large Language Models (EMNLP 2023). Keywords: Bilingual Lexicon Induction, Word Translation, Large Language Models, LLMs.
Python
9
star
34

zepo

Fairer Preferences Elicit Improved Human-Aligned Large Language Model Judgments (Zhou et al.)
Python
8
star
35

cancer-hallmark-cnn

Cancer hallmark CNN
Python
7
star
36

HELIN

Demo Entity Linking API for the HDR Text Analytic Team.
Python
7
star
37

COD

6
star
38

iso-study

Data sets and comparable Wikipedia samples used in our study on near-isomorphism between monolingual word embeddings
Python
6
star
39

hyperlex

HyperLex: a gold standard resource for measuring and evaluating how well semantic models capture graded or soft lexical entailment
5
star
40

ensembled-sicl

Python
4
star
41

RepEval-2016

Python
4
star
42

POSQA

Offical Repo of EMNLP Findings 2023 Paper: POSQA: Probe the World Models of LLMs with Size Comparisons
Python
4
star
43

bio-verbnet

Contains materials for BioVerbnet
4
star
44

panlex-bli

Bilingual lexicon induction (BLI) training and test sets extracted from PanLex - used in the work of Vulić et al. (EMNLP 2019)
4
star
45

bio-simverb

Python
4
star
46

bioverbnet

BioVerbNet: A large semantic-syntacticclassification of verbs in biomedicine
3
star
47

retrofitted-bio-embeddings

Bio word embeddings retrofitted to verb clusters
Python
3
star
48

mling_sdgms

Python
3
star
49

response_reranking

Code repository for Reranking Overgenerated Responses for End-to-End Task-Oriented Dialogue Systems (LREC-COLING 2024)
Python
3
star
50

sqatin

Code for Paper "SQATIN: Supervised Instruction Tuning Meets Question Answering for Improved Dialogue NLU". Published at NAACL-2024 (main conference)
Python
2
star
51

fs-wrep

Pretrained function-specific vectors (Gerz et al., ACL 2020)
2
star
52

xling-postspec

Cross-lingual Semantic Specialization via Lexical Relation Induction
Python
2
star
53

biocaster_2021

This is a public repo for codes and resources of BioCaster 2021: http://www.biocaster.org
Java
2
star
54

sail-bli

Self-Augmented In-Context Learning for Unsupervised Word Translation (ACL 2024). Keywords: Bilingual Lexicon Induction, Word Translation, Large Language Models, LLMs.
Python
1
star
55

bmip-2017-practical

BMIP 2017 practical
1
star
56

deductive_reasoning_probing

Jupyter Notebook
1
star
57

uniprotidmap

UniProt ID mappings
Python
1
star
58

bmip-2018

Resources for BMIP ticked practical
Python
1
star