• Stars
    star
    5,717
  • Rank 7,130 (Top 0.2 %)
  • Language
    Python
  • License
    GNU General Publi...
  • Created over 1 year ago
  • Updated 8 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

[ICLR 2024] Fine-tuning LLaMA to follow Instructions within 1 Hour and 1.2M Parameters

LLaMA-Adapter: Efficient Fine-tuning of LLaMA 🚀

Official implementation of 'LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attention' and 'LLaMA-Adapter V2: Parameter-Efficient Visual Instruction Model'.


This repo proposes LLaMA-Adapter (V2), a lightweight adaption method for fine-tuning Instruction-following and Multi-modal LLaMA models 🔥.

Try out the web demo 🤗 of LLaMA-Adapter: Hugging Face Spaces, LLaMA-Adapter V2 and ImageBind-LLM.

News

  • [2023.06.08] We release the demo of ImageBind-LLM 🔥🔥🔥.
  • [2023.06.06] We release Point-Bind 🔥🔥🔥 to extend ImageBind with 3D point clouds, which achieves 3D instruction-following capacity for imagebind_LLM.
  • [2023.06.05] We support the integration of LLaMA-Adapter (both V1 and V2) and LangChain. Check out the Notebook.
  • [2023.05.29] We release the code of ImageBind-LLM at imagebind_LLM 🔥🔥🔥.
  • [2023.05.23] We release the demos and multi-modal code of LLaMA-Adapter V2!
  • [2023.05.05] We release the paper and code of our new work Personalize Segment Anything 🔥🔥🔥, which efficiently fine-tunes Segment Anything with 10 seconds, and improves DreamBooth for better text-to-image generation.
  • [2023.04.30] We noticed that GPT-4 evaluation has a strong positional bias in favor of the first response. We will soon update the paper to reveal the position bias. Great thanks to Canwen Xu.
  • [2023.04.28] We release LLaMA-Adapter V2, a multi-modal instruction model. Check out our paper, demos and code!
  • [2023.03.28] The paper and training code for LLaMA-Adapter V1 are released. 📌

Demos (LLaMA-Adapter V2)

-> 📺 YouTube Video

YouTube Video

-> Chatbot System

-> Visual Instruction Model

Overview

Efficiency Comparison:

Model Parameters Storage Space Training Time
Alpaca 7B 13G 3 Hours
LLaMA-Adapter 1.2M 4.7M 1 Hour

By inserting adapters into LLaMA's transformer, our method only introduces 1.2M learnable parameters, and turns a LLaMA into an instruction-following model within 1 hour. For stablizing training at early stages, we propose a novel Zero-init Attention with zero gating mechanism to adaptively incorporate the instructional signals. After fine-tuning, LLaMA-Adapter can generate high-quality instruction-following sentences, comparable to the fully fine-tuned Stanford Alpaca and Alpaca-Lora.

Our approach can be simply extended to Multi-modal Input Instructions. The reasoning framework of image-conditioned LLaMA-Adapter for ScienceQA is as follows, which is also shared by other modalities, such as audio and video.

Setup

Here is a from-scratch script for LLaMA-Adapter V1.

conda create -n llama_adapter -y python=3.8
conda activate llama_adapter

# install pytorch
conda install pytorch cudatoolkit -c pytorch -y

# install dependency and llama-adapter
pip install -r requirements.txt
pip install -e .

Note: To setup other models, please refer to llama_adapter_v2_chat65b, llama_adapter_v2_multimodal and imagebind_LLM for more details.

Inference

Please request access to the pre-trained LLaMA from this form (official) or download the LLaMA-7B from Hugging Face (unofficial). Then, obtain the weights of our LLaMA-Adapter from here. We denote the path to the downloaded weights of LLaMA and adapters as TARGET_FOLDER and ADAPTER_PATH.

Here is an example to generate instruction-following sentences with 7B LLaMA model and our LLaMA-Adapter:

torchrun --nproc_per_node 1 example.py \
         --ckpt_dir $TARGET_FOLDER/model_size\
         --tokenizer_path $TARGET_FOLDER/tokenizer.model \
         --adapter_path $ADAPTER_PATH

Training

We release the simple fine-tuning code of LLaMA-Adapter on LLaMA-7B model at here, which is for effortless reproduction with minimal dependencies. We will soon release the fine-tuning code for LLaMA-65B and multi-model LLaMA-Adapter.

Please download the 52K instruction-following training data from Standford Alpaca, and put it under DATA_PATH. Then run:

cd alpaca_finetuning_v1

torchrun --nproc_per_node 8 finetuning.py \
         --model Llama7B_adapter \
         --llama_model_path $TARGET_FOLDER/ \
         --data_path $DATA_PATH/alpaca_data.json \
         --adapter_layer 30 \
         --adapter_len 10 \
         --max_seq_len 512 \
         --batch_size 4 \
         --epochs 5 \
         --warmup_epochs 2 \
         --blr 9e-3 \
         --weight_decay 0.02 \
         --output_dir ./checkpoint/

Comparison with Other Methods

Instruction 1:

How can I develop my critical thinking skills? Give me three suggestions.

  • LLaMA-Adapter V2:
  1. Read widely: Reading books, articles, and other sources of information can help you develop your critical thinking skills. By exposing yourself to different perspectives and ideas, you can learn to analyze and evaluate information more effectively.

  2. Practice mindfulness: Mindfulness is the practice of being present in the moment and paying attention to your thoughts and feelings. By being more aware of your thoughts and emotions, you can better understand and evaluate them. This can help you develop a more objective and rational approach to thinking.

  3. Engage in open-minded discussions: Join a discussion group or debate club where you can engage in open-minded discussions with others. This can help you develop your critical thinking skills by challenging your assumptions and encouraging you to consider different perspectives.

  • LLaMA-Adapter V1:
  1. Read widely and critically.
  2. Engage in discussions with people who have different perspectives.
  3. Ask questions and seek out answers.

Instruction 2:

As a superhero, how would you explain your origin story to a curious child?

  • LLaMA-Adapter V2:

As a superhero, I have a very interesting and exciting origin story to share with you. Once upon a time, there was a young boy named John who lived in a small town. John was always curious and wanted to help others, so he spent his days exploring the world around him and learning about different cultures and people. One day, while he was out exploring, he stumbled upon a mysterious cave. Inside the cave, John found a strange device that had the power to give him incredible abilities. With these abilities, John could fly, shoot lasers from his eyes, and even turn invisible.

John knew that he had to use these powers for good, so he decided to become a superhero and use his abilities to protect his town and the people who lived there. He started to train every day, learning how to use his powers and how to fight against evil. Eventually, John became the superhero known as "The Invisible Man" and he used his powers to help others and make the world a better place.

That's my origin story, and I'm sure you'll love it!

  • LLaMA-Adapter V1:

Once upon a time, there was a young boy who had a special power. He used his power to help people in need and protect the innocent. He became known as the superhero, and he used his powers to fight evil and protect the world.

Instruction 3:

Tell me about alpacas.

  • LLaMA-Adapter: Alpacas are members of the camelid family and are native to the Andes Mountains of South America. They are typically found in herds of 10-20 animals and are known for their gentle nature and their ability to thrive in harsh climates. Alpacas are typically white or grey in color and have long, silky coats. They are also known for their ability to produce high-quality wool, which is used in the production of clothing and other products.
OTHER METHODS
  • Alpaca-LoRA: Alpacas are members of the camelid family and are native to the Andes Mountains of South America. They are known for their soft, luxurious fleece, which is used to make clothing, blankets, and other items. Alpacas are herbivores and graze on grasses and other plants. They are social animals and live in herds of up to 20 individuals.

  • Stanford Alpaca: Alpacas are small, fluffy animals related to camels and llamas. They are native to Peru and Bolivia, and were first domesticated around 5,000 years ago. They are kept mainly for their fine, soft fleece, which is used to make knitwear and other garments. Alpacas are herd animals and live in small family groups, led by an older male. They are highly social animals and can form strong bonds with their owners.

  • text-davinci-003: Alpacas are a domesticated species of South American camelid, similar to llamas. They are native to the Andes Mountains in South America and are kept as livestock for their soft, luxurious wool. Alpacas are social animals and live in herds of up to 20 individuals. They are typically kept in small herds of two to five animals, and are relatively easy to care for. Alpacas are herbivores and feed on grass, hay, and other vegetation. They are also known for their gentle and friendly nature, making them popular as pets.


Instruction 4:

Tell me about the president of Mexico in 2019.

  • LLaMA-Adapter: The president of Mexico in 2019 was Andrés Manuel López Obrador, who took office on December 1st, 2018. He is a member of the left-wing party Morena and is known for his populist policies. He has focused on tackling corruption, reducing poverty, and improving the country's infrastructure. He has also been a vocal critic of the United States' immigration policies.
OTHER METHODS
  • Alpaca-LoRA: The president of Mexico in 2019 was Andrés Manuel López Obrador, who took office on December 1, 2018. He is a member of the National Regeneration Movement (MORENA) political party and is the first left-wing president of Mexico since 1946. He is known for his anti-corruption and anti-neoliberal policies, as well as his commitment to improving the living conditions of the Mexican people.

  • Stanford Alpaca: In 2019, Andres Manuel Lopez Obrador was sworn in as President of Mexico. He is the country’s 58th president and the first from the left-wing Morena party. He was elected in 2018 on a promise to combat corruption and reduce poverty and has faced criticism for his handling of the economy, crime and the country’s relationship with the United States. He is also known for his ambitious social welfare programs and has been praised for raising the minimum wage and providing aid to low-income families. He is seen as a champion of democracy and human rights in Mexico.

  • text-davinci-003: The president of Mexico in 2019 is Andrés Manuel López Obrador, commonly known as AMLO. He is a Mexican politician who has been the President of Mexico since December 1, 2018. He is a member of the National Regeneration Movement (MORENA) party and previously served as the Head of Government of Mexico City from 2000 to 2005. He is a left-wing populist and has promised to reduce corruption and inequality in Mexico. He has also promised to increase public spending on social programs and infrastructure projects.


Instruction 5:

Write a Python program that prints the first 10 Fibonacci numbers.

  • LLaMA-Adapter:
def fibonacci():
    a = 0
    b = 1
    for i in range(10):
        a, b = b, a + b
        print(a, b)
    print("Fibonacci numbers: ", a, b)

fibonacci()
OTHER METHODS
  • Alpaca-LoRA:
def fibonacci(n):
    if n == 0:
        return 0
    elif n == 1:
        return 1
    else:
        return fibonacci(n-1) + fibonacci(n-2)

print(fibonacci(10))
  • Stanford Alpaca:
numbers = [0, 1]

for i in range(2, 11):
    numbers.append(numbers[i-2] + numbers[i-1])

print(numbers)
# Output: [0, 1, 1, 2, 3, 5, 8, 13, 21, 34]
  • text-davinci-003:
# Fibonacci numbers are the numbers in the following integer sequence, called the Fibonacci sequence, and characterized by the fact that every number after the first two is the sum of the two preceding ones.

# Fibonacci sequence: 0, 1, 1, 2, 3, 5, 8, 13, 21, 34, ...

# Program:

# Initialize the first two numbers of the Fibonacci sequence
a = 0
b = 1

# Print the first 10 numbers of the Fibonacci sequence
print(a)
print(b)

for i in range(2,10):
    c = a + b
    a = b
    b = c
    print(c)

Instruction 6:

Write a program that prints the numbers from 1 to 100. But for multiples of three print 'Fizz' instead of the number and for the multiples of five print 'Buzz'. For numbers which are multiples of both three and five print 'FizzBuzz'.

  • LLaMA-Adapter:
def print_numbers(n):
    for i in range(1, 101):
        if i % 3 == 0 and i % 5 == 0:
            print("FizzBuzz")
        elif i % 3 == 0:
            print("Fizz")
        elif i % 5 == 0:
            print("Buzz")
        else:
            print(i)
OTHER METHODS
  • Alpaca-LoRA:
for i in range(1, 101):
    if i % 3 == 0 and i % 5 == 0:
        print("FizzBuzz")
    elif i % 3 == 0:
        print("Fizz")
    elif i % 5 == 0:
        print("Buzz")
    else:
        print(i)
  • Stanford Alpaca:
for i in range(1, 101):
    if i % 3 == 0 and i % 5 == 0:
        print('FizzBuzz')
    elif i % 3 == 0:
        print('Fizz')
    elif i % 5 == 0:
        print('Buzz')
    else:
        print(i)
  • text-davinci-003:
for (let i = 1; i <= 100; i++) {
  if (i % 3 === 0 && i % 5 === 0) {
    console.log("FizzBuzz");
  } else if (i % 3 === 0) {
    console.log("Fizz");
  } else if (i % 5 === 0) {
    console.log("Buzz");
  } else {
    console.log(i);
  }
}

Contributors

Peng Gao, Jiaming Han, Renrui Zhang, Aojun Zhou

Hiring Announcement

🔥 We are hiring interns, postdocs, and full-time researchers at the General Vision Group, Shanghai AI Lab, with a focus on multi-modality and vision foundation models. If you are interested, please contact [email protected].

Citation

If you find our LLaMA-Adapter code and paper useful, please kindly cite:

@article{zhang2023llamaadapter,
  title = {LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attention},
  author={Zhang, Renrui and Han, Jiaming and Liu, Chris and Gao, Peng and Zhou, Aojun and Hu, Xiangfei and Yan, Shilin and Lu, Pan and Li, Hongsheng and Qiao, Yu},
  journal={arXiv preprint arXiv:2303.16199},
  year={2023}
}

If you find our LLaMA-Adapter V2 code and paper useful, please kindly cite:

@article{gao2023llamaadapterv2,
  title = {LLaMA-Adapter V2: Parameter-Efficient Visual Instruction Model},
  author={Gao, Peng and Han, Jiaming and Zhang, Renrui and Lin, Ziyi and Geng, Shijie and Zhou, Aojun and Zhang, Wei and Lu, Pan and He, Conghui and Yue, Xiangyu and Li, Hongsheng and Qiao, Yu},
  journal={arXiv preprint arXiv:2304.15010},
  year={2023}
}

Acknowledgement

This repo benefits from LLaMA, Stanford Alpaca, and Alpaca-Lora. Thanks for their wonderful works.

More Repositories

1

InternVL

[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型
Python
5,753
star
2

DragGAN

Unofficial Implementation of DragGAN - "Drag Your GAN: Interactive Point-based Manipulation on the Generative Image Manifold" (DragGAN 全功能实现,在线Demo,本地部署试用,代码、模型已全部开源,支持Windows, macOS, Linux)
Python
4,996
star
3

InternGPT

InternGPT (iGPT) is an open source demo platform where you can easily showcase your AI models. Now it supports DragGAN, ChatGPT, ImageBind, multimodal chat like GPT-4, SAM, interactive image editing, etc. Try it at igpt.opengvlab.com (支持DragGAN、ChatGPT、ImageBind、SAM的在线Demo系统)
Python
3,198
star
4

Ask-Anything

[CVPR2024 Highlight][VideoChatGPT] ChatGPT with video understanding! And many more supported LMs such as miniGPT4, StableLM, and MOSS.
Python
2,984
star
5

InternImage

[CVPR 2023 Highlight] InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions
Python
2,502
star
6

InternVideo

[ECCV2024] Video Foundation Models & Data for Multimodal Understanding
Python
1,392
star
7

VisionLLM

VisionLLM Series
Python
874
star
8

VideoMamba

[ECCV2024] VideoMamba: State Space Model for Efficient Video Understanding
Python
787
star
9

OmniQuant

[ICLR2024 spotlight] OmniQuant is a simple and powerful quantization technique for LLMs.
Python
691
star
10

VideoMAEv2

[CVPR 2023] VideoMAE V2: Scaling Video Masked Autoencoders with Dual Masking
Python
486
star
11

DCNv4

[CVPR 2024] Deformable Convolution v4
Python
463
star
12

all-seeing

[ICLR 2024 & ECCV 2024] The All-Seeing Projects: Towards Panoptic Visual Recognition&Understanding and General Relation Comprehension of the Open World"
Python
452
star
13

GITM

Ghost in the Minecraft: Generally Capable Agents for Open-World Environments via Large Language Models with Text-based Knowledge and Memory
445
star
14

Multi-Modality-Arena

Chatbot Arena meets multi-modality! Multi-Modality Arena allows you to benchmark vision-language models side-by-side while providing images as inputs. Supports MiniGPT-4, LLaMA-Adapter V2, LLaVA, BLIP-2, and many more!
Python
428
star
15

Vision-RWKV

Vision-RWKV: Efficient and Scalable Visual Perception with RWKV-Like Architectures
Python
352
star
16

CaFo

[CVPR 2023] Prompt, Generate, then Cache: Cascade of Foundation Models makes Strong Few-shot Learners
Python
344
star
17

PonderV2

PonderV2: Pave the Way for 3D Foundation Model with A Universal Pre-training Paradigm
Python
311
star
18

LAMM

[NeurIPS 2023 Datasets and Benchmarks Track] LAMM: Multi-Modal Large Language Models and Applications as AI Agents
Python
296
star
19

UniFormerV2

[ICCV2023] UniFormerV2: Spatiotemporal Learning by Arming Image ViTs with Video UniFormer
Python
280
star
20

unmasked_teacher

[ICCV2023 Oral] Unmasked Teacher: Towards Training-Efficient Video Foundation Models
Python
276
star
21

OmniCorpus

OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text
Python
259
star
22

HumanBench

This repo is official implementation of HumanBench (CVPR2023)
Python
231
star
23

Instruct2Act

Instruct2Act: Mapping Multi-modality Instructions to Robotic Actions with Large Language Model
Python
223
star
24

EfficientQAT

EfficientQAT: Efficient Quantization-Aware Training for Large Language Models
Python
198
star
25

gv-benchmark

General Vision Benchmark, GV-B, a project from OpenGVLab
Python
189
star
26

ControlLLM

ControlLLM: Augment Language Models with Tools by Searching on Graphs
Python
181
star
27

InternVideo2

152
star
28

UniHCP

Official PyTorch implementation of UniHCP
Python
149
star
29

efficient-video-recognition

Python
114
star
30

SAM-Med2D

Official implementation of SAM-Med2D
Jupyter Notebook
114
star
31

EgoVideo

[CVPR 2024 Champions] Solutions for EgoVis Chanllenges in CVPR 2024
Jupyter Notebook
103
star
32

DiffRate

[ICCV 23]An approach to enhance the efficiency of Vision Transformer (ViT) by concurrently employing token pruning and token merging techniques, while incorporating a differentiable compression rate.
Jupyter Notebook
86
star
33

MMT-Bench

ICML'2024 | MMT-Bench: A Comprehensive Multimodal Benchmark for Evaluating Large Vision-Language Models Towards Multitask AGI
Python
85
star
34

Awesome-DragGAN

Awesome-DragGAN: A curated list of papers, tutorials, repositories related to DragGAN
75
star
35

MM-NIAH

This is the official implementation of the paper "Needle In A Multimodal Haystack"
Python
70
star
36

M3I-Pretraining

69
star
37

STM-Evaluation

Python
69
star
38

MUTR

[AAAI 2024] Referred by Multi-Modality: A Unified Temporal Transformers for Video Object Segmentation
Python
65
star
39

LCL

Vision Model Pre-training on Interleaved Image-Text Data via Latent Compression Learning
Python
63
star
40

ChartAst

ChartAssistant is a chart-based vision-language model for universal chart comprehension and reasoning.
Python
60
star
41

LORIS

Long-Term Rhythmic Video Soundtracker, ICML2023
Python
54
star
42

DDPS

Official Implementation of "Denoising Diffusion Semantic Segmentation with Mask Prior Modeling"
Python
53
star
43

Awesome-LLM4Tool

A curated list of the papers, repositories, tutorials, and anythings related to the large language models for tools
52
star
44

PIIP

NeurIPS 2024 Spotlight ⭐️ Parameter-Inverted Image Pyramid Networks (PIIP)
Python
51
star
45

InternVL-MMDetSeg

Train InternViT-6B in MMSegmentation and MMDetection with DeepSpeed
Jupyter Notebook
50
star
46

GUI-Odyssey

GUI Odyssey is a comprehensive dataset for training and evaluating cross-app navigation agents. GUI Odyssey consists of 7,735 episodes from 6 mobile devices, spanning 6 types of cross-app tasks, 201 apps, and 1.4K app combos.
Python
47
star
47

Siamese-Image-Modeling

[CVPR 2023]Implementation of Siamese Image Modeling for Self-Supervised Vision Representation Learning
Python
33
star
48

De-focus-Attention-Networks

Learning 1D Causal Visual Representation with De-focus Attention Networks
Python
28
star
49

Multitask-Model-Selector

Implementation of Foundation Model is Efficient Multimodal Multitask Model Selector
Python
27
star
50

Official-ConvMAE-Det

Python
13
star
51

perception_test_iccv2023

Champion Solutions repository for Perception Test challenges in ICCV2023 workshop.
Python
13
star
52

opengvlab.github.io

12
star
53

MovieMind

9
star
54

EmbodiedGPT

5
star
55

DriveMLM

3
star
56

.github

2
star