• Stars
    star
    592
  • Rank 75,570 (Top 2 %)
  • Language
    Jupyter Notebook
  • License
    Apache License 2.0
  • Created 11 months ago
  • Updated 4 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

[CVPR 2024] LMDrive: Closed-Loop End-to-End Driving with Large Language Models

LMDrive: Closed-Loop End-to-End Driving with Large Language Models

An end-to-end, closed-loop, language-based autonomous driving framework, which interacts with the dynamic environment via multi-modal multi-view sensor data and natural language instructions.

[Project Page] [Paper] [Dataset(hugging face)] [Model Zoo]

[Dataset(OpenXlab)] [Model Zoo(OpenXLab)]

Hits Code License Data License

News


Hao Shao, Yuxuan Hu, Letian Wang, Steven L. Waslander, Yu Liu, Hongsheng Li.

This repository contains code for the paper LMDrive: Closed-Loop End-to-End Driving with Large Language Models. This work proposes a novel language-guided, end-to-end, closed-loop autonomous driving framework.

Demo Video

demo_video.mp4

Contents

  1. Setup
  2. Model Weights
  3. Dataset
    1. Overview
    2. Data Generation
    3. Data Pre-procession
    4. Data Parsing
  4. Training
    1. Vision encoder pre-training
    2. Instruction finetuning
  5. Evaluation
  6. Citation
  7. Acknowledgements

Setup

Our project is built on three parts: (1) vision encoder (corresponding repo: timm); (2) vision LLM (corresponding repo: LAVIS); (3) data collection, agent controller (corresponding repo: InterFuser, Leaderboard, ScenarioRunner).

Install anaconda

wget https://repo.anaconda.com/archive/Anaconda3-2020.11-Linux-x86_64.sh
bash Anaconda3-2020.11-Linux-x86_64.sh
source ~/.bashrc

Clone the repo and build the environment

git clone https://github.com/opendilab/LMDrive.git
cd LMDrive
conda create -n lmdrive python=3.8
conda activate lmdrive
cd vision_encoder
pip3 install -r requirements.txt
python setup.py develop # if you have installed timm before, please uninstall it
cd ../LAVIS
pip3 install -r requirements.txt
python setup.py develop # if you have installed LAVIS before, please uninstall it

pip install flash-attn --no-build-isolation # optional

Download and setup CARLA 0.9.10.1

chmod +x setup_carla.sh
./setup_carla.sh
pip install carla

If you encounter some problems related to Carla, please refer to Carla Issues and InterFuser Issues first.

LMDrive Weights

If you are interested in including any other details in Model Zoo, please open an issue :)

Version Size Checkpoint VisionEncoder LLM-base DS (LangAuto) DS (LangAuto-short)
LMDrive-1.0 (LLaVA-v1.5-7B) 7B LMDrive-llava-v1.5-7b-v1.0 R50 LLaVA-v1.5-7B 36.2 50.6
LMDrive-1.0 (Vicuna-v1.5-7B) 7B LMDrive-vicuna-v1.5-7b-v1.0 R50 Vicuna-v1.5-7B 33.5 45.3
LMDrive-1.0 (LLaMA-7B) 7B LMDrive-llama-7b-v1.0 R50 LLaMA-7B 31.3 42.8

DS denotes the driving score

Dataset

We aim to develop an intelligent driving agent that can generate driving actions based on three sources of input: 1) sensor data (multi-view camera and LiDAR), so that the agent can generate actions that are aware of and compliant with the current scene; 2) navigation instructions (e.g. lane changing, turning), so that the agent can drive to meet the requirement in natural language (instruction from humans or navigation software); and 3) human notice instruction, so that the agent can interact with humans and adapt to human's suggestions and preferences (e.g. pay attention to adversarial events, deal with long-tail events, etc).

We provide a dataset with about 64K data clips, where each clip includes one navigation instruction, several notice instructions, a sequence of multi-modal multi-view sensor data, and control signals. The duration of the clip spans from 2 to 20 seconds. The dataset used in our paper can be downloaded here. If you want to create your own dataset, please follow the steps we've outlined below.

Overview

The data is generated with leaderboard/team_code/auto_pilot.py in 8 CARLA towns using the routes and scenarios files provided at leaderboard/data on CARLA 0.9.10.1 . The dataset is collected at a high frequency (~10Hz).

Once you have downloaded our dataset or collected your own dataset, it's necessary to systematically organize the data as follows. DATASET_ROOT is the root directory where your dataset is stored.

├── $DATASET_ROOT
│   └── dataset_index.txt  # for vision encoder pretraining
│   └── navigation_instruction_list.txt  # for instruction finetuning
│   └── notice_instruction_list.json  # for instruction finetuning
│   └── routes_town06_long_w7_11_28_18_28_35  #  data folder
│   └── routes_town01_short_w2_11_16_08_27_10
│   └── routes_town02_short_w2_11_16_22_55_25
│   └── routes_town01_short_w2_11_16_11_44_08 
      ├── rgb_full
      ├── lidar
      └── ...

The navigation_instruction_list.txt and notice_instruction_list.txt can be generated with our scripts by the data parsing scripts. Each subfolder in the dataset you've collected should be structured as follows:

- routes_town(town_id)_{tiny,short,long}_w(weather_id)_timestamp: corresponding to different towns and routes files
    - routes_X: contains data for an individual route
        - rgb_full: a big multi-view camera image at 400x1200 resolution, which can be split into four images (left, center, right, rear)
        - lidar: 3d point cloud in .npy format. It only includes the LiDAR points captured in 1/20 second, covering 180 degrees of horizontal view. So if you want to utilize 360 degrees of view, you need to merge it with the data from lidar_odd.
        - lidar_odd: 3d point cloud in .npy format.
        - birdview: topdown segmentation images, LAV and LBC used this type of data for training
        - topdown: similar to birdview but it's captured by the down-facing camera
        - 3d_bbs: 3d bounding boxes for different agents
        - affordances: different types of affordances
        - actors_data: contains the positions, velocities and other metadata of surrounding vehicles and the traffic lights
        - measurements: contains ego agent's position, velocity, future waypoints, and other metadata
        - measurements_full: merges measurement and actors_data
        - measurements_all.json: merges the files in measurement_full into a single file

The $DATASET_ROOT directory must contain a file named dataset_index.txt, which can be generated by our data pre-processing script. It should list the training and evaluation data in the following format:

<relative_route_path_dir> <num_data_frames_in_this_dir>
routes_town06_long_w7_11_28_18_28_35/ 1062
routes_town01_short_w2_11_16_08_27_10/ 1785
routes_town01_short_w2_11_16_09_55_05/ 918
routes_town02_short_w2_11_16_22_55_25/ 134
routes_town01_short_w2_11_16_11_44_08/ 569

Here, <relative_route_path_dir> should be a relative path to the $DATASET_ROOT. The training code will concatenate the $DATASET_ROOT and <relative_route_path_dir> to create the full path for loading the data. In this format, 1062 represents the number of frames in the routes_town06_long_w7_11_28_18_28_35/rgb_full directory or routes_town06_long_w7_11_28_18_28_35/lidar, etc.

Data Generation

Data Generation with multiple CARLA Servers

In addition to the dataset, we have also provided all the scripts used for generating data and these can be modified as required for different CARLA versions. The dataset is collected by a rule-based expert agent in different weathers and towns.

Running CARLA Servers
# Start 4 carla servers: ip [localhost], port [2000, 2002, 2004, 2006]. You can adjust the number of CARLA servers according to your situation and more servers can collect more data. If you use N servers to collect data, it means you have collected data N times on each route, except that the weather and traffic scenarios are random each time.

cd carla
CUDA_VISIBLE_DEVICES=0 ./CarlaUE4.sh --world-port=2000 -opengl &
CUDA_VISIBLE_DEVICES=1 ./CarlaUE4.sh --world-port=2002 -opengl &
CUDA_VISIBLE_DEVICES=2 ./CarlaUE4.sh --world-port=2004 -opengl &
CUDA_VISIBLE_DEVICES=3 ./CarlaUE4.sh --world-port=2006 -opengl &

Instructions for setting up docker are available here. Pull the docker image of CARLA 0.9.10.1 docker pull carlasim/carla:0.9.10.1.

Docker 18:

docker run -it --rm -p 2000-2002:2000-2002 --runtime=nvidia -e NVIDIA_VISIBLE_DEVICES=0 carlasim/carla:0.9.10.1 ./CarlaUE4.sh --world-port=2000 -opengl

Docker 19:

docker run -it --rm --net=host --gpus '"device=0"' carlasim/carla:0.9.10.1 ./CarlaUE4.sh --world-port=2000 -opengl

If the docker container doesn't start properly then add another environment variable -e SDL_AUDIODRIVER=dsp.

Run the Autopilot

Generate scripts for collecting data in batches.

cd dataset
python init_dir.py
cd ..
cd data_collection
python generate_yamls.py # You can modify FPS, waypoints distribution strength ...

# If you do not use 4 servers, the following Python scripts are needed to modify
python generate_bashs.py
python generate_batch_collect.py 
cd ..

Run batch-run scripts of the town and route type that you need to collect.

bash data_collection/batch_run/run_route_routes_town01_long.sh
bash data_collection/batch_run/run_route_routes_town01_short.sh
...
bash data_collection/batch_run/run_route_routes_town07_tiny.sh
...
bash data_collection/batch_run/run_route_routes_town10_tiny.sh

Note: Our scripts will use a random weather condition for data collection

Data Generation with a single CARLA Server

With a single CARLA server, roll out the autopilot to start data generation.

carla/CarlaUE4.sh --world-port=2000 -opengl
./leaderboard/scripts/run_evaluation.sh

The expert agent used for data generation is defined in leaderboard/team_code/auto_pilot.py. Different variables which need to be set are specified in leaderboard/scripts/run_evaluation.sh.

Data Pre-procession

We provide some Python scripts for pre-processing the collected data in tools/data_preprocessing, some of them are optional. Please execute them in the order:

  1. python get_list_file.py $DATASET_ROOT: obtain the dataset_list.txt.
  2. python batch_merge_data.py $DATASET_ROOT: merge several scattered data files into one file to reduce IO time when training. [Optional]
  3. python batch_rm_rgb_data.py $DATASET_ROOT: delete redundant files after we have merged them into new files. [Optional]
  4. python batch_stat_blocked_data.py $DATASET_ROOT: find the frames that the ego-vehicle is blocked for a long time. By removing them, we can enhance data distribution and decrease the overall data size.
  5. python batch_rm_blocked_data.py $DATASET_ROOT: delete the blocked frames.
  6. python batch_recollect_data.py $DATASET_ROOT: since we have removed some frames, we need to reorganize them to ensure that the frame ids are continuous.
  7. python batch_merge_measurements.py $DATASET_ROOT: merge the measurement files from all frames in one route folder to reduce IO time

Data Parsing

After collecting and pre-processing the data, we need to parse the navigation instructions and notice instructions data with some Python scripts in tools/data_parsing.

The script for parsing navigation instructions:

python3 parse_instruction.py $DATSET_ROOT 

The parsed navigation clips will be saved in $DATSET_ROOT/navigation_instruction_list.txt, under the root directory of the dataset.

The script for parsing notice instructions:

python3 parse_notice.py $DATSET_ROOT 

The parsed notice clips will be saved in $DATSET_ROOT/notice_instruction_list.txt.

The script for parsing misleading instructions:

python3 parse_misleading.py $DATSET_ROOT 

The parsed misleading clips will be saved in $DATSET_ROOT/misleading_data.txt.

Training

LMDrive's training consists of two stages: 1) the vision encoder pre-training stage, to generate visual tokens from sensor inputs; and 2) the instruction-finetuning stage, to align the instruction/vision and control signal.

LMDrive is trained on 8 A100 GPUs with 80GB memory (the first stage can be trained on GPUS with 32G memory). To train on fewer GPUs, you can reduce the batch-size and the learning-rate while maintaining their proportion. Please download the multi-modal dataset with instructions collected in the CARLA simulator we use in the paper here or openxlab (uploading)], if you do not collect the dataset by yourself. You can only download part of them to verify our framework or your improvement.

Vision encoder pre-training

Pretrain takes around 2~3 days for the visual encoder on 8x A100 (80G). Once the training is completed, you can locate the checkpoint of the vision encoder in the output/ directory.

cd vision_encoder
bash scripts/train.sh

Some options to note:

  • GPU_NUM: the number of GPUs you want to use. By default, it is set to 8.
  • DATASET_ROOT: the root directory for storing the dataset.
  • --model: the structure of visual model. You can choose memfuser_baseline_e1d3_r26 which replaces ResNet50 with ResNet26. It's also possible to create new model variants in visual_encoder/timm/models/memfuser.py
  • --train-towns/train-weathers: the data filter for the training dataset. Similarly, there are corresponding options, val-towns/val-weathers to filter the validation dataset accordingly.

Instruction finetuning

Instruction finetuning takes around 2~3 days for the visual encoder on 8x A100 (80G). Once the training is completed, you can locate the checkpoint of the adapter and qformer in the lavis/output/ directory.

cd LAVIS
bash run.sh 8 lavis/projects/lmdrive/notice_llava15_visual_encoder_r50_seq40.yaml # 8 is the GPU number

Some options in the config.yaml to note:

  • preception_model: the model architecture of the vision encoder.
  • preception_model_ckpt: the checkpoint path of the vision encoder.
  • llm_model: the checkpoint path of the llm (Vicuna/LLaVA).
  • use_notice_prompt: whether to use notice instruction data when training.
  • split_section_num_for_visual_encoder: the number of sections the frames are divided into during the forward encoding of visual features. Higher values can save more memory, and it needs to be a factor of token_max_length.
  • datasets:
    • storage: the root directory for storing the dataset.
    • towns/weathers: the data filter for training/evaluating.
    • token_max_length: the maximum number of frames, if the number of frames exceeds this value, they will be truncated.
    • sample_interval: the interval at which frames are sampled.

Evaluation

Start a CARLA server (described above) and run the required agent. The adequate routes and scenarios files are provided in leaderboard/data and the required variables need to be set in leaderboard/scripts/run_evaluation.sh.

Some options need to be updated in the leaderboard/team_code/lmdrive_config.py:

  • preception_model: the model architecture of the vision encoder.
  • preception_model_ckpt: the checkpoint path of the vision encoder (obtained in the vision encoder pretraining stage).
  • llm_model: the checkpoint path of the llm (LLaMA/Vicuna/LLaVA).
  • lmdrive_ckpt: the checkpoint path of the lmdrive (obtained in the instruction finetuing stage).

Update leaderboard/scripts/run_evaluation.sh to include the following code for evaluating the model on Town05 Long Benchmark.

export CARLA_ROOT=/path/to/carla/root
export TEAM_AGENT=leaderboard/team_code/lmdrive_agent.py
export TEAM_CONFIG=leaderboard/team_code/lmdrive_config.py
export CHECKPOINT_ENDPOINT=results/lmdrive_result.json
export SCENARIOS=leaderboard/data/official/all_towns_traffic_scenarios_public.json
export ROUTES=leaderboard/data/LangAuto/long.xml
CUDA_VISIBLE_DEVICES=0 ./leaderboard/scripts/run_evaluation.sh

Here, the long.json and long.xml files are replaced with short.json and short.xml for the evaluation of the agent in the LangAuto-Short benchmark.

For LangAuto-Tiny benchmark evaluation, replace the long.json and long.xml files with tiny.json and tiny.xml:

export SCENARIOS=leaderboard/data/LangAuto/tiny.json
export ROUTES=leaderboard/data/LangAuto/tiny.xml

LangAuto-Notice

Set the agent_use_notice as True in the lmdriver_config.py.

Citation

If you find our repo, dataset or paper useful, please cite us as

@misc{shao2023lmdrive,
      title={LMDrive: Closed-Loop End-to-End Driving with Large Language Models}, 
      author={Hao Shao and Yuxuan Hu and Letian Wang and Steven L. Waslander and Yu Liu and Hongsheng Li},
      year={2023},
      eprint={2312.07488},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

Acknowledgements

This implementation is based on code from several repositories.

License

All code within this repository is under Apache License 2.0.

More Repositories

1

awesome-RLHF

A curated list of reinforcement learning with human feedback resources (continually updated)
3,262
star
2

DI-engine

OpenDILab Decision AI Engine. The Most Comprehensive Reinforcement Learning Framework B.P.
Python
3,041
star
3

PPOxFamily

PPO x Family DRL Tutorial Course(决策智能入门级公开课:8节课帮你盘清算法理论,理顺代码逻辑,玩转决策AI应用实践 )
Python
1,875
star
4

DI-star

An artificial intelligence platform for the StarCraft II with large-scale distributed training and grand-master agents.
Python
1,215
star
5

LightZero

[NeurIPS 2023 Spotlight] LightZero: A Unified Benchmark for Monte Carlo Tree Search in General Sequential Decision Scenarios (awesome MCTS)
Python
1,097
star
6

awesome-model-based-RL

A curated list of awesome model based RL resources (continually updated)
851
star
7

awesome-diffusion-model-in-rl

A curated list of Diffusion Model in RL resources (continually updated)
739
star
8

awesome-decision-transformer

A curated list of Decision Transformer resources (continually updated)
671
star
9

DI-drive

Decision Intelligence Platform for Autonomous Driving simulation.
Python
563
star
10

InterFuser

[CoRL 2022] InterFuser: Safety-Enhanced Autonomous Driving Using Interpretable Sensor Fusion Transformer
Python
522
star
11

LLMRiddles

Open-Source Reproduction/Demo of the LLM Riddles Game
Python
515
star
12

GoBigger

[ICLR 2023] Come & try Decision-Intelligence version of "Agar"! Gobigger could also help you with multi-agent decision intelligence study.
Python
459
star
13

DI-sheep

羊了个羊 + 深度强化学习(Deep Reinforcement Learning + 3 Tiles Game)
Python
416
star
14

awesome-end-to-end-autonomous-driving

A curated list of awesome End-to-End Autonomous Driving resources (continually updated)
371
star
15

awesome-multi-modal-reinforcement-learning

A curated list of Multi-Modal Reinforcement Learning resources (continually updated)
367
star
16

awesome-exploration-rl

A curated list of awesome exploration RL resources (continually updated)
365
star
17

SO2

[AAAI2024] A Perspective of Q-value Estimation on Offline-to-Online Reinforcement Learning
Python
285
star
18

DI-engine-docs

DI-engine docs (Chinese and English)
Python
281
star
19

DI-orchestrator

OpenDILab RL Kubernetes Custom Resource and Operator Lib
Go
240
star
20

DI-smartcross

Decision Intelligence platform for Traffic Crossing Signal Control
Python
230
star
21

treevalue

Here are the most awesome tree structure computing solutions, make your life easier. (这里有目前性能最优的树形结构计算解决方案)
Python
228
star
22

DI-hpc

OpenDILab RL HPC OP Lib, including CUDA and Triton kernel
Python
222
star
23

awesome-AI-based-protein-design

A collection of research papers for AI-based protein design
216
star
24

ACE

[AAAI 2023] Official PyTorch implementation of paper "ACE: Cooperative Multi-agent Q-learning with Bidirectional Action-Dependency".
Python
212
star
25

DI-treetensor

Let DI-treetensor help you simplify the structure processing!(树形运算一不小心就逻辑混乱?DI-treetensor快速帮你搞定)
Python
202
star
26

GoBigger-Challenge-2021

Interested in multi-agents? The 1st Go-Bigger Multi-Agent Decision Intelligence Challenge is coming and a big bonus is waiting for you!
Python
195
star
27

Gobigger-Explore

Still struggling with the high threshold or looking for the appropriate baseline? Come here and new starters can also play with your own multi-agents!
Python
185
star
28

DI-store

OpenDILab RL Object Store
Go
177
star
29

LightTuner

Python
173
star
30

DOS

[CVPR 2023] ReasonNet: End-to-End Driving with Temporal and Global Reasoning
Python
145
star
31

DI-toolkit

A simple toolkit package for opendilab
Python
113
star
32

DI-bioseq

Decision Intelligence platform for Biological Sequence Searching
Python
111
star
33

DI-1024

1024 + 深度强化学习(Deep Reinforcement Learning + 1024 Game/ 2048 Game)
Python
109
star
34

SmartRefine

[CVPR 2024] SmartRefine: A Scenario-Adaptive Refinement Framework for Efficient Motion Prediction
Python
107
star
35

DIgging

Decision Intelligence for digging best parameters in target environment.
Python
90
star
36

awesome-driving-behavior-prediction

A collection of research papers for Driving Behavior Prediction
77
star
37

PsyDI

PsyDI: Towards a Personalized and Progressively In-depth Chatbot for Psychological Measurements. (e.g. MBTI Measurement Agent)
TypeScript
70
star
38

DI-adventure

Decision Intelligence Adventure for Beginners
Python
68
star
39

GenerativeRL

Python library for solving reinforcement learning (RL) problems using generative models (e.g. Diffusion Models).
Python
48
star
40

huggingface_ding

Auxiliary code for pulling, loading reinforcement learning models based on DI-engine from the Huggingface Hub, or pushing them onto Huggingface Hub with auto-created model card.
Python
46
star
41

CodeMorpheus

CodeMorpheus: Generate code self-portraits with one click(一键生成代码自画像,决策型 AI + 生成式 AI)
Python
45
star
42

OpenPaL

Building open-ended embodied agent in battle royale FPS game
33
star
43

awesome-ui-agents

A curated list of of awesome UI agents resources, encompassing Web, App, OS, and beyond (continually updated)
31
star
44

.github

The first decision intelligence platform covering the most complete algorithms in academia and industry
19
star
45

CleanS2S

High-quality and streaming Speech-to-Speech interactive agent in a single file. 只用一个文件实现的流式全双工语音交互原型智能体!
1
star