• Stars
    star
    370
  • Rank 111,946 (Top 3 %)
  • Language
    Jupyter Notebook
  • License
    Apache License 2.0
  • Created 6 months ago
  • Updated 3 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

[CVPR 2024] LMDrive: Closed-Loop End-to-End Driving with Large Language Models

LMDrive: Closed-Loop End-to-End Driving with Large Language Models

An end-to-end, closed-loop, language-based autonomous driving framework, which interacts with the dynamic environment via multi-modal multi-view sensor data and natural language instructions.

[Project Page] [Paper] [Dataset(hugging face)] [Model Zoo]

[Dataset(OpenXlab)] [Model Zoo(OpenXLab)]

Hits Code License Data License

News


Hao Shao, Yuxuan Hu, Letian Wang, Steven L. Waslander, Yu Liu, Hongsheng Li.

This repository contains code for the paper LMDrive: Closed-Loop End-to-End Driving with Large Language Models. This work proposes a novel language-guided, end-to-end, closed-loop autonomous driving framework.

Demo Video

demo_video.mp4

Contents

  1. Setup
  2. Model Weights
  3. Dataset
    1. Overview
    2. Data Generation
    3. Data Pre-procession
    4. Data Parsing
  4. Training
    1. Vision encoder pre-training
    2. Instruction finetuning
  5. Evaluation
  6. Citation
  7. Acknowledgements

Setup

Our project is built on three parts: (1) vision encoder (corresponding repo: timm); (2) vision LLM (corresponding repo: LAVIS); (3) data collection, agent controller (corresponding repo: InterFuser, Leaderboard, ScenarioRunner).

Install anaconda

wget https://repo.anaconda.com/archive/Anaconda3-2020.11-Linux-x86_64.sh
bash Anaconda3-2020.11-Linux-x86_64.sh
source ~/.bashrc

Clone the repo and build the environment

git clone https://github.com/opendilab/LMDrive.git
cd LMDrive
conda create -n lmdrive python=3.8
conda activate lmdrive
cd vision_encoder
pip3 install -r requirements.txt
python setup.py develop # if you have installed timm before, please uninstall it
cd ../LAVIS
pip3 install -r requirements.txt
python setup.py develop # if you have installed LAVIS before, please uninstall it

pip install flash-attn --no-build-isolation # optional

Download and setup CARLA 0.9.10.1

chmod +x setup_carla.sh
./setup_carla.sh
pip install carla

If you encounter some problems related to Carla, please refer to Carla Issues and InterFuser Issues first.

LMDrive Weights

If you are interested in including any other details in Model Zoo, please open an issue :)

Version Size Checkpoint VisionEncoder LLM-base DS (LangAuto) DS (LangAuto-short)
LMDrive-1.0 (LLaVA-v1.5-7B) 7B LMDrive-llava-v1.5-7b-v1.0 R50 LLaVA-v1.5-7B 36.2 50.6
LMDrive-1.0 (Vicuna-v1.5-7B) 7B LMDrive-vicuna-v1.5-7b-v1.0 R50 Vicuna-v1.5-7B 33.5 45.3
LMDrive-1.0 (LLaMA-7B) 7B LMDrive-llama-7b-v1.0 R50 LLaMA-7B 31.3 42.8

DS denotes the driving score

Dataset

We aim to develop an intelligent driving agent that can generate driving actions based on three sources of input: 1) sensor data (multi-view camera and LiDAR), so that the agent can generate actions that are aware of and compliant with the current scene; 2) navigation instructions (e.g. lane changing, turning), so that the agent can drive to meet the requirement in natural language (instruction from humans or navigation software); and 3) human notice instruction, so that the agent can interact with humans and adapt to human's suggestions and preferences (e.g. pay attention to adversarial events, deal with long-tail events, etc).

We provide a dataset with about 64K data clips, where each clip includes one navigation instruction, several notice instructions, a sequence of multi-modal multi-view sensor data, and control signals. The duration of the clip spans from 2 to 20 seconds. The dataset used in our paper can be downloaded here. If you want to create your own dataset, please follow the steps we've outlined below.

Overview

The data is generated with leaderboard/team_code/auto_pilot.py in 8 CARLA towns using the routes and scenarios files provided at leaderboard/data on CARLA 0.9.10.1 . The dataset is collected at a high frequency (~10Hz).

Once you have downloaded our dataset or collected your own dataset, it's necessary to systematically organize the data as follows. DATASET_ROOT is the root directory where your dataset is stored.

├── $DATASET_ROOT
│   └── dataset_index.txt  # for vision encoder pretraining
│   └── navigation_instruction_list.txt  # for instruction finetuning
│   └── notice_instruction_list.json  # for instruction finetuning
│   └── routes_town06_long_w7_11_28_18_28_35  #  data folder
│   └── routes_town01_short_w2_11_16_08_27_10
│   └── routes_town02_short_w2_11_16_22_55_25
│   └── routes_town01_short_w2_11_16_11_44_08 
      ├── rgb_full
      ├── lidar
      └── ...

The navigation_instruction_list.txt and notice_instruction_list.txt can be generated with our scripts by the data parsing scripts. Each subfolder in the dataset you've collected should be structured as follows:

- routes_town(town_id)_{tiny,short,long}_w(weather_id)_timestamp: corresponding to different towns and routes files
    - routes_X: contains data for an individual route
        - rgb_full: a big multi-view camera image at 400x1200 resolution, which can be split into four images (left, center, right, rear)
        - lidar: 3d point cloud in .npy format. It only includes the LiDAR points captured in 1/20 second, covering 180 degrees of horizontal view. So if you want to utilize 360 degrees of view, you need to merge it with the data from lidar_odd.
        - lidar_odd: 3d point cloud in .npy format.
        - birdview: topdown segmentation images, LAV and LBC used this type of data for training
        - topdown: similar to birdview but it's captured by the down-facing camera
        - 3d_bbs: 3d bounding boxes for different agents
        - affordances: different types of affordances
        - actors_data: contains the positions, velocities and other metadata of surrounding vehicles and the traffic lights
        - measurements: contains ego agent's position, velocity, future waypoints, and other metadata
        - measurements_full: merges measurement and actors_data
        - measurements_all.json: merges the files in measurement_full into a single file

The $DATASET_ROOT directory must contain a file named dataset_index.txt, which can be generated by our data pre-processing script. It should list the training and evaluation data in the following format:

<relative_route_path_dir> <num_data_frames_in_this_dir>
routes_town06_long_w7_11_28_18_28_35/ 1062
routes_town01_short_w2_11_16_08_27_10/ 1785
routes_town01_short_w2_11_16_09_55_05/ 918
routes_town02_short_w2_11_16_22_55_25/ 134
routes_town01_short_w2_11_16_11_44_08/ 569

Here, <relative_route_path_dir> should be a relative path to the $DATASET_ROOT. The training code will concatenate the $DATASET_ROOT and <relative_route_path_dir> to create the full path for loading the data. In this format, 1062 represents the number of frames in the routes_town06_long_w7_11_28_18_28_35/rgb_full directory or routes_town06_long_w7_11_28_18_28_35/lidar, etc.

Data Generation

Data Generation with multiple CARLA Servers

In addition to the dataset, we have also provided all the scripts used for generating data and these can be modified as required for different CARLA versions. The dataset is collected by a rule-based expert agent in different weathers and towns.

Running CARLA Servers
# Start 4 carla servers: ip [localhost], port [2000, 2002, 2004, 2006]. You can adjust the number of CARLA servers according to your situation and more servers can collect more data. If you use N servers to collect data, it means you have collected data N times on each route, except that the weather and traffic scenarios are random each time.

cd carla
CUDA_VISIBLE_DEVICES=0 ./CarlaUE4.sh --world-port=2000 -opengl &
CUDA_VISIBLE_DEVICES=1 ./CarlaUE4.sh --world-port=2002 -opengl &
CUDA_VISIBLE_DEVICES=2 ./CarlaUE4.sh --world-port=2004 -opengl &
CUDA_VISIBLE_DEVICES=3 ./CarlaUE4.sh --world-port=2006 -opengl &

Instructions for setting up docker are available here. Pull the docker image of CARLA 0.9.10.1 docker pull carlasim/carla:0.9.10.1.

Docker 18:

docker run -it --rm -p 2000-2002:2000-2002 --runtime=nvidia -e NVIDIA_VISIBLE_DEVICES=0 carlasim/carla:0.9.10.1 ./CarlaUE4.sh --world-port=2000 -opengl

Docker 19:

docker run -it --rm --net=host --gpus '"device=0"' carlasim/carla:0.9.10.1 ./CarlaUE4.sh --world-port=2000 -opengl

If the docker container doesn't start properly then add another environment variable -e SDL_AUDIODRIVER=dsp.

Run the Autopilot

Generate scripts for collecting data in batches.

cd dataset
python init_dir.py
cd ..
cd data_collection
python generate_yamls.py # You can modify FPS, waypoints distribution strength ...

# If you do not use 4 servers, the following Python scripts are needed to modify
python generate_bashs.py
python generate_batch_collect.py 
cd ..

Run batch-run scripts of the town and route type that you need to collect.

bash data_collection/batch_run/run_route_routes_town01_long.sh
bash data_collection/batch_run/run_route_routes_town01_short.sh
...
bash data_collection/batch_run/run_route_routes_town07_tiny.sh
...
bash data_collection/batch_run/run_route_routes_town10_tiny.sh

Note: Our scripts will use a random weather condition for data collection

Data Generation with a single CARLA Server

With a single CARLA server, roll out the autopilot to start data generation.

carla/CarlaUE4.sh --world-port=2000 -opengl
./leaderboard/scripts/run_evaluation.sh

The expert agent used for data generation is defined in leaderboard/team_code/auto_pilot.py. Different variables which need to be set are specified in leaderboard/scripts/run_evaluation.sh.

Data Pre-procession

We provide some Python scripts for pre-processing the collected data in tools/data_preprocessing, some of them are optional. Please execute them in the order:

  1. python get_list_file.py $DATASET_ROOT: obtain the dataset_list.txt.
  2. python batch_merge_data.py $DATASET_ROOT: merge several scattered data files into one file to reduce IO time when training. [Optional]
  3. python batch_rm_rgb_data.py $DATASET_ROOT: delete redundant files after we have merged them into new files. [Optional]
  4. python batch_stat_blocked_data.py $DATASET_ROOT: find the frames that the ego-vehicle is blocked for a long time. By removing them, we can enhance data distribution and decrease the overall data size.
  5. python batch_rm_blocked_data.py $DATASET_ROOT: delete the blocked frames.
  6. python batch_recollect_data.py $DATASET_ROOT: since we have removed some frames, we need to reorganize them to ensure that the frame ids are continuous.
  7. python batch_merge_measurements.py $DATASET_ROOT: merge the measurement files from all frames in one route folder to reduce IO time

Data Parsing

After collecting and pre-processing the data, we need to parse the navigation instructions and notice instructions data with some Python scripts in tools/data_parsing.

The script for parsing navigation instructions:

python3 parse_instruction.py $DATSET_ROOT 

The parsed navigation clips will be saved in $DATSET_ROOT/navigation_instruction_list.txt, under the root directory of the dataset.

The script for parsing notice instructions:

python3 parse_notice.py $DATSET_ROOT 

The parsed notice clips will be saved in $DATSET_ROOT/notice_instruction_list.txt.

The script for parsing misleading instructions:

python3 parse_misleading.py $DATSET_ROOT 

The parsed misleading clips will be saved in $DATSET_ROOT/misleading_data.txt.

Training

LMDrive's training consists of two stages: 1) the vision encoder pre-training stage, to generate visual tokens from sensor inputs; and 2) the instruction-finetuning stage, to align the instruction/vision and control signal.

LMDrive is trained on 8 A100 GPUs with 80GB memory (the first stage can be trained on GPUS with 32G memory). To train on fewer GPUs, you can reduce the batch-size and the learning-rate while maintaining their proportion. Please download the multi-modal dataset with instructions collected in the CARLA simulator we use in the paper here or openxlab (uploading)], if you do not collect the dataset by yourself. You can only download part of them to verify our framework or your improvement.

Vision encoder pre-training

Pretrain takes around 2~3 days for the visual encoder on 8x A100 (80G). Once the training is completed, you can locate the checkpoint of the vision encoder in the output/ directory.

cd vision_encoder
bash scripts/train.sh

Some options to note:

  • GPU_NUM: the number of GPUs you want to use. By default, it is set to 8.
  • DATASET_ROOT: the root directory for storing the dataset.
  • --model: the structure of visual model. You can choose memfuser_baseline_e1d3_r26 which replaces ResNet50 with ResNet26. It's also possible to create new model variants in visual_encoder/timm/models/memfuser.py
  • --train-towns/train-weathers: the data filter for the training dataset. Similarly, there are corresponding options, val-towns/val-weathers to filter the validation dataset accordingly.

Instruction finetuning

Instruction finetuning takes around 2~3 days for the visual encoder on 8x A100 (80G). Once the training is completed, you can locate the checkpoint of the adapter and qformer in the lavis/output/ directory.

cd LAVIS
bash run.sh 8 lavis/projects/lmdrive/notice_llava15_visual_encoder_r50_seq40.yaml # 8 is the GPU number

Some options in the config.yaml to note:

  • preception_model: the model architecture of the vision encoder.
  • preception_model_ckpt: the checkpoint path of the vision encoder.
  • llm_model: the checkpoint path of the llm (Vicuna/LLaVA).
  • use_notice_prompt: whether to use notice instruction data when training.
  • split_section_num_for_visual_encoder: the number of sections the frames are divided into during the forward encoding of visual features. Higher values can save more memory, and it needs to be a factor of token_max_length.
  • datasets:
    • storage: the root directory for storing the dataset.
    • towns/weathers: the data filter for training/evaluating.
    • token_max_length: the maximum number of frames, if the number of frames exceeds this value, they will be truncated.
    • sample_interval: the interval at which frames are sampled.

Evaluation

Start a CARLA server (described above) and run the required agent. The adequate routes and scenarios files are provided in leaderboard/data and the required variables need to be set in leaderboard/scripts/run_evaluation.sh.

Some options need to be updated in the leaderboard/team_code/lmdrive_config.py:

  • preception_model: the model architecture of the vision encoder.
  • preception_model_ckpt: the checkpoint path of the vision encoder (obtained in the vision encoder pretraining stage).
  • llm_model: the checkpoint path of the llm (LLaMA/Vicuna/LLaVA).
  • lmdrive_ckpt: the checkpoint path of the lmdrive (obtained in the instruction finetuing stage).

Update leaderboard/scripts/run_evaluation.sh to include the following code for evaluating the model on Town05 Long Benchmark.

export CARLA_ROOT=/path/to/carla/root
export TEAM_AGENT=leaderboard/team_code/lmdrive_agent.py
export TEAM_CONFIG=leaderboard/team_code/lmdrive_config.py
export CHECKPOINT_ENDPOINT=results/lmdrive_result.json
export SCENARIOS=leaderboard/data/official/all_towns_traffic_scenarios_public.json
export ROUTES=leaderboard/data/LangAuto/long.xml
CUDA_VISIBLE_DEVICES=0 ./leaderboard/scripts/run_evaluation.sh

Here, the long.json and long.xml files are replaced with short.json and short.xml for the evaluation of the agent in the LangAuto-Short benchmark.

For LangAuto-Tiny benchmark evaluation, replace the long.json and long.xml files with tiny.json and tiny.xml:

export SCENARIOS=leaderboard/data/LangAuto/tiny.json
export ROUTES=leaderboard/data/LangAuto/tiny.xml

LangAuto-Notice

Set the agent_use_notice as True in the lmdriver_config.py.

Citation

If you find our repo, dataset or paper useful, please cite us as

@misc{shao2023lmdrive,
      title={LMDrive: Closed-Loop End-to-End Driving with Large Language Models}, 
      author={Hao Shao and Yuxuan Hu and Letian Wang and Steven L. Waslander and Yu Liu and Hongsheng Li},
      year={2023},
      eprint={2312.07488},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

Acknowledgements

This implementation is based on code from several repositories.

License

All code within this repository is under Apache License 2.0.

More Repositories

1

awesome-RLHF

A curated list of reinforcement learning with human feedback resources (continually updated)
2,642
star
2

DI-engine

OpenDILab Decision AI Engine
Python
2,616
star
3

PPOxFamily

PPO x Family DRL Tutorial Course(决策智能入门级公开课:8节课帮你盘清算法理论,理顺代码逻辑,玩转决策AI应用实践 )
Python
1,632
star
4

DI-star

An artificial intelligence platform for the StarCraft II with large-scale distributed training and grand-master agents.
Python
1,149
star
5

LightZero

[NeurIPS 2023 Spotlight] LightZero: A Unified Benchmark for Monte Carlo Tree Search in General Sequential Decision Scenarios
Python
885
star
6

awesome-model-based-RL

A curated list of awesome model based RL resources (continually updated)
661
star
7

awesome-decision-transformer

A curated list of Decision Transformer resources (continually updated)
536
star
8

DI-drive

Decision Intelligence Platform for Autonomous Driving simulation.
Python
510
star
9

awesome-diffusion-model-in-rl

A curated list of Diffusion Model in RL resources (continually updated)
509
star
10

LLMRiddles

Open-Source Reproduction/Demo of the LLM Riddles Game
Python
448
star
11

InterFuser

[CoRL 2022] InterFuser: Safety-Enhanced Autonomous Driving Using Interpretable Sensor Fusion Transformer
Python
435
star
12

GoBigger

[ICLR 2023] Come & try Decision-Intelligence version of "Agar"! Gobigger could also help you with multi-agent decision intelligence study.
Python
431
star
13

DI-sheep

羊了个羊 + 深度强化学习(Deep Reinforcement Learning + 3 Tiles Game)
Python
362
star
14

awesome-multi-modal-reinforcement-learning

A curated list of Multi-Modal Reinforcement Learning resources (continually updated)
295
star
15

awesome-exploration-rl

A curated list of awesome exploration RL resources (continually updated)
262
star
16

awesome-end-to-end-autonomous-driving

A curated list of awesome End-to-End Autonomous Driving resources (continually updated)
252
star
17

DI-engine-docs

DI-engine docs (Chinese and English)
Python
243
star
18

DI-orchestrator

OpenDILab RL Kubernetes Custom Resource and Operator Lib
Go
217
star
19

treevalue

Here are the most awesome tree structure computing solutions, make your life easier. (这里有目前性能最优的树形结构计算解决方案)
Python
215
star
20

DI-smartcross

Decision Intelligence platform for Traffic Crossing Signal Control
Python
207
star
21

SO2

[AAAI2024] A Perspective of Q-value Estimation on Offline-to-Online Reinforcement Learning
Python
205
star
22

DI-hpc

OpenDILab RL HPC OP Lib
Python
197
star
23

GoBigger-Challenge-2021

Interested in multi-agents? The 1st Go-Bigger Multi-Agent Decision Intelligence Challenge is coming and a big bonus is waiting for you!
Python
192
star
24

DI-treetensor

Let DI-treetensor help you simplify the structure processing!(树形运算一不小心就逻辑混乱?DI-treetensor快速帮你搞定)
Python
176
star
25

ACE

[AAAI 2023] Official PyTorch implementation of paper "ACE: Cooperative Multi-agent Q-learning with Bidirectional Action-Dependency".
Python
174
star
26

awesome-AI-based-protein-design

A collection of research papers for AI-based protein design
170
star
27

Gobigger-Explore

Still struggling with the high threshold or looking for the appropriate baseline? Come here and new starters can also play with your own multi-agents!
Python
167
star
28

DI-store

OpenDILab RL Object Store
Go
164
star
29

LightTuner

Python
146
star
30

DI-bioseq

Decision Intelligence platform for Biological Sequence Searching
Python
97
star
31

DOS

[CVPR 2023] ReasonNet: End-to-End Driving with Temporal and Global Reasoning
Python
94
star
32

DI-1024

1024 + 深度强化学习(Deep Reinforcement Learning + 1024 Game)
JavaScript
91
star
33

DI-toolkit

A simple toolkit package for opendilab
Python
89
star
34

DIgging

Decision Intelligence for digging best parameters in target environment.
Python
81
star
35

awesome-driving-behavior-prediction

A collection of research papers for Driving Behavior Prediction
63
star
36

DI-adventure

Decision Intelligence Adventure for Beginners
Python
54
star
37

SmartRefine

[CVPR 2024] SmartRefine: An Scenario-Adaptive Refinement Framework for Efficient Motion Prediction
Python
39
star
38

CodeMorpheus

CodeMorpheus: Generate code self-portraits with one click(一键生成代码自画像,决策型 AI + 生成式 AI)
Python
28
star
39

PsyDI

PsyDI: A MBTI agent that helps you understand your personality type through a relaxed multi-modal interaction.
TypeScript
25
star
40

OpenPaL

Building open-ended embodied agent in battle royale FPS game
19
star
41

huggingface_ding

Auxiliary code for pulling, loading reinforcement learning models based on DI-engine from the Huggingface Hub, or pushing them onto Huggingface Hub with auto-created model card.
Python
17
star
42

.github

The first decision intelligence platform covering the most complete algorithms in academia and industry
10
star