• Stars
    star
    957
  • Rank 46,493 (Top 1.0 %)
  • Language
    Python
  • License
    MIT License
  • Created about 3 years ago
  • Updated 5 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

[PAMI'23] TransFuser: Imitation with Transformer-Based Sensor Fusion for Autonomous Driving; [CVPR'21] Multi-Modal Fusion Transformer for End-to-End Autonomous Driving

TransFuser: Imitation with Transformer-Based Sensor Fusion for Autonomous Driving

Paper | Supplementary | Talk | Poster | Slides

PWC

This repository contains the code for the PAMI 2022 paper TransFuser: Imitation with Transformer-Based Sensor Fusion for Autonomous Driving. This work is a journal extension of the CVPR 2021 paper Multi-Modal Fusion Transformer for End-to-End Autonomous Driving. The code for the CVPR 2021 paper is available in the cvpr2021 branch.

If you find our code or papers useful, please cite:

@article{Chitta2022PAMI,
  author = {Chitta, Kashyap and
            Prakash, Aditya and
            Jaeger, Bernhard and
            Yu, Zehao and
            Renz, Katrin and
            Geiger, Andreas},
  title = {TransFuser: Imitation with Transformer-Based Sensor Fusion for Autonomous Driving},
  journal = {Pattern Analysis and Machine Intelligence (PAMI)},
  year = {2022},
}
@inproceedings{Prakash2021CVPR,
  author = {Prakash, Aditya and
            Chitta, Kashyap and
            Geiger, Andreas},
  title = {Multi-Modal Fusion Transformer for End-to-End Autonomous Driving},
  booktitle = {Conference on Computer Vision and Pattern Recognition (CVPR)},
  year = {2021}
}

Also, check out the code for other recent work on CARLA from our group:

Contents

  1. Setup
  2. Dataset and Training
  3. Evaluation

Setup

Clone the repo, setup CARLA 0.9.10.1, and build the conda environment:

git clone https://github.com/autonomousvision/transfuser.git
cd transfuser
git checkout 2022
chmod +x setup_carla.sh
./setup_carla.sh
conda env create -f environment.yml
conda activate tfuse
pip install torch-scatter -f https://data.pyg.org/whl/torch-1.11.0+cu102.html
pip install mmcv-full==1.5.3 -f https://download.openmmlab.com/mmcv/dist/cu102/torch1.11.0/index.html

Dataset and Training

Our dataset is generated via a privileged agent which we call the autopilot (/team_code_autopilot/autopilot.py) in 8 CARLA towns using the routes and scenario files provided in this folder. See the tools/dataset folder for detailed documentation regarding the training routes and scenarios. You can download the dataset (210GB) by running:

chmod +x download_data.sh
./download_data.sh

The dataset is structured as follows:

- Scenario
    - Town
        - Route
            - rgb: camera images
            - depth: corresponding depth images
            - semantics: corresponding segmentation images
            - lidar: 3d point cloud in .npy format
            - topdown: topdown segmentation maps
            - label_raw: 3d bounding boxes for vehicles
            - measurements: contains ego-agent's position, velocity and other metadata

Data generation

In addition to the dataset itself, we have provided the scripts for data generation with our autopilot agent. To generate data, the first step is to launch a CARLA server:

./CarlaUE4.sh --world-port=2000 -opengl

For more information on running CARLA servers (e.g. on a machine without a display), see the official documentation. Once the server is running, use the script below for generating training data:

./leaderboard/scripts/datagen.sh <carla root> <working directory of this repo (*/transfuser/)>

The main variables to set for this script are SCENARIOS and ROUTES.

Training script

The code for training via imitation learning is provided in train.py.
A minimal example of running the training script on a single machine:

cd team_code_transfuser
python train.py --batch_size 10 --logdir /path/to/logdir --root_dir /path/to/dataset_root/ --parallel_training 0

The training script has many more useful features documented at the start of the main function. One of them is parallel training. The script has to be started differently when training on a multi-gpu node:

cd team_code_transfuser
CUDA_VISIBLE_DEVICES=0,1 OMP_NUM_THREADS=16 OPENBLAS_NUM_THREADS=1 torchrun --nnodes=1 --nproc_per_node=2 --max_restarts=0 --rdzv_id=1234576890 --rdzv_backend=c10d train.py --logdir /path/to/logdir --root_dir /path/to/dataset_root/ --parallel_training 1

Enumerate the GPUs you want to train on with CUDA_VISIBLE_DEVICES. Set the variable OMP_NUM_THREADS to the number of cpus available on your system. Set OPENBLAS_NUM_THREADS=1 if you want to avoid threads spawning other threads. Set --nproc_per_node to the number of available GPUs on your node.

The evaluation agent file is build to evaluate models trained with multiple GPUs. If you want to evaluate a model trained with a single GPU you need to remove this line.

Evaluation

Longest6 benchmark

We make some minor modifications to the CARLA leaderboard code for the Longest6 benchmark, which are documented here. See the leaderboard/data/longest6 folder for a description of Longest6 and how to evaluate on it.

Pretrained agents

Pre-trained agent files for all 4 methods can be downloaded from AWS:

mkdir model_ckpt
wget https://s3.eu-central-1.amazonaws.com/avg-projects/transfuser/models_2022.zip -P model_ckpt
unzip model_ckpt/models_2022.zip -d model_ckpt/
rm model_ckpt/models_2022.zip

Running an agent

To evaluate a model, we first launch a CARLA server:

./CarlaUE4.sh --world-port=2000 -opengl

Once the CARLA server is running, evaluate an agent with the script:

./leaderboard/scripts/local_evaluation.sh <carla root> <working directory of this repo (*/transfuser/)>

By editing the arguments in local_evaluation.sh, we can benchmark performance on the Longest6 routes. You can evaluate both privileged agents (such as [autopilot.py]) and sensor-based models. To evaluate the sensor-based models use submission_agent.py as the TEAM_AGENT and point to the folder you downloaded the model weights into for the TEAM_CONFIG. The code is automatically configured to use the correct method based on the args.txt file in the model folder.

You can look at qualitative examples of the expected driving behavior of TransFuser on the Longest6 routes here.

Parsing longest6 results

To compute additional statistics from the results of evaluation runs we provide a parser script tools/result_parser.py.

${WORK_DIR}/tools/result_parser.py --xml ${WORK_DIR}/leaderboard/data/longest6/longest6.xml --results /path/to/folder/with/json_results/ --save_dir /path/to/output --town_maps ${WORK_DIR}/leaderboard/data/town_maps_xodr

It will generate a results.csv file containing the average results of the run as well as additional statistics. It also generates town maps and marks the locations where infractions occurred.

Submitting to the CARLA leaderboard

To submit to the CARLA leaderboard you need docker installed on your system. Edit the paths at the start of make_docker.sh. Create the folder team_code_transfuser/model_ckpt/transfuser. Copy the model.pth files and args.txt that you want to evaluate to team_code_transfuser/model_ckpt/transfuser. If you want to evaluate an ensemble simply copy multiple .pth files into the folder, the code will load all of them and ensemble the predictions.

cd leaderboard
cd scripts
./make_docker.sh

The script will create a docker image with the name transfuser-agent. Follow the instructions on the leaderboard to make an account and install alpha.

alpha login
alpha benchmark:submit  --split 3 transfuser-agent:latest

The command will upload the docker image to the cloud and evaluate it.

More Repositories

1

sdfstudio

A Unified Framework for Surface Reconstruction
Python
1,861
star
2

occupancy_networks

This repository contains the code for the paper "Occupancy Networks - Learning 3D Reconstruction in Function Space"
Python
1,454
star
3

giraffe

This repository contains the code for the CVPR 2021 paper "GIRAFFE: Representing Scenes as Compositional Generative Neural Feature Fields"
Python
1,227
star
4

stylegan-t

[ICML'23] StyleGAN-T: Unlocking the Power of GANs for Fast Large-Scale Text-to-Image Synthesis
Python
1,122
star
5

stylegan-xl

[SIGGRAPH'22] StyleGAN-XL: Scaling StyleGAN to Large Diverse Datasets
Python
939
star
6

projected-gan

[NeurIPS'21] Projected GANs Converge Faster
Python
876
star
7

unimatch

[TPAMI'23] Unifying Flow, Stereo and Depth Estimation
Python
855
star
8

convolutional_occupancy_networks

[ECCV'20] Convolutional Occupancy Networks
Python
792
star
9

differentiable_volumetric_rendering

This repository contains the code for the CVPR 2020 paper "Differentiable Volumetric Rendering: Learning Implicit 3D Representations without 3D Supervision"
Python
782
star
10

mip-splatting

[CVPR'24 Oral] Mip-Splatting: Alias-free 3D Gaussian Splatting
Python
700
star
11

monosdf

[NeurIPS'22] MonoSDF: Exploring Monocular Geometric Cues for Neural Implicit Surface Reconstruction
Python
535
star
12

shape_as_points

[NeurIPS'21] Shape As Points: A Differentiable Poisson Solver
Python
518
star
13

unisurf

[ICCV'21] UNISURF: Unifying Neural Implicit Surfaces and Radiance Fields for Multi-View Reconstruction
Python
410
star
14

graf

Official code release for "GRAF: Generative Radiance Fields for 3D-Aware Image Synthesis"
Jupyter Notebook
393
star
15

tuplan_garage

[CoRL'23] Parting with Misconceptions about Learning-based Vehicle Motion Planning
Python
370
star
16

kitti360Scripts

This repository contains utility scripts for the KITTI-360 dataset.
Python
353
star
17

neat

[ICCV'21] NEAT: Neural Attention Fields for End-to-End Autonomous Driving
Python
301
star
18

gaussian-opacity-fields

Gaussian Opacity Fields for Efficient and Compact Surface Reconstruction in Unbounded Scenes
Python
285
star
19

occupancy_flow

This repository contains the code for the ICCV 2019 paper "Occupancy Flow - 4D Reconstruction by Learning Particle Dynamics"
Python
207
star
20

plant

[CoRL'22] PlanT: Explainable Planning Transformers via Object-Level Representations
Python
192
star
21

factor-fields

[SIGGRAPH 2023] We provide a unified formula for neural fields (Factor Fields) and a novel dictionary factorization (Dictionary Fields)
Jupyter Notebook
183
star
22

voxgraf

Official code release for VoxGRAF: Fast 3D-Aware Image Synthesis with Sparse Voxel Grids
Python
123
star
23

carla_garage

[ICCV'23] Hidden Biases of End-to-End Driving Models
Python
121
star
24

texture_fields

This repository contains code for the paper 'Texture Fields: Learning Texture Representations in Function Space'.
Python
113
star
25

sledge

SLEDGE: Synthesizing Simulation Environments for Driving Agents with Generative Models
105
star
26

kitti360LabelTool

JavaScript
103
star
27

counterfactual_generative_networks

[ICLR'21] Counterfactual Generative Networks
Python
102
star
28

gta

[ICLR'24] GTA: A Geometry-Aware Attention Mechanism for Multi-view Transformers
Python
95
star
29

murf

[CVPR'24] MuRF: Multi-Baseline Radiance Fields
Python
84
star
30

controllable_image_synthesis

Towards Unsupervised Learning of Generative Models for 3D Controllable Image Synthesis, CVPR 2020
Python
69
star
31

king

[ECCV'22] KING: Generating Safety-Critical Driving Scenarios for Robust Imitation via Kinematics Gradients
Python
61
star
32

handheld_svbrdf_geometry

On Joint Estimation of Pose, Geometry and svBRDF from a Handheld Scanner, CVPR2020
Python
57
star
33

navsim

NAVSIM: Data-Driven Non-Reactive Autonomous Vehicle Simulation
Python
52
star
34

connecting_the_dots

This repository contains the code for the paper "Connecting the Dots: Learning Representations for Active Monocular Depth Estimation" https://avg.is.tuebingen.mpg.de/publications/riegler2019cvpr
Python
51
star
35

frequency_bias

Official code for "On the Frequency Bias of Generative Models", NeurIPS 2021
Python
39
star
36

data_aggregation

This repository contains the code for the CVPR 2020 paper "Exploring Data Aggregation in Policy Learning for Vision-based Urban Autonomous Driving"
Python
38
star
37

good

[ICLR'23] GOOD: Exploring Geometric Cues for Detecting Objects in an Open World
Python
36
star
38

campari

[3DV'21] CAMPARI: Camera-Aware Decomposed Generative Neural Radiance Fields
Python
29
star
39

autonomousvision.github.io

Blog of the Autonomous Vision Group at MPI-IS Tübingen and University of Tübingen.
HTML
19
star
40

visual_abstractions

6
star
41

slides

Slide repository of the Autonomous Vision Group at MPI-IS Tübingen and University of Tübingen.
CSS
2
star
42

similarity_reconstruction

This code is based on the paper Exploiting Object Similarity in 3D Reconstruction.
C++
1
star
43

slow_flow

This code is based on the paper Slow Flow: Exploiting High-Speed Cameras for Accurate and Diverse Optical Flow Reference Data.
C++
1
star