LEDNet: A Lightweight Encoder-Decoder Network for Real-time Semantic Segmentation
Table of Contents:
- Introduction
- Project Structure
- Installation
- Datasets
- Train
- Resuming training
- Test
- Results
- Reference
- Tips
Introduction
This project contains the code (Note: The code is test in the environment with python=3.6, cuda=9.0, PyTorch-0.4.1, also support Pytorch-0.4.1+) for: LEDNet: A Lightweight Encoder-Decoder Network for Real-time Semantic Segmentation by Yu Wang.
The extensive computational burden limits the usage of CNNs in mobile devices for dense estimation tasks, a.k.a semantic segmentation. In this paper, we present a lightweight network to address this problem, namely **LEDNet**, which employs an asymmetric encoder-decoder architecture for the task of real-time semantic segmentation.More specifically, the encoder adopts a ResNet as backbone network, where two new operations, channel split and shuffle, are utilized in each residual block to greatly reduce computation cost while maintaining higher segmentation accuracy. On the other hand, an attention pyramid network (APN) is employed in the decoder to further lighten the entire network complexity. Our model has less than 1M parameters, and is able to run at over 71 FPS on a single GTX 1080Ti GPU card. The comprehensive experiments demonstrate that our approach achieves state-of-the-art results in terms of speed and accuracy trade-off on Cityscapes dataset. and becomes an effective method for real-time semantic segmentation tasks.Project-Structure
βββ datasets # contains all datasets for the project
| βββ cityscapes # cityscapes dataset
| | βββ gtCoarse # Coarse cityscapes annotation
| | βββ gtFine # Fine cityscapes annotation
| | βββ leftImg8bit # cityscapes training image
| βββ cityscapesscripts # cityscapes dataset label convert scriptsοΌ
βββ utils
| βββ dataset.py # dataloader for cityscapes dataset
| βββ iouEval.py # for test 'iou mean' and 'iou per class'
| βββ transform.py # data preprocessing
| βββ visualize.py # Visualize with visdom
| βββ loss.py # loss function
βββ checkpoint
| βββ xxx.pth # pretrained models encoder form ImageNet
βββ save
| βββ xxx.pth # trained models form scratch
βββ imagenet-pretrain
| βββ lednet_imagenet.py #
| βββ main.py #
βββ train
| βββ lednet.py # model definition for semantic segmentation
| βββ main.py # train model scripts
βββ test
| | βββ dataset.py
| | βββ lednet.py # model definition
| | βββ lednet_no_bn.py # Remove the BN layer in model definition
| | βββ eval_cityscapes_color.py # Test the results to generate RGB images
| | βββ eval_cityscapes_server.py # generate result uploaded official server
| | βββ eval_forward_time.py # Test model inference time
| | βββ eval_iou.py
| | βββ iouEval.py
| | βββ transform.py
Installation
- Python 3.6.x. Recommended using Anaconda3
- Set up python environment
pip3 install -r requirements.txt
-
Env: PyTorch_0.4.1; cuda_9.0; cudnn_7.1; python_3.6,
-
Clone this repository.
git clone https://github.com/xiaoyufenfei/LEDNet.git
cd LEDNet-master
- Install Visdom.
- Install torchsummary
- Download the dataset by following the Datasets below.
- Note: For training, we currently support cityscapes , aim to add Camvid and VOC and ADE20K dataset
Datasets
- You can download cityscapes from here. Note: please download leftImg8bit_trainvaltest.zip(11GB) and gtFine_trainvaltest(241MB) and gtCoarse(1.3GB).
- You can download CityscapesScripts, and convert the dataset to 19 categories. It should have this basic structure.
βββ leftImg8bit
β βββ train
β βββ val
β βββ test
βββ gtFine
β βββ train
β βββ val
β βββ test
βββ gtCoarse
β βββ train
β βββ train_extra
β βββ val
Training-LEDNet
-
For help on the optional arguments you can run:
python main.py -h
-
By default, we assume you have downloaded the cityscapes dataset in the
./data/cityscapes
dir. -
To train LEDNet using the train/main.py script the parameters listed in
main.py
as a flag or manually change them.
python main.py --savedir logs --model lednet --datadir path/root_directory/ --num-epochs xx --batch-size xx ...
Resuming-training-if-decoder-part-broken
- for help on the optional arguments you can run:
python main.py -h
python main.py --savedir logs --name lednet --datadir path/root_directory/ --num-epochs xx --batch-size xx --decoder --state "../save/logs/model_best_enc.pth.tar"...
Testing
- the trained models of training process can be found at here. This may not be the best one, you can train one from scratch by yourself or Fine-tuning the training decoder with model encoder pre-trained on ImageNet, For instance
more details refer ./test/README.md
Results
- Please refer to our article for more details.
Method | Dataset | Fine | Coarse | IoU_cla | IoU_cat | FPS |
---|---|---|---|---|---|---|
LEDNet | cityscapes | yes | yes | 70.6β% | 87.1β%β | 70β+β |
qualitative segmentation result examples:
Citation
If you find this code useful for your research, please use the following BibTeX entry.
@article{wang2019lednet,
title={LEDNet: A Lightweight Encoder-Decoder Network for Real-time Semantic Segmentation},
author={Wang, Yu and Zhou, Quan and Liu, Jia and XiongοΌJian and Gao, Guangwei and Wu, Xiaofu, and Latecki Jan Longin},
journal={arXiv preprint arXiv:1905.02423},
year={2019}
}
Tips
- Limited by GPU resources, the project results need to be further improved...
- It is recommended to pre-train Encoder on ImageNet and then Fine-turning Decoder part. The result will be better.