Polyp-PVT
by Bo Dong, Wenhai Wang, Jinpeng Li, Deng-Ping Fan.
This repo is the official implementation of "Polyp-PVT: Polyp Segmentation with Pyramid Vision Transformers".
1. Introduction
Polyp-PVT is initially described in arxiv.
Most polyp segmentation methods use CNNs as their backbone, leading to two key issues when exchanging information between the encoder and decoder: 1) taking into account the differences in contribution between different-level features; and 2) designing effective mechanism for fusing these features. Different from existing CNN-based methods, we adopt a transformer encoder, which learns more powerful and robust representations. In addition, considering the image acquisition influence and elusive properties of polyps, we introduce three novel modules, including a cascaded fusion module (CFM), a camouflage identification module (CIM), and a similarity aggregation module (SAM). Among these, the CFM is used to collect the semantic and location information of polyps from high-level features, while the CIM is applied to capture polyp information disguised in low-level features. With the help of the SAM, we extend the pixel features of the polyp area with high-level semantic position information to the entire polyp area, thereby effectively fusing cross-level features. The proposed model, named Polyp-PVT , effectively suppresses noises in the features and significantly improves their expressive capabilities.
Polyp-PVT achieves strong performance on image-level polyp segmentation (0.808 mean Dice
and 0.727 mean IoU
on ColonDB) and
video polyp segmentation (0.880 mean dice
and 0.802 mean IoU
on CVC-300-TV), surpassing previous models by a large margin.
2. Framework Overview
3. Results
3.1 Image-level Polyp Segmentation
3.2 Image-level Polyp Segmentation Compared Results:
We also provide some result of baseline methods, You could download from Google Drive/Baidu Drive [code:qw9i], including our results and that of compared models.
3.3 Video Polyp Segmentation
3.4 Video Polyp Segmentation Compared Results:
We also provide some result of baseline methods, You could download from Google Drive/Baidu Drive [code:rtvt], including our results and that of compared models.
4. Usage:
4.1 Recommended environment:
Python 3.8
Pytorch 1.7.1
torchvision 0.8.2
4.2 Data preparation:
Downloading training and testing datasets and move them into ./dataset/, which can be found in this Google Drive/Baidu Drive [code:dr1h].
4.3 Pretrained model:
You should download the pretrained model from Google Drive/Baidu Drive [code:w4vk], and then put it in the './pretrained_pth' folder for initialization.
4.4 Training:
Clone the repository:
git clone https://github.com/DengPingFan/Polyp-PVT.git
cd Polyp-PVT
bash train.sh
4.5 Testing:
cd Polyp-PVT
bash test.sh
4.6 Evaluating your trained model:
Matlab: Please refer to the work of MICCAI2020 (link).
Python: Please refer to the work of ACMMM2021 (link).
Please note that we use the Matlab version to evaluate in our paper.
4.7 Well trained model:
You could download the trained model from Google Drive/Baidu Drive [code:9rpy] and put the model in directory './model_pth'.
4.8 Pre-computed maps:
Google Drive/Baidu Drive [code:x3jc]
5. Citation:
@aticle{dong2023PolypPVT,
title={Polyp-PVT: Polyp Segmentation with PyramidVision Transformers},
author={Bo, Dong and Wenhai, Wang and Deng-Ping, Fan and Jinpeng, Li and Huazhu, Fu and Ling, Shao},
journal={CAAI AIR},
year={2023}
}
6. Acknowledgement
We are very grateful for these excellent works PraNet, EAGRNet and MSEG, which have provided the basis for our framework.
7. FAQ:
If you want to improve the usability or any piece of advice, please feel free to contact me directly ([email protected]).
8. License
The source code is free for research and education use only. Any comercial use should get formal permission first.