Deep Learning for Polyp Detection and Classification in Colonoscopy

This repository was created from the following review paper: A. Nogueira-Rodríguez; R. Domínguez-Carbajales; H. López-Fernández; Á. Iglesias; J. Cubiella; F. Fdez-Riverola; M. Reboiro-Jato; D. Glez-Peña (2020) Deep Neural Networks approaches for detecting and classifying colorectal polyps. Neurocomputing.

Please, cite it if you find it useful for your research.

This repository collects the most relevant studies applying Deep Learning for Polyp Detection and Classification in Colonoscopy from a technical point of view, focusing on the low-level details for the implementation of the DL models. In first place, each study is categorized in three types: (i) polyp detection and localization (through bounding boxes or binary masks, i.e. segmentation), (ii) polyp classification, and (iii) simultaneous polyp detection and classification (i.e. studies based on the usage of a single model such as YOLO or SSD to performs simultaneous polyp detection and classification). Secondly, a summary of the public datasets available as well as the private datasets used in the studies is provided. The third section focuses on technical aspects such as the Deep Learning architectures, the data augmentation techniques and the libraries and frameworks used. Finally, the fourth section summarizes the performance metrics reported by each study.

Suggestions are welcome, please check the contribution guidelines before submitting a pull request.

Table of Contents:

Research

Polyp Detection and Localization

Study	Date	Endoscopy type	Imaging technology	Localization type	Multiple polyp	Real time
Tajbakhsh et al. 2014, Tajbakhsh et al. 2015	Sept. 2014 / Apr. 2015	Conventional	N/A	Bounding box	No	Yes
Zhu R. et al. 2015	Oct. 2015	Conventional	N/A	Bounding box (16x16 patches)	Yes	No
Park and Sargent 2016	March 2016	Conventional	NBI, WL	Bounding box	No	No
Yu et al. 2017	Jan. 2017	Conventional	NBI, WL	Bounding box	No	No
Zhang R. et al. 2017	Jan. 2017	Conventional	NBI, WL	No	No	No
Yuan and Meng 2017	Feb. 2017	WCE	N/A	No	No	No
Brandao et al. 2018	Feb. 2018	Conventional/WCE	N/A	Binary mask	Yes	No
Zhang R. et al. 2018	May 2018	Conventional	WL	Bounding box	No	No
Misawa et al. 2018	June 2018	Conventional	WL	No	Yes	No
Zheng Y. et al. 2018	July 2018	Conventional	NBI, WL	Bounding box	Yes	Yes
Shin Y. et al. 2018	July 2018	Conventional	WL	Bounding box	Yes	No
Urban et al. 2018	Sep. 2018	Conventional	NBI, WL	Bounding box	No	Yes
Mohammed et al. 2018, GitHub	Sep. 2018	Conventional	WL	Binary mask	Yes	Yes
Wang et al. 2018, Wang et al. 2018	Oct. 2018	Conventional	N/A	Binary mask	Yes	Yes
Qadir et al. 2019	Apr. 2019	Conventional	NBI, WL	Bounding box	Yes	No
Blanes-Vidal et al. 2019	March 2019	WCE	N/A	Bounding box	Yes	No
Zhang X. et al. 2019	March 2019	Conventional	N/A	Bounding box	Yes	Yes
Misawa et al. 2019	June 2019	Conventional	N/A	No	Yes	No
Zhu X. et al. 2019	June 2019	Conventional	N/A	No	No	Yes
Ahmad et al. 2019	June 2019	Conventional	WL	Bounding box	Yes	Yes
Sornapudi et al. 2019	June 2019	Conventional/WCE	N/A	Binary mask	Yes	No
Wittenberg et al. 2019	Sept. 2019	Conventional	WL	Binary mask	Yes	No
Yuan Y. et al. 2019	Sept. 2019	WCE	N/A	No	No	No
Ma Y. et al. 2019	Oct. 2019	Conventional	N/A	Bounding box	Yes	No
Tashk et al. 2019	Dec. 2019	Conventional	N/A	Binary mask	No	No
Jia X. et al. 2020	Jan. 2020	Conventional	N/A	Binary mask	Yes	No
Ma Y. et al. 2020	May 2020	Conventional	N/A	Bounding box	Yes	No
Young Lee J. et al. 2020	May 2020	Conventional	N/A	Bounding box	Yes	Yes
Wang W. et al. 2020	July 2020	Conventional	WL	No	No	No
Li T. et al. 2020	Oct. 2020	Conventional	N/A	No	No	No
Sánchez-Peralta et al. 2020	Nov. 2020	Conventional	NBI, WL	Binary mask	No	No
Podlasek J. et al. 2020	Dec. 2020	Conventional	N/A	Bounding box	No	Yes
Qadir et al. 2021	Feb. 2021	Conventional	WL	Bounding box	Yes	Yes
Xu J. et al. 2021	Feb. 2021	Conventional	WL	Bounding box	Yes	Yes
Misawa et al. 2021	Apr. 2021	Conventional	WL	No	Yes	Yes
Livovsky et al. 2021	June 2021	Conventional	N/A	Bounding box	Yes	Yes
Pacal et al. 2021	July 2021	Conventional	WL	Bounding box	Yes	Yes
Liu et al. 2021	July 2021	Conventional	N/A	Bounding box	Yes	Yes
Nogueira-Rodríguez et al. 2021	Aug. 2021	Conventional	NBI, WL	Bounding box	Yes	Yes
Yoshida et al. 2021	Aug. 2021	Conventional	WL, LCI	Bounding box	Yes	Yes
Ma Y. et al. 2021	Sep. 2021	Conventional	WL	Bounding box	Yes	No
Pacal et al. 2022	Nov. 2021	Conventional	WL	Bounding box	Yes	Yes
Nogueira-Rodríguez et al. 2022	April 2022	Conventional	NBI, WL	Bounding box	Yes	Yes
Nogueira-Rodríguez et al. 2023	March 2023	Conventional	NBI, WL	Bounding box	Yes	Yes

Polyp Classification

Study	Date	Endoscopy type	Imaging technology	Classes	Real time
Ribeiro et al. 2016	Oct. 2016	Conventional	WL	Neoplastic vs. Non-neoplastic	No
Zhang R. et al. 2017	Jan. 2017	Conventional	NBI, WL	Adenoma vs. hyperplastic Resectable vs. non-resectable Adenoma vs. hyperplastic vs. serrated	No
Byrne et al. 2017	Oct. 2017	Conventional	NBI	Adenoma vs. hyperplastic	Yes
Komeda et al. 2017	Dec. 2017	Conventional	NBI, WL, Chromoendoscopy	Adenoma vs. non-adenoma	No
Chen et al. 2018	Feb. 2018	Conventional	NBI	Neoplastic vs. hyperplastic	No
Lui et al. 2019	Apr. 2019	Conventional	NBI, WL	Endoscopically curable lesions vs. endoscopically incurable lesion	No
Kandel et al. 2019	June 2019	Conventional	N/A	Adenoma vs. hyperplastic vs. serrated (sessile serrated adenoma/traditional serrated adenoma)	No
Zachariah et al. 2019	Oct. 2019	Conventional	NBI, WL	Adenoma vs. serrated	Yes
Bour et al. 2019	Dec. 2019	Conventional	N/A	Paris classification: not dangeours (types Ip, Is, IIa, and IIb) vs. dangerous (type IIc) vs. cancer (type III)	No
Patino-Barrientos et al. 2020	Jan. 2020	Conventional	WL	Kudo's classification: malignant (types I, II, III, and IV) vs. non-malignant (type V)	No
Cheng Tao Pu et al. 2020	Feb. 2020	Conventional	NBI, BLI	Modified Sano's (MS) classification: MS I (Hyperplastic) vs. MS II (Low-grade tubular adenomas) vs. MS IIo (Nondysplastic or low-grade sessile serrated adenoma/polyp [SSA/P]) vs. MS IIIa (Tubulovillous adenomas or villous adenomas or any high-grade colorectal lesion) vs. MS IIIb (Invasive colorectal cancers)	Yes
Young Joo Yang et al. 2020	May 2020	Conventional	WL	7-class: CRC T1 vs. CRC T2 vs. CRC T3 vs. CRC T4 vs. high-grade dysplasia (HGD) vs. tubular adenoma with or without low grade dysplasia (TA) vs. non-neoplastic lesions 4-class: advanced CRC (T2, T3, and T4) vs. early CRC/HGD (CRC T1 and HGD) vs. TA vs. non-neoplastic lesions Advanced colorectal lesions (HGD and T1, T2, T3, and T4 lesions) vs. non-advanced colorectal lesions (TA and non-neoplastic lesions) Neoplastic lesions (TA, HGD, and stages T1, T2, T3, and T4) vs. non-neoplastic lesions	No
Yoshida et al. 2021	Aug. 2021	Conventional	WL, LCI	Neoplastic vs. hyperplastic	Yes

Simultaneous Polyp Detection and Classification

Study	Date	Endoscopy type	Imaging technology	Localization type	Multiple polyp	Classes	Real time
Tian Y. et al. 2019¹	Apr. 2019	Conventional	N/A	Bounding box	Yes	Modified Sano's (MS) classification: MS I (Hyperplastic) vs. MS II (Low-grade tubular adenomas) vs. MS IIo (Nondysplastic or low-grade sessile serrated adenoma/polyp [SSA/P]) vs. MS IIIa (Tubulovillous adenomas or villous adenomas or any high-grade colorectal lesion) vs. MS IIIb (Invasive colorectal cancers)	No
Liu X. et al. 2019	Oct. 2019	Conventional	WL	Bounding box	Yes	Polyp vs. adenoma	No
Ozawa. et al. 2020²	Feb. 2020	Conventional	NBI, WL	Bounding box	Yes	Adenoma vs. hyperplastic vs. sesile serrated adenoma/polyp (SSAP) vs. cancer vs. other types (Peutz-Jeghers, juvenile, or inflammation polyps)	Yes
Li K. et al. 2021³	Aug. 2021	Conventional	N/A	Bounding box	Yes	Adenoma vs. hyperplastic	Yes

Tian X. et al. 2019 work is based on the usage of a single model (RetinaNet) that performs simultaneous polyp detection and classification. However, the paper only reports detection results using the ETIS-Larib dataset and therefore this results are included in the Polyp Detection and Localization section.
Ozawa. et al. 2020 work is based on the usage of a single model (Single Show MultiBox Detector, SSD) that performs simultaneous polyp detection and classification. Nevertheless, since the detection and classification results are reported independently, they are included in the sections Polyp Detection and Localization and Polyp Classification, respectively.
Li K. et al. 2021 work is based on the usage of several single models that perform simultaneous polyp detection ad classification. As they report different types of results (frame-based polyp localization, polyp-based classification, and simultaneous frame-based polyp detection and classification), they are included in the three results sections.

Datasets

Public Datasets

Dataset	References	Description	Format	Resolution (w x h)	Ground truth	Used in
CVC-ClinicDB	Bernal et al. 2015 https://polyp.grand-challenge.org/CVCClinicDB/	612 sequential WL images with polyps extracted from 31 sequences (23 patients) with 31 different polyps.	Image	384 × 288	Polyp locations (binary mask)	Brandao et al. 2018, Zheng Y. et al. 2018, Shin Y. et al. 2018, Wang et al. 2018, Qadir et al. 2019, Sornapudi et al. 2019, Wittenberg et al. 2019, Jia X. et al. 2020, Ma Y. et al. 2020, Young Lee J. et al. 2020, Podlasek J. et al. 2020, Qadir et al. 2021, Xu J. et al. 2021, Pacal et al. 2021, Liu et al. 2021, Nogueira-Rodríguez et al. 2022
CVC-ColonDB	Bernal et al. 2012 Vázquez et al. 2017	300 sequential WL images with polyps extracted from 13 sequences (13 patients).	Image	574 × 500	Polyp locations (binary mask)	Tajbakhsh et al. 2015, Brandao et al. 2018, Zheng Y. et al. 2018, Sornapudi et al. 2019, Jia X. et al. 2020, Podlasek J. et al. 2020, Qadir et al. 2021, Xu J. et al. 2021, Pacal et al. 2021, Li K. et al. 2021, Nogueira-Rodríguez et al. 2022
CVC-EndoSceneStill	Vázquez et al. 2017	912 WL images with polyps extracted from 44 videos (CVC-ClinicDB + CVC-ColonDB).	Image	574 × 500, 384 × 288	Locations for polyp, background, lumen and specular lights (binary mask)	Sánchez-Peralta et al. 2020
CVC-PolypHD	Bernal et al. 2012 Vázquez et al. 2017 Bernal et al. 2021 https://giana.grand-challenge.org	56 WL images.	Image	1920 × 1080	Polyp locations (binary mask)	Sornapudi et al. 2019, Nogueira-Rodríguez et al. 2022
ETIS-Larib	Silva et al. 2014 https://polyp.grand-challenge.org/ETISLarib/	196 WL images with polyps extracted from 34 sequences with 44 different polyps.	Image	1225 × 966	Polyp locations (binary mask)	Brandao et al. 2018, Zheng Y. et al. 2018, Shin Y. et al. 2018, Tian Y. et al. 2019, Ahmad et al. 2019, Sornapudi et al. 2019, Wittenberg et al. 2019, Jia X. et al. 2020, Podlasek J. et al. 2020, Qadir et al. 2021, Xu J. et al. 2021, Pacal et al. 2021, Liu et al. 2021, Pacal et al. 2022, Nogueira-Rodríguez et al. 2022
Kvasir-SEG / HyperKvasir	Pogorelov et al. 2017 Jha et al. 2020 Borgli et al. 2020 https://datasets.simula.no/kvasir-seg https://datasets.simula.no/hyper-kvasir/	1 000 polyp images	Image	Various resolutions	Polyp locations (binary mask and bounding box)	Sánchez-Peralta et al. 2020, Podlasek J. et al. 2020, Nogueira-Rodríguez et al. 2022
ASU-Mayo Clinic Colonoscopy Video	Tajbakhsh et al. 2016 https://polyp.grand-challenge.org/AsuMayo/	38 small SD and HD video sequences: 20 training videos annotated with ground truth and 18 testing videos without ground truth annotations. WL and NBI.	Video	688 × 550	Polyp locations (binary mask)	Yu et al. 2017, Brandao et al. 2018, Zhang R. et al. 2018, Ahmad et al. 2019, Sornapudi et al. 2019, Wittenberg et al. 2019, Mohammed et al. 2018, Li K. et al. 2021
CVC-ClinicVideoDB	Angermann et al. 2017 Bernal et al. 2018 Bernal et al. 2021 https://giana.grand-challenge.org	38 short and long sequences: 18 SD videos for training.	Video	768 × 576	Polyp locations (binary mask)	Shin Y. et al. 2018, Qadir et al. 2019, Ma Y. et al. 2020, Xu J. et al. 2021, Nogueira-Rodríguez et al. 2022
Colonoscopic Dataset	Mesejo et al. 2016 http://www.depeca.uah.es/colonoscopy_dataset/	76 short videos (both NBI and WL).	Video	768 × 576	Polyp classification (Hyperplastic vs. adenoma vs. serrated)	Zhang R. et al. 2017, Li K. et al. 2021
PICCOLO	Sánchez-Peralta et al. 2020 https://www.biobancovasco.org/en/Sample-and-data-catalog/Databases/PD178-PICCOLO-EN.html	3 433 images (2 131 WL and 1 302 NBI) from 76 lesions from 40 patients.	Image	854 × 480, 1920 × 1080	Polyp locations (binary mask) Polyp classification, including: Paris and NICE classifications, Adenocarcinoma vs. Adenoma vs. Hyperplastic, and histological stratification	Sánchez-Peralta et al. 2020, Pacal et al. 2022, Nogueira-Rodríguez et al. 2022
LDPolypVideo	Ma Y. et al. 2021 https://github.com/dashishi/LDPolypVideo-Benchmark	160 videos (40 187 frames: 33 876 polyp images and 6 311 non-polyp images) with 200 labeled polyps. 103 videos (861 400 frames: 371 400 polyp images and 490 000 non-polyp images) without full annotations.	Video	768 x 576 (videos), 560 × 480 (images)	Polyp locations (bounding box)	Ma Y. et al. 2021, Nogueira-Rodríguez et al. 2022
KUMC dataset	Li K. et al. 2021 https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/FCBUOR	80 colonoscopy video sequences. It also aggregates the CVC-ColonDB, ASU-Mayo Clinic Colonoscopy Video, and Colonoscopic Dataset datasets.	Image	Various resolutions	Polyp locations (bounding box) Polyp classification: Adenoma vs. Hyperplastic	Li K. et al. 2021, Nogueira-Rodríguez et al. 2022
CP-CHILD-A, CP-CHILD-B	Wang W. et al. 2020 https://figshare.com/articles/dataset/CP-CHILD_zip/12554042	CP-CHILD-A contains 1 000 polyp images and 7 000 non-polyp images. CP-CHILD-B contains 400 polyp images and 1 100 normal or other pathological images.	Image	256 × 256	Polyp detection: polyp vs. non-polyp annotations	Wang W. et al. 2020
SUN	Misawa et al. 2021 http://amed8k.sundatabase.org/	49 136 images with polyps from different 100 polyps. 109 554 non-polyp images from 13 video sequences.	Image	N/A	Polyp locations (bounding box)	Misawa et al. 2021, Pacal et al. 2022, Nogueira-Rodríguez et al. 2022
Colorectal Polyp Image Cohort (PIBAdb)	https://www.iisgaliciasur.es/home/biobanco/colorectal-polyp-image-cohort-pibadb/?lang=en	~31 400 polyp images (~22 600 WL and ~8 800 NBI) from 1 176 different polyps. ~17 300 non-polyp images (including ~2 800 normal-mucosa images and ~500 clean-mucosa images)	Video and image	768 × 576	Polyp locations (bounding box) Polyp classification: Adenoma vs. Hyperplastic vs. Sessile Serrated Adenoma vs. Traditional Serrated Adenoma vs. Non Epithelial Neoplastic vs. Invasive	Nogueira-Rodríguez et al. 2022, Nogueira-Rodríguez et al. 2023
ENDOTEST	Fitting et al. 2022	Validation dataset: 24 polyp and their corresponding non-polyp video sequences (22 856 images: 12 161 with polyps and 10 695 without polyps) Performance dataset: 10 full length colonoscopy videos with 24 different polyps (230 898 images).	Video and image	N/A	Polyp locations (bounding box)

Private Datasets

Study	Patients	No. Images	No. Videos	No. Unique Polyps	Purpose	Comments
Tajbakhsh et al. 2015	N/A	35 000 With polyps: 7 000 Without polyps: 28 000	40 short videos (20 positive and 20 negative)	N/A	Polyp localization	-
Zhu R. et al. 2015	N/A	180	-	N/A	Polyp localization	-
Park and Sargent 2016	N/A	652 With polyps: 92	35 (20’ to 40’)	N/A	Polyp localization	-
Ribeiro et al. 2016	66 to 86	85 to 126	-	N/A	Polyp classification (neoplastic vs non-neoplastic)	8 datasets by combining: (i) with or without staining mucosa, (ii) 4 acquisition modes (without CVC, i-Scan1, i-Scan2, i-Scan3).
Zhang R. et al. 2017, Zheng Y. et al. 2018	N/A	1930 Without polyps: 1 104 Hyperplastic: 263 Adenomatous: 563	-	215 polyps (65 hyperplastic and 150 adenomatous)	Polyp classification (hyperplastic vs. adenomatous)	PWH Database. Images taken under either WL or NBI endoscopy.
Yuan and Meng 2017	35	4 000 Normal WCE images: 3 000 (1 000 bubbles, 1 000 turbid, and 1 000 clear) Polyp images: 1 000	-	N/A	Polyp detection	-
Byrne et al. 2017	N/A	N/A	388	N/A	Polyp classification (hyperplastic vs. adenomatous)
Komeda et al. 2017	N/A	1 800 Adenomatous: 1200 Non-adenomatous: 600	-	N/A	Polyp classification (adenomatous vs. non-adenomatous)	-
Chen et al. 2018	N/A	2 441 Training: - Neoplastic: 1476 - Hyperplastic: 681 Testing: - Neoplastic: 188 - Hyperplastic: 96	-	N/A	Polyp classification (hyperplastic vs. neoplastic)	-
Misawa et al. 2018	73	N/A	546 (155 positive and 391 negative)	155	Polyp detection	-
Urban et al. 2018	> 2000	8 641	-	4 088	Polyp localization	Used as training dataset.
Urban et al. 2018	N/A	1 330 With polyps: 672 Without polyps: 658	-	672	Polyp localization	Used as independent dataset for testing.
Urban et al. 2018	9	44 947 With polyps: 13 292 Without polyps: 31 655	9	45	Polyp localization	Used as independent dataset for testing.
Urban et al. 2018	11	N/A	11	73	Polyp localization	Used as independent dataset for testing with “deliberately more challenging colonoscopy videos.”.
Wang et al. 2018	1 290	5 545 With polyps: 3 634 Without polyps: 1 911	-	N/A	Polyp localization	Used as training dataset.
Wang et al. 2018	1 138	27 113 With polyps: 5 541 Without polyps: 21 572	-	1 495	Polyp localization	Used as testing dataset.
Wang et al. 2018	110	-	138	138	Polyp localization	Used as testing dataset.
Wang et al. 2018	54	-	54	0	Polyp localization	Used as testing dataset.
Lui et al. 2019	N/A	8 000 Curable lesions: 4 000 Incurable lesions: 4 000	-	Curable lesions: 159 Incurable lesions: 493	Polyp classification (endoscopically curable vs. incurable lesions)	Used as training dataset. This study is focused on larger endoscopic lesions with risk of submucosal invasion and lymphovascular permeation.
Lui et al. 2019	N/A	567	-	Curable: 56 Incurable: 20	Polyp classification (endoscopically curable vs. incurable lesions)	Used as testing dataset. This study is focused on larger endoscopic lesions with risk of submucosal invasion and lymphovascular permeation.
Tian Y. et al. 2019	218	871 MS I: 102 MS II: 346 MS IIo: 281 MS IIIa: 79 MS IIIb: 63	-	N/A	Polyp classification (5 classes)	-
Blanes-Vidal et al. 2019	255	11 300 With polyps: 4 800 Without polyps: 6 500	N/A	331 polyps (OC) and 375 (CCE)	Polyp localization	CCE: Colorectal capsule endoscopy. OC: conventional optical colonoscopy.
Zhang X. et al. 2019	215	404	-	N/A	Polyp localization	-
Misawa et al. 2019	N/A	3 017 088	-	930	Polyp detection	Used as training set.
Misawa et al. 2019	64 (47 with polyps and 17 without polyps)	N/A	N/A	87	Polyp detection	Used as testing set.
Kandel et al. 2019	552	N/A	-	963	Polyp classification (hyperplastic, serrated adenomas (sessile/traditional), adenomas)
Zachariah et al. 2019	N/A	5 278 Adenoma: 3 310 Serrated: 1 968	-	5 278	Polyp classification (adenoma vs. serrated)	Used as training set.
Zachariah et al. 2019	N/A	634	-	N/A	Polyp classification (adenoma vs. serrated)	Used as testing set.
Zhu X. et al. 2019	283	1 991	-	N/A	Polyp detection	Adenomatous polyps.
Ahmad et al. 2019	N/A	83 716 With polyps: 14 634 Without polyps: 69 082	17	83	Polyp localization	White Light Images.
Sornapudi et al. 2019	N/A	55	N/A	67	Polyp localization	Wireless Capsule Endoscopy videos. Used as testing set.
Sornapudi et al. 2019	N/A	1 800 With polyps: 530 Without polyps: 1 270	18	N/A	Polyp localization	Wireless Capsule Endoscopy videos. Used as training set.
Wittenberg et al. 2019	N/A	2 484	-	2 513	Polyp localization	-
Yuan Y. et al. 2019	80	7 200 Polyp images: 1 200 Normal images (mucosa, bubbles, and turbid): 6 000	80	N/A	Polyp detection	-
Ma Y. et al. 2019	1 661	3 428	-	N/A	Polyp localization	-
Liu X. et al. 2019	2 000	8 000 Polyp: 872 Adenoma: 1 210	-	N/A	Polyp localization and classification (polyp vs. adenoma)	-
Bour et al. 2019	N/A	785 Not dangerous: 699 Dangerous: 25 Cancer: 61	-	N/A	Polyp classification (not dangerous vs. dangerous vs. cancer)	-
Patino-Barrientos et al. 2020	142	600 Type I: 47 Type II: 90 Type III: 183 Type IV: 187 Type V: 93	-	N/A	Polyp classification (malignant vs. non-malignant)	-
Cheng Tao Pu et al. 2020	N/A	1 235 MS I: 103 MS II: 429 MS IIo: 293 MS IIIa: 295 MS IIIb: 115	-	N/A	Polyp classification (5 classes)	Australian (AU) dataset (NBI). Used as training set.
Cheng Tao Pu et al. 2020	N/A	20 MS I: 3 MS II: 5 MS IIo: 2 MS IIIa: 7 MS IIIb: 3	-	N/A	Polyp classification (5 classes)	Japan (JP) dataset (NBI). Used as testing set.
Cheng Tao Pu et al. 2020	N/A	49 MS I: 9 MS II: 10 MS IIo: 10 MS IIIa: 11 MS IIIb: 9	-	N/A	Polyp classification (5 classes)	Japan (JP) dataset (BLI). Used as testing set.
Ozawa. et al. 2020	3 417 (3 021 with polyps and 396 without polyps)	20 431 WL: 17 566 - Adenoma: 9 310 - Hyperplastic: 2 002 - SSAP: 116 - Cancer: 1 468 - Other types: 657 - Normal mucosa: 4 013 NBI: 2 865 - Adenoma: 2 085 - Hyperplastic: 519 - SSAP: 23 - Cancer: 131 - Other types: 107 - Normal mucosa: 0	-	4 752 Adenoma: 3 513 Hyperplastic: 1 058 SSAP: 22 Cancer: 68 Other types: 91	Polyp localization and classification (Adenoma vs. hyperplastic vs. SSAP vs. cancer vs. other types)	Used as training set.
Ozawa. et al. 2020	174	7 077 WL: 6 748 - Adenoma: 639 - Hyperplastic: 145 - SSAP: 33 - Cancer: 30 - Other types: 27 - Normal mucosa: 5 874 NBI: 329 - Adenoma: 208 - Hyperplastic: 69 - SSAP: 8 - Cancer: 3 - Other types: 10 - Normal mucosa: 31	-	309 Adenoma: 218 Hyperplastic: 63 SSAP: 7 Cancer: 4 Other types: 17	Polyp localization and classification (Adenoma vs. hyperplastic vs. SSAP vs. cancer vs. other types)	Used as testing set.
Young Lee J. et al. 2020	103	8 075	181	N/A	Polyp localization	Used as training set.
Young Lee J. et al. 2020	203	420	N/A	322 hyperplastic or sessile serrated adenomas	Polyp localization	Used as training set.
Young Lee J. et al. 2020	7	108 778 - With polyps: 7 022 - Without polyps: 101 756	7	26	Polyp localization	Used as testing set.
Young Joo Yang et al. 2020	1 339	3 828 - Tubular adenoma: 1 316 - Non-neoplastic: 896 - High-grade dysplasia: 621	-	N/A	Polyp classification	Used as training/test set.
Young Joo Yang et al. 2020	240	240 - Tubular adenoma: 116 - Non-neoplastic: 113 - Early CRC/High-grade dysplasia: 8 - Advanced CRC: 3	-	N/A	Polyp classification	External validation dataset.
Li T. et al. 2020	-	7 384 - With polyps: 509 - Without polyps: 6 875	23	N/A	Polyp detection	Colonoscopy videos obtained from YouTube, VideoGIE, and Vimeo.
Podlasek J. et al. 2020	123	79 284	157	N/A	Polyp localization	Used as development (train/validation split) dataset.
Podlasek J. et al. 2020	-	2 678	-	N/A	Polyp localization	Used as development (train/validation split) dataset.
Podlasek J. et al. 2020	34	-	42	N/A	Polyp localization	Used as testing dataset.
Xu J. et al. 2021	262	1 482	-	1 683	Polyp localization	RenjiImageDB. Used as testing set.
Xu J. et al. 2021	14	8 837 With polyps: 3 294 Without polyps: 5 543	14	15	Polyp localization	RenjiVideoDB. Used as testing set.
Misawa et al. 2021	N/A	56 668 With polyps: 55 644 Without polyps: 1024	N/A	N/A	Polyp localization	Used as development (train/validation split) dataset.
Livovsky et al. 2021	2 487	With polyps: 204 687 (189 994 video frames + 14 693 still images) Without polyps: 80 M (80 M video frames + 158 646 still images)	3 611	8 471	Polyp localization	Used as training set.
Livovsky et al. 2021	1 181	33 M video frames	1 393	3 680	Polyp localization	Used as testing set.
Nogueira-Rodríguez et al. 2021	330	28 576 White-light: 21 046 NBI: 7 530	-	941	Polyp localization	-
Yoshida et al. 2021	25	N/A	N/A	100: LED endoscope: 53 (25 neoplastic and 28 hyperplastic) LASER endoscope: 47 (30 neoplastic and 17 hyperplastic)	Polyp localization and classification (neoplastic vs. hyperplastic)	Testing set to evaluate the CAD EYE (Fujifilm) system.

Deep Learning Models and Architectures

Deep Learning Architectures

Off-the-shelf Architectures

Study	Task	Models	Framework	TL	Layers fine-tuned	Layers replaced	Output layer
Ribeiro et al. 2016	Classification	AlexNet, GoogLeNet, Fast CNN, Medium CNN, Slow CNN, VGG16, VGG19	-	ImageNet	N/A	Layers after last CNN layer	SVM
Zhang R. et al. 2017	Detection and classification	CaffeNet	-	ImageNet and Places205	N/A	Tested connecting classifier to each convolutional layer (5 convolutional layers)	SVM (Poly, Linear, RBF, and Tahn)
Chen et al. 2018	Classification	Inception v3	-	ImageNet	N/A	Last layer	FCL
Tian Y. et al. 2019	Localization and Classification	RetinaNet (based on ResNet-50)	N/A	ImageNet	N/A	Last layer	N/A
Misawa et al. 2018, Misawa et al. 2019	Detection	C3D	-	N/A	N/A	N/A	N/A
Zheng Y. et al. 2018	Localization	-	YOLOv1	PASCAL VOC 2007 and 2012	All	-	-
Shin Y. et al. 2018	Localization	Inception ResNet-v2	Faster R-CNN with post-learning schemes	COCO	All	-	RPN and detector layers
Urban et al. 2018	Localization	ResNet-50, VGG16, VGG19	-	ImageNet Also without TL	All	Last layer	FCL
Wang et al. 2018	Localization	VGG16	SegNet	N/A	N/A	N/A	N/A
Wittenberg et al. 2019	Localization	ResNet101	Mask R-CNN	COCO	All (incrementally)	Last layer	FCL
Yuan Y. et al. 2019	Detection	DenseNet	Tensorflow	-	All	-	FCL
Ma Y. et al. 2019	Localization	SSD Inception v2	Tensorflow	N/A	N/A	-	-
Liu X. et al. 2019	Localization and classification	Faster R-CNN with Inception Resnet v2	Tensorflow	COCO	All	-	-
Zachariah et al. 2019	Classification	Inception ResNet-v2	Tensorflow	ImageNet	N/A	Last layer	Graded scale transformation with sigmoid activation
Bour et al. 2019	Classification	ResNet-50, ResNet-101, Xception, VGG19, Inception v3	Keras (Tensorflow)	Yes	N/A	Last layer	N/A
Patino-Barrientos et al. 2020	Classification	VGG16	Keras (Tensorflow)	ImageNet	None, Last three	Last layer	Dense with sigmoid activation
Ozawa. et al. 2020	Localization and Classification	SSD (Single Shot MultiBox Detector)	Caffe	N/A	All	-	-
Ma Y. et al. 2020	Localization	YOLOv3, RetinaNet	N/A	ImageNet	N/A	N/A	N/A
Young Lee J. et al. 2020	Localization	YOLOv2	N/A	N/A	N/A	N/A	N/A
Young Joo Yang et al. 2020	Classification	ResNet-152, Inception-ResNet-v2	PyTorch	ImageNet	All	N/A	N/A
Wang W. et al. 2020	Detection	VGG16, VGG19, ResNet-101, ResNet-152	PyTorch	-	All	Last layer	Fully Connected Layer or Global Average Pooling
Li T. et al. 2020	Detection	AlexNet	Caffe	ImageNet	N/A	N/A	N/A
Sánchez-Peralta et al. 2020	Localization	Backbone: VGG-16 or Densenet121, Encoder-decoder: U-Net or LinkNet	Keras (Tensorflow)	No	-	N/A	N/A
Podlasek J. et al. 2020	Localization	EfficientNet B4, RetinaNet	N/A	No	-	N/A	N/A
Misawa et al. 2021	Localization	YOLOv3	N/A	Yes	N/A	-	FCL
Livovsky et al. 2021	Localization	LSTM-SSD	N/A	No	-	-	-
Nogueira-Rodríguez et al. 2021, Nogueira-Rodríguez et al. 2022, Nogueira-Rodríguez et al. 2023	Localization	YOLOv3	MXNet	PASCAL VOC 2007 and 2012	All	-	FCL
Ma Y. et al. 2021	Localization	RetinaNet, Faster RCNN, YOLOv3, and CenterNet	N/A	ImageNet	-	-	N/A

Custom Architectures

Study	Task	Based on	Highlights
Tajbakhsh et al. 2014, Tajbakhsh et al. 2015	Localization	None	Combination of classic computer vision techniques (detection and location) with DL (correction of prediction). The ML method proposes candidate polyps. Then, three sets of multi-scale patches around the candidate are generated (color, shape and temporal). Each set of patches is fed to a corresponding CNN. Each CNN has 2 convolutional layers, 2 fully connected layers, and an output layer. The maximum score for each set of patches is computed and averaged.
Zhu R. et al. 2015	Localization	LeNet-5	CNN fed with 32x32 images taken from patches generated via a sliding window of 16 pixels over the original images. The LeNet-5 network inspires the CNN architecture. ReLU used as activation function. Last two layers replaced with a cost-sensitive SVM. Positively selected patches are combined to generate the final output.
Park and Sargent 2016	Localization	None	Based on a previous work with no DL techniques. An initial quality assessment and preprocessing step filters and cleans images, and proposes candidate regions of interest (RoI). CNN replaces previous feature extractor. Three convolutional layers with two interspersed subsampling layers followed by a fully connected layer. A final step uses a Conditional Random Field (CRF) for RoI classification.
Yu et al. 2017	Localization	None	Two 3D-FCN are used: - An offline network trained with a training dataset. - An online network initialized with the offline weights and updated each 60 frames with the video frames. Only the last two layers are updated. The last 16 frames are used for predicting each frame. Two convolutional layers followed by a pooling layer each, followed by two groups of two convolutional layers followed by a pooling layer each and finished with two convolutional layers converted from fully connected layers. The output of each network is combined to generate the final output.
Yuan and Meng 2017	Detection	Stacked Sparse AutoEncoder (SSAE)	A modification of a Sparse AutoEncoder to include an image manifold constraint, named Stacked Sparse AutoEncoder with Image Manifold Constraint (SSAEIM). SSAEIM is built by stacking three SAEIM layers followed by an output layer. Image manifold information is used on each layer.
Byrne et al. 2017	Classification	Inception v3	Last layer replaced with a fully connected layer. A credibility score is calculated for each frame with the current frame prediction and the credibility score of the previous frame.
Komeda et al. 2017	Classification	None	Two convolutional layers followed by a pooling layer each, followed by a final fully connected output layer.
Brandao et al. 2018, Ahmad et al. 2019	Localization	AlexNet, GoogLeNet, ResNet-50, ResNet-101, ResNet-152, VGG	Networks pre-trained with PASCAL VOC and ImageNet datasets where converted into fully-connected convolutional networks by replacing the fully connected and scoring layers with a convolution layer. A final deconvolution layer with an output with the same size as the input. A regularization operation is added between every convolutional and activation layer. VGG, ResNet-101 and ResNet-152 were tested also using shape-form-shading features.
Zhang R. et al. 2018	Localization	YOLO	Custom architecture RYCO that consist of two networks: 1. A regression-based deep learning with residual learning (ResYOLO) detection model to locate polyp in a frame. 2. A Discriminative Correlation Filter (DCF) based method called Efficient Convolution Operators (ECO) to track the detected polyps. The ResYOLO network detects new polyps in a frame, starting the polyp tracking. During tracking, both ResYOLO and ECO tracker are used to determine the polyp location. Tracking stops when a confidence score calculated using last frames is under a threshold value.
Urban et al. 2018	Detection	None	Two custom CNNs a proposed. First CNN is built just with convolutional, maximum pooling and fully connected layers. Second CNN also includes batch normalization layers and inception modules.
Urban et al. 2018	Localization	YOLO	The 5 CNNs used for detection (two custom, VGG16, VGG19 and ResNet-50) are modified by replacing the fully connected layers with convolutional layers. The last layer has 5 filter maps that have its outputs spaced over a grid over the input image. Each grid cell predicts its confidence with a sigmoid unit, the position of the polyp relative to the grid cell center, and its size. The final output is the weighted sum of all the adjusted positions and size predictions, weighted with the confidences.
Mohammed et al. 2018	Detection	Y-Net	The frame-work consists of two fully convolution encoder networks which are connected to a single decoder network that matches the encoder network resolution at each down-sampling operation. The network are trained with encoder specific adaptive learning rates that update the parameters of randomly initialized encoder network with a larger step size as compared to the encoder with pre-trained weights. The two encoders features are merged with a decoder network at each down-sampling paththrough sum-skip connection.
Lui et al. 2019	Classification	ResNet	Network with 5 convolutional layers and 2 fully connected layers but based on a pre-trained ResNet CNN backbone.
Qadir et al. 2019	Localization	None	Framework for false positive (FP) reduction is proposed. The framework adds a FP reduction unit to an RPN network. This unit exploits temporal dependencies between frames (forward and backward) to correct the output. Faster R-CNN and SSD RPNs were tested.
Blanes-Vidal et al. 2019	Localization	R-CNN with AlexNet	Several modifications done to AlexNet: - Last fully connected layer replaced to output two classes. - 5 convolutional and 3 fully connected layers were fine-tuned. - Max-Pooling kernels, ReLU activation function and dropout used to avoid overfitting and build robustness to intra-class deformations. - Stochastic gradient descent with momentum used as the optimization algorithm.
Zhang X. et al. 2019	Localization	SSD	SSD was modified to add three new pooling layers (Second-Max Pooling, Second-Min Pooling and Min-Pooling) and a new deconvolution layer whose features are concatenated to those from the Max-Pooling layer that are fed into the detection layer. Model was pre-trained on the ILSVRC CLS-LOC dataset.
Kandel et al. 2019	Classification	CapsNet	A convolutional layer followed by 7 convolutional capsule layers and finalized with a global average pool by capsule type.
Sornapudi et al. 2019	Localization	Mask R-CNN	The region proposal network (RPN) uses a Feature Pyramid Network with a ResNet backbone. ResNet-50 and ResNet-101 were used, improved by extracting features from 5 different levels of layers. ResNet networks were initialized with COCO and ImageNet. Additionally, 76 random balloon images from Flickr were used to fine-tune networks initialized with COCO. The regions proposed by the RPN were filtered before the ROIAlign layer. The ROIAlign layer is followed by a pixel probability mask network, comprised of 4 convolutional layers followed by a transposed convolutional layer and a final convolutional layer with a sigmoid activation function that generates the final output. All convolutional layers except final are built with ReLU activation function.
Tashk et al. 2019	Localization	U-Net	The U-Net architecture was modified to use as input any image or video formats associated with optical colonoscopy modalities.
Patino-Barrientos et al. 2020	Classification	None	The model is composed by four convolutional layers, each one of them followed by a max pooling layer. After that, the model has a dropout layer to reduce overfitting and then add a final dense layer with sigmoid activation that outputs the probability of the current polyp being malignant. The model was trained using the RMSprop optimizer with a learning rate of 1×10⁻⁴.
Jia X. et al. 2020	Localization	ResNet-50, Feature Pyramid Network, and Faster R-CNN	Authors propose a two-stage framework, where the polyp proposal stage (stage I) is constructed as a region-level polyp detector that is capable of guiding the pixel-level learning in the polyp segmentation stage (stage II), aiming to accurately segment the area the polyp occupies in the image. This framework has a backbone network composed by a ResNet-50 followed by a Feature Pyramid Network, producing a set of feature maps that are used by the two-stage framework. The polyp proposal stage was created as as an extension of faster R-CNN, which performs as a region-level polyp detector to recognize the lesion area as a whole. Then, the polyp segmentation stage is built in a fully convolutional fashion for pixelwise segmentation. This two-stage framework has a feature sharing strategy in which the learned semantics of polyp proposals of stage I are transferred to the segmentation task of stage II.
Qadir et al. 2021	Localization	Resnet34 and MDeNet	Authors propose a modified version of MDeNet, proposed them in Qadir et al. 2019. See section 2.3. F-CNN models for polyp detection of Qadir et al. 2021 for more details.
Xu J. et al. 2021	Localization	YOLOv3	Authors present a framework based on YOLOv3 to improve detection. This frameworks adds: (i) a False Positive Relearning Module (FPRM) to make the detector network learning more about the features of FPs for higher precision; (ii) an Image Style Transfer Module (ISTM) to enhance the features of polyps for higher sensitivity; (iii) an Inter-Frame Similarity Correlation unit (ISCU) to integrate spatiotemporal information, which is combined with the image detector network to improve performance in video detection in order to reduce FPs.
Pacal et al. 2021	Localization	YOLOv4	Authors propose several models based on YOLOv4. To create their "Proposed Model1 (Small)" they first replaced the whole structure with Cross Stage Partial Networks (CSPNet), then substitute the Mish activation function for the Leaky ReLu activation function and also substituted the Distance Intersection over Union (DIoU) loss for the Complete Intersection over Union (CIoU) loss.
Liu et al. 2021	Localization	Resnet101 and Domain adaptive Faster R-CNN	Authors propose a consolidated domain adaptive framework with a training free style transfer process, a hierarchical network, and a centre besiegement loss for accurate cross-domain polyp detection and localization.
Pacal et al. 2022	Localization	YOLOv3, YOLOv4	Authors propose modified versions of YOLOv3 and YOLOv4 by integrating Cross Stage Partial Network (CSPNet). With the aim of improving the detection performance, they also use the Sigmoid-weighted Linear Unit (SiLU) activation function and the Complete Intersection over Union (CIoU) loss functions.

Data Augmentation Strategies

	Rotation	Flipping (Mirroring)	Shearing	Crop	Random brightness	Translation (Shifting)	Scale	Zooming	Gaussian smoothing	Blurring	Saturation adjustment	Gaussian distortion	Resize	Random contrast	Exposure adjustment	Color augmentations in HSV	Mosaic	Mix-up	Histogram equalization	Skew	Random erasing	Color distribution adjust	Clipping	Sharpening	Cutmix	Color jittering	Random image expansion
Num. Studies	28	26	12	9	9	8	8	6	4	4	3	3	3	3	2	2	2	2	1	1	1	1	1	1	1	1	1
Tajbakhsh et al. 2015	X			X		X	X						X
Park and Sargent 2016	X					X
Ribeiro et al. 2016	X	X
Yu et al. 2017	X					X
Byrne et al. 2017		X		X									X
Brandao et al. 2018		X
Zhang R. et al. 2018	X	X			X				X					X
Zheng Y. et al. 2018	X
Shin Y. et al. 2018	X	X	X		X			X	X
Urban et al. 2018	X	X	X
Mohammed et al. 2018	X	X	X		X		X		X
Qadir et al. 2019	X	X	X					X
Tian Y. et al. 2019	X	X	X			X	X
Blanes-Vidal et al. 2019	X	X		X
Zhang X. et al. 2019	X
Zhu X. et al. 2019	X										X				X
Sornapudi et al. 2019	X	X	X				X		X										X
Wittenberg et al. 2019	X	X
Yuan Y. et al. 2019	X	X				X						X										X
Ma Y. et al. 2019	X			X	X
Bour et al. 2019	X	X	X		X			X			X	X								X	X
Patino-Barrientos et al. 2020	X	X	X			X		X
Cheng Tao Pu et al. 2020	X	X		X
Ma Y. et al. 2020	X		X					X		X													X
Young Lee J. et al. 2020					X					X				X										X
Young Joo Yang et al. 2020		X
Wang W. et al. 2020	X				X									X
Li T. et al. 2020	X			X		X						X
Podlasek J. et al. 2020	X	X
Qadir et al. 2021	X	X						X								X
Xu J. et al. 2021		X					X																			X
Misawa et al. 2021		X			X																			X
Livovsky et al. 2021		X		X
Pacal et al. 2021a	X	X	X	X	X	X	X			X	X				X	X	X							X	X
Liu et al. 2021	X	X	X		X			X	X
Nogueira-Rodríguez et al. 2021		X		X									X														X
Ma Y. et al. 2021	X	X	X				X			X
Pacal et al. 2021B		X	X				X										X	X

Frameworks and Libraries

Framework/Library	# Studies	Used by
Tensorflow	9	Chen et al. 2018, Shin Y. et al. 2018, Mohammed et al. 2018, Yuan Y. et al. 2019, Ma Y. et al. 2019, Liu X. et al. 2019, Zachariah et al. 2019, Bour et al. 2019, Patino-Barrientos et al. 2020, Sánchez-Peralta et al. 2020
Caffe	8	Zhu X. et al. 2019, Yu et al. 2017, Brandao et al. 2018, Wang et al. 2018, Zhang X. et al. 2019, Ozawa. et al. 2020, Jia X. et al. 2020, Li T. et al. 2020
Keras	6	Urban et al. 2018, Mohammed et al. 2018, Sornapudi et al. 2019, Wittenberg et al. 2019, Bour et al. 2019, Patino-Barrientos et al. 2020, Sánchez-Peralta et al. 2020, Xu J. et al. 2021
PyTorch	5	Young Joo Yang et al. 2020, Wang W. et al. 2020, Pacal et al. 2021, Liu et al. 2021, Pacal et al. 2022
MXNet	3	Nogueira-Rodríguez et al. 2021, Nogueira-Rodríguez et al. 2022, Nogueira-Rodríguez et al. 2023
C3D	2	Misawa et al. 2018, Misawa et al. 2019
DarkNet	2	Pacal et al. 2021, Pacal et al. 2022
MatConvNet (MATLAB)	1	Ribeiro et al. 2016

Performance

Note: Some performance metrics are not directly reported in the papers, but were derived using raw data or confusion matrices provided by them.

Polyp Detection and Localization

Performance metrics on public and private datasets of all polyp detection and localization studies.

Between parentheses it is specified the type of performance metric: i = image-based, bb = bounding-box-based, p = polyp-based, pa = patch, and pi = pixel-based.
Between curly brackets it is specified the training dataset, where "P" stands for private.
Between square brackets it is specified the test dataset used for computing the performance metric, where "P" stands for private.
For instance, [{P}] means that development and test splits of the same private dataset have been used for training and testing respectively.
Performances marked with an * are reported on training datasets (e.g. k-fold cross-validation).
AP stands for Average Precision.

Note: Since february 2022, the former frame-based (f) type was split into image-based and bounding-box-based, which accurately reflects the type of evaluation done. Please, note that our review paper uses frame-based and includes both.

Study	Recall (sensitivity)	Precision (PPV)	Specificity	Others	Manually selected images?
Tajbakhsh et al. 2015	70% (bb) _{[P]}	63% (bb) _{[P]}	90% (bb) _{[P]}	F1: 0.66, F2: 0.68 (bb) _{[P]}	No
Zhu R. et al. 2015	79.44% (pa) _{[P]}	N/A	79.54% (pa) _{[P]}	Acc: 79.53% (pa) _{[P]}	Yes
Park and Sargent 2016	86% (bb) _{{P} *}	-	85% (bb) _{{P} *}	AUC: 0.86 (bb) _{{P} *}	Yes (on training)
Yu et al. 2017	71% (bb) _{[ASU-Mayo]}	88.1% (bb) _{[ASU-Mayo]}	N/A	F1: 0.786, F2: 0.739 (bb) _{[ASU-Mayo]}	No
Zhang R. et al. 2017	97.6% (i) _{[P]}	99.4% (i) _{[P]}	N/A	F1: 0.98, F2: 0.98, AUC: 1.00 (i) _{[P]}	Yes
Yuan and Meng 2017	98% (i) _{{P} *}	97% (i) _{{P} *}	99% (i) _{{P} *}	F1: 0.98, F2: 0.98 (i) [P]	Yes
Brandao et al. 2018	~90% (bb) _{{CVC-ClinicDB + ASU-Mayo} [ETIS-Larib]} ~90% (bb) _{{CVC-ClinicDB + ASU-Mayo} [CVC-ColonDB]}	~73% (bb) _{{CVC-ClinicDB + ASU-Mayo} [ETIS-Larib]} ~80% (bb) _{{CVC-ClinicDB + ASU-Mayo} [CVC-ColonDB]}	N/A	F1: ~0.81, F2: ~0.86 (bb) _{{CVC-ClinicDB + ASU-Mayo} [ETIS-Larib]} F1: ~0.85, F2: ~0.88 (bb) _{{CVC-ClinicDB + ASU-Mayo} [CVC-ColonDB]}	Yes
Zhang R. et al. 2018	71.6% (bb) _{[ASU-Mayo]}	88.6% (bb) _{[ASU-Mayo]}	97% (bb) _{[ASU-Mayo]}	F1: 0.792, F2: 0.744 (bb) _{[ASU-Mayo]}	No
Misawa et al. 2018	90% (i) _{[P]} 94% (p) _{[P]}	55.1% (i) _{[P]} 48% (p) _{[P]}	63.3% (i) _{[P]} 40% (p) _{[P]}	F1: 0.68 (i) 0.63 (p), F2: 0.79 (i) 0.78 (p) _{[P]} Acc: 76.5% (i) 60% (p) _{[P]}	No
Zheng Y. et al. 2018	74% (bb) _{{CVC-ClinicDB + CVC-ColonDB} [ETIS-Larib]}	77.4% (bb) _{{CVC-ClinicDB + CVC-ColonDB} [ETIS-Larib]}	N/A	F1: 0.757, F2: 0.747 (bb) _{{CVC-ClinicDB + CVC-ColonDB} [ETIS-Larib]}	Yes
Shin Y. et al. 2018	80.3% (bb) _{{CVC-ClinicDB} [ETIS-Larib]} 84.2% (bb) _{{CVC-ClinicDB} [ASU-Mayo]} 84.3% (bb) _{{CVC-ClinicDB} [CVC-ClinicVideoDB]}	86.5% (bb) _{{CVC-ClinicDB} [ETIS-Larib]} 82.7% (bb) _{{CVC-ClinicDB} [ASU-Mayo]} 89.7% (bb) _{{CVC-ClinicDB} [CVC-ClinicVideoDB]}	N/A	F1: 0.833, F2: 0.815 (bb) _{{CVC-ClinicDB} [ETIS-Larib]} F1: 0.834, F2: 0.839 (bb) _{{CVC-ClinicDB} [ASU-Mayo]} F1: 0.869, F2: 0.853 (bb) _{{CVC-ClinicDB} [CVC-ClinicVideoDB]}	Yes (ETIS-Larib) No (ASU-Mayo, CVC-ClinicVideoDB)
Urban et al. 2018	93% (bb) _{{P1} [P2]} 100% (p) _{{P1} [P2]} 93% (p) _{{P1} [P3]}	74% (bb) _{{P1} [P2]} 35% (p) _{{P1} [P2]} 60% (p) _{{P1} [P3]}	93% (bb) _{{P1} [P2]}	F1: 0.82, F2: 0.88 (bb) _{{P1} [P2]} F1: 0.52, F2: 0.73 (p) _{{P1} [P2]} F1: 0.73, F2: 0.84 (p) _{{P1} [P3]}	No
Wang et al. 2018	88.24% (bb) _{{P} [CVC-ClinicDB]} 94.38% (bb) _{{P} [P (dataset A)]} 91.64% (bb), 100% (p) _{{P} [P (dataset C)]}	93.14% (bb) _{{P} [CVC-ClinicDB]} 95.76% (bb) _{{P} [P (dataset A)]}	95.40% (bb) _{{P} [P (dataset D)]}	F1: 0.91, F2: 0.89 (bb) _{{P} [CVC-ClinicDB]} F1: 0.95, F2: 0.95, AUC: 0.984 (bb) _{{P} [P (dataset A)]}	Yes (dataset A, CVC-ClinicDB) No (dataset C/D)
Mohammed et al. 2018	84.4% (bb) _{[ASU-Mayo]}	87.4 % (bb) _{[ASU-Mayo]}	N/A	F1: 0.859, F2: 0.85 (bb) _{[ASU-Mayo]}	No
Qadir et al. 2019	81.51% (bb) _{{CVC-ClinicDB} [CVC-ClinicVideoDB]}	87.51% (bb) _{{CVC-ClinicDB} [CVC-ClinicVideoDB]}	84.26% (bb) _{{CVC-ClinicDB} [CVC-ClinicVideoDB]}	F1: 0.844, F2: 0.83 (bb) _{{CVC-ClinicDB} [CVC-ClinicVideoDB]}	No
Tian Y. et al. 2019	64.42% (bb) _{{P} [ETIS-Larib]}	73.6% (bb) _{{P} [ETIS-Larib]}	-	F1: 0.687, F2: 0.66 (bb) _{{P} [ETIS-Larib]}	Yes
Blanes-Vidal et al. 2019	97.1% (bb) _{[P]}	91.4% (bb) _{[P]}	93.3% (bb) _{[P]}	Acc: 96.4%, F1: 0.94, F2: 0.95 (bb) _{[P]}	N/A (not clear in the paper)
Zhang X. et al. 2019	76.37% (bb) _{[P]}	93.92% (bb) _{[P]}	N/A	F1: 0.84, F2: 0.79(bb) _{[P]}	Yes
Misawa et al. 2019	86% (p) _{{P1} [P2]}	N/A	74% (i) _{{P1} [P2]}	-	No
Zhu X. et al. 2019	88.5% (i) _{{P1} [P2]}	N/A	96.4% (i) _{{P1} [P2]}	-	No
Ahmad et al. 2019	91.6% (bb) _{{P} [ETIS-Larib]} 84.5% (bb) _{{ETIS-Larib + P} [P]}	75.3% (bb) _{{P} [ETIS-Larib]}	92.5% (bb) _{{ETIS-Larib + P} [P]}	F1: 0.83, F2: 0.88 (bb) _{{P} [ETIS-Larib]}	Yes (ETIS-Larib) No (private)
Sornapudi et al. 2019	91.64% (bb) _{{CVC-ClinicDB} [CVC-ColonDB]} 78.12% (bb) _{{CVC-ClinicDB} [CVC-PolypHD]} 80.29% (bb) _{{CVC-ClinicDB} [ETIS-Larib]} 95.52% (bb) _{[P]}	89.94% (bb) _{{CVC-ClinicDB} [CVC-ColonDB]} 83.33% (bb) _{{CVC-ClinicDB} [CVC-PolypHD]} 72.93% (bb) _{{CVC-ClinicDB} [ETIS-Larib]} 98.46% (bb) _{[P]}	N/A	F1: 0.9073, F2: 0.9127 (bb) _{{CVC-ClinicDB} [CVC-ColonDB]} F1: 0.8065, F2: 0.7911 (bb) _{{CVC-ClinicDB} [CVC-PolypHD]} F1: 0.7643, F2: 0.7870 (bb) _{{CVC-ClinicDB} [ETIS-Larib]} F1: 0.966, F2: 0.961 (bb) _{[P]}	Yes (CVC-ClinicDB, CVC-ColonDB, ETIS-Larib) No (WCE video)
Wittenberg et al. 2019	86% (bb) _{{P} [CVC-ClinicDB]} 83% (bb) _{{P} [ETIS-Larib]} 93% (bb) _{[P]}	80% (bb) _{{P} [CVC-ClinicDB]} 74% (bb) _{{P} [ETIS-Larib]} 86% (bb) _{[P]}	N/A	F1: 0.82, F2: 0.85 (bb) _{{P} [CVC-ClinicDB]} F1: 0.79, F2: 0.81 (bb) _{{P} [ETIS-Larib]} F1: 0.89, F2: 0.92 (bb) _{[P]}	Yes
Yuan Y. et al. 2019	90.21% (i) _{[P]}	74.51% (i) _{[P]}	94.07% (i) _{[P]}	Accuracy: 93.19%, F1: 0.81, F2: 0.86 (i) _{[P]}	Yes
Ma Y. et al. 2019	93.67% (bb) _{[P]}	N/A	98.36% (bb) _{[P]}	Accuracy: 96.04%, AP: 94.92% (bb) _{[P]}	Yes
Tashk et al. 2019	82.7% (pi) _{{[CVC-ClinicDB]}} 90.9% (pi) _{{[ETIS-Larib]}} 82.4% (pi) _{{[CVC-ColonDB]}}	70.2% (pi) _{{[CVC-ClinicDB]}} 70.2 (pi) _{{[ETIS-Larib]}} 62% (pi) _{{[CVC-ColonDB]}}	-	Accuracy: 99.02%, F1: 0.76, F2: 0.798 (pi) _{{[CVC-ClinicDB]}} Accuracy: 99.6%, F1: 0.7923, F2: 0.858 (pi) _{{[ETIS-Larib]}} Accuracy: 98.2%, F1: 0.707, F2: 0.773 (pi) _{{[CVC-ColonDB]}}	Yes (CVC-ClinicDB, CVC-ColonDB, ETIS-Larib)
Jia X. et al. 2020	92.1% (bb) _{{CVC-ColonDB} [CVC-ClinicDB]} 59.4% (pi) _{{CVC-ColonDB} [CVC-ClinicDB]} 81.7% (bb) _{{CVC-ClinicDB} [ETIS-Larib]}	84.8% (bb) _{{CVC-ColonDB} [CVC-ClinicDB]} 85.9% (pi) _{{CVC-ColonDB} [CVC-ClinicDB]} 63.9% (bb) _{{CVC-ClinicDB} [ETIS-Larib]}	-	F1: 0.883, F2: 0.905 (bb) _{{CVC-ColonDB} [CVC-ClinicDB]} F1: 0.702, F2: 0.633, Jaccard: 74.7±20.5, Dice: 83.9±13.6 (pi) _{{CVC-ColonDB} [CVC-ClinicDB]} F1: 0.717, F2: 0.774 (bb) _{{CVC-ClinicDB} [ETIS-Larib]}	Yes (CVC-ClinicDB, ETIS-Larib)
Ozawa. et al. 2020	92% (bb) _{{P1} [P2]} 90% (bb) _{{P1} [P2: WL]} 97% (bb) _{{P1} [P2: NBI]} 98% (p) _{{P1} [P2]}	86% (bb) _{{P1} [P2]} 83% (bb) _{{P1} [P2: WL]} 97% (bb) _{{P1} [P2: NBI]}	N/A	F1: 0.88, F2: 0.88 (bb) _{{P1} [P2]} F1: 0.86, F2: 0.84 (bb) _{{P1} [P2: WL]} F1: 0.97, F2: 0.97 (bb) _{{P1} [P2: NBI]}	Yes
Ma Y. et al. 2020	92% (bb) _{{CVC-ClinicDB} [CVC-ClinicVideoDB]}	87.50% (bb) _{{CVC-ClinicDB} [CVC-ClinicVideoDB]}	N/A	F1: 0.897, F2: 0.911 (bb) _{{CVC-ClinicDB} [CVC-ClinicVideoDB]}	No
Young Lee J. et al. 2020	96.7% (bb) _{[P]} 90.2% (bb) _{{P} [CVC-ClinicDB]}	97.4% (bb) _{[P]} 98.2% (bb) _{{P} [CVC-ClinicDB]}	N/A	F1: 0.97, F2: 0.97 (bb) _{[P]} F1: 0.94, F2: 0.96 (bb) _{{P} [CVC-ClinicDB]}	Yes (CVC-ClinicDB, private)
Wang W. et al. 2020	97.5% (i) _{{[CP-CHILD-A]}} 98% (i) _{{CP-CHILD-A} [CP-CHILD-B]}	N/A	99.85% (i) _{{[CP-CHILD-A]}} 99.83% (i) _{{CP-CHILD-A} [CP-CHILD-B]}	Accuracy: 99.25% (i) _{{[CP-CHILD-A]}} Accuracy: 99.34% (i) _{{CP-CHILD-A} [CP-CHILD-B]}	Yes
Li T. et al. 2020	73% (i) _{[P]}	93% (i) _{[P]}	96% (i) _{[P]}	NPV: 83%, Acc: 86%, AUC: 0.94 (i) _{[P]}	Yes
Sánchez-Peralta et al. 2020	74.73% (pi) _{{PICCOLO} [Kvasir-SEG]} 71.88% (pi) _{{PICCOLO} [CVC-EndoSceneStill]} 72.89% (pi) _{[PICCOLO]} 69.77% (pi) _{{PICCOLO} [PICCOLO-WL]} 63.31% (pi) _{{CVC-EndoSceneStill} [Kvasir-SEG]} 79.22% (pi) _{{[CVC-EndoSceneStill]}} 45.09% (pi) _{{CVC-EndoSceneStill} [PICCOLO]} 57.06% (pi) _{{CVC-EndoSceneStill} [PICCOLO-WL]} 88.98% (pi) _{{[Kvasir-SEG]}} 83.46% (pi) _{{Kvasir-SEG} [CVC-EndoSceneStill]} 58.11% (pi) _{{Kvasir-SEG} [PICCOLO]} 54.63% (pi) _{{Kvasir-SEG} [PICCOLO-WL]}	81.31% (pi) _{{PICCOLO} [Kvasir-SEG]} 84.35% (pi) _{{PICCOLO} [CVC-EndoSceneStill]} 77.58% (pi) _{[PICCOLO]} 71.33% (pi) _{{PICCOLO} [PICCOLO-WL]} 77.80% (pi) _{{CVC-EndoSceneStill} [Kvasir-SEG]} 87.88% (pi) _{{[CVC-EndoSceneStill]}} 52.84% (pi) _{{CVC-EndoSceneStill} [PICCOLO]} 60.93% (pi) _{{CVC-EndoSceneStill} [PICCOLO-WL]} 81.68% (pi) _{{[Kvasir-SEG]}} 83.54% (pi) _{{Kvasir-SEG} [CVC-EndoSceneStill]} 59.54% (pi) _{{Kvasir-SEG} [PICCOLO]} 63.61% (pi) _{{Kvasir-SEG} [PICCOLO-WL]}	97.41% (pi) _{{PICCOLO} [Kvasir-SEG]} 98.85% (pi) _{{PICCOLO} [CVC-EndoSceneStill]} 97.96% (pi) _{[PICCOLO]} 97.37% (pi) _{{PICCOLO} [PICCOLO-WL]} 98.15% (pi) _{{CVC-EndoSceneStill} [Kvasir-SEG]} 99.00% (pi) _{{[CVC-EndoSceneStill]}} 97.30% (pi) _{{CVC-EndoSceneStill} [PICCOLO]} 91.12% (pi) _{{CVC-EndoSceneStill} [PICCOLO-WL]} 96.49% (pi) _{{[Kvasir-SEG]}} 97.65% (pi) _{{Kvasir-SEG} [CVC-EndoSceneStill]} 93.29% (pi) _{{Kvasir-SEG} [PICCOLO]} 98.06% (pi) _{{Kvasir-SEG} [PICCOLO-WL]}	F1: 0.779, F2: 0.760, Jaccard: 65.33±30.66, Dice: 73.54±30.15 (pi) _{{PICCOLO} [Kvasir-SEG]} F1: 0.776, F2: 0.741, Jaccard: 64.18±33.04, Dice: 71.66±32.98 (pi) _{{PICCOLO} [CVC-EndoSceneStill]} F1: 0.752, F2: 0.738, Jaccard: 64.01±36.23, Dice: 70.10±36.45 (pi) _{[PICCOLO]} F1: 0.705, F2: 0.701, Jaccard: 58.70±38.90, Dice: 64.51±39.18 (pi) _{{PICCOLO} [PICCOLO-WL]} F1: 0.698, F2: 0.658, Jaccard: 56.12±34.29, Dice: 64.26±35.35 (pi) _{{CVC-EndoSceneStill} [Kvasir-SEG]} F1: 0.833, F2: 0.808, Jaccard: 72.16±30.93, Dice: 78.61±29.48 (pi) _{{[CVC-EndoSceneStill]}} F1: 0.487, F2: 0.465, Jaccard: 39.52±37.9, Dice: 45.5±41.51 (pi) _{{CVC-EndoSceneStill} [PICCOLO]} F1: 0.589, F2: 0.578, Jaccard: 45.00±35.60, Dice: 52.81±38.33 (pi) _{{CVC-EndoSceneStill} [PICCOLO-WL]} F1: 0.852, F2: 0.874, Jaccard: 74.52±22.81, Dice: 82.68±21.28 (pi) _{{[Kvasir-SEG]}} F1: 0.835, F2: 0.835, Jaccard: 71.82±29.87, Dice: 78.78±28.14 (pi) _{{Kvasir-SEG} [CVC-EndoSceneStill]} F1: 0.588, F2: 0.584, Jaccard: 44.92±37.37, Dice: 51.87±39.79 (pi) _{{Kvasir-SEG} [PICCOLO]} F1: 0.588, F2: 0.562, Jaccard: 47.74±39.55, Dice: 53.62±41.68 (pi) _{{Kvasir-SEG} [PICCOLO-WL]}	Yes
Podlasek J. et al. 2020	91.2% (bb) _{{P} [CVC-ClinicDB]} 88.2% (bb) _{{P} [Hyper-Kvasir]} 74.1% (bb) _{{P} [CVC-ColonDB]} 67.3% (bb) _{{P} [ETIS-Larib]}	97.4% (bb) _{{P} [CVC-ClinicDB]} 97.5% (bb) _{{P} [Hyper-Kvasir]} 92.4% (bb) _{{P} [CVC-ColonDB]} 79% (bb) _{{P} [ETIS-Larib]}	N/A	F1: 0.942, F2: 0.924 (bb) _{{P} [CVC-ClinicDB]} F1: 0.926, F2: 0.899 (bb) _{{P} [Hyper-Kvasir]} F1: 0.823, F2: 0.771 (bb) _{{P} [CVC-ColonDB]} F1: 0.727, F2: 0.693 (bb) _{{P} [ETIS-Larib]}	Yes
Qadir et al. 2021	86.54% (bb) _{{CVC-ClinicDB} [ETIS-Larib]} 91% (bb) _{{CVC-ClinicDB} [CVC-ColonDB]}	86.12% (bb) _{{CVC-ClinicDB} [ETIS-Larib]} 88.35% (bb) _{{CVC-ClinicDB} [CVC-ColonDB]}	N/A	F1: 0.863, F2: 0.864 (bb) _{{CVC-ClinicDB} [ETIS-Larib]} F1: 0.896, F2: 0.904 (bb) _{{CVC-ClinicDB} [CVC-ColonDB]}	Yes
Xu J. et al. 2021	75.70% (bb) _{{CVC-ClinicDB + CVC-ColonDB + ETIS-Larib + CVC-ClinicVideoDB} [P]} 71.63% (bb) _{{CVC-ClinicDB} [ETIS-Larib]} 66.36% (bb) _{{CVC-ClinicDB} [CVC-ClinicVideoDB]}	85.54% (bb) _{{CVC-ClinicDB + CVC-ColonDB + ETIS-Larib + CVC-ClinicVideoDB} [P]} 83.24% (bb) _{{CVC-ClinicDB} [ETIS-Larib]} 88.5% (bb) _{{CVC-ClinicDB} [CVC-ClinicVideoDB]}	N/A	F1: 0.799, F2: 0.773 (bb) _{{CVC-ClinicDB + CVC-ColonDB + ETIS-Larib + CVC-ClinicVideoDB} [P]} F1: 0.77, F2: 0.737 (bb) _{{CVC-ClinicDB} [ETIS-Larib]} F1: 0.7586, F2: 0.698 (bb) _{{CVC-ClinicDB} [CVC-ClinicVideoDB]}	Yes (ETIS-Larib, Private) No (CVC-ClinicVideoDB)
Misawa et al. 2021	98% (p) _{{P} [SUN]} 90.5% (i) _{{P} [SUN]}	88.2% (i) _{{P} [SUN]}	93.7% (i) _{{P} [SUN]}	F1: 0.893, F2: 0.900, NPV: 94.96% (i) _{{P} [SUN]}	No.
Livovsky et al. 2021	97.1% (p) _{{P1} [P2]}	N/A	N/A	N/A	No.
Pacal et al. 2021	82.55% (bb) _{{CVC-ClinicDB} [ETIS-Larib]} 96.68% (bb) _{{CVC-ClinicDB} [CVC-ColonDB]}	91.62% (bb) _{{CVC-ClinicDB} [ETIS-Larib]} 96.04% (bb) _{{CVC-ClinicDB} [CVC-ColonDB]}	N/A	F1: 0.868, F2: 0.842 (bb) _{{CVC-ClinicDB} [ETIS-Larib]} F1: 0.964, F2: 0.965 (bb) _{{CVC-ClinicDB} [CVC-ColonDB]}	Yes
Liu et al. 2021	87.5% (bb) _{{CVC-ClinicDB} [ETIS-Larib]}	77.8% (bb) _{{CVC-ClinicDB} [ETIS-Larib]}	-	F1: 0.824, F2: 0.854 (bb) _{{CVC-ClinicDB} [ETIS-Larib]}	Yes (ETIS-Larib)
Li K. et al. 2021	86.2% (bb) _{[KUMC]}	91.2% (bb) _{[KUMC]}	N/A	F1: 0.886, F2: 0.8715, AP: 88.5% (bb) _{[KUMC]}	Yes
Nogueira-Rodríguez et al. 2021	87% (bb) _{[P]} 89.91% (p) _{[P]}	89% (bb) _{[P]}	54.97% (p) _{[P]}	F1: 0.881, F2: 0.876 (bb) _{[P]}	Yes
Yoshida et al. 2021	83% (p) _{{CAD EYE} [P-LED WLI]} 87.2% (p) _{{CAD EYE} [P-LASER WLI]} 88.7% (p) _{{CAD EYE} [P-LED LCI]} 89.4% (p) _{{CAD EYE} [P-LASER LCI]}	N/A	N/A	N/A	-
Ma Y. et al. 2021	64% (bb) _{{CVC-ClinicDB} [CVC-ClinicVideoDB]} 47% (bb) _{{CVC-ClinicDB} [LDPolypVideo]}	85% (bb) _{{CVC-ClinicDB} [CVC-ClinicVideoDB]} 65% (bb) _{{CVC-ClinicDB} [LDPolypVideo]}	N/A	F1: 0.73, F2: 0.67 (bb) _{{CVC-ClinicDB} [CVC-ClinicVideoDB]} F1: 0.55, F2: 0.50 (bb) _{{CVC-ClinicDB} [LDPolypVideo]}	Yes (CVC-ClinicDB) No (LDPolypVideo, CVC-ClinicVideoDB)
Pacal et al. 2022	91.04% (bb) _{{SUN + PICCOLO + CVC-ClinicDB} [ETIS-Larib]} 90.57% (bb) _{{SUN + CVC-ClinicDB} [ETIS-Larib]} 88.24% (bb) _{{SUN} [ETIS-Larib]} 75.53% (bb) _{{PICCOLO} [ETIS-Larib]} 79.85% (bb) _{[PICCOLO]} 86.48% (bb) _{[SUN]}	90.61% (bb) _{{SUN + PICCOLO + CVC-ClinicDB} [ETIS-Larib]} 90.14% (bb) _{{SUN + CVC-ClinicDB} [ETIS-Larib]} 88.24% (bb) _{{SUN} [ETIS-Larib]} 87.29% (bb) _{{PICCOLO} [ETIS-Larib]} 92.60% (bb) _{[PICCOLO]} 96.49% (bb) _{[SUN]}	N/A	F1: 0.908, F2: 0.909 (bb) _{{SUN + PICCOLO + CVC-ClinicDB} [ETIS-Larib]} F1: 0.903, F2: 0.902 (bb) _{{SUN + CVC-ClinicDB} [ETIS-Larib]} F1: 0.882, F2: F1: 0.882 (bb) _{{SUN} [ETIS-Larib]} F1: 0.809, F2: 0.846 (bb) _{{PICCOLO} [ETIS-Larib]} F1: 0.857, F2: 0.821 (bb) _{[PICCOLO]} F1: 0.912, F2: 0.883 (bb) _{[SUN]}	Yes (ETIS-Larib, PICCOLO) No (SUN)
Nogueira-Rodríguez et al. 2022	82% (bb) _{{P} [CVC-ClinicDB]} 84% (bb) _{{P} [CVC-ColonDB]} 75% (bb) _{{P} [CVC-PolypHD]} 72% (bb) _{{P} [ETIS-Larib]} 78% (bb) _{{P} [Kvasir-SEG]} 60% (bb) _{{P} [PICCOLO]} 80% (bb) _{{P} [CVC-ClinicVideoDB]} 81% (bb) _{{P} [KUMC dataset]} 76% (bb) _{{P} [KUMC dataset–Test]} 78% (bb) _{{P} [SUN]} 49% (bb) _{{P} [LDPolypVideo]}	87% (bb) _{{P} [CVC-ClinicDB]} 81% (bb) _{{P} [CVC-ColonDB]} 86% (bb) _{{P} [CVC-PolypHD]} 71% (bb) _{{P} [ETIS-Larib]} 84% (bb) _{{P} [Kvasir-SEG]} 76% (bb) _{{P} [PICCOLO]} 75% (bb) _{{P} [CVC-ClinicVideoDB]} 83% (bb) _{{P} [KUMC dataset]} 81% (bb) _{{P} [KUMC dataset–Test]} 83% (bb) _{{P} [SUN]} 56% (bb) _{{P} [LDPolypVideo]}	-	F1: , F2: 0.83, AP: 0.82 (bb) _{{P} [CVC-ClinicDB]} F1: 0.83, F2: 0.83, AP: 0.85 (bb) _{{P} [CVC-ColonDB]} F1: 0.80, F2: 0.77, AP: 0.79 (bb) _{{P} [CVC-PolypHD]} F1: 0.72, F2: 0.72, AP: 0.69 (bb) _{{P} [ETIS-Larib]} F1: 0.81, F2: 0.82, AP: 0.79 (bb) _{{P} [Kvasir-SEG]} F1: 0.67, F2: 0.62, AP: 0.63 (bb) _{{P} [PICCOLO]} F1: 0.77, F2: 0.79, AP: 0.77 (bb) _{{P} [CVC-ClinicVideoDB]} F1: 0.82, F2: 0.81, AP: 0.83 (bb) _{{P} [KUMC dataset]} F1: 0.78, F2: 0.77, AP: 0.79 (bb) _{{P} [KUMC dataset–Test]} F1: 0.81, F2: 0.79, AP: 0.81 (bb) _{{P} [SUN]} F1: 0.52, F2: 0.50, AP: 0.44 (bb) _{{P} [LDPolypVideo]}	Yes (CVC-ClinicDB, CVC-ColonDB, CVC-PolypHD, ETIS-Larib, Kvasir-SEG, PICCOLO, KUMC) No (CVC-ClinicVideoDB, SUN, LDPolypVideo)
Nogueira-Rodríguez et al. 2023	87.2% (bb) _{{P} [P]} 86.7% (bb) _{{P2} [P]} 87.5% (bb) _{{P5} [P]} 85% (bb) _{{P10} [P]} 88% (bb) _{{P15} [P]}	Intra-dataset Evaluation Nogueira et al. 2021 89% (bb) _{{P} [P]} 88.2% (bb) _{{P} [P2]} 87.1% (bb) _{{P} [P5]} 85.2% (bb) _{{P} [P10]} 83.6% (bb) _{{P} [P15]} Not-polyp images increment 2% 89.4% (bb) _{{P2} [P]} 89% (bb) _{{P2} [P2]} 88.6% (bb) _{{P2} [P5]} 87.9% (bb) _{{P2} [P10]} 87.1% (bb) _{{P2} [P15]} Not-polyp images increment 5% 90.2% (bb) _{{P5} [P]} 89.9% (bb) _{{P5} [P2]} 89.5% (bb) _{{P5} [P5]} 88.8% (bb) _{{P5} [P10]} 88.1% (bb) _{{P5} [P15]} Not-polyp images increment 10% 90.4% (bb) _{{P10} [P]} 90.2% (bb) _{{P10} [P2]} 90.1% (bb) _{{P10} [P5]} 89.7% (bb) _{{P10} [P10]} 89.5% (bb) _{{P10} [P15]} Not-polyp images increment 15% 91% (bb) _{{P15} [P]} 90.9% (bb) _{{P15} [P2]} 90.7% (bb) _{{P15} [P5]} 90.4% (bb) _{{P15} [P10]} 90.1% (bb) _{{P15} [P15]}	-	Intra-dataset Evaluation Nogueira et al. 2021 F1:0.881 (bb) _{{P} [P]} F1:0.882 (bb) _{{P} [P2]} F1:0.871 (bb) _{{P} [P5]} F1:0.852 (bb) _{{P} [P10]} F1:0.836 (bb) _{{P} [P15]} Not-polyp images increment 2% F1:0.880 (bb) _{{P2} [P]} F1:0.890 (bb) _{{P2} [P2]} F1:0.886 (bb) _{{P2} [P5]} F1:0.879 (bb) _{{P2} [P10]} F1:0.871 (bb) _{{P2} [P15]} Not-polyp images increment 5% F1:0.888 (bb) _{{P5} [P]} F1:0.899 (bb) _{{P5} [P2]} F1:0.895 (bb) _{{P5} [P5]} F1:0.888 (bb) _{{P5} [P10]} F1:0.881 (bb) _{{P5} [P15]} Not-polyp images increment 10% F1:0.876 (bb) _{{P10} [P]} F1:0.902 (bb) _{{P10} [P2]} F1:0.901 (bb) _{{P10} [P5]} F1:0.897 (bb) _{{P10} [P10]} F1:0.895 (bb) _{{P10} [P15]} Not-polyp images increment 15% F1:0.895 (bb) _{{P15} [P]} F1:0.909 (bb) _{{P15} [P2]} F1:0.907 (bb) _{{P15} [P5]} F1:0.904 (bb) _{{P15} [P10]} F1:0.901 (bb) _{{P15} [P15]} Inter-dataset Evaluation LDPolypVideo F1:0.522 (bb) _{{P} [LDPolypVideo]} F1:0.563 (bb) _{{P2} [LDPolypVideo]} F1:0.516 (bb) _{{P5} [LDPolypVideo]} F1:0.491 (bb) _{{P10} [LDPolypVideo]} F1:0.564 (bb) _{{P10} [LDPolypVideo]} CVC-ClinicVideoDB F1:0.774 (bb) _{{P} [CVC-ClinicVideoDB]} F1:0.803 (bb) _{{P2} [CVC-ClinicVideoDB]} F1:0.813 (bb) _{{P5} [CVC-ClinicVideoDB]} F1:0.809 (bb) _{{P10} [CVC-ClinicVideoDB]} F1:0.800 (bb) _{{P15} [CVC-ClinicVideoDB]} KUMC dataset F1:0.818 (bb) _{{P} [KUMC dataset]} F1:0.811 (bb) _{{P2} [KUMC dataset]} F1:0.819 (bb) _{{P5} [KUMC dataset]} F1:0.762 (bb) _{{P10} [KUMC dataset]} F1:0.831 (bb) _{{P15} [KUMC dataset]} PICCOLO F1:0.667 (bb) _{{P} [PICCOLO]} F1:0.601 (bb) _{{P2} [PICCOLO]} F1:0.691 (bb) _{{P5} [PICCOLO]} F1:0.759 (bb) _{{P10} [PICCOLO]} F1:0.691 (bb) _{{P15} [PICCOLO]} CVC-ClinicDB F1:0.845 (bb) _{{P} [CVC-ClinicDB]} F1:0.843 (bb) _{{P2} [CVC-ClinicDB]} F1:0.867 (bb) _{{P5} [CVC-ClinicDB]} F1:0.786 (bb) _{{P10} [CVC-ClinicDB]} F1:0.824 (bb) _{{P15} [CVC-ClinicDB]} CVC-ColonDB F1:0.826 (bb) _{{P} [CVC-ColonDB]} F1:0.848 (bb) _{{P2} [CVC-ColonDB]} F1:0.883 (bb) _{{P5} [CVC-ColonDB]} F1:0.689 (bb) _{{P10} [CVC-ColonDB]} F1:0.797 (bb) _{{P15} [CVC-ColonDB]} SUN F1:0.805 (bb) _{{P} [SUN]} F1:0.764 (bb) _{{P2} [SUN]} F1:0.738 (bb) _{{P5} [SUN]} F1:0.765 (bb) _{{P10} [SUN]} F1:0.746 (bb) _{{P15} [SUN]} Kvasir-SEG F1:0.807 (bb) _{{P} [Kvasir-SEG]} F1:0.800 (bb) _{{P2} [Kvasir-SEG]} F1:0.797 (bb) _{{P5} [Kvasir-SEG]} F1:0.840 (bb) _{{P10} [Kvasir-SEG]} F1:0.830 (bb) _{{P15} [Kvasir-SEG]} ETIS-Larib F1:0.718 (bb) _{{P} [ETIS-Larib]} F1:0.732 (bb) _{{P2} [ETIS-Larib]} F1:0.679 (bb) _{{P5} [ETIS-Larib]} F1:0.594 (bb) _{{P10} [ETIS-Larib]} F1:0.685 (bb) _{{P15} [ETIS-Larib]} CVC-PolypHD F1:0.800 (bb) _{{P} [CVC-PolypHD]} F1:0.729 (bb) _{{P2} [CVC-PolypHD]} F1:0.826 (bb) _{{P5} [CVC-PolypHD]} F1:0.820 (bb) _{{P10} [CVC-PolypHD]} F1:0.820 (bb) _{{P15} [CVC-PolypHD]}	Yes (PIBAdb, CVC-ClinicDB, CVC-ColonDB, CVC-PolypHD, ETIS-Larib, Kvasir-SEG, PICCOLO, KUMC) No (CVC-ClinicVideoDB, SUN, LDPolypVideo)

Polyp Classification

Performance metrics on public and private datasets of all polyp classification studies.

Between curly brackets it is specified the training dataset, where "P" stands for private.
Between square brackets it is specified the test dataset used for computing the performance metric, where "P" stands for private.
For instance, [{P}] means that development and test splits of the same private dataset have been used for training and testing respectively.
Performances marked with an * are reported on training datasets (e.g. k-fold cross-validation).

Study	Classes	Recall (sensitivity)	Specificity	PPV	NPV	Others	Polyp-level vs. frame-level	Dataset type
Zhang R. et al. 2017	Adenoma vs. hyperplastic Resectable vs. non-resectable Adenoma vs. hyperplastic vs. serrated	92% (resectable vs. non-resectable) _{{[Colonoscopic Dataset]}} 87.6% (adenoma vs. hyperplastic) _{[P]}	89.9% (resectable vs. non-resectable) _{{[Colonoscopic Dataset]}} 84.2% (adenoma vs. hyperplastic) _{[P]}	95.4% (resectable vs. non-resectable) _{{[Colonoscopic Dataset]}} 87.30% (adenoma vs. hyperplastic) _{[P]}	84.9% (resectable vs. non-resectable) _{{[Colonoscopic Dataset]}} 87.2% (adenoma vs. hyperplastic) _{[P]}	Acc: 91.3% (resectable vs. non- resectable) _{{[Colonoscopic Dataset]}} Acc: 86.7% (adenoma vs. serrated adenoma vs. hyperplastic) _{{[Colonoscopic Dataset]}} Acc: 85.9% (adenoma vs. hyperplastic) _{[P]}	frame	video (manually selected images)
Byrne et al. 2017	Adenoma vs. hyperplastic	98% _{{P1} [P2]}	83% _{{P1} [P2]}	90% _{{P1} [P2]}	97% _{{P1} [P2]}	-	polyp	unaltered video
Chen et al. 2018	Neoplastic vs. hyperplastic	96.3% _{{P1} [P2]}	78.1% _{{P1} [P2]}	89.6% _{{P1} [P2]}	91.5% _{{P1} [P2]}	N/A	frame	image dataset
Lui et al. 2019	Endoscopically curable lesions vs. endoscopically incurable lesions	88.2% _{{P1} [P2]}	77.9% _{{P1} [P2]}	92.1% _{{P1} [P2]}	69.3% _{{P1} [P2]}	Acc: 85.5% _{{P1} [P2]}	frame	image dataset
Kandel et al. 2019	Hyperplastic vs. serrated adenoma (near focus) Hyperplastic vs. adenoma (far focus)	57.14% (hyperplastic vs. serrated) _{P} * 75.63% (hyperplastic vs. adenoma) _{P} *	68.52% (hyperplastic vs. serrated) _{P} * 63.79% (hyperplastic vs. adenoma) _{P} *	N/A	N/A	Acc: 67.21% (hyperplastic vs. serrated) _{P} * Acc: 72.48% (hyperplastic vs. adenoma) _{P} *	frame	image dataset
Zachariah et al. 2019	Adenoma vs. serrated	95.7% _{P} *	89.9% _{P} *	94.1% _{P} *	92.6% _{P} *	Acc: 93.6%, F1: 0.948, F2: 0.953 _{P} *	polyp	image dataset
Bour et al. 2019	Not dangerous vs. dangerous vs. cancer	88% (Cancer vs. others) [P] 84% (Not dangerous vs. others) [P] 90% (Dangerous vs. others) [P]	94% (Cancer vs. others) [P] 93% (Not dangerous vs. others) [P] 93% (Dangerous vs. others)	88% (Cancer vs. others) [P] 87% (Not dangerous vs. others) [P] 86% (Dangerous vs. others)	N/A	Acc: 87.1% [P] F1: 0.88 (Cancer vs. others) [P] F1: 0.86 (Not dangerous vs. others) [P] F1: 0.88 (Dangerous vs. others)	frame	image dataset
Patino-Barrientos et al. 2020	Malignant vs. non-malignant	86% _{[P]}	N/A	81% _{[P]}	N/A	Acc: 83% _{[P]} F1: 0.83 _{[P]}	frame	image dataset
Cheng Tao Pu et al. 2020	5-class (I, II, IIo, IIIa, IIIb) Adenoma (classes II + IIo + IIIa) vs. hyperplastic (class I)	97% (adenoma vs. hyperplastic) _{{P: AU} *} 100% (adenoma vs. hyperplastic) _{{P: AU} [P: JP-NBI]} 100% (adenoma vs. hyperplastic) _{{P: AU} [P: JP-BLI]}	51% (adenoma vs. hyperplastic) _{{P: AU} *} 0% (adenoma vs. hyperplastic) _{{P: AU} [P: JP-NBI]} 0% (adenoma vs. hyperplastic) _{{P: AU} [P: JP-BLI]}	95% (adenoma vs. hyperplastic) _{{P: AU} *} 82.4% (adenoma vs. hyperplastic) _{{P: AU} [P: JP-NBI]} 77.5% (adenoma vs. hyperplastic) _{{P: AU} [P: JP-BLI]}	63.5% (adenoma vs. hyperplastic) _{{P: AU} *} - (adenoma vs. hyperplastic) _{{P: AU} [P: JP-NBI]} - (adenoma vs. hyperplastic) _{{P: AU} [P: JP-BLI]}	AUC (5-class): 94.3% _{{P: AU} } AUC (5-class): 84.5% _{{P: AU} [P: JP-NBI]} AUC (5-class): 90.3% _{{P: AU} [P: JP-BLI]} Acc: 72.3% (5-class) _{{P: AU} } Acc: 59.8% (5-class) _{{P: AU} [P: JP-NBI]} Acc: 53.1% (5-class) _{{P: AU} [P: JP-BLI]} Acc: 92.7% (adenoma vs. hyperplastic) _{{P: AU} *} Acc: 82.4% (adenoma vs. hyperplastic) _{{P: AU} [P: JP-NBI]} Acc: 77.5% (adenoma vs. hyperplastic) _{{P: AU} [P: JP-BLI]}	frame	image dataset
Ozawa. et al. 2020	Adenoma vs. hyperplastic vs. SSAP vs. cancer vs. other types	97% (adenoma vs. other classes) _{{P1} [P2: WL]} 90% (adenoma vs. hyperplastic) _{{P1} [P2: WL]} 97% (adenoma vs. other classes) _{{P1} [P2: NBI]} 86% (adenoma vs. hyperplastic) _{{P1} [P2: NBI]}	81% (adenoma vs. hyperplastic) _{{P1} [P2: WL]} 88% (adenoma vs. hyperplastic) _{{P1} [P2: NBI]}	86% (adenoma vs. other classes) _{{P1} [P2: WL]} 98% (adenoma vs. hyperplastic) _{{P1} [P2: WL]} 83% (adenoma vs. other classes) _{{P1} [P2: NBI]} 98% (adenoma vs. hyperplastic) _{{P1} [P2: NBI]}	85% (adenoma vs. other classes) _{{P1} [P2: WL]} 48% (adenoma vs. hyperplastic) _{{P1} [P2: WL]} 91% (adenoma vs. other classes) _{{P1} [P2: NBI]} 54% (adenoma vs. hyperplastic) _{{P1} [P2: NBI]}	Acc: 83% (5-class) _{{P1} [P2: WL]} F1: 0.91, F1: 0.88 (adenoma vs. other classes) _{{P1} [P2: WL]} F1: 0.94, F2: 0.96 (adenoma vs. hyperplastic) _{{P1} [P2: WL]} Acc: 81% (5-class) _{{P1} [P2: NBI]} F1: 0.89, F2: 0.85 (adenoma vs. other classes) _{{P1} [P2: NBI]} F1: 0.92, F2: 0.95 (adenoma vs. hyperplastic) _{{P1} [P2: NBI]}	frame	image dataset
Young Joo Yang et al. 2020	7-class (CRC T1 vs. CRC T2 vs. CRC T3 vs. CRC T4 vs. high-grade dysplasia (HGD) vs. tubular adenoma with or without low grade dysplasia (TA) vs. non-neoplastic lesions) 4-class (advanced CRC (T2, T3, and T4) vs. early CRC/HGD (CRC T1 and HGD) vs. TA vs. non-neoplastic lesions) Advanced colorectal lesions vs. non-advanced colorectal lesions Neoplastic lesions vs. non-neoplastic lesions	94.1% (Neoplastic vs. non-neoplastic) _{[P1]} 83.2% (Advanced vs. non-advanced) _{[P1]}	34.1% (Neoplastic vs. non-neoplastic) _{[P1]} 89.7% (Advanced vs. non-advanced) _{[P1]}	86.1% (Neoplastic vs. non-neoplastic) _{[P1]} 84.5% (Advanced vs. non-advanced) _{[P1]}	65% (Neoplastic vs. non-neoplastic) _{[P1]} 88.7% (Advanced vs. non-advanced) _{[P1]}	Acc: 0.795, F1: 0.899, F2: 0.923, AUC: 0.832 (Neoplastic vs. non-neoplastic) _{[P1]} Acc: 93.5%, F1: 0.838, F2: 0.934, AUC: 0.935 (Advanced vs. non-advanced) _{[P1]} Acc: 71.5%, AUC: 0.760 (Neoplastic vs. non-neoplastic) _{{P1} [P2]} Acc: 87.1%, AUC: 0.935 (Advanced vs. non-advanced) _{{P1} [P2]} Acc (7-class): 60.2% _{[P1]} 74.7% _{{P1} [P2]} Acc (4-class): 67.7% _{[P1]} 76% _{{P1} [P2]}	frame	image dataset
Li K. et al. 2021	Adenoma vs. hyperplastic	86.8% _{[KUMC]}	N/A	85.8% _{[KUMC]}	N/A	F1: 0.863 _{[KUMC]}	polyp	image dataset
Yoshida et al. 2021	Neoplastic vs. hyperplastic	91.7% _{{CAD EYE} [P non-magnified BLI]} 90.9% _{{CAD EYE} [P-magnified BLI]}	86.8% _{{CAD EYE} [P non-magnified BLI]} 85.2% _{{CAD EYE} [P-magnified BLI]}	82.5% _{{CAD EYE} [P non-magnified BLI]} 83.3% _{{CAD EYE} [P-magnified BLI]}	93.9% _{{CAD EYE} [P non-magnified BLI]} 92.0% _{{CAD EYE} [P-magnified BLI]}	Acc: 88.8% _{{CAD EYE} [P non-magnified BLI]} Acc: 87.8% _{{CAD EYE} [P-magnified BLI]}	polyp	live video

Simultaneous Polyp Detection and Classification

Performance metrics on public and private datasets of all simultaneous polyp detection and classification studies.

Between curly brackets it is specified the training dataset, where "P" stands for private.
Between square brackets it is specified the test dataset used for computing the performance metric, where "P" stands for private.
For instance, [{P}] means that development and test splits of the same private dataset have been used for training and testing respectively.
AP_IoU stands for Average Precision and mAP_IoU for Mean Average Precision (i.e. the mean of each class AP), calculated at the specified IoU (Intersection over Union) level.

Study	Classes	AP	mAP	Recall (sensitivity)	Specificity	PPV	NPV	Others	Manually selected images?
Liu X. et al. 2019	Polyp vs. adenoma	Polyp: AP_0.5 = 83.39% _{[P]} Adenoma: AP_0.5 = 97.90% _{[P]}	mAP_0.5 = 90.645% _{[P]}	N/A	N/A	N/A	N/A	N/A	Yes
Li K. et al. 2021	Adenoma vs. Hyperplastic	Adenoma: AP = 81.1% _{[KUMC]} Hyperplastic: AP = 65.9% _{[KUMC]}	mAP = 73.5% _{[KUMC]}	61.3% _{[KUMC]}	86.3% _{[KUMC]}	92.2% _{[KUMC]}	49.1% _{[KUMC]}	F1: 0.736 _{[KUMC]}	Yes

List of Acronyms and Abbreviations

AP: Average Precision.
BLI: Blue Light Imaging.
LCI: Linked-Color Imaging.
mAP: Mean Average Precision.
NBI: Narrow Band Imaging.
SSAP: Sesile Serrated Adenoma/Polyp.
WCE: Wireless Capsule Endoscopy.
WL: White Light.

References and Further Reading

Reviews

Jun Ki Min, Min Seob Kwak, and Jae Myung Cha. Overview of Deep Learning in Gastrointestinal Endoscopy. Gut Liver. 2019 Jul; 13(4): 388–393.
Samy A Azer. Challenges Facing the Detection of Colonic Polyps: What Can Deep Learning Do?. Medicina (Kaunas). 2019 Aug; 55(8): 473.
Wei-Lun Chao, Hanisha Manickavasagan, and Somashekar G. Krishna. Application of Artificial Intelligence in the Detection and Differentiation of Colon Polyps: A Technical Review for Physicians. Diagnostics (Basel). 2019 Sep; 9(3): 99.
Thomas KL. Lui, Chuan-Guo Guo, and Wai K. Leung. Accuracy of Artificial Intelligence on Histology Prediction and Detection of Colorectal Polyps: a Systematic Review and Meta-Analysis. Gastrointest Endosc. 2020 Feb 28.
Cristina Sánchez-Montes, Jorge Bernal, Ana García-Rodríguez, Henry Córdova, and Gloria Fernández-Esparrach. Review of computational methods for the detection and classification of polyps in colonoscopy imaging. Gastroenterol Hepatol. 2020 Apr; 43(4): 222-232.
Luisa F.Sánchez-Peralta, Luis Bote-Curiel, Artzai Picón, Francisco M.Sánchez-Margallo, J. Blas Pagador. Deep learning to find colorectal polyps in colonoscopy: A systematic literature review. Artificial Intelligence in Medicine. 2020 Aug; 108: 101923.
Munish Ashat, Jagpal Singh Klair, Dhruv Singh, Arvind Rangarajan Murali, Rajesh Krishnamoorthi. Impact of real-time use of artificial intelligence in improving adenoma detection during colonoscopy: A systematic review and meta-analysis. Endoscopy International Open. 2021 March; 09(04): E513-E521.
Alexander Hann, Joel Troya, and Daniel Fitting. Current status and limitations of artificial intelligence in colonoscopy. United European Gastroenterology. 2021 Jun; 1-7.
Michelle Viscaino, Javier Torres Bustos, Pablo Muñoz, Cecilia Auat Cheein, and Fernando Auat Cheein. Artificial intelligence for the early detection of colorectal cancer: comprehensive review of its advantages and misconceptions. World Journal of Gastroenterology. 2021 Oct; 27(38): 6399-6414.
Britt B S L Houwen, Karlijn J Nass , Jasper L A Vleugels, Paul Fockens, Yark Hazewinkel, Evelien Dekker. Comprehensive review of publicly available colonoscopic imaging databases for artificial intelligence research: availability, accessibility, and usability. Gastrointestinal Endoscopy. 2022 Sep.

Randomized Clinical Trials

Study	Title	Date	Number of patients
Wang et al. 2019	Real-time automatic detection system increases colonoscopic polyp and adenoma detection rates: a prospective randomised controlled study	Sep. 2019	1058
Gong et al. 2020	Detection of colorectal adenomas with a real-time computer-aided system (ENDOANGEL): a randomised controlled study	Jan. 2020	704
Wang et al. 2020	Effect of a deep-learning computer-aided detection system on adenoma detection during colonoscopy (CADe-DB trial): a double-blind randomised study	Jan. 2020	1010
Liu et al. 2020	Study on detection rate of polyps and adenomas in artificial-intelligence-aided colonoscopy	Feb. 2020	1026
Su et al. 2019	Impact of a real-time automatic quality control system on colorectal polyp and adenoma detection: a prospective randomized controlled study (with videos)	Feb. 2020	659
Repici et al. 2020	Efficacy of Real-Time Computer-Aided Detection of Colorectal Neoplasia in a Randomized Trial	Aug. 2020	685

sing-group/deep-learning-colonoscopy

sing-group

Reviews

Repository Details

Deep Learning for Polyp Detection and Classification in Colonoscopy

Research

Polyp Detection and Localization

Polyp Classification

Simultaneous Polyp Detection and Classification

Datasets

Public Datasets

Private Datasets

Deep Learning Models and Architectures

Deep Learning Architectures

Off-the-shelf Architectures

Custom Architectures

Data Augmentation Strategies

Frameworks and Libraries

Performance

Polyp Detection and Localization

Polyp Classification

Simultaneous Polyp Detection and Classification

List of Acronyms and Abbreviations

References and Further Reading

Reviews

Randomized Clinical Trials

More Repositories