• Stars
    star
    2,698
  • Rank 16,922 (Top 0.4 %)
  • Language
    Python
  • License
    MIT License
  • Created about 4 years ago
  • Updated 10 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

📈 目前最大的工业缺陷检测数据库及论文集 Constantly summarizing open source dataset and critical papers in the field of surface defect research which are of great importance.
English | 简体中文

Surface Defect Detection: Dataset & Papers 📌

GitHub Computer Vision in Action License Open Collective Forks Stars

📈 Constantly summarizing open source dataset and critical papers in the field of surface defect research which are of great importance. Important critical papers from year 2017 have been collected and compiled, which can be viewed in the 📂 [Papers] folder. 🐋


Dataset download: Google Drive Google Drive | Baidu Cloud 百度云盘 o7p5

Introduction

At present, surface defect equipment based on machine vision has widely replaced artificial visual inspection in various industrial fields, including 3C, automobiles, home appliances, machinery manufacturing, semiconductors and electronics, chemical, pharmaceutical, aerospace, light industry and other industries. Traditional surface defect detection methods based on machine vision often use conventional image processing algorithms or artificially designed features plus classifiers. Generally speaking, imaging schemes are usually designed by using the different properties of the inspected surface or defects. A reasonable imaging scheme helps to obtain images with uniform illumination and clearly reflect the surface defects of the object. In recent years, many defect detection methods based on deep learning have also been widely used in various industrial scenarios.

Compared with the clear classification, detection and segmentation tasks in computer vision, the requirements for defect detection are very general. In fact, its requirements can be divided into three different levels: "what is the defect" (classification), "where is the defect" (positioning) and "How many defects are" (split).

*** 本项目会持续更新,右上角收藏防丢失 Star ⭐ ~ ***

Star anti-lost

喜欢这个项目吗?请考虑 ❤️ 赞助本项目 以帮助长期维护!

Table of Contents

1. Key Issues in Surface Defect Detection

1)Small Sample Problem

The current deep learning methods are widely used in various computer vision tasks, and surface defect detection is generally regarded as its specific application in the industrial field. In traditional understanding, the reason why deep learning methods cannot be directly applied to surface defect detection is because in a real industrial environment, there are too few industrial defect samples that can be provided.

Compared with the more than 14 million sample data in the ImageNet dataset, the most critical problem faced in surface defect detection is small sample problem. In many real industrial scenarios, there are even only a few or dozens of defective images. In fact, for the small sample problem which is one of the key problems in industrial surface defect detection, there are currently 4 different solutions:

- Data Amplification and Generation

The most commonly used defect image expansion method is to use multiple image processing operations such as mirroring, rotation, translation, distortion, filtering, and contrast adjustment on the original defect samples to obtain more samples. Another more common method is data synthesis, where individual defects are often fused and superimposed on normal (non-defective) samples to form defective samples.

- Network Pre-training and Transfer Learning

Generally speaking, using small samples to train deep learning networks can easily lead to overfitting, so methods based on pre-training networks or transfer learning are currently one of the most commonly used methods for samples.

- Reasonable Network Structure Design

The need for samples can also be greatly reduced by designing a reasonable network structure. Based on the compressed sampling theorem to compress and expand small sample data, we use CNN to directly classify the compressed sampling data features. Compared with the original image input, compressing the input can greatly reduce the network's demand for samples. In addition, the surface defect detection method based on the twin network can also be regarded as a special network design, which can greatly reduce the sample requirement.

- Unsupervised or Semi-supervised Method

In the unsupervised model, only normal samples are used for training, so there is no need for defective samples. The semi-supervised method can use unlabeled samples to solve the network training problem in the case of small samples.

👆 BACK to Table of Contents -->

2)Real-time Problem

The defect detection methods based on deep learning include three main links in industrial applications: data annotation, model training, and model inference. Real-time in actual industrial applications pays more attention to model inference. At present, most defect detection methods are concentrated in the accuracy of classification or recognition, little attention is paid to the efficiency of model inference. There are many methods for accelerating the model, such as model weighting and model pruning. In addition, although the existing deep learning model uses GPU as a general-purpose computing unit(GPGPU), with the development of technology, it is believed that FPGA will become an attractive alternative.

👆 BACK to Table of Contents -->

2. Common Datasets for Industrial Surface Defect Detection

1)Steel Surface: NEU-CLS

NEU-CLS can be used for classification and positioning tasks.

latest access 🔗 - (#16)

The surface defect dataset released by Northeastern University (NEU) collects six typical surface defects of hot-rolled steel strips, namely rolling scale (RS), plaque (Pa), cracking (Cr), pitting surface (PS), inclusions (In) and scratches (Sc). The dataset includes 1,800 grayscale images, six different types of typical surface defects each of which contains 300 samples. For defect detection tasks, the dataset provides annotations that indicate the category and location of the defect in each image. For each defect, the yellow box is the border indicating its location, and the green label is the category score.

Kaggle - Severstal: Steel Defect Detection

Severstal: Steel Defect Detection

Severstal is leading the charge in efficient steel mining and production. They believe the future of metallurgy requires development across the economic, ecological, and social aspects of the industry—and they take corporate responsibility seriously. The company recently created the country’s largest industrial data lake, with petabytes of data that were previously discarded. Severstal is now looking to machine learning to improve automation, increase efficiency, and maintain high quality in their production.

https://www.kaggle.com/c/severstal-steel-defect-detection


👆 BACK to Table of Contents -->

2)Solar Panels: elpv-dataset

A dataset of functional and defective solar cells extracted from EL images of solar modules.


The dataset contains 2,624 samples of 300x300 pixels 8-bit grayscale images of functional and defective solar cells with varying degree of degradations extracted from 44 different solar modules. The defects in the annotated images are either of intrinsic or extrinsic type and are known to reduce the power efficiency of solar modules.

All images are normalized with respect to size and perspective. Additionally, any distortion induced by the camera lens used to capture the EL images was eliminated prior to solar cell extraction.


👆 BACK to Table of Contents -->

3)Metal Surface: KolektorSDD

The dataset is constructed from images of defected electrical commutators that were provided and annotated by Kolektor Group. Specifically, microscopic fractions or cracks were observed on the surface of the plastic embedding in electrical commutators. The surface area of each commutator was captured in eight non-overlapping images. The images were captured in a controlled environment.


The dataset consists of:

  • 50 physical items (defected electrical commutators)
  • 8 surfaces per item
  • Altogether 399 images:
    -- 52 images of visible defect
    -- 347 images without any defect
  • Original images of sizes:
    -- width: 500 px
    -- height: from 1240 to 1270 px
  • For training and evaluation images should be resized to 512 x 1408 px

For each item the defect is only visible in at least one image, while two items have defects on two images, which means there were 52 images where the defects are visible. The remaining 347 images serve as negative examples with non-defective surfaces.


👆 BACK to Table of Contents -->

4)PCB Inspection: DeepPCB

     
an example of the tested image                                         the corresponding template image

Figure 1. PCB Inspection Dataset.


👆 BACK to Table of Contents -->

5)Fabric Defects Dataset: AITEX

This dataset consists of 245 4096x256 pixel images with seven different fabric structures. There are 140 non-defect images in the dataset, 20 of each type of fabric. In addition, there are 105 images of different types of fabric defects (12 types) common in the textile industry. The image size allows users to use different window sizes, thereby the number of samples can be increased. The online dataset also contains segmentation masks of all defective images, so that white pixels represent defective areas and the remaining pixels are black.


👆 BACK to Table of Contents -->

6)Fabric Defect Dataset (Tianchi)

In the actual production process of cloth, due to the influence of various factors, defects such as stains, holes, lint, etc. will occur. In order to ensure the quality of the product, the cloth needs to be inspected for defects.

Fabric defect inspection is an important part of the textile industry's production and quality management. At present, manual inspection is susceptible to subjective factors and lacks consistency, and inspection personnel working for a long time under strong light has a great impact on vision. Due to the wide variety of fabric defects, various morphological changes, and the difficulty of observation and recognition, the intelligent detection of fabric defects has been a technical bottleneck that has plagued the industry for many years.

This dataset covers all kinds of important defects in fabrics in the textile industry, and each picture contains one or more defects. The data includes two types of plain cloth and patterned cloth. Among them, about 8000 pieces of plain cloth data are used for preliminary matches, and about 12,000 pieces of patterned cloth data are used for semi-finals.


👆 BACK to Table of Contents -->

7)Aluminium Profile Surface Defect Dataset(Tianchi)

Due to the influence of various factors in the actual production process of aluminum profile, the surface of the aluminum profile will have cracks, peeling, scratches and other defects, which will seriously affect the quality of the aluminum profile. To ensure product quality, manual visual inspection is required. However, the surface of the aluminum profile itself contains textures, which are not highly distinguishable from defects.

Traditional manual visual inspection methods have many shortcomings, which are very laborious, cannot accurately judge surface defects in time, and have difficult to control the efficiency of quality inspection. In recent years, deep learning has made rapid progress in image recognition and other fields. Aluminum profile manufacturers are eager to use the latest AI technology to innovate the existing quality inspection process, automatically complete quality inspection tasks, reduce the incidence of missed inspections, and improve product quality. AI technology, especially deep learning, makes aluminum profile product production managers completely free from the inability to fully grasp the state of product surface quality.

In the dataset of the competition, there are 10,000 pieces of monitoring image data from aluminum profiles with defects in actual production, and each image contains one or more defects. The sample image for machine learning will clearly identify the type of defect contained in the image.


👆 BACK to Table of Contents -->

8)Weakly Supervised Learning for Industrial Optical Inspection(DAGM 2007)


Dataset introduction:

  • Mainly aimed at miscellaneous defects on textured backgrounds.

  • Training data with weaker supervision.

  • Contains ten data sets, the first six are training data sets, and the last four are test data sets.

  • Each dataset contains 1000 "non-defective" images and 150 "defective" images saved in grayscale 8-bit PNG format. Each data set is generated by a different texture model and defect model.

  • The background texture of the "No Defect" image shows no defect, and the background texture of the "No Defect" image has exactly one marked defect.

  • All datasets have been randomly divided into training and testing sub-data sets of equal size.

  • Weak labels are represented by ellipses, which roughly indicate the defect area.


👆 BACK to Table of Contents -->

9)Cracks on the Surface of the Construction

CrackForest Dataset is an annotated road crack image database which can reflect urban road surface condition in general.


Figure 2. Cracks on the Bridge(left) and Cracks on the Road Surface.

👆 BACK to Table of Contents -->

10)Magnetic Tile Dataset

Magnetic tile dataset by githuber: abin24, which can be downloaded from https://github.com/Charmve/Surface-Defect-Detection/tree/master/Magnetic-Tile-Defect, which was used in their paper "Surface defect saliency of magnetic tile", the paper can be reach by here or here

dataset

Figure 3. An overview of our dataset.

This is also the datasets of the paper "Saliency of magnetic tile surface defects". The images of 6 common magnetic tile defects were collected, and their pixel level ground-truth were labeled.

👆 BACK to Table of Contents -->

11)RSDDs: Rail Surface Defect Datasets

The RSDDs dataset contains two types of datasets: the first is a type I RSDDs dataset captured from the fast lane, which contains 67 challenging images. The second is a Type II RSDDs dataset captured from a normal/heavy transportation track, which contains 128 challenging images.

Each image of the two data sets contains at least one defect, and the background is complex and noisy.

These defects in the RSDDs dataset have been marked by professional human observers in the field of track surface inspection.



👆 BACK to Table of Contents -->

12)Kylberg Texture Dataset v.1.0

Figure 4. Example patches from each one of the 28 texture classes.

Short description

  • 28 texture classes, see Figure 4.
  • 160 unique texture patches per class. (Alternative dataset with 12 rotations per each original patch, 160*12=1920 texture patches per class).
  • Texture patch size: 576x576 pixels.
  • File format: Lossless compressed 8 bit PNG.
  • All patches are normalized with a mean value of 127 and a standard deviation of 40.
  • One directory per texture class.
  • Files are named as follows: blanket1-d-p011-r180.png, where blanket1 is the class name, d original image sample number (possible values are a, b, c, or d), p011 is patch number 11, r180 patch rotated 180 degrees.

🔗 Offical Link: http://www.cb.uu.se/~gustaf/texture/

👆 BACK to Table of Contents -->

13)KTH-TIPS database

Repeat the background texture data set, the sample picture is as follows


👆 BACK to Table of Contents -->

14)Escalator Step Defect Dataset

🔗 Offical Link:https://aistudio.baidu.com/aistudio/datasetdetail/44820


👆 BACK to Table of Contents -->

15)Transmission Line Insulator Dataset

In the data set, Normal_Insulators contains 600 insulator images captured by drones. Defective_Insulators contains defective insulators, and the number of defective images of insulators is 248. The data set includes data sets and labels.

🔗 Offical Link:https://github.com/InsulatorData/InsulatorDataSet

👆 BACK to Table of Contents -->

16)MVTEC ITODD

The MVTec Industrial 3D Object Detection Dataset (MVTec ITODD) is a public dataset for 3D object detection and pose estimation with a strong focus on industrial settings and applications.

The dataset consists of

  • 28 objects and 3500 labeled scenes containing instances of these objects
  • Five sensors (two 3D sensors and three grayscale cameras) observing each scene

More information can be found in this PDF file 🔍.

🔗 Download link https://www.mvtec.com/company/research/datasets/mvtec-itodd

👆 BACK to Table of Contents -->

17)BSData - dataset for Instance Segmentation and industrial Wear Forecasting

The dataset contains 1104 channel 3 images with 394 image-annotations for the surface damage type “pitting”. The annotations made with the annotation tool labelme, are available in JSON format and hence convertible to VOC and COCO format. All images come from two BSD types.

The other BSD type is shown on 325 images with two image-sizes. Since all images of this type have been taken with continuous time the degree of soiling is evolving.

Also, the dataset contains as above mentioned 27 pitting development sequences with every 69 images.

Figure 5. On the left image-examples, on the right associated PNG-Annotations.

🔗 Offical link https://github.com/2Obe/BSData

Sincerely, thank @Beñat Gartzia for his recommendation and all your attention!

👆 BACK to Table of Contents -->

18)The Gear Inspection Dataset

The Gear Inspection Dataset (GID) is a dataset for a competition held by Baidu (China) Co., called the "National Artificial Intelligence Innovation Application Competition." It has two thousand grayscale images with 28575 annotations for three types of defects from a real-world source. Each picture includes defects described in a separate JSON file with the image name, label categories, bounding boxes, and polygons for segmentation. Nevertheless, the tags for labeling categories do not include specific information about their type but only numbers, so spotting their similarities with other related datasets is challenging.

Figure 6. Examples of validation test images and their labels.

🔗 Offical link http://www.aiinnovation.com.cn/#/dataDetail?id=34

Note: The contest dataset is not for commercial use.

👆 BACK to Table of Contents -->


3. More Inventory of the Best Data Set Sources

I have been collecting surface defect detection data sets, but there are still many data sets that have not been collected. For the data sets not collected in this repo, you can go to the following sites to view. At the same time, everyone is very welcome to share the new data set and become the contributor of this repo.

Contributions welcome

source url Recommended
Kaggle https://www.kaggle.com/datasets
Paper With Code https://paperwithcode.com/sota
Registry of Open Data on AWS https://registry.opendata.aws ⭐⭐⭐
Microsoft Research Open Data https://msropendata.com
Awesome-public-datasets https://github.com/awesomedata/awesome-public-datasets

👆 BACK to Table of Contents -->


4. Surface Defect Detection Papers

I have collected some articles on surface defect detection. The main objects to be tested are: defects or abnormal objects such as metal surfaces, LCD screens, buildings, and power lines. The methods are mainly classified method, detection method, reconstruction method and generation method. The electronic version (PDF) of the paper is placed under the file named corresponding to the date in the 'Paper' folder.

Go to 📂 [Papers].


👆 BACK to Table of Contents -->

Acknowledgements

You can see this repo now, we should be grateful to the people who originally open sourced the above data set. They have brought great help to our study and research work. The idea of collecting this data set originally came from reading an article on surface defect detection by SFXiang of "AI算法修炼营(AI_SuanFa)", which prompted me to organize a more comprehensive data set. The collection of papers comes from a CSDNer named "庆志的小徒弟". These papers are only until November 19, and I will continue to be improved after that. Hopefully, feel free to CONTRIBUTE.

Finally, I want to thank the open source contributors of the above data set again.

👆 BACK to Table of Contents -->

Download

👆 BACK to Table of Contents -->

Notification

This work is originally contributed by lots of great man for their paper work or industry application. You can only use this dataset for research purpose.

If you have any questions or idea, please let me know 📧 [email protected]

🍮 Community

  • Github discussions 💬 or issues 💭

  • QQ Group: 734758251 (password:哈哈哈)

  • WeChat Group ID: Yida_Zhang2

  • Email: yidazhang1#gmail.com


Go for it!

   Supporting

     Support this project by becoming a sponsor. Your name and/or logo will show up our homepage with a link to your website. 🙏

Sponse this project

Citation

Use this bibtex to cite this repository:

@misc{Surface Defect Detection,
  title={Surface Defect Detection: Dataset and Papers},
  author={Charmve},
  year={2020.09},
  publisher={Github},
  journal={GitHub repository},
  howpublished={\url{https://github.com/Charmve/Surface-Defect-Detection}},
}

Stargazers over time

Stargazers over time


Feel free to ask any questions, open a PR if you feel something can be done differently!

🌟Star this repository🌟

Created by Charmve & maiwei.ai Community | Deployed on Kaggle


* Update on Sep 17, 2021 @Charmve, Star and Fork

More Repositories

1

computer-vision-in-action

A computer vision closed-loop learning platform where code can be run interactively online. 学习闭环《计算机视觉实战演练:算法与应用》中文电子书、源码、读者交流社区(持续更新中 ...) 📘 在线电子书 https://charmve.github.io/computer-vision-in-action/ 👇项目主页
Jupyter Notebook
2,375
star
2

BLE-Security-Attack-Defence

✨ Purpose only! The dangers of Bluetooth Low Energy(BLE)implementations: Unveiling zero day vulnerabilities and security flaws in modern Bluetooth LE stacks.
Python
264
star
3

CppMaster

C++ Master Learning Roadmap, especially for AIoT and C++ advanced SWE
C++
262
star
4

Practicum4ECE

「一名普通电子信息本科生的项目实践管理」将大学阶段的实训内容,按照专业课程设计(包括上机实验、课程设计、下学年的毕业设计等)、竞赛项目、科创项目、小型编程项目这四个门类进行整理汇总。
C
113
star
5

LeetCode4FLAG

🔥 🔥 High frequent interview LeetCode 100 for FaceBook,Linkedin,Amazon,Google,Microsoft. More importantly, the problems' solutions are provided,include C++, Python and Java.
C++
110
star
6

VOGUE-Try-On

Personal repository for "VOGUE: Try-On by StyleGAN Interpolation Optimization" (CVPR 2021). SOTA results for garments to deform according to the given body shape, while preserving pattern and material details.
HTML
89
star
7

OpenCC

Automatic driving long tail / corner cases scenarios dataset (Anomaly detection)
87
star
8

awesome-scene-text-detection

Tracking the latest progress in Scene Text Detection and Recognition: Must-read papers well organized with code and dataset
75
star
9

OccNet-Course

国内首个占据栅格网络全栈课程《从BEV到Occupancy Network,算法原理与工程实践》,包含端侧部署。Surrounding Semantic Occupancy Perception Course for Autonomous Driving (docs, ppt and source code) 课程主页:http://111.229.117.200:7001/
Python
75
star
10

Mirror-Glass-Detection

🍸 Mirror & Glass Detection in Real-world Scenes
Python
72
star
11

PaperWeeklyAI

📚「@MaiweiAI」Studying papers in the fields of computer vision, NLP, and machine learning algorithms every week.
Jupyter Notebook
63
star
12

OpenCS-Courses

🎓 Path to a free self-taught education in Computer Science!
58
star
13

Awesome-Lane-Detection

A paper list with code of lane detection.
Python
52
star
14

Semantic-Segmentation-PyTorch

PyTorch implementation for Semantic Segmentation, include FCN, U-Net, SegNet, GCN, PSPNet, Deeplabv3, Deeplabv3+, Mask R-CNN, DUC, GoogleNet, and more dataset
Python
48
star
15

autopilot-perception

End to End Autopilot Perception Playbook
HTML
46
star
16

iQuant

HTML
40
star
17

LightCube

A Design of 3D Dynamic Display System Based on Voice Control. ✨ Purpose only! Copyright and commercial use rights are received.
C
40
star
18

Charmve

🤡 About Me.
Python
38
star
19

PyStegosploit

PoC - Exploit Delivery via Steganography and Polyglots, CVE-2014-0282
HTML
37
star
20

transparent-object-segmentation

💎 Transparent objects such as windows and bottles made by glass widely exist in the real world. Segmenting transparent objects is challenging because these objects have diverse appearance inherited from the image background, making them had similar appearance with their surroundings.
32
star
21

TimeWarp

🐶 「Updating ...」Replacing Real-Time the High-Resolution Meeting Background Freely
Python
29
star
22

StegaStamp-plus

Improved the original repo, 'Invisible Hyperlinks in Physical Photographs', embedded with longer string than the original
Jupyter Notebook
25
star
23

SNE-RoadSeg2

🌱 SNE-RoadSeg in PyTorch, ECCV 2020 by Rui (Ranger) Fan & Hengli Wang, but now we have improved it.
Python
24
star
24

AccANN

🐆 A compiler from AI model to RTL (Verilog) accelerator in FPGA hardware with auto design space exploration for *AdderNet*
15
star
25

AR-DAO

AR-DAO,A decentralized autonomous organization (DAO) that enables users to forge NFTs through augmented reality (AR) and participate in various gaming and social activities. 一个让用户通过增强现实(AR)铸造NFT并参与不同游戏等社交活动的去中心化自治组织(DAO)https://github.com/Charmve/AR-DAO/wiki
JavaScript
14
star
26

gpt-eyes

I GAVE GPT-4 EYES!
JavaScript
10
star
27

EmotionCube

🐾 EmotionCube: An intelligent companion robot is designed based on expression recognition and intelligent speech.
C
9
star
28

Charmve.github.io

I'm here! 👋 Personal Home Page 🐶
HTML
8
star
29

steganography.js

Hide secret messages with JavaScript with steganography.js
JavaScript
5
star
30

PuppyGo

vision language model and large language model powered embodied robot
5
star
31

NotOnlyPaper

PaperEasy = arXiv + code + video + tutorial + Colab/demo
JavaScript
5
star
32

AlphaFold-baseline

This package provides an basic implementation of the contact prediction network used in AlphaFold 1 for beginner, associated model weights and CASP13 dataset as used for CASP13 (2018) and published in Nature
Python
4
star
33

mapless-course

No HDMap, Only Vision Perception
4
star
34

Bluetooth-Location_2D

indoor position based-on Bluetooth low energy in 2d space
Java
3
star
35

B1ueB0y-BLE-Fuzzing

An awesome toolkit for testing the BLE device, chip and Protocol stack
2
star
36

jsFlow

🏄 A Lightweight Web Browser-based Machine Learning Framework
CSS
2
star
37

LocalPay

非接触式离线支持方案
1
star
38

NumPyCNN

Building Convolutional Neural Networks From Scratch using NumPy
Python
1
star
39

Less-is-More

🌈 Art of README in GitHub, #Less is More#.
1
star
40

ScenesGen

自动驾驶算法系列课程之《场景数据生成》
1
star
41

weather-app

SCSS
1
star
42

Wind-Pendulum-Controlling-System

2015年风力摆控制系统赛题(B题)解析+源代码开源 A Wind Pendulum Controlling System
1
star
43

SimElectronicGun

基于视觉感知的模拟电磁曲射炮 (2019年全国大学生电子设计竞赛H题) 全国二等奖作品
C
1
star
44

qbot_pro

Jupyter Notebook
1
star