• Stars
    star
    771
  • Rank 58,926 (Top 2 %)
  • Language
    Python
  • Created almost 7 years ago
  • Updated over 5 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

常用的语义分割架构结构综述以及代码复现 华为媒体研究院 图文Caption、OCR识别、图视文多模态理解与生成相关方向工作或实习欢迎咨询 15757172165 https://guanfuchen.github.io/media/hw_zhaopin_20220724_tiny.jpg

semseg

master

语义图像分割,为图像中的每个像素分配语义标签(例如“道路”,“天空”,“人”,“狗”)的任务使得能够实现许多新应用,例如Pixel 2和Pixel 2 XL智能手机的纵向模式中提供的合成浅景深效果和移动实时视频分割。

引用自Semantic Image Segmentation with DeepLab in TensorFlow

本仓库的开发计划见项目下一步开发计划

下面将近期主要的论文整理表格以供后面进一步总结。


网络实现

  • FCN(VGG和ResNet的骨干网络),已实现,参考fcn_understanding
  • RefineNet,已实现,参考refinenet_understanging
  • DUC,参考duc_understanding
  • DRN,已实现
  • PSPNet,参考pspnet_understanding
  • ENet,已实现
  • ErfNet,已实现
  • EDANet,已实现
  • LinkNet,已实现,参考pytorch-linknet
  • FC-DenseNet,已实现,参考fcdensenet_understanding
  • LRN,已实现,但是没有增加多分辨率loss训练,后期增加。
  • BiSeNet,已实现,主要是ResNet-18和ResNet-101,其余类似。
  • FRRN,已实现,FRRN A和FRRN B。
  • 增加YOLO-V1多任务学习,还未完全测试。
  • GCN
  • ...

semantic segmentation algorithms

这个仓库旨在实现常用的语义分割算法,主要参考如下:


相关论文


弱监督语义分割

  • Generating Self-Guided Dense Annotations for Weakly Supervised Semantic Segmentation

实例分割

目前暂且收集相关实例分割到语义分割目录中,待综述完成单独分离。

  • Semantic Instance Segmentation with a Discriminative Loss Function

数据集实现


数据集增加

通过仿射变换来实现数据集增加的方法扩充语义分割数据集。


依赖

  • pytorch
  • ...

数据


用法

# 在tmux或者另一个终端中开启可视化服务器visdom
python -m visdom.server
# 然后在浏览器中查看127.0.0.1:9097
  • 训练
# 训练模型
python train.py
  • 校验
# 校验模型
python validate.py

ENet可视化结果

以下是相关语义分割论文粗略整理。


ShuffleSeg: Real-time Semantic Segmentation Network

摘要
Real-time semantic segmentation is of significant importance for mobile and robotics related applications. We propose a computationally efficient segmentation network which we term as ShuffleSeg. The proposed architecture is based on grouped convolution and channel shuffling in its encoder for improving the performance. An ablation study of different decoding methods is compared including Skip architecture, UNet, and Dilation Frontend. Interesting insights on the speed and accuracy tradeoff is discussed. It is shown that skip architecture in the decoding method provides the best compromise for the goal of real-time performance, while it provides adequate accuracy by utilizing higher resolution feature maps for a more accurate segmentation. ShuffleSeg is evaluated on CityScapes and compared against the state of the art real-time segmentation networks. It achieves 2x GFLOPs reduction, while it provides on par mean intersection over union of 58.3% on CityScapes test set. ShuffleSeg runs at 15.7 frames per second on NVIDIA Jetson TX2, which makes it of great potential for real-time applications.
会议/期刊 作者 论文 代码
arXiv: 1803.03816 Mostafa Gamal, Mennatullah Siam, Moemen Abdel-Razek ShuffleSeg: Real-time Semantic Segmentation Network TFSegmentation

本文提出了一种基于ShuffleNet的实时语义分割网络,通过在编码器中使用grouped convolution和channle shuffling(ShuffleNet基本结构),同时用不同的解码方法,包括Skip架构,UNet和Dilation前端探索了精度和速度的平衡。

主要动机是:

It was shown in [4][2][3] that depthwise separable convolution or grouped convolution reduce the computational cost, while maintaining good representation capability.

训练的trciks:充分利用CityScapes数据集,将其中粗略标注的图像作为网络预训练,然后基于精细标注的图像作为网络微调。


RTSeg: Real-time Semantic Segmentation Comparative Study

摘要
Semantic segmentation benefits robotics related applications especially autonomous driving. Most of the research on semantic segmentation is only on increasing the accuracy of segmentation models with little attention to computationally efficient solutions. The few work conducted in this direction does not provide principled methods to evaluate the different design choices for segmentation. In this paper, we address this gap by presenting a real-time semantic segmentation benchmarking framework with a decoupled design for feature extraction and decoding methods. The framework is comprised of different network architectures for feature extraction such as VGG16, Resnet18, MobileNet, and ShuffleNet. It is also comprised of multiple meta-architectures for segmentation that define the decoding methodology. These include SkipNet, UNet, and Dilation Frontend. Experimental results are presented on the Cityscapes dataset for urban scenes. The modular design allows novel architectures to emerge, that lead to 143x GFLOPs reduction in comparison to SegNet.
会议/期刊 作者 论文 代码
arXiv: 1803.02758 Mennatullah Siam, Mostafa Gamal, Moemen Abdel-Razek, Senthil Yogamani, Martin Jagersand RTSeg: Real-time Semantic Segmentation Comparative Study TFSegmentation

和ShuffleSeg: Real-time Semantic Segmentation Network同一作者。

本文整体思路和ShuffleSeg类同,只不过更加抽象了编码器解码器,这里的编码器不再仅仅是ShuffleNet,而是增加了VGG16,Resnet18,MobileNet,方便了后期不同基础网络性能的比较。


SegNet: A Deep Convolutional Encoder-Decoder Architecture for Robust Semantic Pixel-Wise Labelling

摘要
We propose a novel deep architecture, SegNet, for semantic pixel wise image labelling. SegNet has several attractive properties; (i) it only requires forward evaluation of a fully learnt function to obtain smooth label predictions, (ii) with increasing depth, a larger context is considered for pixel labelling which improves accuracy, and (iii) it is easy to visualise the effect of feature activation(s) in the pixel label space at any depth.
SegNet is composed of a stack of encoders followed by a corresponding decoder stack which feeds into a soft-max classification layer. The decoders help map low resolution feature maps at the output of the encoder stack to full input image size feature maps. This addresses an important drawback of recent deep learning approaches which have adopted networks designed for object categorization for pixel wise labelling. These methods lack a mechanism to map deep layer feature maps to input dimensions. They resort to ad hoc methods to upsample features, e.g. by replication. This results in noisy predictions and also restricts the number of pooling layers in order to avoid too much upsampling and thus reduces spatial context. SegNet overcomes these problems by learning to map encoder outputs to image pixel labels. We test the performance of SegNet on outdoor RGB scenes from CamVid, KITTI and indoor scenes from the NYU dataset. Our results show that SegNet achieves state-of-the-art performance even without use of additional cues such as depth, video frames or post-processing with CRF models.
会议/期刊 作者 论文 代码
arXiv: 1505.07293 Vijay Badrinarayanan, Ankur Handa, Roberto Cipolla SegNet: A Deep Convolutional Encoder-Decoder Architecture for Robust Semantic Pixel-Wise Labelling caffe-segnet

本文为SegNet-Basic,基本思路就是编码器-解码器架构,指出当前语义分割方法都缺少一个机制将深度特征图map到输入维度的机制,基本都是特定的上采样特征方法,比如复制。


Bayesian SegNet: Model Uncertainty in Deep Convolutional Encoder-Decoder Architectures for Scene Understanding

摘要
We present a deep learning framework for probabilistic pixel-wise semantic segmentation, which we term Bayesian SegNet. Semantic segmentation is an important tool for visual scene understanding and a meaningful measure of uncertainty is essential for decision making. Our contribution is a practical system which is able to predict pixelwise class labels with a measure of model uncertainty. We achieve this by Monte Carlo sampling with dropout at test time to generate a posterior distribution of pixel class labels. In addition, we show that modelling uncertainty improves segmentation performance by 2-3% across a number of state of the art architectures such as SegNet, FCN and Dilation Network, with no additional parametrisation. We also observe a significant improvement in performance for smaller datasets where modelling uncertainty is more effective. We benchmark Bayesian SegNet on the indoor SUN Scene Understanding and outdoor CamVid driving scenes datasets.
会议/期刊 作者 论文 代码
arXiv: 1511.02680 Alex Kendall, Vijay Badrinarayanan, Roberto Cipolla Bayesian SegNet: Model Uncertainty in Deep Convolutional Encoder-Decoder Architectures for Scene Understandin caffe-segnet

本文主要提出了一种基于概率的像素级语义分割框架Bayesian SegNet,通过建模模型不确定性能够在许多网络中都提升2-3%性能,如SegNet,FCN和Dilation网络。


SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation

摘要
We present a novel and practical deep fully convolutional neural network architecture for semantic pixel-wise segmentation termed SegNet. This core trainable segmentation engine consists of an encoder network, a corresponding decoder network followed by a pixel-wise classification layer. The architecture of the encoder network is topologically identical to the 13 convolutional layers in the VGG16 network. The role of the decoder network is to map the low resolution encoder feature maps to full input resolution feature maps for pixel-wise classification. The novelty of SegNet lies is in the manner in which the decoder upsamples its lower resolution input feature map(s). Specifically, the decoder uses pooling indices computed in the max-pooling step of the corresponding encoder to perform non-linear upsampling. This eliminates the need for learning to upsample. The upsampled maps are sparse and are then convolved with trainable filters to produce dense feature maps. We compare our proposed architecture with the widely adopted FCN and also with the well known DeepLab-LargeFOV, DeconvNet architectures. This comparison reveals the memory versus accuracy trade-off involved in achieving good segmentation performance.
SegNet was primarily motivated by scene understanding applications. Hence, it is designed to be efficient both in terms of memory and computational time during inference. It is also significantly smaller in the number of trainable parameters than other competing architectures and can be trained end-to-end using stochastic gradient descent. We also performed a controlled benchmark of SegNet and other architectures on both road scenes and SUN RGB-D indoor scene segmentation tasks. These quantitative assessments show that SegNet provides good performance with competitive inference time and most efficient inference memory-wise as compared to other architectures. We also provide a Caffe implementation of SegNet and a web demo at http://mi.eng.cam.ac.uk/projects/segnet/.
会议/期刊 作者 论文 代码
arXiv: 1511.00561 Vijay Badrinarayanan, Alex Kendall, Roberto Cipolla SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation caffe-segnet

本文提出的SegNet是应用最为广泛的架构,其中SegNet-VGG16在性能和精度上都获得了较大的提升,主要指出了解码器使用的反池化操作。


U-Net: Convolutional Networks for Biomedical Image Segmentation

摘要
There is large consent that successful training of deep networks requires many thousand annotated training samples. In this paper, we present a network and training strategy that relies on the strong use of data augmentation to use the available annotated samples more efficiently. The architecture consists of a contracting path to capture context and a symmetric expanding path that enables precise localization. We show that such a network can be trained end-to-end from very few images and outperforms the prior best method (a sliding-window convolutional network) on the ISBI challenge for segmentation of neuronal structures in electron microscopic stacks. Using the same network trained on transmitted light microscopy images (phase contrast and DIC) we won the ISBI cell tracking challenge 2015 in these categories by a large margin. Moreover, the network is fast. Segmentation of a 512x512 image takes less than a second on a recent GPU. The full implementation (based on Caffee) and the trained networks are available at http://lmb.informatik.uni-freiburg.de/people/ronneber/u-net.
会议/期刊 作者 论文 代码
arXiv: 1505.04597 Olaf Ronneberger, Philipp Fischer, Thomas Brox U-Net: Convolutional Networks for Biomedical Image Segmentation unet第三方

本文提出的U-Net网络能够有效利用标注样本,通过a symmetric expanding path提升分割精度。

More Repositories

1

video_obj

基于视频的目标检测算法研究 华为媒体研究院 图文Caption、OCR识别、图视文多模态理解与生成相关方向工作或实习欢迎咨询 15757172165 https://guanfuchen.github.io/media/hw_zhaopin_20220724_tiny.jpg
Python
473
star
2

facedet

实现常用基于深度学习的人脸检测算法 华为媒体研究院 图文Caption、OCR识别、图视文多模态理解与生成相关方向工作或实习欢迎咨询 15757172165 https://guanfuchen.github.io/media/hw_zhaopin_20220724_tiny.jpg
Python
129
star
3

objdet

实现常用的one-stage和two-stage目标检测网络 华为媒体研究院 图文Caption、OCR识别、图视文多模态理解与生成相关方向工作或实习欢迎咨询 15757172165 https://guanfuchen.github.io/media/hw_zhaopin_20220724_tiny.jpg
Python
55
star
4

cifarclassify

实现常用图像分类算法
Python
41
star
5

cnn_np

使用numpy构建cnn复习深度学习知识
Jupyter Notebook
35
star
6

DeepNetModel

记录每一个常用的深度模型结构的特点(图和代码)
Python
29
star
7

cvpr_review

整理cvpr论文,包括摘要,动机,架构,结果,总结
Python
26
star
8

objtrack

实现常用的目标跟踪算法
Python
25
star
9

Flow-Guided-Feature-Aggregation

Flow-Guided-Feature-Aggregation研究基于视频的目标检测FGFA框架
Python
19
star
10

videopred

Common Video Prediction Architectures
Python
16
star
11

object_detection_hog_svm

使用HOG和SVM进行目标检测
Jupyter Notebook
13
star
12

deep_denoise

常用深度学习方法进行图像去噪方法研究
11
star
13

Deformable-ConvNets

原始仓库Deformable-ConvNets代码注释
Python
11
star
14

PyMathModule

记录Python常用数值计算(线性代数)相关库的用法
Jupyter Notebook
11
star
15

CaffeLearning

Caffe源码分析
C++
9
star
16

opencv_train

记录图像处理相关算法openv实现
Jupyter Notebook
9
star
17

maskrcnn-benchmark-read

read maskrcnn-benchmark
Python
9
star
18

statistics_model

统计学系模型实现
Jupyter Notebook
7
star
19

facial_keypoints_detection

面部关键点检测
Jupyter Notebook
7
star
20

semseg_keras

using keras to implement some famous semantic segment
Python
6
star
21

deep_sort

deep_sort人脸跟踪脚本
Python
5
star
22

py-faster-rcnn

py-faster-rcnn源码阅读笔记
Jupyter Notebook
5
star
23

objdet_darknet

目标检测_YOLOV3实现
Python
5
star
24

patch

patch little file
Shell
4
star
25

facenet

facenet模型代码注释和日常使用
Python
4
star
26

Detectron

Detectron代码阅读以及注释
Python
4
star
27

maskrcnn_simple

简化版maskrcnn,仅仅包含faster rcnn模型和voc数据集训练
Python
4
star
28

CodeTrain

刷题记录
Python
3
star
29

CppModule

记录常用Cpp模块使用方法
C++
3
star
30

dji_cam_transport

dji camera video convert image and publish image topic
CMake
3
star
31

SiameseNetwork

a siamese network
Python
3
star
32

pytorch2caffe

Python
3
star
33

PtDetection

A PyTorch implementation for yolo, faster rcnn
3
star
34

pytorch-cpp

for usage pytorch to cpp
Python
3
star
35

blog_ws

hugo blog workspace for github and gitos blog system
CSS
2
star
36

NDK_OpenCV_AndroidStudio

作为NDK OpenCV Android Studio工程模版,服务于人脸等图像相关工程
Java
2
star
37

GPUTrain

GPU编程代码学习
Jupyter Notebook
2
star
38

semseg_tensorflow

use tensorflow to implement some famous semantic segment solution
Python
2
star
39

SlimApp

一款记录瘦身过程的App
2
star
40

paper_english

记录日常阅读论文中出现的生词
2
star
41

SSAccountLoad

get free shadowsocks accounts to update
HTML
2
star
42

ros_nav

使用ros进行导航的相关试验
Jupyter Notebook
2
star
43

guanfuchen.github.io

website
HTML
2
star
44

p3at_2dnav

use ros base_move to get p3at robot move
CMake
2
star
45

WebCheck

some web auto check
Python
1
star
46

deadline_6_30

可行区域检测与预测技术研究
Jupyter Notebook
1
star
47

objdet_cnn

使用cnn检测一张图像中的单个物体
Jupyter Notebook
1
star
48

freespace_multi_msg

可行区域python客户端监听发布信息转换器
CMake
1
star
49

stitch_sem

stitch the semantic segment result
Python
1
star
50

semseg_caffe

semseg using caffe to implement
Jupyter Notebook
1
star
51

translate_deep_learning

TeX
1
star
52

mlf

machine learning foundations机器学习基石笔记
1
star
53

ocr

newest ocr(optical character recognition) papers and tools 华为媒体研究院 图文Caption、OCR识别、图视文多模态理解与生成相关方向工作或实习欢迎咨询 15757172165 https://guanfuchen.github.io/media/hw_zhaopin_20220724_tiny.jpg
1
star
54

Deep-Feature-Flow

Deep-Feature-Flow代码阅读以及注释
Python
1
star
55

LLM

LLM series
1
star
56

face_recognition

face recognition common dataset and model
Python
1
star