• Stars
    star
    105
  • Rank 328,196 (Top 7 %)
  • Language
    Python
  • License
    GNU General Publi...
  • Created almost 7 years ago
  • Updated over 3 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Semantically segment the road in the given image.

Road Segmentation

Objective

In the case of the autonomous driving, given an front camera view, the car needs to know where is the road. In this project, we trained a neural network to label the pixels of a road in images, by using a method named Fully Convolutional Network (FCN). In this project, FCN-VGG16 is implemented and trained with KITTI dataset for road segmentation.

Demo

demo_gif

(click to see the full video)


1 Code & Files

1.1 My project includes the following files and folders

  • main.py is the main code for demos
  • project_tests.py includes the unittest
  • helper.py includes some helper functions
  • env-gpu-py35.yml is environmental file with GPU and Python3.5
  • data folder contains the KITTI road data, the VGG model and source images.
  • model folder is used to save the trained model
  • runs folder contains the segmentation examples of the testing data

1.2 Dependencies & my environment

Miniconda is used for managing my dependencies.

  • Python3.5, tensorflow-gpu, CUDA8, Numpy, SciPy
  • OS: Ubuntu 16.04
  • CPU: Intelยฎ Coreโ„ข i7-6820HK CPU @ 2.70GHz ร— 8
  • GPU: GeForce GTX 980M (8G memory)
  • Memory: 32G

1.3 How to run the code

(1) Download KITTI data (training and testing)

Download the Kitti Road dataset from here. Extract the dataset in the data folder. This will create the folder data_road with all the training a test images.

(2) Load pre-trained VGG

Function maybe_download_pretrained_vgg() in helper.py will do it automatically for you.

(3) Run the code:

python main.py

(4) Use my trained model to predict new images

You can download my trained model here and save it to the folder model. Also, you need to set the training flag to False in the main.py:

training_flag = False

Then run the code by:

python main.py

1.4. Release History

  • 0.1.1

    • Updated documents
    • Date 7 December 2017
  • 0.1.0

    • The first proper release
    • Date 6 December 2017

2 Network Architecture

2.1 Fully Convolutional Networks (FCN) in the Wild

FCNs can be described as the above example: a pre-trained model, follow by 1-by-1 convolutions, then followed by transposed convolutions. Also, we can describe it as encoder (a pre-trained model + 1-by-1 convolutions) and decoder (transposed convolutions).

2.2 Fully Convolutional Networks for Semantic Segmentation

The Semantic Segmentation network provided by this paper learns to combine coarse, high layer informaiton with fine, low layer information. The pooling and prediction layers are shown as grid that reveal relative spatial coarseness, while intermediate layers are shown as vertical lines

  • The encoder

    • VGG16 model pretrained on ImageNet for classification (see VGG16 architecutre below) is used in encoder.
    • And the fully-connected layers are replaced by 1-by-1 convolutions.
  • The decoder

    • Transposed convolution is used to upsample the input to the original image size.
    • Two skip connections are used in the model.

VGG-16 architecture

vgg16

2.3 Classification & Loss

we can approach training a FCN just like we would approach training a normal classification CNN.

In the case of a FCN, the goal is to assign each pixel to the appropriate class, and cross entropy loss is used as the loss function. We can define the loss function in tensorflow as following commands.

logits = tf.reshape(input, (-1, num_classes))
cross_entropy_loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits, labels))

Then, we have an end-to-end model for semantic segmentation.

3 Dataset

3.1 Training data examples from KITTI

Origin Image

Mask image

In this project, 384 labeled images are used as training data. Download the Kitti Road dataset from here.

3.2 Testing data

There are 4833 testing images are processed with the trained models. 4543 frames from are a video and other 290 images from random places in Karlsruhe.

4 Experiments

Some key parameters in training stage, and the traning loss and training time for each epochs are shown in the following table.

epochs = 37
batch_size = 8
learning_rate = 0.0001
epochs learning_rate exec_time (s) training_loss
1 0.0001 43.16 0.7978
2 0.0001 38.52 0.5058
3 0.0001 38.55 0.2141
4 0.0001 38.56 0.1696
5 0.0001 38.39 0.1339
6 0.0001 38.44 0.1215
7 0.0001 38.68 0.1089
8 0.0001 38.3 0.0926
9 0.0001 38.14 0.0913
10 0.0001 38.08 0.0837
11 0.0001 38.34 0.0703
12 0.0001 38.02 0.0663
13 0.0001 38.21 0.0585
14 0.0001 38.33 0.0549
15 0.0001 38.12 0.0525
16 0.0001 38.31 0.0483
17 0.0001 38.4 0.0465
18 0.0001 38.42 0.0454
19 0.0001 38.27 0.0421
20 0.0001 38.73 0.0404
21 0.0001 38.03 0.039
22 0.0001 38.22 0.0387
23 0.0001 37.95 0.0368
24 0.0001 38.22 0.0352
25 0.0001 38.91 0.0335
26 0.0001 38.67 0.0324
27 0.0001 38.21 0.0316
28 0.0001 38.2 0.0302
29 0.0001 38.13 0.0291
30 0.0001 38.19 0.0313
31 0.0001 38.15 0.0303
32 0.0001 38.16 0.0299
33 0.0001 38.11 0.0273
34 0.0001 38.21 0.0265
35 0.0001 38.16 0.0254
36 0.0001 38.62 0.0244
37 0.0001 37.99 0.0234

5 Discussion

5.1 Good Performance

With only 384 labeled training images, the FCN-VGG16 performs well to find where is the road in the testing data, and the testing speed is about 6 fps in my laptop. The model performs very well on either highway or urban driving. Some testing examples are shown as follows:

5.2 Limitations

Based on my test on 4833 testing images. There are two scenarios where th current model does NOT perform well: (1) turning spot, (2) over-exposed area.

The bad performance at the turning spots might be caused by the fact of lacking training examples that from turning spots, because almost all the training images are taken when the car was driving straight or turning slightly. We might be able to improve the performance by adding more training data that are taken at the turning spots. As for the over-exposed area, it is more challenging. One possible approach is to use white-balance techniques or image restoration methods to get the correct image. The other possible approach is to add more training data with over-exposed scenarios, and let the network to learn how to segment the road even under the over-expose scenarios.

Turning spot

Over-exposed area

More Repositories

1

tracking-with-Extended-Kalman-Filter

Object (e.g Pedestrian, vehicles) tracking by Extended Kalman Filter (EKF), with fused data from both lidar and radar sensors.
C++
709
star
2

vehicle-detection

Created vehicle detection pipeline with two approaches: (1) deep neural networks (YOLO framework) and (2) support vector machines ( OpenCV + HOG).
Python
622
star
3

driving-lane-departure-warning

Built a real-time lane departure warning system with a monocular camera, using OpenCV.
Python
211
star
4

tracking-with-Unscented-Kalman-Filter

Object (e.g Pedestrian, biker, vehicles) tracking by Unscented Kalman Filter (UKF), with fused data from both lidar and radar sensors.
C++
155
star
5

Model-Predictive-Control

This project is to use Model Predictive Control (MPC) to drive a car in a game simulator. The server provides reference waypoints (yellow line in the demo video) via websocket, and we use MPC to compute steering and throttle commands to drive the car. The solution must be robust to 100ms latency, since it might encounter in real-world application.
C++
152
star
6

traffic-light-detector

Detect traffic lights and classify the state of them, then give the commands "go" or "stop".
Python
62
star
7

PID-controller

Use a PID controller to control the steering angle and throttle for driving a car in a car game simulator.
C++
56
star
8

camera-pose-estimation

Given a map data (image + lidar), estimate the 6 DoF camera pose of the query image.
MATLAB
28
star
9

path_planning

Implement a simple real-time path planner in C++ to navigate a car around a simulated highway scenario
C++
21
star
10

kidnapped-vehicle

Implemented a C++ particle filter for real-time vehicle localization with only current visual observations and a map.
C++
19
star
11

traffic-light-classifier

A simple network to classifier the states of the traffic lights.
Python
16
star
12

DrivingLaneDetection

Detected highway lane lines on a video stream. Used OpencV image analysis techniques to identify lines, including Hough Transforms and Canny edge detection.
Jupyter Notebook
13
star
13

driving-behavioral-cloning

Built and trained a convolutional neural network for end-to-end driving in a simulator, using TensorFlow and Keras.
Python
11
star
14

traffic-sign-recognition

Built and trained a deep neural network to classify traffic signs, using TensorFlow.
Jupyter Notebook
9
star
15

ResNet_MobileNet_InceptionV3

An example to run ResNet, MobileNet and InceptionV3 with Keras and Imagenet weights
Python
1
star
16

rental-bike

Predict the number of rental bikes should be placed based on the environmental conditions.
Jupyter Notebook
1
star
17

calculator

C++ based Calculation Engines which can do multiplication and division with numbers, and it is implemented with template and factory method.
HTML
1
star