tf.crop_and_resize function used for feature pyramid network, Million thanks to longwc ported it from tensorflow!
PyTorch has noNotice: We have no time to continue this project, the model is converted and performing well; The data pipeline is 95% complete, for the training you may study well for the loss function. :)
tf+keras model and run python convert_weights/convert_weights.py
or download the converted model at Dropbox.
Download the INSTALLATION
CUDA CODE:
Build NMS and ROIAlign/CropAndResize
- Changing
-arch
inlib/make.sh
for your GPU# Which CUDA capabilities do we want to pre-build for? # https://developer.nvidia.com/cuda-gpus # Compute/shader model Cards # 6.1 P4, P40, Titan Xp, GTX 1080 Ti, GTX 1080 # 6.0 P100 # 5.2 M40, Titan X, GTX 980 # 3.7 K80 # 3.5 K40, K20 # 3.0 K10, Grid K520 (AWS G2)
cd lib
./make.sh
MS COCO Requirements:
Install pycocotools
from forks of the original pycocotools with fixes for Python3 and Windows (the official repo doesn't seem to be active anymore).
- Linux: https://github.com/waleedka/coco
- Windows: https://github.com/philferriere/cocoapi.
To train or test on MS COCO, you'll also need:
- pycocotools (installation instructions below)
- MS COCO Dataset
- Download the 5K minival and the 35K validation-minus-minival subsets. More details in the original Faster R-CNN implementation.
Structure
-
./convert_weights
: How to convert the weights from tf+keras version of MASK-RCNN. -
./network
: The definitions for the mask rcnn. -
./preprocess
: All the scripts for the data pipeline: Transform raw image and labels. -
./postprocess
: For the model's output... -
./README
: This package contains image will showed on the Github.
Demo
Picture demo
Change the path of the model at demo.py
, and then run:
python demo.py
Realtime webcam demo
Change the path of the model at realtime_demo.py
, and then run:
python realtime_demo.py
Evaluation
Change the path of the model at eval.py
, and then run:
python eval.py
tf+keras model has 0.347, the difference may come from some upsample function, or other issues which would be great if you want to dive deep:
mAP of Bbox,tf+keras model has 0.296, the difference may come from some upsample function, or other issues which would be great if you want to dive deep:
mAP of SegmentationTraining(Not woring for now)...
Data loader
python preprocess/test_data_loader
Loss function
Loss function is at network/mask_rcnn.py
, you may need study well for the loss function in the keras code and modify it at network/mask_rcnn.py
.
Using data loader backpropagation loss to train the model
The fit_loader
(ncullen93/torchsample#24) function in torchSample maybe a good replacer for fit_generator
used by keras.
Pipeline Description
Overview:
Stage 0, Resnet101 and Feature Pyramid Network to Extrac Features of the Image:
Stage 1, Region Proposal Network:
The Region Proposal Network (RPN) runs a lightweight binary classifier on a lot of boxes (anchors) over the image and returns object/no-object scores. Anchors with high objectness score (positive anchors) are passed to the stage two to be classified.
Often, even positive anchors don't cover objects fully. So the RPN also regresses a refinement (a delta in location and size) to be applied to the anchros to shift it and resize it a bit to the correct boundaries of the object.
1.1 RPN Targets:
The RPN targets are the training values for the RPN. To generate the targets, we start with a grid of anchors that cover the full image at different scales, and then we compute the IoU of the anchors with ground truth object. Positive anchors are those that have an IoU >= 0.7 with any ground truth object, and negative anchors are those that don't cover any object by more than 0.3 IoU. Anchors in between (i.e. cover an object by IoU >= 0.3 but < 0.7) are considered neutral and excluded from training.
To train the RPN regressor, we also compute the shift and resizing needed to make the anchor cover the ground truth object completely.
1.2 RPN Predictions:
1.3 RoIAlign
Stage 2, Proposal Classification:
This stage takes the region proposals from the RPN and feed the result to detection and mask branch respectively.
2.2 Detection
Per-Class Non-Max Suppression
Detections after NMS
2.2 Bounding Box Refinement
This is an example of final detection boxes (dotted lines) and the refinement applied to them (solid lines) in the second stage.
Stage 3: Generating Masks
This stage takes the detections (refined bounding boxes and class IDs) from the previous layer and runs the mask head to generate segmentation masks for every instance.