• Stars
    star
    205
  • Rank 190,531 (Top 4 %)
  • Language
    Jupyter Notebook
  • License
    MIT License
  • Created almost 3 years ago
  • Updated about 1 month ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

PyTorch Faster R-CNN Object Detection on Custom Dataset

A Simple Pipeline to Train PyTorch FasterRCNN Model

Train PyTorch FasterRCNN models easily on any custom dataset. Choose between official PyTorch models trained on COCO dataset, or choose any backbone from Torchvision classification models, or even write your own custom backbones.

You can run a Faster RCNN model with Mini Darknet backbone and Mini Detection Head at more than 150 FPS on an RTX 3080.

Get Started

​ Open In Colab Kaggle

Check All Updates Here

Latest Update

  • Filter classes to visualize during inference using the --classes command line argument with space separated class indices from the dataset YAML file.

    For example, to visualize only persons in COCO dataset, use, python inference.py --classes 1 <rest of the command>

    To visualize person and car, use, python inference.py --classes 1 3 <rest of the command>

  • Added Deep SORT Real-Time tracking to inference_video.py and onnx_video_inference.py. Using --track command with the usual inference command. Support for MobileNet Re-ID for now.

Custom Model Naming Conventions

For this repository:

  • Small head refers to 512 representation size in the Faster RCNN head and predictor.
  • Tiny head refers to 256 representation size in the Faster RCNN head and predictor.
  • Nano head refers to 128 representation size in the Faster RCNN head and predictor.

Check All Available Model Flags

Go To

Setup on Ubuntu

  1. Clone the repository.

    git clone https://github.com/sovit-123/fastercnn-pytorch-training-pipeline.git
  2. Install requirements.

    1. Method 1: If you have CUDA and cuDNN set up already, do this in your environment of choice.

      pip install -r requirements.txt
    2. Method 2: If you want to install PyTorch with CUDA Toolkit in your environment of choice.

      conda install pytorch torchvision torchaudio cudatoolkit=10.2 -c pytorch

      OR

      conda install pytorch torchvision torchaudio cudatoolkit=11.3 -c pytorch

      OR install the version with CUDA support as per your choice from here.

      Then install the remaining requirements.

Setup on Windows

  1. First you need to install Microsoft Visual Studio from here. Sing In/Sing Up by clicking on this link and download the Visual Studio Community 2017 edition.

    Install with all the default chosen settings. It should be around 6 GB. Mainly, we need the C++ Build Tools.

  2. Then install the proper pycocotools for Windows.

    pip install git+https://github.com/gautamchitnis/cocoapi.git@cocodataset-master#subdirectory=PythonAPI
  3. Clone the repository.

    git clone https://github.com/sovit-123/fastercnn-pytorch-training-pipeline.git
  4. Install PyTorch with CUDA support.

    conda install pytorch torchvision torchaudio cudatoolkit=10.2 -c pytorch

    OR

    conda install pytorch torchvision torchaudio cudatoolkit=11.3 -c pytorch

    OR install the version with CUDA support as per your choice from here.

    Then install the remaining requirements except for pycocotools.

Train on Custom Dataset

Taking an exmaple of the smoke dataset from Kaggle. Let's say that the dataset is in the data/smoke_pascal_voc directory in the following format. And the smoke.yaml is in the data_configs directory. Assuming, we store the smoke data in the data directory

├── data
│   ├── smoke_pascal_voc
│   │   ├── archive
│   │   │   ├── train
│   │   │   └── valid
│   └── README.md
├── data_configs
│   └── smoke.yaml
├── models
│   ├── create_fasterrcnn_model.py
│   ...
│   └── __init__.py
├── outputs
│   ├── inference
│   └── training
│       ...
├── readme_images
│   ...
├── torch_utils
│   ├── coco_eval.py
│   ...
├── utils
│   ├── annotations.py
│   ...
├── datasets.py
├── inference.py
├── inference_video.py
├── __init__.py
├── README.md
├── requirements.txt
└── train.py

The content of the smoke.yaml should be the following:

# Images and labels direcotry should be relative to train.py
TRAIN_DIR_IMAGES: ../../xml_od_data/smoke_pascal_voc/archive/train/images
TRAIN_DIR_LABELS: ../../xml_od_data/smoke_pascal_voc/archive/train/annotations
# VALID_DIR should be relative to train.py
VALID_DIR_IMAGES: ../../xml_od_data/smoke_pascal_voc/archive/valid/images
VALID_DIR_LABELS: ../../xml_od_data/smoke_pascal_voc/archive/valid/annotations

# Class names.
CLASSES: [
    '__background__',
    'smoke'
]

# Number of classes (object classes + 1 for background class in Faster RCNN).
NC: 2

# Whether to save the predictions of the validation set while training.
SAVE_VALID_PREDICTION_IMAGES: True

Note that the data and annotations can be in the same directory as well. In that case, the TRAIN_DIR_IMAGES and TRAIN_DIR_LABELS will save the same path. Similarly for VALID images and labels. The datasets.py will take care of that.

Next, to start the training, you can use the following command.

Command format:

python train.py --data <path to the data config YAML file> --epochs 100 --model <model name (defaults to fasterrcnn_resnet50)> --name <folder name inside output/training/> --batch 16

In this case, the exact command would be:

python train.py --data data_configs/smoke.yaml --epochs 100 --model fasterrcnn_resnet50_fpn --name smoke_training --batch 16

The terimal output should be similar to the following:

Number of training samples: 665
Number of validation samples: 72

3,191,405 total parameters.
3,191,405 training parameters.
Epoch     0: adjusting learning rate of group 0 to 1.0000e-03.
Epoch: [0]  [ 0/84]  eta: 0:02:17  lr: 0.000013  loss: 1.6518 (1.6518)  time: 1.6422  data: 0.2176  max mem: 1525
Epoch: [0]  [83/84]  eta: 0:00:00  lr: 0.001000  loss: 1.6540 (1.8020)  time: 0.0769  data: 0.0077  max mem: 1548
Epoch: [0] Total time: 0:00:08 (0.0984 s / it)
creating index...
index created!
Test:  [0/9]  eta: 0:00:02  model_time: 0.0928 (0.0928)  evaluator_time: 0.0245 (0.0245)  time: 0.2972  data: 0.1534  max mem: 1548
Test:  [8/9]  eta: 0:00:00  model_time: 0.0318 (0.0933)  evaluator_time: 0.0237 (0.0238)  time: 0.1652  data: 0.0239  max mem: 1548
Test: Total time: 0:00:01 (0.1691 s / it)
Averaged stats: model_time: 0.0318 (0.0933)  evaluator_time: 0.0237 (0.0238)
Accumulating evaluation results...
DONE (t=0.03s).
IoU metric: bbox
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.001
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.002
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.000
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.000
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.001
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.009
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.007
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.029
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.074
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.028
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.088
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.167
SAVING PLOTS COMPLETE...
...
Epoch: [4]  [ 0/84]  eta: 0:00:20  lr: 0.001000  loss: 0.9575 (0.9575)  time: 0.2461  data: 0.1662  max mem: 1548
Epoch: [4]  [83/84]  eta: 0:00:00  lr: 0.001000  loss: 1.1325 (1.1624)  time: 0.0762  data: 0.0078  max mem: 1548
Epoch: [4] Total time: 0:00:06 (0.0801 s / it)
creating index...
index created!
Test:  [0/9]  eta: 0:00:02  model_time: 0.0369 (0.0369)  evaluator_time: 0.0237 (0.0237)  time: 0.2494  data: 0.1581  max mem: 1548
Test:  [8/9]  eta: 0:00:00  model_time: 0.0323 (0.0330)  evaluator_time: 0.0226 (0.0227)  time: 0.1076  data: 0.0271  max mem: 1548
Test: Total time: 0:00:01 (0.1116 s / it)
Averaged stats: model_time: 0.0323 (0.0330)  evaluator_time: 0.0226 (0.0227)
Accumulating evaluation results...
DONE (t=0.03s).
IoU metric: bbox
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.137
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.313
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.118
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.029
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.175
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.428
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.204
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.306
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.347
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.140
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.424
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.683
SAVING PLOTS COMPLETE...

Distributed Training

Training on 2 GPUs:

export CUDA_VISIBLE_DEVICES=0,1
python -m torch.distributed.launch --nproc_per_node=2 --use_env train.py --data data_configs/smoke.yaml --epochs 100 --model fasterrcnn_resnet50_fpn --name smoke_training --batch 16

Inference

Image Inference on COCO Pretrained Model

By default using Faster RCNN ResNet50 FPN V2 model.

python inference.py

Use model of your choice with an image input.

python inference.py --model fasterrcnn_mobilenetv3_large_fpn --input example_test_data/image_1.jpg

Image Inference in Custom Trained Model

In this case you only need to give the weights file path and input file path. The config file and the model name are optional. If not provided they will will be automatically inferred from the weights file.

python inference.py --input data/inference_data/image_1.jpg --weights outputs/training/smoke_training/last_model_state.pth

Video Inference on COCO Pretrrained Model

python inference_video.py

Video Inference in Custom Trained Model

python inference_video.py --input data/inference_data/video_1.mp4 --weights outputs/training/smoke_training/last_model_state.pth 

Tracking using COCO Pretrained Models

# Track all COCO classes (Faster RCNN ResNet50 FPN V2).
python inference_video.py --track --model fasterrcnn_resnet50_fpn_v2 --show

# Track all COCO classes (Faster RCNN ResNet50 FPN V2) using own video.
python inference_video.py --track --model fasterrcnn_resnet50_fpn_v2 --show --input ../inference_data/video_1.mp4

# Tracking only person class (index 1 in COCO pretrained). Check `COCO_91_CLASSES` attribute in `data_configs/coco.yaml` for more information.
python inference_video.py --track --model fasterrcnn_resnet50_fpn_v2 --show --input ../inference_data/video_4.mp4 --classes 1

# Tracking only person and car classes (indices 1 and 3 in COCO pretrained). Check `COCO_91_CLASSES` attribute in `data_configs/coco.yaml` for more information.
python inference_video.py --track --model fasterrcnn_resnet50_fpn_v2 --show --input ../inference_data/video_4.mp4 --classes 1 3

# Tracking using custom trained weights. Just provide the path to the weights instead of model name.
python inference_video.py --track --weights outputs/training/fish_det/best_model.pth --show --input ../inference_data/video_6.mp4

Evaluation

Replace the required arguments according to your need.

python eval.py --model fasterrcnn_resnet50_fpn_v2 --weights outputs/training/trial/best_model.pth --data data_configs/aquarium.yaml --batch 4

You can use the following command to show a table for class-wise Average Precision (--verbose additionally needed).

python eval.py --model fasterrcnn_resnet50_fpn_v2 --weights outputs/training/trial/best_model.pth --data data_configs/aquarium.yaml --batch 4 --verbose

A List of All Model Flags to Use With the Training Script

The following command expects the coco dataset to be present one directory back inside the input folder in XML format. You can find the dataset here on Kaggle. Check the data_configs/coco.yaml for more details. You can change the relative dataset path in the YAML file according to your structure.

# Usage 
python train.py --model fasterrcnn_resnet50_fpn_v2 --data data_configs/coco.yaml

OR USE ANY ONE OF THE FOLLOWING

[
    'fasterrcnn_convnext_small',
    'fasterrcnn_convnext_tiny',
    'fasterrcnn_custom_resnet', 
    'fasterrcnn_darknet',
    'fasterrcnn_efficientnet_b0',
    'fasterrcnn_efficientnet_b4',
    'fasterrcnn_mbv3_small_nano_head',
    'fasterrcnn_mbv3_large',
    'fasterrcnn_mini_darknet_nano_head',
    'fasterrcnn_mini_darknet',
    'fasterrcnn_mini_squeezenet1_1_small_head',
    'fasterrcnn_mini_squeezenet1_1_tiny_head',
    'fasterrcnn_mobilenetv3_large_320_fpn', # Torchvision COCO pretrained
    'fasterrcnn_mobilenetv3_large_fpn', # Torchvision COCO pretrained
    'fasterrcnn_nano',
    'fasterrcnn_resnet18',
    'fasterrcnn_resnet50_fpn_v2', # Torchvision COCO pretrained
    'fasterrcnn_resnet50_fpn',  # Torchvision COCO pretrained
    'fasterrcnn_resnet101',
    'fasterrcnn_resnet152',
    'fasterrcnn_squeezenet1_0',
    'fasterrcnn_squeezenet1_1_small_head',
    'fasterrcnn_squeezenet1_1',
    'fasterrcnn_vitdet',
    'fasterrcnn_vitdet_tiny',
    'fasterrcnn_mobilevit_xxs',
    'fasterrcnn_regnet_y_400mf'
]

Tutorials

More Repositories

1

Traffic-Light-Detection-Using-YOLOv3

Traffic light detection using deep learning with the YOLOv3 framework. PyTorch => YOLOv3
Jupyter Notebook
59
star
2

vision_transformers

Vision Transformers for image classification, image segmentation, and object detection.
Python
40
star
3

image-deblurring-using-deep-learning

PyTorch implementation of image deblurring using deep learning. Use a simple convolutional autoencoder neural network to deblur Gaussian blurred images.
Jupyter Notebook
20
star
4

CamVid-Image-Segmentation-using-FCN-ResNet50-with-PyTorch

Deep learning semantic segmentation on the Camvid dataset using PyTorch FCN ResNet50 neural network.
Python
12
star
5

attention_is_all_you_need

Implementation of language model papers along with several examples [NOT ALL WRITTEN FROM SCRATCH].
Python
12
star
6

Pneumonia-Detection-using-Deep-Learning

Detecting pneumonia from chest radiographs using deep learning with the PyTorch framework. Faster RCNN ResNet50 backbone.
Python
11
star
7

German-Traffic-Sign-Recognition-with-Deep-Learning

Recognizing traffic signs with deep learning and PyTorch using Spatial Transformer Convolutional Neural Networks.
Jupyter Notebook
8
star
8

SSD300-VGG11-on-Pascal-VOC-2005-Data

This project trains a SSD300 with VGG11 base on the PASCAL VOC 2005 dataset using the PyTorch deep learning framework.
Python
4
star
9

local_file_search

Local file search using embedding techniques
Python
4
star
10

lm_sft

Various LMs/LLMs below 3B parameters (for now) trained using SFT (Supervised Fine Tuning) for several downstream tasks
Jupyter Notebook
4
star
11

Diabetic-Retinopathy-NN

Jupyter Notebook
3
star
12

detr-custom-training

Training DETR (Detection Transformer) on custom object detection datasets.
Python
3
star
13

American-Sign-Language-Detection-using-Deep-Learning

This project aims to detect American Sign Language using PyTorch and deep learning. The neural network can also detect the sign language letters in real-time from a webcam video feed.
Python
3
star
14

Video-Recognition-using-Deep-Learning

This project uses deep learning and the PyTorch framework to detect sports action categories in videos in real-time. The neural network is a simple custom neural network built with PyTorch.
Python
3
star
15

Deep-Learning-Image-Super-Resolution

This is a deep learning project based on the Image Super-Resolution Using Deep Convolutional Networks - SRCNN paper using the PyTorch deep learning library.
Python
2
star
16

Robust_Neural_Networks_by_Adding_Noise_to_Data

PyTorch implementation of building robust deep learning neural networks by adding noise to image data before training.
Python
2
star
17

chat_llama

Chat with LLama 2 models (locally)
Python
2
star
18

pytorch-dcgan-framework

A small PyTorch framework to try out and train DCGAN on different datasets.
Python
1
star
19

Semantic-Segmentation-using-Fully-Convlutional-Networks

Implementation of Fully Convolutional Network for semantic segmentation using PyTorch framework
Python
1
star
20

opencv-resize-heroku-test

Python
1
star
21

pytorch-efficientdet-api

A PyTorch EfficientDet API for easy training and inference on custom datasets.
Python
1
star
22

Fast-and-Accurate-Human-Detection-with-HOG

This project used OpenCV HOG people detector to build an accurate and fast enough implementation to detect people in images and videos.
Python
1
star
23

English-to-French-Neural-Machine-Translator

A deep learning model to translate English sentences to French sentences.
Jupyter Notebook
1
star
24

gpt_instruct

GPT instruct based models in less than a billion parameters.
Jupyter Notebook
1
star
25

Diabetic-Retinopathy

Python
1
star
26

efficient_seg

Python
1
star