• Stars
    star
    528
  • Rank 83,941 (Top 2 %)
  • Language
    Python
  • License
    MIT License
  • Created about 4 years ago
  • Updated about 1 month ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Dataset Management Framework, a Python library and a CLI tool to build, analyze and manage Computer Vision datasets.

Dataset Management Framework (Datumaro)

Build status codecov

A framework and CLI tool to build, transform, and analyze datasets.

VOC dataset                                  ---> Annotation tool
     +                                     /
COCO dataset -----> Datumaro ---> dataset ------> Model training
     +                                     \
CVAT annotations                             ---> Publication, statistics etc.

Features

(Back to top)

  • Dataset reading, writing, conversion in any direction.

    Other formats and documentation for them can be found here.

  • Dataset building

    • Merging multiple datasets into one
    • Dataset filtering by a custom criteria:
      • remove polygons of a certain class
      • remove images without annotations of a specific class
      • remove occluded annotations from images
      • keep only vertically-oriented images
      • remove small area bounding boxes from annotations
    • Annotation conversions, for instance:
      • polygons to instance masks and vice-versa
      • apply a custom colormap for mask annotations
      • rename or remove dataset labels
    • Splitting a dataset into multiple subsets like train, val, and test:
      • random split
      • task-specific splits based on annotations, which keep initial label and attribute distributions
        • for classification task, based on labels
        • for detection task, based on bboxes
        • for re-identification task, based on labels, avoiding having same IDs in training and test splits
    • Sampling a dataset
      • analyzes inference result from the given dataset and selects the ‘best’ and the ‘least amount of’ samples for annotation.
      • Select the sample that best suits model training.
        • sampling with Entropy based algorithm
  • Dataset quality checking

    • Simple checking for errors
    • Comparison with model inference
    • Merging and comparison of multiple datasets
    • Annotation validation based on the task type(classification, etc)
  • Dataset comparison

  • Dataset statistics (image mean and std, annotation statistics)

  • Model integration

    • Inference (OpenVINO, Caffe, PyTorch, TensorFlow, MxNet, etc.)
    • Explainable AI (RISE algorithm)
      • RISE for classification
      • RISE for object detection

Check the design document for a full list of features. Check the user manual for usage instructions.

Contributing

(Back to top)

Feel free to open an Issue, if you think something needs to be changed. You are welcome to participate in development, instructions are available in our contribution guide.

Telemetry data collection note

The OpenVINOâ„¢ telemetry library is used to collect basic information about Datumaro usage.

To enable/disable telemetry data collection please see the guide.

More Repositories

1

openvino

OpenVINOâ„¢ is an open-source toolkit for optimizing and deploying AI inference
C++
7,074
star
2

open_model_zoo

Pre-trained Deep Learning models and demos (high quality and extremely fast)
Python
4,086
star
3

anomalib

An anomaly detection library comprising state-of-the-art algorithms and features such as experiment management, hyper-parameter optimization, and edge inference.
Python
3,761
star
4

openvino_notebooks

📚 Jupyter notebook tutorials for OpenVINO™
Jupyter Notebook
2,372
star
5

training_extensions

Train, Evaluate, Optimize, Deploy Computer Vision Models via OpenVINOâ„¢
Python
1,139
star
6

nncf

Neural Network Compression Framework for enhanced OpenVINOâ„¢ inference
Python
925
star
7

model_server

A scalable inference server for models optimized with OpenVINOâ„¢
C++
660
star
8

openvino_tensorflow

OpenVINOâ„¢ integration with TensorFlow
C++
178
star
9

openvino.genai

Run Generative AI models using native OpenVINO C++ API
C++
120
star
10

openvino_contrib

Repository for OpenVINO's extra modules
C++
105
star
11

awesome-openvino

A curated list of OpenVINO based AI projects
99
star
12

geti-sdk

Software Development Kit (SDK) for the Intel® Geti™ platform for Computer Vision AI model training.
Python
73
star
13

docker_ci

The framework to generate a Dockerfile, build, test, and deploy a docker image with OpenVINOâ„¢ toolkit.
Dockerfile
58
star
14

training_toolbox_caffe

Training Toolbox for Caffe
Jupyter Notebook
49
star
15

openvino_build_deploy

Pre-built components and code samples to help you build and deploy production-grade AI applications with the OpenVINOâ„¢ Toolkit from Intel
Jupyter Notebook
42
star
16

npu_plugin

OpenVINO NPU Plugin
MLIR
33
star
17

workbench

TypeScript
28
star
18

model_api

C++
25
star
19

openvino_xai

OpenVINOâ„¢ Explainable AI (XAI) Toolkit: Visual Explanation for OpenVINO Models
Python
24
star
20

openvino_tokenizers

OpenVINO Tokenizers extension
C++
22
star
21

model_preparation_algorithm

Model Preparation Algorithm: a Transfer Learning Framework
Python
21
star
22

security_addon

OpenVINOâ„¢ Security Add-on to control access to inferencing models.
C
16
star
23

operator

OpenVINO operator for OpenShift and Kubernetes
Go
13
star
24

model_analyzer

Model Analyzer is the Network Statistic Information tool
Python
11
star
25

workbench_aux

OpenVINOâ„¢ Toolkit - Deep Learning Workbench repository Auxuliary Assets
Python
10
star
26

mlas

Assembly
8
star
27

hyper_parameter_optimization

Python library of automatic hyper-parameter optimization
Python
6
star
28

openvino_docs

OpenVINOâ„¢ Toolkit documentation repository
Python
3
star
29

MLPerf

C++
2
star
30

telemetry

Python
1
star
31

npu_plugin_btc

C++
1
star
32

cpu_extensions

1
star
33

npu_plugin_elf

C++
1
star