• Stars
    star
    153
  • Rank 243,368 (Top 5 %)
  • Language
    Python
  • License
    MIT License
  • Created about 7 years ago
  • Updated almost 2 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Easily convert RGB video data (e.g. .avi) to the TensorFlow tfrecords file format for training e.g. a NN in TensorFlow. This implementation allows to limit the number of frames per video to be stored in the tfrecords.

Downloads

Description

Easily convert RGB video data (e.g. tested with .avi and .mp4) to the TensorFlow tfrecords file format for training e.g. a NN in TensorFlow. Due to common hardware/GPU RAM limitations in Deep Learning, this implementation allows to limit the number of frames per video to be stored in the tfrecords or to simply use all video frames. The code automatically chooses the frame step size s.t. there is an equal separation distribution of the individual video frames.

The implementation offers the option to include Optical Flow (currently OpenCV's calcOpticalFlowFarneback) as an additional channel to the tfrecords data (it can be easily extended in this regard, for example, by exchanging the currently used Optical Flow algorithm with a different one). Acompanying the code, we've also added a small example with two .mp4 files from which two tfrecords batches are created (1 video per tfrecords file). To access the examples, make sure to use the GitHub repo instead of the pip package.

This implementation was created during a research project and grew historically. Therefore, we invite users encountering bugs to pull-request fixes.

Installation

run the following command:

pip install video2tfrecord 

Writing (video) to tfrecord

After installing the package, you execute the following exemplary command to start the video-to-tfrecord conversion:

from video2tfrecord import convert_videos_to_tfrecord

convert_videos_to_tfrecord(source_path, destination_path, n_videos_in_record, n_frames_per_video, "*.avi") 

while n_videos_in_record being the number of videos in one single tfrecord file, n_frames_per_video being the number of frames to be stored per video and source_path containing your .avi video files. Set n_frames_per_video="all" if you want all video frames to be stored in the tfrecord file (keep in mind that tfrecord can become very large).

Reading from tfrecord

see test.py for an example

Manual installation

If you want to set up your installation manually, use the install scripts provided in the repository.

The package has been successfully tested with:

  • Python 3.4, 3.5 and 3.6
  • tensorflow 1.5.0
  • opencv-python 3.4.0.12
  • numpy 1.14.0

OpenCV troubleshooting

If you encounter issues with OpenCV (e.g. because you use a different version), you can build OpenCV locally from the repository [1] (e.g. refer to StackOverflow thread under [2]). Make sure to use the specified version as in different versions there might be changes to functions within the OpenCV framework.

Parameters and storage details

By adjusting the parameters at the top of the code you can control:

  • input dir (containing all the video files)
  • output dir (to which the tfrecords should be saved)
  • resolution of the images
  • video file suffix (e.g. *.avi) as RegEx(!include asterisk!)
  • number of frames per video that are actually stored in the tfrecord entries (can be smaller than the real number of frames)
  • image color depth
  • if optical flow should be added as a 4th channel
  • number of videos a tfrecords file should contain

The videos are stored as features in the tfrecords. Every video instance contains the following data/information:

  • feature[path] (as byte string while path being "blobs/i" with 0 <= i <=number of images per video)
  • feature['height'] (while height being the image height, e.g. 128)
  • feature['width'] (while width being the image width, e.g. 128)
  • feature['depth'] (while depth being the image depth, e.g. 4 if optical flow used)

Future work:

  • supervised learning: allow to include a label file (e.g. .csv) that specifies the relationship <videoid> to <label> in each row and store label information in the records
  • use compression mode in TFRecordWriter (options = tf.python_io.TFRecordOptions(tf.python_io.TFRecordCompressionType.GZIP))
  • improve documentation
  • add the option to use all video frames instead of just a subset (use n_frames_per_video="all")
  • write small exemplary script for loading the tfrecords + meta-data into a TF QueueRunner (see test.py)
  • replace Farneback optical flow with a more sophisticated method, say dense trajectories
  • Question to users: would it make sense to offer video2tfrecord as a web service (i.e. upload videos, get tfrecords back)?

Additional contributors: Jonas Rothfuss (https://github.com/jonasrothfuss/)

More Repositories

1

mppi_pendulum

The reimplementation of Model Predictive Path Integral (MPPI) from the paper "Information Theoretic MPC for Model-Based Reinforcement Learning" (Williams et al., 2017) for the pendulum OpenAI Gym environment
Python
74
star
2

movement_primitives_via_optimization

Implementation of the paper "Movement Primitives via Optimization" (Dragan et al., 2016). It includes both the adaptation of trajectories with DMP and learning a better adaptation norm.
Python
21
star
3

learningdynamics

The code for the NeurIPS 2019 Graph Representation Learning workshop paper "Learning Visual Dynamics Models of Rigid Objects using Relational Inductive Biases" (Ferreira et al., 2019)
Python
7
star
4

minimum-edit-distance-py

Python implementation of the standard Wagner-Fischer edit distance for two strings. It computes the string dissimilarity based on the number of operations required to transform one string into the other.
Python
5
star
5

vigenere-py

Algorithm for deciphering a message encoded with the Vigenère cipher written in python 3
Python
5
star
6

FlyingShapesDataset

A very simple toy dataset ideally for testing deep neural networks before deploying them on real-world data. It is comprised of 90 000 videos showing two objects of equal shape (rectangle, triangle and circle) and size in which one object approaches the other one.
Python
5
star
7

PlanarManipulatorDataset

A very simple toy dataset ideally for testing deep neural networks before deploying them on real-world data. It is comprised of 90 000 videos showing a planar robot manipulator executing articulated manipulation tasks by grasping a circular object and placing it on top of a square platform.
Python
4
star
8

Intro_to_ML_DHBW

Machine Learning introductory course at the Baden-Württemberg Cooperative State University Karlsruhe
TeX
2
star
9

tabular_rl

Python
1
star