• Stars
    star
    111
  • Rank 314,510 (Top 7 %)
  • Language
    TeX
  • License
    MIT License
  • Created over 10 years ago
  • Updated almost 9 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Efficient feature extraction, aggregation and classification for action recognition (CVPR 2014)

Information & Contact

An earlier version of this code was used to compute the results of the following paper:

"Efficient feature extraction, encoding and classification for action recognition",
Vadim Kantorov, Ivan Laptev,
In Proc. Computer Vision and Pattern Recognition (CVPR), IEEE, 2014

If you use this code, please cite our work:

@inproceedings{kantorov2014,
      author = {Kantorov, V. and Laptev, I.},
      title = {Efficient feature extraction, encoding and classification for action recognition},
      booktitle = {Proc. Computer Vision and Pattern Recognition (CVPR), IEEE, 2014},
      year = {2014}
}

The paper and the poster are available at the project webpage or in this repository, the binaries are published on the repository releases page, the Hollywood-2 and HMDB-51 repro scripts are in the [repro directory] (http://github.com/vadimkantorov/cvpr2014/tree/master/repro/).

Please submit bugs on GitHub directly.

For any other question, please contact Vadim Kantorov at [email protected] or [email protected].

Description and usage

We release two tools in this repository. The first tool fastvideofeat is a motion feature extractor based on motion vectors from video compression information. The second is a fast Fisher vector computation tool fastfv that uses vector SSE2 CPU instructions.

We also release scripts (in the repro directory) for reproducing our results on Hollywood-2 dataset and on HMDB-51 dataset.

All code is released under the MIT license.

fastvideofeat

The tool accepts a video file path as input and writes descriptors to standard output.

Command-line options:
Option Description
--disableHOG disables HOG descriptor computation
--disableHOF disables HOF descriptor computation
--disableMBH disables MBH descriptor computation
-f 1-10 restricts descriptor computation to the given frame range

IMPORTANT Frame range is specified in terms of PTS (presentation time stamp) which are usually equivalent to frame indices, but not always. Beware. You can inspect PTS values of the frames of the video using ffmpeg's ffprobe (fourth column):

$ ffprobe -print_format csv -show_packets -select_streams 0 video.mp4

The output format (also reminded on standard error):

#Descriptor format: xnorm ynorm tnorm pts StartPTS EndPTS Xoffset Yoffset PatchWidth PatchHeight hog (dim. 96) hof (dim. 108) mbhx (dim. 96) mbhy (dim. 96)

  • xnorm and ynorm are the normalized frame coordinates of the spatio-temporal (s-t) patch
  • tnorm and pts are the normalized and unnormalized frame number of the s-t patch center
  • StartPTS and EndPTS are the frame numbers of the first and last frames of the s-t patch
  • Xoffset and Yoffset are the non-normalized frame coordinates of the s-t patch
  • PatchWidth and PatchHeight are the non-normalized width and height of teh s-t patch
  • descr is the array of floats of concatenated descriptors. The size of this array depends on the enabled descriptor types. All values are from zero to one. The first comment line describes the enabled descriptor types, their order in the array, and the dimension of each descriptor in the array.

Every line on standard output corresponds to an extracted descriptor of a patch anc consists of tab-separated floats.

Examples:
  • Compute HOG, HOF, MBH and save the descriptors in descriptors.txt:

    $ ./fastvideofeat video.avi > descriptors.txt

  • Compute only HOF and MBH from the first 600 frames and save the descriptors in descriptors.txt:

    $ ./fastvideofeat video.avi --disableHOG -f 1-600 > descriptors.txt

More examples in examples/compute_mpeg_features.sh.

Video format support

We've tested fastvideofeat only videos encoded in H.264 and MPEG-4. Whether motion vectors can be extracted and processed depends completely on FFmpeg's ability to put them into the right structures. Last time I've checked it was not working for VP9, for example. And in general, video reading depends fully on FFmpeg libraries.

fastfv

The tool accepts descriptors on the standard input and writes Fisher vector (FV) to the standard output. The tool consumes GMM vocabs saved by Yael library. A sample script to build GMM vocabs with Yael is provided, as well as its usage example.

IMPORTANT The computed Fisher vectors are non-normalized, apply signed square rooting / power normalization, L2-normalization, clipping etc before training a classifier.

Command-line options:
Option Description
--xpos 0 specifies the column with x coordinate of the s-t patch in the descriptor array
--ypos 1 specifies the column with y coordinate of the s-t patch in the descriptor array
--tpos 2 specifies the column with t coordinate of the s-t patch in the descriptor array
--knn 5 FV parts corresponding to these many closest GMM centroids will be updated during processing of every input descriptor
--vocab 10-105 10-105.hog.gmm specifies descriptor type location and path to GMM vocab. This option is mandatory, and several options of this kind are allowed.
--enableflann 4 32 use FLANN instead of knn for descriptor attribution, first argument is number of kd-trees, second argument is number of checks performed during attribution
--enablespatiotemporalgrids enables spatio-temporal grids: 1x1x1, 1x3x1, 1x1x2
--enablesecondorder enables second-order part of the Fisher vector
Examples:
  • Compute Fisher vector:

    $ zcat sample_features_mpeg4.txt.gz | ../bin/fastfv --xpos 0 --ypos 1 --tpos 2 --enablespatiotemporalgrids --enableflann 4 32 --vocab 10-105 hollywood2_sample_vocabs/10-105.hog.gmm --vocab 106-213 hollywood2_sample_vocabs/106-213.hog.gmm --vocab 214-309 hollywood2_sample_vocabs/214-309.mbhx.gmm --vocab 310-405 hollywood2_sample_vocabs/310-405.mbhy.gmm > fv.txt

  • Build GMM vocab with Yael:

    $ PYTHONPATH=$(pwd)/../bin/dependencies/yael/yael:$PYTHONPATH cat features*.gz | ../src/gmm_train.py --gmm_ncomponents 256 --vocab 10-105 10-105.hog.gmm

Examples are explained in examples/compute_fisher_vector.sh.

Performance

We haven't observed enabling second order boosts accuracy, so it's disabled by default. Enabling second order part increases Fisher vector size twice.

Using simple knn descriptor attribution (default) beats FLANN in speed by a factor of two, yet leads to <1% accuracy degradation. It's enabled by default because of its speed.

Enabling spatio-temporal grids (disabled by default) is important for maximum accuracy (~2% gain).

If you use FLANN, it's the number of checks that defines speed, try reducing it to gain speed.

Building from source

On both Linux and Windows, the binaries will appear in bin after building. By default, code links statically with dependencies below, check Makefiles for details.

Dependencies for fastvideofeat:

Dependencies for fastfv:

The code is known to work with OpenCV 2.4.9, FFmpeg 2.4, Yael 4.38, ATLAS 3.10.2, LAPACK 3.5.0.

Linux

Make sure you have the dependencies installed and visible to g++ (a minimal installation script is in the bin/dependencies directory). Build the tools by running make.

Windows

Only fastvideofeat builds and works on Windows, fastfv doesn't build because yael currently does not support Windows.

To build fastvideofeat, set in Makefile the good paths to the dependencies, processor architecture and Visual C++ version, and run from an appropriate Visual Studio Developer Command Prompt (specifically, VS2013 x64 Native Tools Command Prompt worked for us):

$ nmake -f Makefile.nmake

Notes

For practical usage, software needs to be modified to save and read features in some binary format, because the overhead on text file reading/writing is huge.

License and acknowledgements

All code and scripts are licensed under the MIT license.

We greatly thank Heng Wang and his work which was of significant help.

More Repositories

1

caffemodel2pytorch

Convert Caffe models to PyTorch
Python
358
star
2

mpegflow

A tool for easy extraction of motion vectors stored in video files
C++
253
star
3

contextlocnet

ContextLocNet: Context-aware Deep Network Models for Weakly Supervised Localization (ECCV 2016)
Lua
88
star
4

ctc

Primer on CTC implementation in pure Python PyTorch code
Python
83
star
5

metriclearningbench

Metric learning models in PyTorch with results on CUB2011, CARS196, Stanford Online Products
Python
65
star
6

caffemodel2json

A small tool to dump Caffe's *.caffemodel to JSON for inspection
Python
40
star
7

convasr

Baseline convolutional ASR system in PyTorch
HTML
21
star
8

discordspeechtotext

Discord Speech-To-Text bot in Python using Google Cloud Speech-To-Text API
Python
21
star
9

tfcheckpoint2pytorch

Converts TensorFlow checkpoints (with index, meta and data files) to PyTorch, HDF5 and JSON
Python
18
star
10

wemosetup

A simple Python script to set up WeMo devices
Python
16
star
11

tsp-bb

Implementation of branch-and-bound for TSP in C++
C++
13
star
12

wigwam

A humane dependency fetcher for scientific projects
Python
10
star
13

readaudio

Read audio with FFmpeg into NumPy/PyTorch via ctypes (standard library module)
C
10
star
14

inferspeech

PyTorch speech2text inference script for the NVidia openseq2seq wav2letter model variant
Python
9
star
15

yet_another_pytorch_slot_attention

Reimplementation of Slot Attention (object discovery task) in PyTorch with converted checkpoint
Python
7
star
16

busyboxnanozipdiff3

Build script of BusyBox for WebAssembly (wasm) using Emscripten
C
6
star
17

mask2cad_pytorch

Python
6
star
18

echomsk

Crawler and parser utilities for Russian talk radio echo.msk.ru
Python
6
star
19

dotnetlibtorch

The simplest possible interfacing of C# and libtorch via DLPack P/Invoke wrapper around C function and structures
C#
5
star
20

pydlpack

Create a DLPack tensor in plain C library, wrap DLPack structures in Python, import DLpack tensor into NumPy / PyTorch
C
4
star
21

zrxiv

HTML
4
star
22

selective_search_pytorch

Selective Search reimplementation in PyTorch that allows to extract not only the bounding boxes, but also the region masks
C++
4
star
23

natudump

Scraping LegiFrance naturalisation decrees for fun and OSINT profit
Python
3
star
24

prostoboxes

A one-file, zero-dependency bounding box image annotation HTML user interface
Python
2
star
25

convdia

Jupyter Notebook
2
star
26

gittex

JavaScript
2
star
27

lexical-playground-only

TypeScript
2
star
28

busytex

[WIP] TexLive 2020 compiled with Emscripten into WebAssembly and bundled into a single executable
JavaScript
1
star
29

paws

Minimal scripts for spinning EC2 instances and running Deep Learning jobs
Python
1
star
30

vosges

Vosges is a simple Python tool for generating and running Sun Grid Engine (SGE) jobs, with a nice HTML dashboard
Python
1
star
31

UBR

Python
1
star
32

torchwav

Adaptation of wave and scipy.io.wavfile for reading .WAV audio format in PyTorch
Python
1
star
33

busyidetest

1
star
34

open_stt_splits

Python
1
star