• Stars
    star
    202
  • Rank 193,691 (Top 4 %)
  • Language
    Lua
  • Created almost 8 years ago
  • Updated almost 6 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Activity Recognition Algorithms for the Charades Dataset

Charades Starter Code for Activity Recognition in Torch and PyTorch

Contributor: Gunnar Atli Sigurdsson

New: extension of this framework to the deep CRF model on Charades for Asynchronous Temporal Fields for Action Recognition: https://github.com/gsig/temporal-fields

  • New: This code implements a Two-Stream network in PyTorch
  • This code implements a Two-Stream network in Torch
  • This code implements a Two-Stream+LSTM network in Torch

See pytorch/, torch/, for the code repositories.

The code replicates the 'Two-Stream Extended' and 'Two-Stream+LSTM' baselines found in:

@inproceedings{sigurdsson2017asynchronous,
author = {Gunnar A. Sigurdsson and Santosh Divvala and Ali Farhadi and Abhinav Gupta},
title = {Asynchronous Temporal Fields for Action Recognition},
booktitle={The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
year={2017},
pdf = {http://arxiv.org/pdf/1612.06371.pdf},
code = {https://github.com/gsig/temporal-fields},
}

which is in turn based off "Two-stream convolutional networks for action recognition in videos" by Simonyan and Zisserman, and "Beyond Short Snippets: Deep Networks for Video Classification" by Joe Yue-Hei Ng el al.

Combining the predictions (submission files) of those models using combine_rgb_flow.py yields a final classification accuracy of 18.9% mAP (Two-Stream) and 19.8% (LSTM) on Charades (evalated with charades_v1_classify.m)

Technical Overview:

The code is organized such that to train a two-stream network. Two independed network are trained: One RGB network and one Flow network. This code parses the training data into pairs of an image (or flow), and a label for a single activity class. This forms a softmax training setup like a standard CNN. The network is a VGG-16 network. For RGB it is pretrained on Image-Net, and for Flow it is pretrained on UCF101. The pretrained networks can be downloaded with the scripts in this directory. For testing. The network uses a batch size of 25, scores all images, and pools the output to make a classfication prediction or uses all 25 outputs for localization.