• Stars
    star
    265
  • Rank 153,770 (Top 4 %)
  • Language
    Python
  • License
    MIT License
  • Created almost 5 years ago
  • Updated almost 5 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

PyTorch 3D video classification models pre-trained on 65 million Instagram videos

IG-65M PyTorch

Unofficial PyTorch (and ONNX) 3D video classification models and weights pre-trained on IG-65M (65MM Instagram videos).

IG-65M activations for the Primer movie trailer video; time goes top to bottom

IG-65M video deep dream: maximizing activations; for more see this pull request

Usage πŸ’»

The following describes how to use the model in your own project and how to use our conversion and extraction tools.

PyTorch Models

We provide convenient PyTorch Hub integration

>>> import torch
>>>
>>> torch.hub.list("moabitcoin/ig65m-pytorch")
['r2plus1d_34_32_ig65m', 'r2plus1d_34_32_kinetics', 'r2plus1d_34_8_ig65m', 'r2plus1d_34_8_kinetics']
>>>
>>> model = torch.hub.load("moabitcoin/ig65m-pytorch", "r2plus1d_34_32_ig65m", num_classes=359, pretrained=True)

Tools

We build and publish Docker images (see all tags) via Travis CI/CD for master and for all releases.

In these images we provide the following tools:

  • convert - to convert Caffe2 blobs to PyTorch model and weights
  • extract - to compute clip features for a video with a pre-trained model
  • semcode - to visualize clip features for a video over time
  • index-build - to build an approximate nearest neighbor index from clip features
  • index-serve - to load an approximate nearest neighbor index and serve queries
  • index-query- to make approximate nearest neighbor queries against an index server

Run these pre-built images via

docker run moabitcoin/ig65m-pytorch:latest-cpu --help

Example for running on CPUs:

docker run --ipc=host -v $PWD:/data moabitcoin/ig65m-pytorch:latest-cpu \
    extract /data/myvideo.mp4 /data/myfeatures.npy

Example for running on GPUs via nvidia-docker:

docker run --runtime=nvidia --ipc=host -v $PWD:/data moabitcoin/ig65m-pytorch:latest-gpu \
    extract /data/myvideo.mp4 /data/myfeatures.npy

Development

We provide CPU and nvidia-docker based GPU Dockerfiles for self-contained and reproducible environments. Use the convenience Makefile to build the Docker image and then get into the container mounting a host directory to /data inside the container:

make
make run datadir=/Path/To/My/Videos

By default we build and run the CPU Docker images; for GPUs run:

make dockerfile=Dockerfile.gpu
make gpu

The WebcamDataset requires exposing /dev/video0 to the container which will only work on Linux:

make
make webcam

PyTorch and ONNX Models πŸ†

We provide converted .pth and .pb PyTorch and ONNX weights, respectively.

Model Pretrain+Finetune Input Size pth onnx caffe2
R(2+1)D_34 IG-65M + None 8x112x112 r2plus1d_34_clip8_ig65m_from_scratch-9bae36ae.pth r2plus1d_34_clip8_ig65m_from_scratch-748ab053.pb r2plus1d_34_clip8_ig65m_from_scratch.pkl
R(2+1)D_34 IG-65M + Kinetics 8x112x112 r2plus1d_34_clip8_ft_kinetics_from_ig65m-0aa0550b.pth r2plus1d_34_clip8_ft_kinetics_from_ig65m-625d61b3.pb r2plus1d_34_clip8_ft_kinetics_from_ig65m.pkl
R(2+1)D_34 IG-65M + None 32x112x112 r2plus1d_34_clip32_ig65m_from_scratch-449a7af9.pth r2plus1d_34_clip32_ig65m_from_scratch-e304d648.pb r2plus1d_34_clip32_ig65m_from_scratch.pkl
R(2+1)D_34 IG-65M + Kinetics 32x112x112 r2plus1d_34_clip32_ft_kinetics_from_ig65m-ade133f1.pth r2plus1d_34_clip32_ft_kinetics_from_ig65m-10f4c3bf.pb r2plus1d_34_clip32_ft_kinetics_from_ig65m.pkl

Notes

  • ONNX models provided here have not been optimized for inference.
  • Models fine-tuned on Kinetics have 400 classes, the plain IG65 models 359 (32 clips), and 487 (8 clips) classes.
  • For models fine-tuned on Kinetics you can use the labels from here.
  • For plain IG65 models there is no label map available.
  • Official Facebook Research Caffe2 models are here.

References πŸ“–

  1. D. Tran, H. Wang, L. Torresani, J. Ray, Y. LeCun and M. Paluri. A Closer Look at Spatiotemporal Convolutions for Action Recognition. CVPR 2018.
  2. D. Tran, H. Wang, L. Torresani and M. Feiszli. Video Classification with Channel-Separated Convolutional Networks. ICCV 2019.
  3. D. Ghadiyaram, M. Feiszli, D. Tran, X. Yan, H. Wang and D. Mahajan, Large-scale weakly-supervised pre-training for video action recognition. CVPR 2019.
  4. VMZ: Model Zoo for Video Modeling
  5. Kinetics & IG-65M

License

Copyright Β© 2019 MoabitCoin

Distributed under the MIT License (MIT).