• Stars
    star
    203
  • Rank 192,890 (Top 4 %)
  • Language
    Python
  • License
    Apache License 2.0
  • Created over 5 years ago
  • Updated 9 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Authors official PyTorch implementation of the "ViSiL: Fine-grained Spatio-Temporal Video Similarity Learning" [ICCV 2019]

ViSiL: Fine-grained Spatio-Temporal Video Similarity Learning

This repository contains the Tensorflow implementation of the paper ViSiL: Fine-grained Spatio-Temporal Video Similarity Learning. It provides code for the calculation of similarities between the query and database videos given by the user. Also, it contains an evaluation script to reproduce the results of the paper. The video similarity calculation is achieved by applying a frame-to-frame function that respects the spatial within-frame structure of videos and a learned video-to-video similarity function that also considers the temporal structure of videos.

The PyTorch implementation of ViSiL can be found here

Prerequisites

  • Python 3
  • Tensorflow 1.xx (tested with 1.8-1.15)

Getting started

Installation

  • Clone this repo:
git clone https://github.com/MKLab-ITI/visil
cd visil
  • You can install all the dependencies by
pip install -r requirements.txt
  • Download and unzip the pretrained model:
wget http://ndd.iti.gr/visil/ckpt.zip
unzip ckpt.zip
  • If you want to use I3D as backbone network (used for AVR in the paper), then install the following packages:
# For tensoflow version >= 1.14
pip install tensorflow-probability==0.7 dm-sonnet==1.25

# For tensoflow version < 1.14
pip install tensorflow-probability==0.6 dm-sonnet==1.23

Video similarity calculation

  • Create a file that contains the query videos. Each line of the file have to contain a video id and a path to the corresponding video file, separated by a tab character (\t). Example:

      wrC_Uqk3juY queries/wrC_Uqk3juY.mp4
      k_NT43aJ_Jw queries/k_NT43aJ_Jw.mp4
      2n30dbPBNKE queries/2n30dbPBNKE.mp4
                                               ...	
    
  • Create a file with the same format for the database videos.

  • Run the following command to calculate the similarity between all the query and database videos

python calculate_similarity.py --query_file queries.txt --database_file database.txt --model_dir model/
  • For faster processing, you can load the query videos to the GPU memory by adding the flag --load_queries
python calculate_similarity.py --query_file queries.txt --database_file database.txt --model_dir model/ --load_queries
  • The calculated similarities are stored to the file given to the --output_file. The file is in JSON format and contains a dictionary with every query id as keys, and another dictionary that contains the similarities of the dataset videos to the corresponding queries as values. See the example below
    {
      "wrC_Uqk3juY": {
        "KQh6RCW_nAo": 0.716,
        "0q82oQa3upE": 0.300,
          ...},
      "k_NT43aJ_Jw": {
        "-KuR8y1gjJQ": 1.0,
        "Xb19O5Iur44": 0.417,
          ...},
      ....
    }
    ```
  • Add flag --help to display the detailed description for the arguments of the similarity calculation script
  -q, --query_file QUERY_FILE                     Path to file that contains the query videos
  -d, --database_file DATABASE_FILE               Path to file that contains the database videos
  -o, --output_file OUTPUT_FILE                   Name of the output file. Default: "results.json"
  --network NETWORK                               Backbone network used for feature extraction.
                                                  Options: "resnet" or "i3d". Default: "resnet"
  --model_dir MODEL_DIR                           Path to the directory of the pretrained model.
                                                  Default: "ckpt/resnet"
  -s, --similarity_function SIMILARITY_FUNCTION   Function that will be used to calculate the
                                                  similarity between query-candidate frames and
                                                  videos.Options: "chamfer" or "symmetric_chamfer".
                                                  Default: "chamfer"
  --batch_sz BATCH_SZ                             Number of frames contained in each batch during
                                                  feature extraction. Default: 128
  --gpu_id GPU_ID                                 Id of the GPU used. Default: 0
  -l, --load_queries                              Flag that indicates that the queries will be loaded to
                                                  the GPU memory.
  --threads THREADS                               Number of threads used for video loading. Default: 8

Evaluation

  • We also provide code to reproduce the experiments in the paper.

  • First, download the videos of the dataset you want. The supported options are:

  • Determine the pattern based on the video id that the source videos are stored. For example, if all dataset videos are stored in a folder with filename the video id and the extension .mp4, then the pattern is {id}.mp4. If each dataset video is stored in a different folder based on their video id with filename video.mp4, then the pattern us {id}/video.mp4.

    • The code replaces the {id} string with the id of the videos in the dataset
    • Also, it support supports Unix style pathname pattern expansion. For example, if video files have various extension, then the pattern can be e.g. {id}/video.*
    • For FIVR-200K, EVVE, ActivityNet, the Youtube ids are considered as the video ids
    • For CC_WEB_VIDEO, video ids derives from the number of the query set that the video belongs to, and the basename of the file. In particular, the video ids are in form <number_of_query_set>/<basename>, e.g. 1/1_1_Y
  • Run the evaluation.py by providing the name of the evaluation dataset, the path to video files, the pattern that the videos are stored

python evaluation.py --dataset FIVR-5K --video_dir /path/to/videos/ --pattern {id}/video.* --load_queries

Use ViSiL in your Python code

Here is a toy example to run ViSiL on any data.

from model.visil import ViSiL
from datasets import load_video

# Load the two videos from the video files
query_video = load_video('/path/to/query/video')
target_video = load_video('/path/to/target/video')

# Initialize ViSiL model and load pre-trained weights
model = ViSiL('ckpt/resnet/')

# Extract features of the two videos
query_features = model.extract_features(query_video, batch_sz=32)
target_features = model.extract_features(target_video, batch_sz=32)

# Calculate similarity between the two videos
similarity = model.calculate_video_similarity(query_features, target_features)

Docker

Thanks to @theycallmeloki for providing a Dockerfile to setup a docker container for the repo.

  • First build a docker image based on the Dockerfile
docker build -t visil:latest .
  • Start a docker container based on the created docker image
docker run -it --gpus all --name ViSiL visil:latest

Visualization

To visualize similarity matrices and the ViSiL outputs, you may use this Colab notebook.

Citation

If you use this code for your research, please consider citing our paper:

@inproceedings{kordopatis2019visil,
  title={{ViSiL}: Fine-grained Spatio-Temporal Video Similarity Learning},
    author={Kordopatis-Zilos, Giorgos and Papadopoulos, Symeon and Patras, Ioannis and Kompatsiaris, Ioannis},
  booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
  year={2019}
}

DnS - improved performance and better computational efficiency

FIVR-200K - download our FIVR-200K dataset

License

This project is licensed under the Apache License 2.0 - see the LICENSE file for details

Contact for further details about the project

Giorgos Kordopatis-Zilos ([email protected])

More Repositories

1

CUDA

GPU-accelerated LIBSVM is a modification of the original LIBSVM that exploits the CUDA framework to significantly reduce processing time while producing identical results. The functionality and interface of LIBSVM remains the same. The modifications were done in the kernel computation, that is now performed using the GPU.
HTML
213
star
2

image-verification-corpus

This contains an evolving dataset of fake and real images shared in social media.
Java
155
star
3

ndvr-dml

Authors official Tensorflow implementation of the "Near-Duplicate Video Retrieval with Deep Metric Learning" [ICCVW 2017]
Python
118
star
4

FIVR-200K

FIVR-200K dataset from the "FIVR: Fine-grained Incident Video Retrieval" [TMM 2019]
Python
78
star
5

intermediate-cnn-features

Feature extraction from videos based on intermediate layers of a Convolutional Neural Network.
Python
63
star
6

multimedia-indexing

A framework for large-scale feature extraction, indexing and retrieval.
Java
59
star
7

greek-sentiment-lexicon

A lexicon to be used for sentiment analysis in Greek.
34
star
8

news-popularity-prediction

A set of methods that predict the future values of popularity indices for news posts using a variety of features.
Python
33
star
9

pygrank

Recommendation algorithms for large graphs
Python
29
star
10

reveal-graph-embedding

Implementation of community-based graph embedding for user classification.
Python
28
star
11

fake-video-corpus

A dataset of debunked and verified user-generated videos.
25
star
12

ImproveMyCity-Mobile

The Android mobile version of the web-based ImproveMyCity application
Java
21
star
13

MyoWebToolkit

Web tools to do research with Myo
JavaScript
18
star
14

JGNN

A Fast Graph Neural Network Library written in Native Java
Java
16
star
15

mmdemo-dockerized

A set of services for monitoring of multiple social media platforms based on Docker.
JavaScript
16
star
16

reveal-user-classification

Performs user classification into labels using a set of seed Twitter users with known labels and the structure of the interaction network between them.
Python
11
star
17

topic-detection

Provides the implementation of a topic detection framework developed for the MULTISENSOR project.
R
9
star
18

easIE

easy Information Extraction: a framework for quickly and simply generating Web Information Extractors and Wrappers.
Java
8
star
19

simmo

Socially interconnected/interlinked and multimedia-enriched objects: A model for representing multimedia content in the context of the Web and Social Media.
Java
8
star
20

prophet

PROPheT (PERICLES Ontology Population Tool)
Python
6
star
21

decentralized-gnn

A library for implementing Decentralized Graph Neural Network algorithms.
Python
6
star
22

reveal-user-annotation

Utility methods for generating labels for Twitter users and handling their storage and retrieval.
Python
5
star
23

verge

VERGE is a hybrid interactive video retrieval system, which is capable of searching into video content by integrating different search modules that employ visual- and textual-based techniques.
PHP
5
star
24

category-based-classification

Contains the implementation of a category-based classification framework developed for the MULTISENSOR project.
Python
5
star
25

contextual-video-verification

Provides support to end users for verifying web videos using metadata and contextual signals.
Java
4
star
26

DanceAnno

Dance annotation tool for data obtained with the Kinect sensor
Python
4
star
27

hackair-data-retrieval

Contains components for air quality data collection, image collection from Flickr and web cams, and image analysis for sky detection and localization.
Java
4
star
28

mgraph-summarization

Implementation of MGraph framework for generating summaries from large collections of social media posts (e.g. tweets).
Java
4
star
29

adaptive-fairness

Implementation of an algorithmic framework for achieving optimal fairness-accuracy trade-offs.
MATLAB
3
star
30

twitter-aq

Dataset and code to reproduce results of Twitter-based Air Quality estimation.
Python
3
star
31

image-privacy

Implements a personalized machine learning approach for image privacy classification.
Java
3
star
32

hugomklab

Lab's static website based on Hugo
HTML
3
star
33

gnn-tf

A TensorFlow framework for the definition and training of Graph Neural Network architectures on interoperable predictive tasks.
Python
2
star
34

usemp-pscore

Implementation of the USEMP Privacy Scoring framework.
Java
2
star
35

hackair-decision-support-api

Contains the hackAIR ontology and reasoning implementation.
Java
2
star
36

company-data-integration

Implements techniques for matching between company-related data across different sources.
Java
1
star
37

simmo-stream-manager

Stream manager adaptation for use with SIMMO.
Java
1
star
38

yamlres

Retrieving algorithm component combinations from online (or local) yaml resources.
Python
1
star
39

pericode

PeriCoDe project
MATLAB
1
star
40

patent_ontologies

PATExpert Semantic Representation Framework
1
star
41

reveal-community-ranking

Reveal Community Ranking
JavaScript
1
star
42

multisensor-concept-event-detection

Python
1
star
43

pygrank-f

A forward-oriented programming variation of pygrank
Python
1
star