• Stars
    star
    132
  • Rank 272,631 (Top 6 %)
  • Language
    Python
  • License
    MIT License
  • Created over 2 years ago
  • Updated 11 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Easily compute clip embeddings from video frames

clip-video-encode

pypi Open In Colab Try it on gitpod

Easily compute clip embeddings from video frames.

Install

Using pip:

pip install clip-video-encode

Or build from source:

python setup.py install

Usage

NAME
    clip-video-encode - Encode frames using CLIP image encoder

SYNOPSIS
    clip-video-encode SRC <flags>

DESCRIPTION
    Input:
      src:
        str: path to mp4 file
        str: youtube link
        str: path to txt file with multiple mp4's or youtube links
        list: list with multiple mp4's or youtube links
      dest:
        str: directory where to save embeddings to
        None: dest = src + .npy
      output_format:
        str: "files" or "webdataset"
      take_every_nth:
        int: only take every nth frame
      frame_workers:
        int: number of Processes to distribute video reading to.
      frame_memory_size:
        int: GB of memory for FrameReader.
      metadata_columns:
        str: a comma separated list of metadata column names to look for in src
      use_dst_name:
        bool: use the save name suggested by video2numpy
      distribute:
        str: distribution strategy, currently either slurm or none
      oc_model_name:
        str: open_clip model name, used for selecting CLIP architecture
      pretrained:
        str: open_clip pretrained weights name

POSITIONAL ARGUMENTS
    SRC

FLAGS
    --dest=DEST
        Default: ''
    --output_format=OUTPUT_FORMAT
        Default: 'files'
    --take_every_nth=TAKE_EVERY_NTH
        Default: 1
    --frame_workers=FRAME_WORKERS
        Default: 1
    --frame_memory_size=FRAME_MEMORY_SIZE
        Default: 4
    --metadata_columns=METADATA_COLUMNS
        Default: ''
    --use_dst_name=USE_DST_NAME
        Default: False
    --distribute=DISTRIBUTE
        Default: 'none'
    --oc_model_name=OC_MODEL_NAME
        Default: 'ViT-B-32'
    --pretrained=PRETRAINED
        Default: 'laion2b_s34b_b79k'

API

This module exposes a single function clip_video_encode which takes the same arguments as the command line tool:

import glob
from clip_video_encode import clip_video_encode

VIDS = glob.glob("some/path/my_videos/*.mp4")
EMBEDDING_DIR = "some/path/my_embeddings"
take_every_5 = 5

clip_video_encode(VIDS, EMBEDDING_DIR, take_every_5)

Who is using clip-video-encode?

  • CLIP-Kinetics700 - The Kinetics700 dataset (700GB) can be compressed to ~8GB using clip-video-encode at 1 FPS
  • CLIP-WebVid - The WebVid dataset (10M videos) encoded as CLIP ViT-B/32 embeddings at 1 FPS.

Examples

Check out some cool clip-video-encode examples:

  • Thing detector - Look for things in videos using clip-video-encode generated embeddings.
  • Large dataset processing - If you want to process a large dataset (like WebVid) into CLIP embeddings see the example at the bottom of the linked README.md.

Setup a virtualenv:

python3 -m venv .env
source .env/bin/activate
pip install -e .

to run tests:

pip install -r requirements-test.txt

then

make lint
make test

You can use make black to reformat the code

python -m pytest -x -s -v tests -k "dummy" to run a specific test