• Stars
    star
    134
  • Rank 265,400 (Top 6 %)
  • Language
    Python
  • License
    MIT License
  • Created over 5 years ago
  • Updated over 5 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Speech Recognition without audio input

LipReading

Main repository for LipReading with Deep Neural Networks

Introduction

The goal is to implement LipReading: Similar to how end-to-end Speech Recognition systems work, mapping high-fidelity speech audio to sensible characters and word level outputs, we will do the same for "speech visuals". In particular, we will take video frame input, extract the relevant mouth/chin signals as input to map to characters and words.

Overview

TODO

A high level overview of some TODO items. For more project details please see the Github project

  • Download Data (926 videos)
  • Build Vision Pipeline (1 week) in review
  • Build NLP Pipeline (1 week) wip
  • Build Loss Fn and Training Pipeline (2 weeks) wip
  • Train 🚋 and Ship 🚢 wip

Architecture

There are two primary interconnected pipelines: a "vision" pipeline for extracting the face and lip features from video frames, along with a "nlp-inspired" pipeline for temporally correlating the sequential lip features into the final output.

Here's a quick dive into tensor dimensionalities

Vision Pipeline

Video -> Frames       -> Face Bounding Box Detection      -> Face Landmarking    
Repr. -> (n, y, x, c) -> (n, (box=1, y_i, x_i, w_i, h_i)) -> (n, (idx=68, y, x))   

NLP Pipeline

 -> Letters  ->  Words    -> Language Model 
 -> (chars,) ->  (words,) -> (sentences,)

Datasets

  • all: 926 videos (projected, not generated yet)
  • large: 464 videos (failed at 35/464)
  • medium: 104 videos (currently at 37/104)
  • small: 23 videos
  • micro: 6 videos
  • nano: 1 video

Setup

  1. Clone this repository and install the requirements. We will be using python3.

Please make sure you run python scripts, setup your PYTHONPATH at ./, as well as a workspace env variable.

git clone [email protected]:joseph-zhong/LipReading.git 
# (optional, setup venv) cd LipReading; python3  -m venv .
  1. Once the repository is cloned, the last step for setup is to setup the repository's PYTHONPATH and workspace environment variable to take advantage of standardized directory utilities in ./src/utils/utility.py

Copy the following into your ~/.bashrc

export PYTHONPATH="$PYTHONPATH:/path/to/LipReading/" 
export LIP_READING_WS_PATH="/path/to/LipReading/"
  1. Install the simple requirements.txt with PyTorch with CTCLoss, SpaCy, and others.

On MacOS for CPU capabilities only.

pip3 install -r requirements.macos.txt

On Ubuntu, for GPU support

pip3 install -r requirements.ubuntu.txt

SpaCy Setup

We need to install a pre-built English model for some capabilities

python3 -m spacy download en

Data Directories Structure

This allows us to have a simple standardized directory structure for all our datasets, raw data, model weights, logs, etc.

./data/
  --/datasets (numpy dataset files for dataloaders to load)
  --/raw      (raw caption/video files extracted from online sources)
  --/weights  (model weights, both for training/checkpointing/running)
  --/tb       (Tensorboard logging)
  --/...

See ./src/utils/utility.py for more.

Getting Started

Now that the dependencies are all setup, we can finally do stuff!

Configuration

Each of our "standard" scripts in ./src/scripts (i.e. not ./src/scripts/misc) take the standard argsparse-style arguments. For each of the "standard" scripts, you will be able to pass --help to see the expected arguments. To maintain reproducibility, cmdline arguments can be written in a raw text file with one argument per line.

e.g. for ./config/gen_dataview/nano

--inp=StephenColbert/nano 

Represent the arguments to pass to ./src/scripts/generate_dataview.py, automatically passable via

./src/scripts/generate_dataview.py $(cat ./config/gen_dataview/nano)

The arguments will be used from left-to-right order, so if arguments are repeated, they will be overwritten by the latter settings. This allows for modularity in configuring hyperparameters.

(For demonstration purposes, not a working example)

./src/scripts/train.py \
    $(cat ./config/dataset/large) \
    $(cat ./config/train/model/small-model) \
    $(cat ./config/train/model/rnn/lstm) \
    ...

Train Model

  1. Train Model
./src/scripts/train.py

Examples

Training on Micro

./src/scripts/train_model.py $(cat ./config/train/micro)

Tensorboard Visualization

See README_TENSORBOARD.md

Other Resources

This is a collection of external links, papers, projects, and otherwise potentially helpful starting points for the project.

Other Projects

Other Academic Papers

Academic Datasets

More Repositories

1

KITTI-devkit

Official Devkit for the KITTI Depth Prediction/Completion Benchmark 2017
C++
11
star
2

utils

python utilities to make life easier
Python
10
star
3

VideoSummarization

Video Summarization
Python
9
star
4

Papers

Papers that I'd like to remember
HTML
6
star
5

Robotics

All Programming Work done for Robotics
C
5
star
6

Outside

Side Projects that I try to devote time to develop
Java
4
star
7

Spectrum

Chrome extension that provides contextual information on politically biased news articles with suggested reading from differently politically leaning articles covering similar topics. Development in progress.
Python
4
star
8

PaulAllenComputingChallenge

This is a public mirror of a bitbucket repository we're using for the Paul Allen Computing Challenge.
JavaScript
2
star
9

917S-Code

2
star
10

iris-recognition-py

Biometric identification based on human iris pattern comparison
Jupyter Notebook
2
star
11

CodeDay

Repository for work done in the 24-hour hackathon "CodeDay" in Seattle
C#
2
star
12

re-Fresh

Your fridge's inventory, simplified
JavaScript
2
star
13

Stratego

CodeDay Spring 2014 Project
Java
2
star
14

Yip-Android

Java
2
star
15

RNNTensorflowTutorial

Python
2
star
16

blog-josephzhong

JavaScript
2
star
17

personal

C++
2
star
18

DubHacks2016

Backend of a Hololens Hack to provide live sentiment feedback for teachers on their students
Python
2
star
19

tmp

tmp
1
star
20

PolynomialRegression6

Multivariable Calculus Project
Java
1
star
21

DuoCoder

Had an idea to create Duolingo for programming languages... Will be a website, Android app and iOS app
1
star
22

Huffman-Encryption-Project

Huffman Encryption - Bit/Byte encryption
Java
1
star
23

RockPaperScissors

1
star
24

Poll_io

WUHack 2015
Objective-C
1
star
25

CS224N

Notes and stuff for Stanford's CS224n: Natural Language Processing with Deep Learning
1
star
26

PokerProject-in-jGrasp

Version Two of Poker Project (for jGrasp) after GitHub broke
1
star
27

refactored-doodle

Crickets
Java
1
star
28

i2cdevlib

I2C device library collection for AVR/Arduino or other C++-based MCUs
C++
1
star
29

ICPC-2016-Qualifier

ACM International Collegiate Programming Contest
Java
1
star
30

UWStrokeResearch_Android

Facilitating stroke research with MD David Tirschwell, Dr. Richard Anderson, and Glenn Schubert
Java
1
star
31

CABPuzzle

A simple warm-up problem in CS4
Java
1
star
32

Paer_IO

Web2
JavaScript
1
star
33

MandelbrotSets

Mandelbrot Sets with Dynamic Parallelism in CUDA C
TeX
1
star
34

Util-Apps-Website

Public Website for Util Apps
1
star
35

PolynomialRegression

Math Project implemented in Java - Polynomial Regression Project
Java
1
star