• Stars
    star
    420
  • Rank 103,180 (Top 3 %)
  • Language
    Python
  • License
    Other
  • Created over 5 years ago
  • Updated over 2 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

A 3D vision library from 2D keypoints: monocular and stereo 3D detection for humans, social distancing, and body orientation.

Monoloco library    Downloads

Continuously tested on Linux, MacOS and Windows: Tests

gif



This library is based on three research projects for monocular/stereo 3D human localization (detection), body orientation, and social distancing. Check the video teaser of the library on YouTube.


MonStereo: When Monocular and Stereo Meet at the Tail of 3D Human Localization
L. Bertoni, S. Kreiss, T. Mordan, A. Alahi, ICRA 2021
Article                 Citation                 Video


Perceiving Humans: from Monocular 3D Localization to Social Distancing
L. Bertoni, S. Kreiss, A. Alahi, T-ITS 2021
Article                 Citation                 Video


MonoLoco: Monocular 3D Pedestrian Localization and Uncertainty Estimation
L. Bertoni, S. Kreiss, A.Alahi, ICCV 2019
Article                 Citation                 Video

Library Overview

Visual illustration of the library components:

gif

License

All projects are built upon Openpifpaf for the 2D keypoints and share the AGPL Licence.

This software is also available for commercial licensing via the EPFL Technology Transfer Office (https://tto.epfl.ch/, [email protected]).

Quick setup

A GPU is not required, yet highly recommended for real-time performances.

The installation has been tested on OSX and Linux operating systems, with Python 3.6, 3.7, 3.8. Packages have been installed with pip and virtual environments.

For quick installation, do not clone this repository, make sure there is no folder named monoloco in your current directory, and run:

pip3 install monoloco

For development of the source code itself, you need to clone this repository and then:

pip3 install sdist
cd monoloco
python3 setup.py sdist bdist_wheel
pip3 install -e .

Interfaces

All the commands are run through a main file called run.py using subparsers. To check all the options:

  • python3 -m monoloco.run --help
  • python3 -m monoloco.run predict --help
  • python3 -m monoloco.run train --help
  • python3 -m monoloco.run eval --help
  • python3 -m monoloco.run prep --help

or check the file monoloco/run.py

Predictions

The software receives an image (or an entire folder using glob expressions), calls PifPaf for 2D human pose detection over the image and runs Monoloco++ or MonStereo for 3D localization &/or social distancing &/or orientation

Which Modality
The command --mode defines which network to run.

  • select --mode mono (default) to predict the 3D localization of all the humans from monocular image(s)
  • select --mode stereo for stereo images
  • select --mode keypoints if just interested in 2D keypoints from OpenPifPaf

Models are downloaded automatically. To use a specific model, use the command --model. Additional models can be downloaded from here

Which Visualization

  • select --output_types multi if you want to visualize both frontal view or bird's eye view in the same picture
  • select --output_types bird front if you want to different pictures for the two views or just one view
  • select --output_types json if you'd like the ouput json file

If you select --mode keypoints, use standard OpenPifPaf arguments

Focal Length and Camera Parameters
Absolute distances are affected by the camera intrinsic parameters. When processing KITTI images, the network uses the provided intrinsic matrix of the dataset. In all the other cases, we use the parameters of nuScenes cameras, with "1/1.8'' CMOS sensors of size 7.2 x 5.4 mm. The default focal length is 5.7mm and this parameter can be modified using the argument --focal.

A) 3D Localization

Ground-truth comparison
If you provide a ground-truth json file to compare the predictions of the network, the script will match every detection using Intersection over Union metric. The ground truth file can be generated using the subparser prep, or directly downloaded from Google Drive and called it with the command --path_gt.

Monocular examples

For an example image, run the following command:

python3 -m monoloco.run predict docs/002282.png \
--path_gt names-kitti-200615-1022.json \
-o <output directory> \
--long-edge <rescale the image by providing dimension of long side> \
--n_dropout <50 to include epistemic uncertainty, 0 otherwise>

predict

To show all the instances estimated by MonoLoco add the argument --show_all to the above command.

predict_all

It is also possible to run openpifpaf directly by using --mode keypoints. All the other pifpaf arguments are also supported and can be checked with python3 -m monoloco.run predict --help.

predict

Stereo Examples
To run MonStereo on stereo images, make sure the stereo pairs have the following name structure:

  • Left image: <name>.<extension>
  • Right image: <name>_r.<extension>

(It does not matter the exact suffix as long as the images are ordered)

You can load one or more image pairs using glob expressions. For example:

python3 -m monoloco.run predict --mode stereo \
--glob docs/000840*.png \
 --path_gt <to match results with ground-truths> \
 -o data/output  --long-edge 2500

Crowded scene

python3 -m monoloco.run predict --glob docs/005523*.png \
--mode stereo \
--path_gt <to match results with ground-truths> \
-o data/output  --long-edge 2500 \
--instance-threshold 0.05 --seed-threshold 0.05

Occluded hard example

B) Social Distancing (and Talking activity)

To visualize social distancing compliance, simply add the argument social_distance to --activities. This visualization is not supported with a stereo camera. Threshold distance and radii (for F-formations) can be set using --threshold-dist and --radii, respectively.

For more info, run: python3 -m monoloco.run predict --help

Examples
An example from the Collective Activity Dataset is provided below.

To visualize social distancing run the below, command:

pip3 install scipy
python3 -m monoloco.run predict docs/frame0032.jpg \
--activities social_distance --output_types front bird 

C) Hand-raising detection

To detect raised hand, you can add the argument --activities raise_hand to the prediction command.

For example, the below image is obtained with:

python3 -m monoloco.run predict docs/raising_hand.jpg \
--activities raise_hand social_distance --output_types front

For more info, run: python3 -m monoloco.run predict --help

D) Orientation and Bounding Box dimensions

The network estimates orientation and box dimensions as well. Results are saved in a json file when using the command --output_types json. At the moment, the only visualization including orientation is the social distancing one.

E) Webcam

You can use the webcam as input by using the --webcam argument. By default the --z_max is set to 10 while using the webcam and the --long-edge is set to 144. If multiple webcams are plugged in you can choose between them using --camera, for instance to use the second camera you can add --camera 1. You also need to install opencv-python to use this feature :

pip3 install opencv-python

Example command:

python3 -m monoloco.run predict --webcam \
--activities raise_hand social_distance

Training

We train on the KITTI dataset (MonoLoco/Monoloco++/MonStereo) or the nuScenes dataset (MonoLoco) specifying the path of the json file containing the input joints. Please download them here or follow preprocessing instructions.

Results for MonoLoco++ are obtained with:

python3 -m monoloco.run train --joints data/arrays/joints-kitti-mono-210422-1600.json

While for the MonStereo results run:

python3 -m monoloco.run train --joints data/arrays/joints-kitti-stereo-210422-1601.json \
--lr 0.003 --mode stereo 

If you are interested in the original results of the MonoLoco ICCV article (now improved with MonoLoco++), please refer to the tag v0.4.9 in this repository.

Finally, for a more extensive list of available parameters, run:

python3 -m monstereo.run train --help


Preprocessing

Preprocessing and training step are already fully supported by the code provided, but require first to run a pose detector over all the training images and collect the annotations. The code supports this option (by running the predict script and using --mode keypoints).

Data structure

data         
├── outputs                 
├── arrays
├── kitti

Run the following inside monoloco repository:

mkdir data
cd data
mkdir outputs arrays kitti

Kitti Dataset

Download kitti images (from left and right cameras), ground-truth files (labels), and calibration files from their website and save them inside the data folder as shown below.

data         
├── kitti
        ├── gt
        ├── calib
        ├── images
        ├── images_right

The network takes as inputs 2D keypoints annotations. To create them run PifPaf over the saved images:

python3 -m openpifpaf.predict \
--glob "data/kitti/images/*.png" \
--json-output <directory to contain predictions> \
--checkpoint=shufflenetv2k30 \
--instance-threshold=0.05 --seed-threshold 0.05 --force-complete-pose 

Horizontal flipping

To augment the dataset, we apply horizontal flipping on the detected poses. To include small variations in the pose, we use the poses from the right-camera (the dataset uses a stereo camera). As there are no labels for the right camera, the code automatically correct the ground truth depth by taking into account the camera baseline. To obtain these poses, run pifpaf also on the folder of right images. Make sure to save annotations into a different folder, and call the right folder: <NameOfTheLeftFolder>_right

Recall

To maximize the recall (at the cost of the computational time), it's possible to upscale the images with the command --long_edge 2500 (~scale 2).

Once this step is complete, the below commands transform all the annotations into a single json file that will used for training.

For MonoLoco++:

python3 -m monoloco.run prep --dir_ann <directory that contains annotations>

For MonStereo:

python3 -m monoloco.run prep --mode stereo --dir_ann <directory that contains left annotations> 

Collective Activity Dataset

To evaluate on of the collective activity dataset (without any training) we selected 6 scenes that contain people talking to each other. This allows for a balanced dataset, but any other configuration will work.

THe expected structure for the dataset is the following:

collective_activity         
├── images                 
├── annotations

where images and annotations inside have the following name convention:

IMAGES: seq<sequence_name>_frame<frame_name>.jpg ANNOTATIONS: seq<sequence_name>_annotations.txt

With respect to the original dataset, the images and annotations are moved to a single folder and the sequence is added in their name. One command to do this is:

rename -v -n 's/frame/seq14_frame/' f*.jpg

which for example change the name of all the jpg images in that folder adding the sequence number (remove -n after checking it works)

Pifpaf annotations should also be saved in a single folder and can be created with:

python3 -m openpifpaf.predict \
--glob "data/collective_activity/images/*.jpg"  \
--checkpoint=shufflenetv2k30 \
--instance-threshold=0.05 --seed-threshold 0.05 \--force-complete-pose \
--json-output <output folder>

Evaluation

3D Localization

We provide evaluation on KITTI for models trained on nuScenes or KITTI. Download the ground-truths of KITTI dataset and the calibration files from their website. Save the training labels (one .txt file for each image) into the folder data/kitti/gt and the camera calibration matrices (one .txt file for each image) into data/kitti/calib.
To evaluate a pre-trained model, download the latest models from here and save them into `data/outputs.

Baselines

We compare our results with other monocular and stereo baselines, depending whether you are evaluating stereo or monocular settings. For some of the baselines, we have obtained the annotations directly from the authors and we don't have yet the permission to publish them.

Mono3D, 3DOP, MonoDepth MonoPSR and our MonoDIS and our Geometrical Baseline.

  • Mono3D: download validation files from here and save them into data/kitti/m3d
  • 3DOP: download validation files from here and save them into data/kitti/3dop
  • MonoDepth: compute an average depth for every instance using the following script here and save them into data/kitti/monodepth
  • Geometrical Baseline and MonoLoco: To include also geometric baselines and MonoLoco, download a monoloco model, save it in data/models, and add the flag --baselines to the evaluation command

The evaluation file will run the model over all the annotations and compare the results with KITTI ground-truth and the downloaded baselines. For this run:

python3 -m monoloco.run eval \
--dir_ann <annotation directory> \
--model data/outputs/monoloco_pp-210422-1601.pkl \
--generate \
--save \

For stereo results add --mode stereo and select --model=monstereo-210422-1620.pkl. Below, the resulting table of results and an example of the saved figures.

Tables

Relative Average Precision Localization: RALP-5% (MonStereo)

We modified the original C++ evaluation of KITTI to make it relative to distance. We use cmake. To run the evaluation, first generate the txt file with the standard command for evaluation (above). Then follow the instructions of this repository to prepare the folders accordingly (or follow kitti guidelines) and run evaluation. The modified file is called evaluate_object.cpp and runs exactly as the original kitti evaluation.

Activity Estimation (Talking)

Please follow preprocessing steps for Collective activity dataset and run pifpaf over the dataset images. Evaluation on this dataset is done with models trained on either KITTI or nuScenes. For optimal performances, we suggest the model trained on nuScenes teaser.

python3 -m monstereo.run eval \
--activity \
--dataset collective \
--model <path to the model> \
--dir_ann <annotation directory>

Citation

When using this library in your research, we will be happy if you cite us!

@InProceedings{bertoni_2021_icra,
    author = {Bertoni, Lorenzo and Kreiss, Sven and Mordan, Taylor and Alahi, Alexandre},
    title = {MonStereo: When Monocular and Stereo Meet at the Tail of 3D Human Localization},
    booktitle = {the International Conference on Robotics and Automation (ICRA)},
    year = {2021}
}
@ARTICLE{bertoni_2021_its,
    author = {Bertoni, Lorenzo and Kreiss, Sven and Alahi, Alexandre},
    journal={IEEE Transactions on Intelligent Transportation Systems}, 
    title={Perceiving Humans: from Monocular 3D Localization to Social Distancing}, 
    year={2021},
@InProceedings{bertoni_2019_iccv,
    author = {Bertoni, Lorenzo and Kreiss, Sven and Alahi, Alexandre},
    title = {MonoLoco: Monocular 3D Pedestrian Localization and Uncertainty Estimation},
    booktitle = {the IEEE International Conference on Computer Vision (ICCV)},
    month = {October},
    year = {2019}
}

More Repositories

1

CrowdNav

[ICRA19] Crowd-aware Robot Navigation with Attention-based Deep Reinforcement Learning
Python
583
star
2

trajnetplusplusbaselines

[ITS'21] Human Trajectory Forecasting in Crowds: A Deep Learning Perspective
Python
244
star
3

UniTraj

A Unified Framework for scalable Vehicle Trajectory Prediction, ECCV 2024
Python
166
star
4

social-nce

[ICCV21] Official implementation of the "Social NCE: Contrastive Learning of Socially-aware Motion Representations" in PyTorch.
Python
157
star
5

s-attack

[CVPR 2022] S-attack library. Official implementation of two papers "Vehicle trajectory prediction works, but not everywhere" and "Are socially-aware trajectory prediction models really socially-aware?".
Python
105
star
6

causalmotion

[CVPR22] Towards Robust and Adaptive Motion Forecasting: A Causal Representation Perspective
Python
73
star
7

RRB

Official implementation of "Injecting Knowledge in Data-driven Vehicle Trajectory Predictors", Published in Transportation research part C.
Python
67
star
8

ttt-plus-plus

[NeurIPS21] TTT++: When Does Self-supervised Test-time Training Fail or Thrive?
Python
56
star
9

DePOSit

[ICRA 2023] Official implementation of "A generic diffusion-based approach for 3D human pose prediction in the wild".
Python
49
star
10

bounding-box-prediction

Bounding box prediction library. Official implementation of two papers on human 2D/3D bounding box prediction
Python
46
star
11

trajnetplusplustools

Tools for TrajNet++
Python
42
star
12

monstereo

MonoLoco++ and MonStereo for 3D localization, orientation, bounding box dimensions and social distancing from monocular and / or stereo images. PyTorch Official Implementation.
Python
39
star
13

social-transmotion

[ICLR 2024] Official implementation of "Social-Transmotion: Promptable Human Trajectory Prediction" in PyTorch.
Python
34
star
14

trajnetplusplusdata

Data for TrajNet++ Challenge
31
star
15

detection-attributes-fields

PyTorch implementation of "Detecting 32 Pedestrian Attributes for Autonomous Vehicles"
Python
31
star
16

looking

Python
30
star
17

trajnetplusplusdataset

Dataset Preparation for TrajNet++
Python
29
star
18

collaborative-gan-sampling

[AAAI20] TensorFlow implementation of the Collaborative Sampling in Generative Adversarial Networks
Python
24
star
19

decoupled-pose-prediction

Official implementation of "Learning Decoupled Representations for Human Pose Forecasting" in PyTorch
Python
19
star
20

Person-Re-Identification-with-Confidence

Python
18
star
21

motion-style-transfer

[CoRL22] Motion Style Transfer: Modular Low-Rank Adaptation for Deep Motion Forecasting
Python
17
star
22

SVGNet

The official implementation of "SVG-Net: An SVG-based Trajectory Prediction Model"
17
star
23

CIM

Causal Imitative Model official code
Python
14
star
24

openpifpaf_wholebody

PifPaf extension to detect body, foot, face and hand keypoints.
Python
13
star
25

unposed

[RA-L 2024] Official implementation of "Toward Reliable Human Pose Forecasting with Uncertainty"
Python
13
star
26

rock-pytorch

A PyTorch implementation of "Revisiting Multi-Task Learning with ROCK: a Deep Residual Auxiliary Block for Visual Detection"
Python
12
star
27

hybrid-feature-fusion

Python
11
star
28

butterflydetector

Python
9
star
29

pedestrian-transition-dataset

Jupyter Notebook
9
star
30

DLAV-2022

EPFL Deep Learning for Autonomous Vehicles, Spring 2022
Jupyter Notebook
9
star
31

SemDisc

Official implementation of "A Shared Representation for Photorealistic Driving Simulators" in PyTorch.
Python
9
star
32

JRDB-Traj

JRDB dataset: trajectory prediction baselines and data preprocessing
Python
8
star
33

introML-2021

EPFL Introduction to Machine Learning for Engineers, Spring 2021
Jupyter Notebook
7
star
34

SGG-CoRF

Python
7
star
35

CODE

Implementation of CODE: Confident Ordinary Differential Editing
Jupyter Notebook
6
star
36

IntroML-2024

Jupyter Notebook
6
star
37

DLAV-2023

Jupyter Notebook
5
star
38

introML-2023

Jupyter Notebook
5
star
39

Deep-Visual-Re-Identification-with-Confidence

Python
5
star
40

pose-action-recognition

Python
5
star
41

TIC-TAC

[ICML 2024] Code repository for "TIC-TAC: A Framework for Improved Covariance Estimation in Deep Heteroscedastic Regression". We address the problem of sub-optimal covariance estimation in deep heteroscedastic regression by proposing a new model and metric.
Python
5
star
42

openpifpaf-torchhub

Pretrained models for OpenPifPaf via torchhub.
4
star
43

openpifpaf_animalpose

Python
3
star
44

trajnetplusplus-model-zoo

TrajNet++ Model Zoo is a collection of pre-trained models on the TrajNet++ Benchmark
2
star
45

unitraj-DLAV

The UniTraj framework, developed by VITA, adjusted for the project of DLAV course.
Python
2
star
46

DLAV-2024

Jupyter Notebook
2
star
47

introML-2022

EPFL Introduction to Machine Learning for Engineers, Spring 2022
Jupyter Notebook
1
star
48

IncrementalHumanPose

Python
1
star
49

code_template

A repository displaying a possible code structure suitable for Slurm
Python
1
star