• Stars
    star
    266
  • Rank 153,242 (Top 4 %)
  • Language
    Python
  • License
    Mozilla Public Li...
  • Created over 4 years ago
  • Updated 12 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

[WACV2021] Foreground-aware Semantic Representations for Image Harmonization https://arxiv.org/abs/2006.00809

Foreground-aware Semantic Representations for Image Harmonization

drawing

This repository contains the official PyTorch implementation of the following paper:

Foreground-aware Semantic Representations for Image Harmonization
Konstantin Sofiiuk, Polina Popenova, Anton Konushin
Samsung AI Center Moscow
https://arxiv.org/abs/2006.00809

Abstract: Image harmonization is an important step in photo editing to achieve visual consistency in composite images by adjusting the appearances of foreground to make it compatible with background. Previous approaches to harmonize composites are based on training of encoder-decoder networks from scratch, which makes it challenging for a neural network to learn a high-level representation of objects. We propose a novel architecture to utilize the space of high-level features learned by a pre-trained classification network. We create our models as a combination of existing encoder-decoder architectures and a pre-trained foreground-aware deep high-resolution network. We extensively evaluate the proposed method on existing image harmonization benchmark and set up a new state-of-the-art in terms of MSE and PSNR metrics.

Setting up an environment

This framework is built using Python 3.6 and relies on the PyTorch 1.4.0+. The following command installs all necessary packages:

pip3 install -r requirements.txt

You can also use our Dockerfile to build a container with configured environment.

If you want to run training or testing, you must configure the paths to the datasets in config.yml.

Datasets

We train and evaluate all our models on the iHarmony4 dataset. It contains 65742 training and 7404 test objects. Each object is a triple consisting of real image, composite image and foreground mask.

Before training we resize HAdobe5k subdataset so that each side is smaller than 1024. The resizing script is provided in resize_hdataset.ipynb.

Don't forget to change the paths to the datasets in config.yml after downloading and unpacking.

Training

We provide the scripts for training our models on images of size 256 and 512. For each experiment, a separate folder is created in the ./harmonization_exps with Tensorboard logs, text logs, visualization and model's checkpoints. You can specify another path in the config.yml (see EXPS_PATH variable).

Start training with the following commands:

python3 train.py <model-script-path> --gpus=0 --workers=4 --exp-name=first-try

# iDIH: fully convolutional Encoder-Decoder with output image blending and foreground-normalized MSE loss
python3 train.py models/fixed256/improved_dih.py --gpus=0 --workers=4 --exp-name=first-try

# HRNet18s-V2p + iDIH: feature pyramid of 4 HRNet18-small-V2 outputs is concatenated to 4 outputs of the iDIH encoder
python3 train.py models/fixed256/hrnet18_idih.py --gpus=0 --workers=4 --exp-name=first-try

# HRNet18-V2 + iDIH: single output of HRNet18-V2 is concatenated to single output of the iDIH encoder
python3 train.py models/fixed256/hrnet18_idih.py --gpus=0 --workers=4 --exp-name=first-try

# iDIH trained on 512x512
python3 train.py models/crop512/improved_dih.py --gpus=0 --workers=4 --exp-name=first-try

To see all training parameters, run python3 train.py --help.

We used pre-trained HRNetV2 models from the official repository. To train one of our models with HRNet backbone, download HRNet weights and specify their path in config.yml (see IMAGENET_PRETRAINED_MODELS variable).

Evaluation

We provide scripts to both evaluate and get predictions from any model. To do that, we specify all models configs in mconfigs. To evaluate a model different from the provided, a new config entry should be added.

You can specify the checkpoints path in config.yml (see MODELS_PATH variable) in advance and provide the scripts only with a checkpoint name instead of an absolute checkpoint path.

Evaluate model

To get metrics table on the iHarmony4 test set run the following command:

python3 scripts/evaluate_model.py <model-name> <checkpoint-path> --resize-strategy Fixed256

# iDIH
python3 scripts/evaluate_model.py improved_dih256 /hdd0/harmonization_exps/fixed256/improved_dih/checkpoints/last_checkpoint.pth --resize-strategy Fixed256

To see all evaluation parameters run python3 scripts/evaluate_model.py --help.

Get model predictions

To get predictions on a set of images, run the following command:

python3 scripts/predict_for_dir.py <model-name> <checkpoint-path> --images <composite-images-path> --masks <masks-path> --resize 256

# iDIH
python3 scripts/evaluate_model.py improved_dih256 /hdd0/harmonization_exps/fixed256/improved_dih/checkpoints/last_checkpoint.pth \
--images /hdd0/datasets/ImageHarmonization/test/composite_images --masks /hdd0/datasets/ImageHarmonization/test/masks \
--resize 256

To see all evaluation parameters run python3 scripts/predict_for_dir.py --help.

Jupyter notebook

For interactive models testing with samples visualization see eval_and_vis_harmonization_model.ipynb.

Results

We provide metrics and pre-trained weights for several models trained on images of size 256x256 augmented with horizontal flip and random resized crop. Metric values may differ slightly from the ones in the paper since all the models were retrained from scratch with the new codebase.

Pre-trained models with corresponding names of model configs (see Evaluation):

Model Download Link Name in mconfigs
iDIH256 idih256.pth improved_dih256
iSSAM256 issam256.pth improved_ssam256
DeepLab-ResNet34 + iDIH256 deeplab_idih256.pth deeplab_r34_idih256
HRNet18s + iDIH256 hrnet18s_idih256.pth hrnet18s_idih256
HRNet18 + iDIH256 hrnet18_idih256.pth hrnet18_idih256
HRNet18 pyramid + iDIH256 hrnet18_v2p_idih256.pth hrnet18_v2p_idih256
HRNet32 + iDIH256 hrnet32_idih256.pth hrnet32_idih256

Evaluation metrics:

Model HCOCO HAdobe5k HFlickr Hday2night All
Evaluation metric MSE PSNR MSE PSNR MSE PSNR MSE PSNR MSE PSNR
Base models
iDIH256 19.58 38.34 30.84 36.00 84.74 32.58 50.05 37.10 30.70 36.99
iSSAM256 16.48 39.16 22.60 37.24 69.67 33.56 40.59 37.72 24.65 37.95
iDIH256 with backbone
DeepLab-ResNet34 17.68 38.97 28.13 36.33 70.89 33.25 56.17 37.25 27.37 37.53
HRNet18s 14.30 39.52 22.57 37.18 63.03 33.70 51.20 37.69 22.82 38.15
HRNet18 13.79 39.62 25.44 36.91 60.63 33.88 44.94 37.74 22.99 38.16
HRNet18 pyramid 14.10 39.56 24.47 37.04 62.13 33.90 47.74 37.46 23.10 38.15
HRNet32 14.00 39.71 23.04 37.13 57.55 34.06 53.70 37.70 22.22 38.29

License

The code is released under the MPL 2.0 License. MPL is a copyleft license that is easy to comply with. You must make the source code for any of your changes available under MPL, but you can combine the MPL software with proprietary code, as long as you keep the MPL code in separate files.

Citation

If you find this work is useful for your research, please cite our paper:

@article{sofiiuk2020harmonization,
  title={Foreground-aware Semantic Representations for Image Harmonization},
  author={Konstantin Sofiiuk, Polina Popenova, Anton Konushin},
  journal={arXiv preprint arXiv:2006.00809},
  year={2020}
}

More Repositories

1

ritm_interactive_segmentation

Reviving Iterative Training with Mask Guidance for Interactive Segmentation
Python
622
star
2

fbrs_interactive_segmentation

[CVPR2020] f-BRS: Rethinking Backpropagating Refinement for Interactive Segmentation https://arxiv.org/abs/2001.10331
Jupyter Notebook
581
star
3

NeuralHaircut

Neural Haircut: Prior-Guided Strand-Based Hair Reconstruction. ICCV 2023
Python
510
star
4

rome

Realistic mesh-based avatars. ECCV 2022
Python
424
star
5

adaptis

[ICCV19] AdaptIS: Adaptive Instance Selection Network, https://arxiv.org/abs/1909.07829
Jupyter Notebook
335
star
6

imvoxelnet

[WACV2022] ImVoxelNet: Image to Voxels Projection for Monocular and Multi-View General-Purpose 3D Object Detection
Python
271
star
7

pytorch-ensembles

Pitfalls of In-Domain Uncertainty Estimation and Ensembling in Deep Learning, ICLR 2020
Jupyter Notebook
236
star
8

fcaf3d

[ECCV2022] FCAF3D: Fully Convolutional Anchor-Free 3D Object Detection
Python
223
star
9

iterdet

[S+SSPR2020] IterDet: Iterative Scheme for Object Detection in Crowded Environments
Python
206
star
10

FineControlNet

Official Pytorch Implementation of "FineControlNet: Fine-level Text Control for Image Generation with Spatially Aligned Text Control Injection", 2023
Python
177
star
11

SPIn-NeRF

3D Scene Inpainting with NeRFs
Jupyter Notebook
167
star
12

TwiTi

This is a project of "#Twiti: Social Listening for Threat Intelligence" (TheWebConf 2021)
Python
167
star
13

zero-cost-nas

Zero-Cost Proxies for Lightweight NAS
Jupyter Notebook
141
star
14

BayesDLL

Python
141
star
15

tr3d

[ICIP2023] TR3D: Towards Real-Time Indoor 3D Object Detection
Python
138
star
16

ASAM

Implementation of ASAM: Adaptive Sharpness-Aware Minimization for Scale-Invariant Learning of Deep Neural Networks, ICML 2021.
Python
138
star
17

MLI

Novel View Synthesis with multiplane/multilayer representation: CVPR2022, WACV2023
Python
136
star
18

td3d

[WACV'24] TD3D: Top-Down Beats Bottom-Up in 3D Instance Segmentation
Python
131
star
19

day-to-night

Python
106
star
20

saic_depth_completion

Official implementation of "Decoder Modulation for Indoor Depth Completion" https://arxiv.org/abs/2005.08607
Python
105
star
21

Butterfly_Acc

The codes and artifacts associated with our MICRO'22 paper titled: "Adaptable Butterfly Accelerator for Attention-based NNs via Hardware and Algorithm Co-design"
Verilog
101
star
22

DINAR

Inference code for "DINAR: Diffusion Inpainting of Neural Textures for One-Shot Human Avatars"
Python
98
star
23

tqc_pytorch

Implementation of Truncated Quantile Critics method for continuous reinforcement learning. https://bayesgroup.github.io/tqc/
Python
87
star
24

SummaryMixing

This repository implements SummaryMixing, a simpler, faster and much cheaper replacement to self-attention for automatic speech recognition (see: https://arxiv.org/abs/2307.07421). The code is ready to be used with the SpeechBrain toolkit).
Python
86
star
25

style-people

Inference code for "StylePeople: A Generative Model of Fullbody Human Avatars" paper
Python
72
star
26

MTL

Python
71
star
27

ffc_se

Code for the paper "FFC-SE: Fast Fourier Convolution for Speech Enhancement" (published at Interspeech 2022 conference)
Python
48
star
28

hifi_plusplus

HiFi++: a Unified Framework for Bandwidth Extension and Speech Enhancement (ICASSP 2023)
Python
47
star
29

deep-weight-prior

The Deep Weight Prior, ICLR 2019
Jupyter Notebook
44
star
30

odometry

Training Deep SLAM on Single Frames https://arxiv.org/abs/1912.05405
Python
43
star
31

RAMP

[IROS 2023] RAMP: Hierarchical Reactive Motion Planning for Manipulation Tasks Using Implicit Signed Distance Functions
Python
42
star
32

eagle

Measuring and predicting on-device metrics (latency, power, etc.) of machine learning models
Python
42
star
33

point_based_clothing

Official PyTorch implementation of ICCV'21 paper Point-Based Modeling of Human Clothing
Jupyter Notebook
41
star
34

HandNeRF

Official Pytorch Implementation of "HandNeRF: Learning to Reconstruct Hand-Object Interaction Scene from a Single RGB Image", ICRA 2024
Python
39
star
35

HIO-SDF

[ICRA 2024] HIO-SDF: Hierarchical Incremental Online Signed Distance Fields
Python
39
star
36

gps-augment

Simple but high-performing method for learning a policy of test-time augmentation
Jupyter Notebook
38
star
37

Noise2NoiseFlow

Python
36
star
38

cloud_transformers

[ICCV, 2021] Cloud Transformers: A Universal Approach To Point Cloud Processing Tasks https://arxiv.org/abs/2007.11679
Python
33
star
39

SceneGrasp

[IROS 2023] Real-time Simultaneous Multi-Object 3D Shape Reconstruction, 6DoF Pose Estimation and Dense Grasp Prediction
Python
31
star
40

Drop-DTW

Python
30
star
41

ltmnet

Learning Tone Curves for Local Image Enhancement
Python
30
star
42

Sparse-Multi-DNN-Scheduling

Open-source artifacts and codes of our MICRO'23 paper titled “Sparse-DySta: Sparsity-Aware Dynamic and Static Scheduling for Sparse Multi-DNN Workloads”.
Python
30
star
43

semi-supervised-NFs

Code for the paper Semi-Conditional Normalizing Flows for Semi-Supervised Learning
Python
28
star
44

W2E

This is a project of "Cybersecurity Event Detection with New and Re-emerging Words". (ASIACCS 2020)
28
star
45

FastFlow

FastFlow is a system that automatically detects CPU bottlenecks in deep learning training pipelines and resolves the bottlenecks with data pipeline offloading to remote resources .
Python
24
star
46

geometry-preserving-de

Towards General Purpose, Geometry Preserving Single-View Depth Estimation https://arxiv.org/abs/2009.12419
Python
22
star
47

neural-textures

Inference code for "StylePeople: A Generative Model of Fullbody Human Avatars" paper. This code is for the part of the paper describing video-based avatars.
Python
22
star
48

graphics2raw

Code associated with our paper "Graphics2RAW: Mapping Computer Graphics Images to Sensor RAW Images". The paper has been accepted to the International Conference on Computer Vision (ICCV'23).
Python
22
star
49

FACaP

[IROS 2022]Floorplan-Aware Camera Poses Refinement
Python
21
star
50

nb-asr

Python
21
star
51

content-aware-metadata

Python
20
star
52

coordinate_based_inpainting

[CVPR2019] Coordinate-based texture inpainting for pose-guided human image generation https://arxiv.org/abs/1811.11459
Jupyter Notebook
18
star
53

Genie

Official Implementation of "Genie: Show Me the Data for Quantization" (CVPR 2023)
Python
17
star
54

blox

Macro Neural Architecture Search Benchmark
Python
16
star
55

StepFormer

Python
16
star
56

hierarchical-act

This supplementary code is for IROS 2024 paper "Hierarchical Action Chunk Transformer: Learning Temporal Multimodality from Demonstrations with Fast Imitation Behavior"
Python
14
star
57

Undiff

Test code disclosure for the research paper "UnDiff: Unsupervised Voice Restoration with Unconditional Diffusion Model", as a supplementary material for the paper accepted to the upcoming Interspeech2023 conference.
Python
14
star
58

EdgeViTs

[ECCV 2022] EdgeViTs: Competing Light-weight CNNs on Mobile Devices with Vision Transformers
Python
13
star
59

genren

Implementation of 2D-3D Cyclic Generative Renderer (3DV-2020).
Python
13
star
60

awesomeyaml

Utility library to help parsing, transforming and querying yaml-based configs
Python
12
star
61

pyworkers

Abstraction over threading, multiprocessing and TCP-based RPC
Python
11
star
62

ShellRecontruction

[IROS 2022] Object Shell Reconstruction: Camera-centric Object Representation for Robotic Grasping
Python
11
star
63

StereoLayers

11
star
64

PALinux

In-Kernel Control-Flow Integrity on Commodity OSes using ARM Pointer Authentication
11
star
65

c2g-HOF

[ICRA 2021, IROS 2021] Cost-to-Go Function Generating Networks for High Dimensional Motion Planning
Python
11
star
66

two-camera-white-balance

Python
10
star
67

hole-robust-wf

Data and code for the WACV 2022 paper, "Hole-robust Wireframe Detection"
Python
10
star
68

video-retrieval-sampler

The official implementation for the paper 'mmSampler: Efficient Frame Sampler for Multimodal Video Retrieval'.
Python
9
star
69

ordered_dropout

Technique of Ordered Dropout as used in the paper "Fjord: Fair and accurate federated learning under heterogeneous targets with ordered dropout", NeurIPS'21
Jupyter Notebook
9
star
70

myQASR

Open source the codebase related to the paper: E. Fish, U. Michieli, M. Ozay, "A Model for Every User and Budget: Label-Free and Personalized Mixed-Precision Quantization", 2023. The paper has been accepted for publication at the INTERSPEECH 2023 Conference.
Jupyter Notebook
8
star
71

FedorAS

FedorAS: Federated Architecture Search under system heterogeneity
Python
8
star
72

AdaCLIP

This repository contains the code for AdaCLIP, a computation and latency-aware system for pragmatic multimodal video retrieval.
Python
8
star
73

prime-count

This repository contains codes for Prime+Count paper.
C
7
star
74

appbuddy

Python
7
star
75

RIC

RIC: Rotate-Inpaint-Complete for Generalizable Scene Reconstruction
Python
7
star
76

X-MRS

Food image / recipe (text) cross-modal representation learning, retrieval and (image) synthesis. Code from ACM-Multimedia 2021 "Cross-Modal Retrieval and Synthesis (X-MRS): Closing the Modality Gap in Shared Representation Learning"
Python
7
star
77

FineControlNet-project-page

Project webpage of "FineControlNet: Fine-level Text Control for Image Generation with Spatially Aligned Text Control Injection", 2023
JavaScript
7
star
78

RGBD-FGN

RGBD Fusion Grasp Network with Large-Scale Tableware Grasp Dataset
Python
6
star
79

smoke-bomb

SmokeBomb: Effective Mitigation Method against Cache Side-channel Attacks on the ARM Architecture
C
6
star
80

fastflow-tensorflow

A customized Tensorflow with partial offloading and profiling features for FastFlow project.
C++
5
star
81

NASR

Jupyter Notebook
5
star
82

procedure-planning

Python
4
star
83

NAFLD

Two-dimensional convolutional neural network using quantitative US for non-invasive assessment of hepatic steatosis in NAFLD
Python
4
star
84

transpr

Python
4
star
85

ExpandersPruning

This respository contains the code and experiments for the paper "Data-Free Model Pruning at Initialization via Expanders", appearing at the Efficient Deep Learning for Computer Vision CVPR Workshop, 2023. Authors: James Stewart, Umberto Michieli, and Mete Ozay.
Python
4
star
86

Z-Fold

Official Implementation of "Z-Fold: A Frustratingly Easy Post-Training Quantization Scheme for LLMs" (EMNLP 2023)
Python
4
star
87

NB-MLM

Python
3
star
88

SAGE

Python
3
star
89

WatchYourSteps

3D scenes editing using NeRFs
Python
3
star
90

saic-is

Python
2
star
91

MotionID

Python
2
star
92

MoRF-project-page

JavaScript
2
star
93

Multitask-RFG

Code to reproduce experiments for End-to-end recipe flow graph parsing
Python
2
star
94

viola-project-page

Project webpage for "VioLA: Aligning Videos to 2D LiDAR Scans"
JavaScript
2
star
95

iiTransformer

Code for "iiTransformer: A Unified Approach to Exploiting Local and Non-Local Information for Image Restoration" (Kang et al., BMVC 2022)
Python
1
star
96

FROST

Codebase release for our accepted paper at ICASSP 2024.
1
star
97

HandNeRF-project-page

Project webpage of "HandNeRF: Learning to Reconstruct Hand-Object Interaction Scene from a Single RGB Image", ICRA 2024
JavaScript
1
star
98

HIO-SDF-project-page

Project page for "HIO-SDF: Hierarchical Incremental Online Signed Distance Fields"
JavaScript
1
star
99

GAN-high-resolution-representation

Python
1
star