ECCV-2022-Papers
官网链接:https://eccv2022.ecva.net/
截稿日期:2022年3月7日(9:59PM CET, 11:59AM PST)
会议日期:2022年10月24日-2022年10月28日
↘️ CV-Surveys施工中~~~~~~~~~~
历年综述论文分类汇总戳这里2022 年论文分类汇总戳这里
2021年论文分类汇总戳这里
2020 年论文分类汇总戳这里
❣ ❣ ❣ 另外打包下载ECCV 2022论文,可在【我爱计算机视觉】微信公众号后台回复“paper”。共计 1645 篇。分类完成
🏆 🏆 🏆 获奖论文
- 最佳论文奖
- 最佳论文荣誉奖
- Koenderink Prize (test of time)
- Best Demo Award
- [Using a Smartphone for Augmented Reality in a Classroom]
📺 video
- [Using a Smartphone for Augmented Reality in a Classroom]
- Everingham Prize
- 【The UCF101 and HMD51 dataset teams】&【Walter J. Scheirer 】
61.Light Field(光学、几何、光场成像)
- 相机相关
- 相机姿势
- 相机估计
- 相机自动校准
- 事件相机
- 相机重识别
- 相机定位
- 光场
60.Data Augmentation(数据增强)
- TokenMix: Rethinking Image Mixing for Data Augmentation in Vision Transformers
⭐ code - Neuromorphic Data Augmentation for Training Spiking Neural Networks
- 3D Random Occlusion and Multi-layer Projection for Deep Multi-Camera Pedestrian Localization
⭐ code
59.Image Matching(图像匹配)
- ASpanFormer: Detector-Free Image Matching with Adaptive Span Transformer
🏠 project - ECO-TR: Efficient Correspondences Finding Via Coarse-to-Fine Refinement
⭐ code🏠 project
58.Human Motion Prediction(人体动作预测)
- ERA: Expert Retrieval and Assembly for Early Action Prediction
- Overlooked Poses Actually Make Sense: Distilling Privileged Knowledge for Human Motion Prediction
- GIMO: Gaze-Informed Human Motion Prediction in Context
⭐ code - Diverse Human Motion Prediction Guided by Multi-level Spatial-Temporal Anchors
⭐ code - 行动预测
- 运动估计
- 人体运动合成
57.Scene Graph Generation(场景图生成)
- Panoptic Scene Graph Generation
⭐ code🏠 project - Meta Spatio-Temporal Debiasing for Video Scene Graph Generation
- Hierarchical Memory Learning for Fine-Grained Scene Graph Generation
- Fine-Grained Scene Graph Generation with Data Transfer
⭐ code - Towards Open-Vocabulary Scene Graph Generation with Prompt-Based Finetuning
56.Sound
- Learning Visual Styles from Audio-Visual Associations
🏠 project - Active Audio-Visual Separation of Dynamic Sound Sources
🏠 project - 声源定位
- 有源扬声器检测
- 音频驱动的视频肖像生成
- 视听分割
- 语音合成
- 声音分离
55.Style Transfer(风格迁移)
- CCPL: Contrastive Coherence Preserving Loss for Versatile Style Transfer
😮 oral⭐ code - Learning Graph Neural Networks for Image Style Transfer
- ARF: Artistic Radiance Fields
🏠 project - 图像风格化
- 发型迁移
54.View Generation(视图生成)
- InfiniteNature-Zero: Learning Perpetual View Generation of Natural Scenes from Single Images
😮 oral - CompNVS: Novel View Synthesis with Scene Completion
- HDR-Plenoxels: Self-Calibrating High Dynamic Range Radiance Fields
⭐ code - Neural Radiance Transfer Fields for Relightable Novel-View Synthesis with Global Illumination
- R2L: Distilling Neural Radiance Field to Neural Light Field for Efficient Novel View Synthesis
🏠 project - NeXT: Towards High Quality Neural Radiance Fields via Multi-Skip Transformer
⭐ code
53.Dataset(数据集)
- The Abduction of Sherlock Holmes: A Dataset for Visual Abductive Reasoning
🌻 dataset - Responsive Listening Head Generation: A Benchmark Dataset and Baseline
🌻 dataset - Online Segmentation of LiDAR Sequences: Dataset and Algorithm
🌻 dataset - COO: Comic Onomatopoeia Dataset for Recognizing Arbitrary or Truncated Texts
⭐ code
用于识别任意或截断文本的漫画拟声词数据集 - BRACE: The Breakdancing Competition Dataset for Dance Motion Synthesis
🌻 dataset
用于舞蹈动作合成的霹雳舞比赛数据集 - CelebV-HQ: A Large-Scale Video Facial Attributes Dataset
🌻 dataset🏠 project
一个大规模的视频人脸属性数据集 - UnrealEgo: A New Dataset for Robust Egocentric 3D Human Motion Capture
⭐ code🏠 project
用于鲁棒性以自我为中心的三维人类运动捕捉的新数据集 - BEAT: A Large-Scale Semantic and Emotional Multi-modal Dataset for Conversational Gestures Synthesis
🌻 dataset📰 ECCV 2022 | 76小时动捕,最大规模数字人多模态数据集开源 - MovieCuts: A New Dataset and Benchmark for Cut Type Recognition
🌻 dataset
剪切类型识别 - A Real World Dataset for Multi-View 3D Reconstruction
🌻 dataset
三维重建 - Capturing, Reconstructing, and Simulating: The UrbanScene3D Dataset
🌻 dataset
城市场景重建 - PartImageNet: A Large, High-Quality Dataset of Parts
分割 - A-OKVQA: A Benchmark for Visual Question Answering Using World Knowledge
🌻 dataset
VQA - OOD-CV: A Benchmark for Robustness to Out-of-Distribution Shifts of Individual Nuisances in Natural Images
- The Anatomy of Video Editing: A Dataset and Benchmark Suite for AI-Assisted Video Editing
🌻 dataset
视频编辑 - ClearPose: Large-Scale Transparent Object Dataset and Benchmark
🌻 dataset
深度估计 - AnimeCeleb: Large-Scale Animation CelebHeads Dataset for Head Reenactment
🌻 dataset
动画名人头像数据集 - A Dense Material Segmentation Dataset for Indoor and Outdoor Scene Parsing
用于室内和室外场景解析的密集材料分割数据集 - MimicME: A Large Scale Diverse 4D Database for Facial Expression Analysis
用于面部表情分析的大规模多样化4D数据库 - Delving into Universal Lesion Segmentation: Method, Dataset, and Benchmark
🌻 dataset
病变分割
52.Scene Flow Estimation(场景流估计)
- Bi-PointFlowNet: Bidirectional Learning for Point Cloud Based Scene Flow Estimation
⭐ code - What Matters for 3D Scene Flow Network
⭐ code - MonoPLFlowNet: Permutohedral Lattice FlowNet for Real-Scale 3D Scene Flow Estimation with Monocular Images
51.Anomaly Detection(异常检测)
- Registration based Few-Shot Anomaly Detection
😮 oral⭐ code - Dynamic Local Aggregation Network with Adaptive Clusterer for Anomaly Detection
⭐ code - DSR – A Dual Subspace Re-Projection Network for Surface Anomaly Detection
⭐ code - Locally Varying Distance Transform for Unsupervised Visual Anomaly Detection
- SPot-the-Difference Self-Supervised Pre-training for Anomaly Detection and Segmentation
⭐ code - HaloAE: An HaloNet based Local Transformer Auto-Encoder for Anomaly Detection and Localization
- Hierarchical Semi-Supervised Contrastive Learning for Contamination-Resistant Anomaly Detection
⭐ code - 表面异常检测
50.Neural Rendering(渲染)
- Relighting4D: Neural Relightable Human from Videos
⭐ code🏠 project📺 video - MPIB: An MPI-Based Bokeh Rendering Framework for Realistic Partial Occlusion Effects
⭐ code🏠 project - NeuMan: Neural Human Radiance Field from a Single Video
⭐ code - Approximate Differentiable Rendering with Algebraic Surfaces
⭐ code🏠 project - AdaNeRF: Adaptive Sampling for Real-time Rendering of Neural Radiance Fields
⭐ code🏠 project - Generalizable Patch-Based Neural Rendering
😮 oral⭐ code🏠 project - Deforming Radiance Fields with Cages
⭐ code🏠 project - NeuMesh: Learning Disentangled Neural Mesh-based Implicit Field for Geometry and Texture Editing
😮 oral⭐ code🏠 project - ActiveNeRF: Learning where to See with Uncertainty Estimation
⭐ code - ARAH: Animatable Volume Rendering of Articulated Human SDFs
⭐ code🏠 project - LaTeRF: Label and Text Driven Object Radiance Fields
- MoFaNeRF: Morphable Facial Neural Radiance Field
⭐ code - Conditional-Flow NeRF: Accurate 3D Modelling with Reliable Uncertainty Quantification
- Sem2NeRF: Converting Single-View Semantic Masks to Neural Radiance Fields
⭐ code - KeypointNeRF: Generalizing Image-Based Volumetric Avatars Using Relative Spatial Encoding of Keypoints
🏠 project - ViewFormer: NeRF-Free Neural Rendering from Few Images Using Transformers
⭐ code - GeoAug: Data Augmentation for Few-Shot NeRF with Geometry Constraints
- SinNeRF: Training Neural Radiance Fields on Complex Scenes from a Single Image
⭐ code - BungeeNeRF: Progressive Neural Radiance Field for Extreme Multi-Scale Scene Rendering
49.Few/Zero-Shot Learning/Domain Generalization/Adaptation(小/零样本/域泛化/适应)
- 小样本
- Cross-Domain Cross-Set Few-Shot Learning via Learning Compact and Aligned Representations
⭐ code - Self-Supervision Can Be a Good Few-Shot Learner
⭐ code - VizWiz-FewShot: Locating Objects in Images Taken by People With Visual Impairments
🏠 project - Contrastive Prototypical Network with Wasserstein Confidence Penalty
⭐ code - tSF: Transformer-Based Semantic Filter for Few-Shot Learning
- Worst Case Matters for Few-Shot Recognition
- Learning Instance and Task-Aware Dynamic Kernels for Few-Shot Learning
- Self-Promoted Supervision for Few-Shot Transformer
⭐ code - Coarse-to-Fine Incremental Few-Shot Learning
- Improving Few-Shot Learning through Multi-task Representation Learning Theory
- TransVLAD: Focusing on Locally Aggregated Descriptors for Few-Shot Learning
- Kernel Relative-Prototype Spectral Filtering for Few-Shot Learning
⭐ code - Uncertainty-DTW for Time Series and Sequences
- Cross-Domain Cross-Set Few-Shot Learning via Learning Compact and Aligned Representations
- 零样本
- 域适应
- Prior Knowledge Guided Unsupervised Domain Adaptation
⭐ code - MoDA: Map Style Transfer for Self-Supervised Domain Adaptation of Embodied Agents
- CoSMix: Compositional Semantic Mix for Domain Adaptation in 3D LiDAR Segmentation
⭐ code - GIPSO: Geometrically Informed Propagation for Online Adaptation in 3D LiDAR Segmentation
⭐ code - Prototype-Guided Continual Adaptation for Class-Incremental Unsupervised Domain Adaptation
⭐ code - MemSAC: Memory Augmented Sample Consistency for Large Scale Domain Adaptation
⭐ code🏠 project - Concurrent Subsidiary Supervision for Unsupervised Source-Free Domain Adaptation
⭐ code🏠 project - Combating Label Distribution Shift for Active Domain Adaptation
- Uncertainty-guided Source-free Domain Adaptation
⭐ code - Learning Unbiased Transferability for Domain Adaptation by Uncertainty Modeling
- Unknown-Oriented Learning for Open Set Domain Adaptation
- Burn after Reading: Online Adaptation for Cross-Domain Streaming Data
⭐ code - Adversarial Partial Domain Adaptation by Cycle Inconsistency
- A Broad Study of Pre-training for Domain Generalization and Adaptation
- Interpretable Open-Set Domain Adaptation via Angular Margin Separation
- Contrastive Vicinal Space for Unsupervised Domain Adaptation
⭐ code - Incomplete Multi-View Domain Adaptation via Channel Enhancement and Knowledge Transfer
- BMD: A General Class-Balanced Multicentric Dynamic Prototype Strategy for Source-Free Domain Adaptation
⭐ code
- Prior Knowledge Guided Unsupervised Domain Adaptation
- 域泛化
- Grounding Visual Representations with Texts for Domain Generalization
⭐ code - Improving Test-Time Adaptation via Shift-agnostic Weight Regularization and Nearest Source Prototypes
- Attention Diversification for Domain Generalization
⭐ code - Cross-Domain Ensemble Distillation for Domain Generalization
- Domain Generalization by Mutual-Information Regularization with Pre-trained Models
⭐ code - MVDG: A Unified Multi-View Framework for Domain Generalization
⭐ code
- Grounding Visual Representations with Texts for Domain Generalization
48.Semantic Correspondence(语义对应)
- Demystifying Unsupervised Semantic Correspondence Estimation
⭐ code🏠 project - Learning Semantic Correspondence with Sparse Annotations
47.GNN/GCN(图神经网络)
- GCN
- GNN
46.Continual Learning(持续学习)
- Balancing Stability and Plasticity through Advanced Null Space in Continual Learning
😮 oral - Online Continual Learning with Contrastive Vision Transformer
- Helpful or Harmful: Inter-Task Association in Continual Learning
- Theoretical Understanding of the Information Flow on Continual Learning Performance
⭐ code - Transfer without Forgetting
⭐ code - incDFM: Incremental Deep Feature Modeling for Continual Novelty Detection
- Online Task-Free Continual Learning with Dynamic Sparse Distributed Memory
⭐ code - Discriminability-Transferability Trade-Off: An Information-Theoretic Perspective
- CoSCL: Cooperation of Small Continual Learners Is Stronger than a Big One
⭐ code - DualPrompt: Complementary Prompting for Rehearsal-Free Continual Learning
⭐ code
45.Metric Learning(度量学习)
- DAS: Densely-Anchored Sampling for Deep Metric Learning
⭐ code - Posterior Refinement on Metric Matrix Improves Generalization Bound in Metric Learning
- A Non-Isotropic Probabilistic Take On Proxy-Based Deep Metric Learning
⭐ code
44.Active Learning(主动学习)
- When Active Learning Meets Implicit Semantic Data Augmentation
- PT4AL: Using Self-Supervised Pretext Tasks for Active Learning
⭐ code
43.Lifelong Learning(终生学习)
42.Reinforcement Learning(强化学习)
- Style-Agnostic Reinforcement Learning
⭐ code - StARformer: Transformer with State-Action-Reward Representations for Visual Reinforcement Learning
⭐ code - Learning Efficient Multi-agent Cooperative Visual Exploration
🏠 project - DexMV: Imitation Learning for Dexterous Manipulation from Human Videos
🏠 project
41.Incremental Learning(增量学习)
- Learning with Recoverable Forgetting
- Incremental Task Learning with Incremental Rank Updates
⭐ code - DLCFT: Deep Linear Continual Fine-Tuning for General Incremental Learning
- 类增量
- Class-incremental Novel Class Discovery
⭐ code - Long-Tailed Class Incremental Learning
⭐ code - Few-Shot Class-Incremental Learning via Entropy-Regularized Data-Free Replay
- Few-Shot Class-Incremental Learning from an Open-Set Perspective
⭐ code - Class-Incremental Learning with Cross-Space Clustering and Controlled Transfer
⭐ code🏠 project - R-DFCIL: Relation-Guided Representation Learning for Data-Free Class Incremental Learning
⭐ code - FOSTER: Feature Boosting and Compression for Class-Incremental Learning
- S3C: Self-Supervised Stochastic Classifiers for Few-Shot Class-Incremental Learning
⭐ code
- Class-incremental Novel Class Discovery
40.Adversarial Learning(对抗学习)
- Prior-Guided Adversarial Initialization for Fast Adversarial Training
⭐ code📰 ECCV 2022 | 一种基于先验指导的对抗样本初始化方法 - BIPS: Bi-modal Indoor Panorama Synthesis via Residual Depth-Aided Adversarial Learning
⭐ code - Decoupled Adversarial Contrastive Learning for Self-supervised Adversarial Robustness
😮 oral⭐ code - RIBAC: Towards Robust and Imperceptible Backdoor Attack against Compact DNN
⭐ code - Adversarial Coreset Selection for Efficient Robust Training
- Shape Matters: Deformable Patch Attack
- Enhanced Accuracy and Robustness via Multi-Teacher Adversarial Distillation
⭐ code - GradAuto: Energy-Oriented Attack on Dynamic Neural Networks
⭐ code - Learning Energy-Based Models with Adversarial Training
- Revisiting Outer Optimization in Adversarial Training
- One Size Does NOT Fit All: Data-Adaptive Adversarial Training
⭐ code - UniCR: Universally Approximated Certified Robustness via Randomized Smoothing
- ℓ∞-Robustness and Beyond:Unleashing Efficient Adversarial Training
- Towards Efficient Adversarial Training on Vision Transformers
- FrequencyLowCut Pooling - Plug & Play against Catastrophic Overfitting
⭐ code - TAFIM: Targeted Adversarial Attacks against Facial Image Manipulations
🏠 project - 对抗攻击
- Frequency Domain Model Augmentation for Adversarial Attack
⭐ code - Watermark Vaccine: Adversarial Attacks to Prevent Watermark Removal
⭐ code - SegPGD: An Effective and Efficient Adversarial Attack for Evaluating and Boosting Segmentation Robustness
- Scaling Adversarial Training to Large Perturbation Bounds
⭐ code - Towards Effective and Robust Neural Trojan Defenses via Input Filtering
- Exploiting the Local Parabolic Landscapes of Adversarial Losses to Accelerate Black-Box Adversarial Attack
⭐ code - Robust Network Architecture Search via Feature Distortion Restraining
- Triangle Attack: A Query-Efficient Decision-Based Adversarial Attack
⭐ code - Adaptive Image Transformations for Transfer-Based Adversarial Attack
- Frequency Domain Model Augmentation for Adversarial Attack
- 黑盒
- 白盒
- 对抗样本
39.Transfer Learning(迁移学习)
- Factorizing Knowledge in Neural Networks
⭐ code - SecretGen: Privacy Recovery on Pre-trained Models via Distribution Discrimination
⭐ code - How Stable Are Transferability Metrics Evaluations?
- Language-Driven Artistic Style Transfer
- MultiMAE: Multi-modal Multi-task Masked Autoencoders
🏠 project
38.Contrastive Learning(对比学习)
- Network Binarization via Contrastive Learning
⭐ code - Adversarial Contrastive Learning via Asymmetric InfoNCE
⭐ code - Fast-MoCo: Boost Momentum-based Contrastive Learning with Combinatorial Patches
⭐ code - Contrastive Learning for Diverse Disentangled Foreground Generation
- Decoupled Contrastive Learning
- Joint Learning of Localized Representations from Medical Images and Reports
- Contrasting Quadratic Assignments for Set-Based Representation Learning
- Generative Subgraph Contrast for Self-Supervised Graph Representation Learning
⭐ code
37.Open-set Recognition(开集识别)
- DenseHybrid: Hybrid Anomaly Detection for Dense Open-set Recognition
- Difficulty-Aware Simulator for Open Set Recognition
⭐ code
36.Machine Learning(机器学习)
35.Feature Learning(联邦学习)
- SphereFed: Hyperspherical Federated Learning
- Image Coding for Machines with Omnipotent Feature Learning
- Addressing Heterogeneity in Federated Learning via Distributional Transformation
⭐ code - FedLTN: Federated Learning for Sparse and Personalized Lottery Ticket Networks
- Improving Generalization in Federated Learning by Seeking Flat Minima
- AdaBest: Minimizing Client Drift in Federated Learning via Adaptive Bias Estimation
34.Meta-Learning(元学习)
- Bitwidth-Adaptive Quantization-Aware Neural Network Training: A Meta-Learning Approach
- Meta-Learning with Less Forgetting on Large-Scale Non-Stationary Task Distributions
- Learning to Weight Samples for Dynamic Early-exiting Networks
⭐ code - Rethinking Clustering-Based Pseudo-Labeling for Unsupervised Meta-Learning
⭐ code
33.Model Compression/Knowledge Distillation/Pruning(模型压缩/知识蒸馏/剪枝)
- 知识蒸馏
- Knowledge Condensation Distillation
⭐ code - FedX: Unsupervised Federated Learning with Cross Knowledge Distillation
⭐ code - Black-box Few-shot Knowledge Distillation
⭐ code - Efficient One Pass Self-distillation with Zipf's Label Smoothing
⭐ code - MixSKD: Self-Knowledge Distillation from Mixup for Image Recognition
⭐ code - Switchable Online Knowledge Distillation
⭐ code - Distilling the Undistillable: Learning from a Nasty Teacher
⭐ code - Masked Generative Distillation
⭐ code - DistPro: Searching a Fast Knowledge Distillation Process via Meta Optimization
⭐ code - Personalized Education: Blind Knowledge Distillation
⭐ code - Prune Your Model before Distill It
⭐ code - IDa-Det: An Information Discrepancy-Aware Distillation for 1-Bit Detectors
⭐ code - Deep Ensemble Learning by Diverse Knowledge Distillation for Fine-Grained Object Classification
- A Fast Knowledge Distillation Framework for Visual Recognition
🏠 project - Self-Regulated Feature Learning via Teacher-Free Feature Distillation
🏠 project
- Knowledge Condensation Distillation
- 量化
- Synergistic Self-supervised and Quantization Learning
😮 oral⭐ code - PalQuant: Accelerating High-precision Networks on Low-precision Accelerators
⭐ code - Fine-Grained Data Distribution Alignment for Post-Training Quantization
⭐ code - Symmetry Regularization and Saturating Nonlinearity for Robust Quantization
- Mixed-Precision Neural Network Quantization via Learned Layer-Wise Importance
- Non-uniform Step Size Quantization for Accurate Post-Training Quantization
⭐ code - Towards Accurate Network Quantization with Equivalent Smooth Regularizer
- Explicit Model Size Control and Relaxation via Smooth Regularization for Mixed-Precision Quantization
- BASQ: Branch-Wise Activation-Clipping Search Quantization for Sub-4-Bit Neural Networks
⭐ code - RDO-Q: Extremely Fine-Grained Channel-Wise Quantization via Rate-Distortion Optimization
- PTQ4ViT: Post-Training Quantization for Vision Transformers with Twin Uniform Quantization
- Synergistic Self-supervised and Quantization Learning
- 剪枝
- FairGRAPE: Fairness-aware GRAdient Pruning mEthod for Face Attribute Classification
⭐ code - Bayesian Optimization with Clustering and Rollback for CNN Auto Pruning
⭐ code - Trainability Preserving Neural Structured Pruning
⭐ code - Interpretations Steered Network Pruning via Amortized Inferred Saliency Maps
⭐ code - Data-Free Backdoor Removal Based on Channel Lipschitzness
⭐ code - Multi-Granularity Pruning for Model Acceleration on Mobile Devices
- Ensemble Knowledge Guided Sub-network Search and Fine-Tuning for Filter Pruning
- Soft Masking for Cost-Constrained Channel Pruning
⭐ code - Towards Ultra Low Latency Spiking Neural Networks for Vision and Sequential Tasks Using Temporal Pruning
- CPrune: Compiler-Informed Model Pruning for Efficient Target-Aware DNN Execution
- Filter Pruning via Feature Discrimination in Deep Neural Networks
- FairGRAPE: Fairness-aware GRAdient Pruning mEthod for Face Attribute Classification
- 轻量级
- MC
32.Point Cloud(点云)
- Few 'Zero Level Set'-Shot Learning of Shape Signed Distance Functions in Feature Space
- FH-Net: A Fast Hierarchical Network for Scene Flow Estimation on Real-World Point Clouds
⭐ code - Dynamic 3D Scene Analysis by Point Cloud Accumulation
⭐ code🏠 project - Point Cloud Compression with Sibling Context and Surface Priors
- LESS: Label-Efficient Semantic Segmentation for LiDAR Point Clouds
- Point MixSwap: Attentional Point Cloud Mixing via Swapping Matched Structural Divisions
⭐ code - MORE: Multi-Order RElation Mining for Dense Captioning in 3D Scenes
⭐ code - Bottom Up Top down Detection Transformers for Language Grounding in Images and Point Clouds
🏠 project - PointTree: Transformation-Robust Point Cloud Encoder with Relaxed K-D Trees
⭐ code - Learning to Generate Realistic LiDAR Point Clouds
🏠 project - PseudoAugment: Learning to Use Unlabeled Data for Data Augmentation in Point Clouds
- SPE-Net: Boosting Point Cloud Analysis via Rotation Robustness Enhancement
⭐ code - Resolution-Free Point Cloud Sampling Network with Data Distillation
⭐ code - diffConv: Analyzing Irregular Point Clouds with an Irregular View
⭐ code - GraphFit: Learning Multi-Scale Graph-Convolutional Representation for Point Cloud Normal Estimation
⭐ code - Quasi-Balanced Self-Training on Noise-Aware Synthesis of Object Point Clouds for Closing Domain Gap
⭐ code - PD-Flow: A Point Cloud Denoising Framework with Normalizing Flows
⭐ code - Shape-Pose Disentanglement Using SE(3)-Equivariant Vector Neurons
- Revisiting Point Cloud Simplification: A Learnable Feature Preserving Approach
- Masked Autoencoders for Point Cloud Self-Supervised Learning
⭐ code - Masked Discrimination for Self-Supervised Learning on Point Clouds
⭐ code - Meta-Sampler: Almost-Universal yet Task-Oriented Sampling for Point Clouds
⭐ code - Efficient Point Cloud Analysis Using Hilbert Curve
- RFNet-4D: Joint Object Reconstruction and Flow Estimation from 4D Point Clouds
⭐ code - 3D点云
- Autoregressive 3D Shape Generation via Canonical Mapping
⭐ code - Point Cloud Domain Adaptation via Masked Local 3D Structure Prediction
⭐ code - Exploring the Devil in Graph Spectral Domain for 3D Point Cloud Attacks
⭐ code - Unsupervised Learning of 3D Semantic Keypoints with Mutual Reconstruction
⭐ code - Few-Shot Class-Incremental Learning for 3D Point Cloud Objects
⭐ code - Image2Point: 3D Point-Cloud Understanding with 2D Image Pretrained Models
⭐ code - Manifold Adversarial Learning for Cross-Domain 3D Shape Representation
- Autoregressive 3D Shape Generation via Canonical Mapping
- 点云定位
- 点云分割
- 点云补全
- 点云配准
- SuperLine3D: Self-supervised Line Segmentation and Description for LiDAR Point Cloud
⭐ code - Improving RGB-D Point Cloud Registration by Learning Multi-scale Local Linear Transformation
⭐ code - PointCLM: A Contrastive Learning-Based Framework for Multi-Instance Point Cloud Registration
⭐ code - PCR-CG: Point Cloud Registration via Deep Explicit Color and Geometry
- SuperLine3D: Self-supervised Line Segmentation and Description for LiDAR Point Cloud
- 点云重建
- 点云分类
- 点云理解
31.SLAM/Augmented Reality/Virtual Reality/Robotics(增强/虚拟现实/机器人)
- 增强现实
- VR
- 虚拟试穿
- 视觉定位(相机姿势估计)
- 机器人
30.Optical Flow(光流)
- Secrets of Event-Based Optical Flow
⭐ code - Deep 360∘ Optical Flow Estimation Based on Multi-Projection Fusion
- Learning Omnidirectional Flow in 360-degree Video via Siamese Representation
🏠 project - Video Interpolation by Event-driven Anisotropic Adjustment of Optical Flow
- Learning Omnidirectional Flow in 360° Video via Siamese Representation
🏠 project - FlowFormer: A Transformer Architecture for Optical Flow
- Complementing Brightness Constancy with Deep Networks for Optical Flow Prediction
- Disentangling Architecture and Training for Optical Flow
🏠 project - A Perturbation-Constrained Adversarial Attack for Evaluating the Robustness of Optical Flow
😮 oral⭐ code - Optical Flow Training under Limited Label Budget via Active Learning
⭐ code - S2F2: Single-Stage Flow Forecasting for Future Multiple Trajectories Prediction
- Semi-Supervised Learning of Optical Flow by Flow Supervisor
⭐ code - Deep 360° Optical Flow Estimation Based on Multi-Projection Fusion
29.Re-identification(重识别)
- 重识别
- Negative Samples are at Large: Leveraging Hard-distance Elastic Loss for Re-identification
- PASS: Part-Aware Self-Supervised Pre-training for Person Re-identification
⭐ code - Adaptive Cross-Domain Learning for Generalizable Person Re-identification
⭐ code - Dynamically Transformed Instance Normalization Network for Generalizable Person Re-identification
- Mimic Embedding via Adaptive Aggregation: Learning Generalizable Person Re-identification
⭐ code - Modality Synergy Complement Learning with Cascaded Aggregation for Visible-Infrared Person Re-identification
⭐ code - Cross-Modality Transformer for Visible-Infrared Person Re-identification
- Optimal Transport for Label-Efficient Visible-Infrared Person Re-identification
⭐ code - Counterfactual Intervention Feature Transfer for Visible-Infrared Person Re-identification
- 行人搜索
- 人群计数
- Visual Search
- 步态识别
28.Neural Architecture Search(神经架构搜索)
- SuperTickets: Drawing Task-Agnostic Lottery Tickets from Supernets via Jointly Architecture Searching and Parameter Pruning
⭐ code - UniNet: Unified Architecture Search with Convolution, Transformer, and MLP
⭐ code - ScaleNet: Searching for the Model to Scale
⭐ code - CLOSE: Curriculum Learning On the Sharing Extent Towards Better One-shot NAS
⭐ code - Towards Regression-Free Neural Networks for Diverse Compute Platforms
- LidarNAS: Unifying and Searching Neural Architectures for 3D Point Clouds
- U-Boost NAS: Utilization-Boosted Differentiable Neural Architecture Search
⭐ code - A Max-Flow Based Approach for Neural Architecture Search
- ViTAS: Vision Transformer Architecture Search
- Learning Where to Look – Generative NAS Is Surprisingly Efficient
⭐ code - Neural Architecture Search for Spiking Neural Networks
- Data-Free Neural Architecture Search via Recursive Label Calibration
⭐ code
27.Image Classification(图像分类)
- Exploring Fine-Grained Audiovisual Categorization with the SSW60 Dataset
⭐ code - Centrality and Consistency: Two-Stage Clean Samples Identification for Learning with Instance-Dependent Noisy Labels
⭐ code - Not All Models Are Equal: Predicting Model Transferability in a Self-Challenging Fisher Space
⭐ code - Constructing Balance from Imbalance for Long-tailed Image Recognition
⭐ code - No Token Left Behind: Explainability-Aided Image Classification and Generation
⭐ code - Interpretable Image Classification with Differentiable Prototypes Assignment
⭐ code - Rotation Regularization without Rotation
⭐ code - Revisiting a kNN-based Image Classification System with High-capacity Storage
- In Defense of Image Pre-training for Spatiotemporal Recognition
⭐ code - Augmenting Deep Classifiers with Polynomial Neural Networks
- A Dataset Generation Framework for Evaluating Megapixel Image Classifiers & their Explanations
- Cartoon Explanations of Image Classifiers
- Exploring Hierarchical Graph Representation for Large-Scale Zero-Shot Image Classification
🏠 project - SSBNet: Improving Visual Recognition Efficiency by Adaptive Sampling
- AutoMix: Unveiling the Power of Mixup for Stronger Classifiers
- MaxViT: Multi-axis Vision Transformer
⭐ code - Self-Feature Distillation with Uncertainty Modeling for Degraded Image Recognition
- Three Things Everyone Should Know about Vision Transformers
- RealPatch: A Statistical Matching Framework for Model Patching with Real Samples
⭐ code - TDAM: Top-Down Attention Module for Contextually Guided Feature Selection in CNNs
⭐ code - Automatic Check-Out via Prototype-Based Classifier Learning from Single-Product Exemplars
⭐ code - Embedding Contrastive Unsupervised Features to Cluster in- and Out-of-Distribution Noise in Corrupted Image Datasets
- Unsupervised Few-Shot Image Classification by Learning Features into Clustering Space
- 小样本图像分类
- 多标签分类
- 长尾分类
- SAFA: Sample-Adaptive Feature Augmentation for Long-Tailed Image Classification
- Invariant Feature Learning for Generalized Long-Tailed Classification
⭐ code - Tackling Long-Tailed Category Distribution Under Domain Shifts
⭐ code🏠 project - Identifying Hard Noise in Long-Tailed Sample Distribution
😮 oral⭐ code - On Multi-Domain Long-Tailed Recognition, Imbalanced Domain Generalization and Beyond
⭐ code
- 视觉分类
- 细粒度识别
- 长尾识别
26.Video/Image Super-Resolution(视频/图像超分辨率)
- 跨模态超分辨率
- 图像超分辨率
- Image Super-Resolution with Deep Dictionary
⭐ code - CADyQ: Content-Aware Dynamic Quantization for Image Super-Resolution
⭐ code - Reference-based Image Super-Resolution with Deformable Attention Transformer
⭐ code - KXNet: A Model-Driven Deep Neural Network for Blind Super-Resolution
⭐ code - Super-Resolution by Predicting Offsets: An Ultra-Efficient Super-Resolution Network for Rasterized Images
- Boosting Event Stream Super-Resolution with a Recurrent Neural Network
- Learning Series-Parallel Lookup Tables for Efficient Image Super-Resolution
⭐ code - Efficient Long-Range Attention Network for Image Super-Resolution
⭐ code - Metric Learning Based Interactive Modulation for Real-World Super-Resolution
⭐ code - Dynamic Dual Trainable Bounds for Ultra-Low Precision Super-Resolution Networks
⭐ code - Perception-Distortion Balanced ADMM Optimization for Single-Image Super-Resolution
⭐ code - Uncertainty Learning in Kernel Estimation for Multi-stage Blind Image Super-Resolution
- MuLUT: Cooperating Multiple Look-Up Tables for Efficient Image Super-Resolution
- Adaptive Patch Exiting for Scalable Single Image Super-Resolution
⭐ code - From Face to Natural Image: Learning Real Degradation for Blind Image Super-Resolution
⭐ code - Unfolded Deep Kernel Estimation for Blind Image Super-Resolution
⭐ code - Efficient and Degradation-Adaptive Network for Real-World Image Super-Resolution
⭐ code - Self-Supervised Learning for Real-World Super-Resolution from Dual Zoomed Observations
⭐ code - Restore Globally, Refine Locally: A Mask-Guided Scheme to Accelerate Super-Resolution Networks
⭐ code - Compiler-Aware Neural Architecture Search for On-Mobile Real-Time Super-Resolution
⭐ code - KXNet: A Model-Driven Deep Neural Network for Blind Super-Resolution
⭐ code - ARM: Any-Time Super-Resolution Method
⭐ code - D2C-SR: A Divergence to Convergence Approach for Real-World Image Super-Resolution
⭐ code - RRSR:Reciprocal Reference-Based Image Super-Resolution with Progressive Feature Alignment and Selection
- Image Super-Resolution with Deep Dictionary
- 视频超分辨率
- Towards Interpretable Video Super-Resolution via Alternating Optimization
⭐ code - Learning Spatiotemporal Frequency-Transformer for Compressed Video Super-Resolution
⭐ code - Real-RawVSR: Real-World Raw Video Super-Resolution with a Benchmark Dataset
⭐ code - A Codec Information Assisted Framework for Efficient Compressed Video Super-Resolution
- Towards Interpretable Video Super-Resolution via Alternating Optimization
25.Autonomous vehicles(自动驾驶)
- 车辆轨迹预测
- 自动驾驶
- ST-P3: End-to-end Vision-based Autonomous Driving via Spatial-Temporal Feature Learning
⭐ code - Resolving Copycat Problems in Visual Imitation Learning via Residual Action Prediction
- Dfferentiable Raycasting for Self-supervised Occupancy Forecasting
⭐ code - Self-Distillation for Robust LiDAR Semantic Segmentation in Autonomous Driving
⭐ code - V2X-ViT: Vehicle-to-Everything Cooperative Perception with Vision Transformer
⭐ code - Radatron: Accurate Detection Using Multi-Resolution Cascaded MIMO Radar
🏠 project - Rethinking Closed-Loop Training for Autonomous Driving
- Motion Inspired Unsupervised Perception and Prediction in Autonomous Driving
- KING: Generating Safety-Critical Driving Scenarios for Robust Imitation via Kinematics Gradients
😮 oral🏠 project - InAction: Interpretable Action Decision Making for Autonomous Driving
⭐ code - CODA: A Real-World Road Corner Case Dataset for Object Detection in Autonomous Driving
🏠 project - Unsupervised Semantic Segmentation of Urban Scenes via Cross-Modal Distillation
😮 oral🏠 project - StretchBEV: Stretching Future Instance Prediction Spatially and Temporally
🏠 project - BEVFormer: Learning Bird's-Eye-View Representation from Multi-Camera Images via Spatiotemporal Transformers
⭐ code - Point Cloud Compression with Range Image-Based Entropy Model for Autonomous Driving
- ST-P3: End-to-end Vision-based Autonomous Driving via Spatial-Temporal Feature Learning
- 轨迹预测
- Learning Pedestrian Group Representations for Multi-modal Trajectory Prediction
⭐ code - Aware of the History: Trajectory Forecasting with the Local Behavior Data
⭐ code - Social-SSL: Self-Supervised Cross-Sequence Representation Learning Based on Transformers for Multi-agent Trajectory Prediction
⭐ code - Sequential Multi-View Fusion Network for Fast LiDAR Point Motion Estimation
- Social-Implicit: Rethinking Trajectory Prediction Evaluation and the Effectiveness of Implicit Maximum Likelihood Estimation
⭐ code - View Vertically: A Hierarchical Network for Trajectory Prediction via Fourier Spectrums
⭐ code - PreTraM: Self-Supervised Pre-training via Connecting Trajectory and Map
⭐ code
- Learning Pedestrian Group Representations for Multi-modal Trajectory Prediction
- 车道线检测
- 行人轨迹预测
- 车辆重识别
24.UAV/Remote Sensing/Satellite Image(无人机/遥感/卫星图像)
- 遥感
- 航空视频识别
- FAR: Fourier Aerial Video Recognition
🏠 project
23.Medical Image(医学影像)
- The Surprisingly Straightforward Scene Text Removal Method With Gated Attention and Region of Interest Generation: A Comprehensive Prominent Model Analysis
⭐ code🏠 project - 医学图像分割
- Personalizing Federated Medical Image Segmentation via Local Calibration
⭐ code - Learning Topological Interactions for Multi-Class Medical Image Segmentation
😮 oral⭐ code - Generalizable Medical Image Segmentation via Random Amplitude Mixup and Domain-Specific Image Restoration
⭐ code - PointScatter: Point Set Representation for Tubular Structure Extraction
😮 oral⭐ code - Dual Contrastive Learning with Anatomical Auxiliary Supervision for Few-Shot Medical Image Segmentation
⭐ code - Auto-FedRL: Federated Hyperparameter Optimization for Multi-Institutional Medical Image Segmentation
⭐ code - Med-DANet: Dynamic Architecture Network for Efficient Medical Volumetric Segmentation
⭐ code - CXR Segmentation by AdaIN-Based Domain Adaptation and Knowledge Distillation
- Personalizing Federated Medical Image Segmentation via Local Calibration
- 放射科报告生成
- 密集预测
- retinal image matching(视网膜图像匹配)
- 支架追踪
- 病变检测
- 医学图像分析
- 医学图像分类
- 医学关键点定位
22.OCR
- Levenshtein OCR
- 文本识别
- 手写数学表达式识别
- 场景文本检测
- Scene Text Recognition with Permuted Autoregressive Sequence Models
⭐ code - Dynamic Low-Resolution Distillation for Cost-Efficient End-to-End Text Spotting
⭐ code - SGBANet: Semantic GAN and Balanced Attention Network for Arbitrarily Oriented Scene Text Recognition
- Optimal Boxes: Boosting End-to-End Scene Text Recognition by Adjusting Annotated Bounding Boxes via Reinforcement Learning
- Contextual Text Block Detection towards Scene Text Understanding
🏠 project - Toward Understanding WordArt: Corner-Guided Transformer for Scene Text Recognition
😮 oral⭐ code - GLASS: Global to Local Attention for Scene-Text Spotting
⭐ code - Multi-Granularity Prediction for Scene Text Recognition
- Pure Transformer with Integrated Experts for Scene Text Recognition
- Background-Insensitive Scene Text Recognition with Text Semantic Segmentation
- Detecting Tampered Scene Text in the Wild
⭐ code - Language Matters: A Weakly Supervised Vision-Language Pre-training Approach for Scene Text Detection and Spotting
- TextAdaIN: Paying Attention to Shortcut Learning in Text Recognizers
- Multi-modal Text Recognition Networks: Interactive Enhancements between Visual and Semantic Features
⭐ code - OCR-Free Document Understanding Transformer
⭐ code
- Scene Text Recognition with Permuted Autoregressive Sequence Models
- 视频文本检测
- 文本检测
- 文件图像矫正
- document unwarping
21.Semi/self-supervised learning(半/自监督)
- 无监督
- Contrastive Positive Mining for Unsupervised 3D Action Representation Learning
- Dense Siamese Network for Dense Unsupervised Learning
⭐ code - Contrastive Positive Mining for Unsupervised 3D Action Representation Learning
- Relative Contrastive Loss for Unsupervised Representation Learning
- DiffuseMorph: Unsupervised Deformable Image Registration Using Diffusion Model
- 弱监督
- 自监督
- GOCA: Guided Online Cluster Assignment for Self-Supervised Video Representation Learning
⭐ code - RegionCL: Exploring Contrastive Region Pairs for Self-Supervised Representation Learning
⭐ code - Domain Knowledge-Informed Self-Supervised Representations for Workout Form Assessment
- Differentiable Raycasting for Self-Supervised Occupancy Forecasting
- How Severe Is Benchmark-Sensitivity in Video Self-Supervised Learning?
⭐ code - MaCLR: Motion-Aware Contrastive Learning of Representations for Videos
- Making the Most of Text Semantics to Improve Biomedical Vision-Language Processing
- Mc-BEiT: Multi-Choice Discretization for Image BERT Pre-training
⭐ code - What to Hide from Your Students: Attention-Guided Masked Image Modeling
⭐ code - Constrained Mean Shift Using Distant Yet Related Neighbors for Representation Learning
⭐ code - Semantic-Aware Fine-Grained Correspondence
- Self-Supervised Classification Network
⭐ code - Dual-Domain Self-Supervised Learning and Model Adaption for Deep Compressive Imaging
- SdAE: Self-distillated Masked Autoencoder
⭐ code - RDA: Reciprocal Distribution Alignment for Robust SSL
⭐ code - Motion Sensitive Contrastive Learning for Self-supervised Video Representation
- Towards Efficient and Effective Self-Supervised Learning of Visual Representations
⭐ code - Unifying Visual Contrastive Learning for Object Recognition from a Graph Perspective
- The Challenges of Continuous Self-Supervised Learning
- GeoRefine: Self-Supervised Online Depth Refinement for Accurate Dense Mapping
- Fusion from Decomposition: A Self-Supervised Decomposition Approach for Image Fusion
- DNA: Improving Few-Shot Transfer Learning with Low-Rank Decomposition and Alignment
⭐ code - Self-Supervised Learning of Visual Graph Matching
⭐ code - DisCo: Remedying Self-Supervised Learning on Lightweight Models with Distilled Contrastive Learning
⭐ code - SLIP: Self-Supervision Meets Language-Image Pre-training
⭐ code - Domain Invariant Masked Autoencoders for Self-Supervised Learning from Multi-Domains
- Improving Self-Supervised Lightweight Model Learning via Hard-Aware Metric Distillation
⭐ code - Masked Siamese Networks for Label-Efficient Learning
⭐ code - Natural Synthetic Anomalies for Self-Supervised Anomaly Detection and Localization
⭐ code - Understanding Collapse in Non-Contrastive Siamese Representation Learning
- Discovering Deformable Keypoint Pyramids
⭐ code
- GOCA: Guided Online Cluster Assignment for Self-Supervised Video Representation Learning
- 半监督
- Towards Realistic Semi-Supervised Learning
- OpenLDN: Learning to Discover Novel Classes for Open-World Semi-Supervised Learning
- Semi-Leak: Membership Inference Attacks Against Semi-supervised Learning
⭐ code - ConMatch: Semi-Supervised Learning with Confidence-Guided Consistency Regularization
⭐ code - Vibration-Based Uncertainty Estimation for Learning from Limited Supervision
- Unsupervised Selective Labeling for More Effective Semi-Supervised Learning
- RDA: Reciprocal Distribution Alignment for Robust Semi-Supervised Learning
⭐ code - Semi-Supervised Vision Transformers
⭐ code - CA-SSL: Class-Agnostic Semi-Supervised Learning for Detection and Segmentation
- RVSL: Robust Vehicle Similarity Learning in Real Hazy Scenes Based on Semi-supervised Learning
⭐ code - Semi-Supervised Keypoint Detector and Descriptor for Retinal Image Matching
- PSS: Progressive Sample Selection for Open-World Visual Representation Learning
⭐ code - Stochastic Consensus: Enhancing Semi-Supervised Learning with Consistency of Stochastic Classifiers
- 监督学习
20.Face(人脸)
- Effective Presentation Attack Detection Driven by Face Related Task
⭐ code - Facial Depth and Normal Estimation Using Single Dual-Pixel Camera
⭐ code - StyleFace: Towards Identity-Disentangled Face Generation on Megapixels
- Augmentation of rPPG Benchmark Datasets: Learning to Remove and Embed rPPG Signals via Double Cycle Consistent Learning from Unpaired Facial Videos
⭐ code - Custom Structure Preservation in Face Aging
- deepfake检测
- 三维人脸
- 活体检测
- Generative Domain Adaptation for Face Anti-Spoofing
- Multi-domain Learning for Updating Face Anti-spoofing Models
⭐ code - Source-Free Domain Adaptation with Contrastive Domain Alignment and Self-Supervised Exploration for Face Anti-Spoofing
⭐ code - Adaptive Transformers for Robust Few-Shot Cross-Domain Face Anti-Spoofing
- 人脸识别
- Controllable and Guided Face Synthesis for Unconstrained Face Recognition
⭐ code🏠 project - Towards Robust Face Recognition with Comprehensive Search
- BoundaryFace: A mining framework with noise label self-correction for Face Recognition
⭐ code - Privacy-Preserving Face Recognition with Learnable Privacy Budgets in Frequency Domain
⭐ code - OneFace: One Threshold for All
- AgeTransGAN for Facial Age Transformation with Rectified Performance Metrics
⭐ code - Teaching Where to Look: Attention Similarity Knowledge Distillation for Low Resolution Face Recognition
⭐ code - CoupleFace: Relation Matters for Face Recognition Distillation
- Towards Racially Unbiased Skin Tone Estimation via Scene Disambiguation
🏠 project - Pre-training Strategies and Datasets for Facial Representation Learning
- Unsupervised and Semi-Supervised Bias Benchmarking in Face Recognition
- Controllable and Guided Face Synthesis for Unconstrained Face Recognition
- 人脸聚类
- 说话人脸合成
- 谈话头像合成
- 人脸姿势估计
- 人脸交换
- 假脸检测
- UIA-ViT: Unsupervised Inconsistency-Aware Method based on Vision Transformer for Face Forgery Detection
😮 oral - An Information Theoretic Approach for Attention-Driven Face Forgery Detection
- Exploring Disentangled Content Information for Face Forgery Detection
- Adaptive Face Forgery Detection in Cross Domain
- UIA-ViT: Unsupervised Inconsistency-Aware Method based on Vision Transformer for Face Forgery Detection
- 人脸捕捉
- 人脸表情识别
- How to Synthesize a Large-Scale and Trainable Micro-Expression Dataset?
⭐ code - Teaching with Soft Label Smoothing for Mitigating Noisy Labels in Facial Expressions
⭐ code - Order Learning Using Partially Ordered Data via Chainization
⭐ code - Emotion-Aware Multi-View Contrastive Learning for Facial Emotion Recognition
⭐ code - Learn-to-Decompose: Cascaded Decomposition Network for Cross-Domain Few-Shot Facial Expression Recognition
⭐ code - Learn from All: Erasing Attention Consistency for Noisy Label Facial Expression Recognition
⭐ code
- How to Synthesize a Large-Scale and Trainable Micro-Expression Dataset?
- 三维人脸重建
- 人脸重现
- 人脸身份操作
- 人脸纹理合成与重建
- 人脸恢复
- 表情识别
19.Image Synthesis/Generation(图像合成)
- Injecting 3D Perception of Controllable NeRF-GAN into StyleGAN for Editable Portrait Image Synthesis
⭐ code🏠 project - GALA: Toward Geometry-and-Lighting-Aware Object Search for Compositing
- Generalized Brain Image Synthesis with Transferable Convolutional Sparse Coding Networks
- Auto-regressive Image Synthesis with Integrated Quantization
😮 oral - Paint2Pix: Interactive Painting based Progressive Image Synthesis and Editing
⭐ code - Improved Masked Image Generation with Token-Critic
- Weakly-Supervised Stitching Network for Real-World Panoramic Image Generation
- SCAM! Transferring humans between images with Semantic Cross Attention Modulation
🏠 project - PixelFolder: An Efficient Progressive Pixel Synthesis Network for Image Generation
⭐ code - Adaptive Feature Interpolation for Low-Shot Image Generation
- Few-Shot Image Generation with Mixup-Based Distance Learning
⭐ code - Multimodal Conditional Image Synthesis with Product-of-Experts GANs
🏠 project - Any-Resolution Training for High-Resolution Image Synthesis
🏠 project - 3D-Aware Indoor Scene Synthesis with Depth Priors
🏠 project - 图像生成
- DeltaGAN: Towards Diverse Few-shot Image Generation with Sample-Specific Delta
⭐ code - Scraping Textures from Natural Images for Synthesis and Editing
🏠 project - Word-Level Fine-Grained Story Visualization
- CoGS: Controllable Generation and Search from Sketch and Style
- Unsupervised Learning of Efficient Geometry-Aware Neural Articulated Representations
- Unleashing Transformers: Parallel Token Prediction with Discrete Absorbing Diffusion for Fast High-Resolution Image Generation from Vector-Quantized Codes
⭐ code
- DeltaGAN: Towards Diverse Few-shot Image Generation with Sample-Specific Delta
- 样本引导下的图像生成
- 文本-图像合成
- 从文本描述中生成不同的人类动作
18.Image-to-Image Translation(图像到图像翻译)
- VecGAN: Image-to-Image Translation with Interpretable Latent Directions
- Vector Quantized Image-to-Image Translation
⭐ code🏠 project - Ultra-high-resolution unpaired stain transformation via Kernelized Instance Normalization
⭐ code - Unpaired Image Translation via Vector Symbolic Architectures
😮 oral⭐ code - Bi-Level Feature Alignment for Versatile Image Translation and Manipulation
- ManiFest: Manifold Deformation for Few-Shot Image Translation
⭐ code - 图像翻译
17.GAN
- VQGAN-CLIP: Open Domain Image Generation and Editing with Natural Language Guidance
- Quantized GAN for Complex Music Generation from Dance Videos
⭐ code - RepMix: Representation Mixing for Robust Attribution of Synthesized Images
⭐ code - FakeCLR: Exploring Contrastive Learning for Solving Latent Discontinuity in Data-Efficient GANs
⭐ code - Generative Multiplane Images: Making a 2D GAN 3D-Aware
⭐ code🏠 project - Generator Knows What Discriminator Should Learn in Unconditional GANs
⭐ code - Hierarchical Semantic Regularization of Latent Spaces in StyleGANs
⭐ code🏠 project - Mind the Gap in Distilling StyleGANs
⭐ code - FurryGAN: High Quality Foreground-aware Image Synthesis
🏠 project - Improving GANs for Long-Tailed Data through Group Spectral Regularization
⭐ code🏠 project - 3D-FM GAN: Towards 3D-Controllable Face Manipulation
🏠 project - Exploring Gradient-based Multi-directional Controls in GANs
⭐ code - Studying Bias in GANs through the Lens of Race
- FairStyle: Debiasing StyleGAN2 with Style Channel Manipulations
🏠 project - FingerprintNet: Synthesized Fingerprints for Generated Image Detection
- Detecting Generated Images by Real Images
⭐ code - High-Fidelity GAN Inversion with Padding Space
🏠 project - A Style-Based GAN Encoder for High Fidelity Reconstruction of Images and Videos
⭐ code - BlobGAN: Spatially Disentangled Scene Representations
🏠 project - GAN with Multivariate Disentangling for Controllable Hair Editing
⭐ code - StyleGAN-Human: A Data-Centric Odyssey of Human Generation
⭐ code - EAGAN: Efficient Two-Stage Evolutionary Architecture Search for GANs
⭐ code - JoJoGAN: One Shot Face Stylization
- HairNet: Hairstyle Transfer with Pose Changes
- EleGANt: Exquisite and Locally Editable GAN for Makeup Transfer
⭐ code - Editing Out-of-Domain GAN Inversion via Differential Activations
⭐ code - On the Robustness of Quality Measures for GANs
⭐ code - Diverse Generation from a Single Video Made Possible
🏠 project - Rayleigh EigenDirections (REDs): Nonlinear GAN Latent Space Traversals for Multidimensional Features
- Generating Natural Images with Direct Patch Distributions Matching
⭐ code - TREND: Truncated Generalized Normal Density Estimation of Inception Embeddings for GAN Evaluation
- Neural Scene Decoration from a Single Photograph
⭐ code - ChunkyGAN: Real Image Inversion via Segments
- GAN Cocktail: Mixing GANs without Dataset Access
🏠 project - DuelGAN: A Duel between Two Discriminators Stabilizes the GAN Training
⭐ code - 线稿上色
- 图像生成
- GAN逆映射
- 妆发迁移
- 文本消除
16.Transformer
- k-means Mask Transformer
⭐ code - Outpainting by Queries
⭐ code - Laplacian Mesh Transformer: Dual Attention and Topology Aware Network for 3D Mesh Classification and Segmentation
- Locality Guidance for Improving Vision Transformers on Tiny Datasets
⭐ code - ParC-Net: Position Aware Circular Convolution with Merits from ConvNets and Transformer
⭐ code - MTFormer: Multi-task Learning via Transformer and Cross-Task Reasoning
- TinyViT: Fast Pretraining Distillation for Small Vision Transformers
⭐ code - MeshMAE: Masked Autoencoders for 3D Mesh Data Analysis
- An Impartial Take to the CNN vs Transformer Robustness Contest
- Ghost-free High Dynamic Range Imaging with Context-aware Transformer
⭐ code - EdgeViTs: Competing Light-Weight CNNs on Mobile Devices with Vision Transformers
⭐ code - Adaptive Token Sampling for Efficient Vision Transformers
😮 oral🏠 project - Self-Slimmed Vision Transformer
⭐ code - Are Vision Transformers Robust to Patch Perturbations?
- Selective TransHDR: Transformer-Based Selective HDR Imaging Using Ghost Region Mask
- BLT: Bidirectional Layout Transformer for Controllable Layout Generation
🏠 project - Convolutional Embedding Makes Hierarchical Vision Transformer Stronger
- AMixer: Adaptive Weight Mixing for Self-Attention Free Vision Transformers
⭐ code - Doubly-Fused ViT: Fuse Information from Vision Transformer Doubly with Local Representation
- VIP: Unified Certified Detection and Recovery for Patch Attack with Vision Transformers
- Improving Vision Transformers by Revisiting High-Frequency Components
⭐ code - VSA: Learning Varied-Size Window Attention in Vision Transformers
⭐ code - DaViT: Dual Attention Vision Transformers
⭐ code - KVT: k-NN Attention for Boosting Vision Transformers
⭐ code - ScalableViT: Rethinking the Context-Oriented Generalization of Vision Transformer
⭐ code - DeiT III: Revenge of the ViT
- Sliced Recursive Transformer
⭐ code - Improving Closed and Open-Vocabulary Attribute Prediction Using Transformers
🏠 project - Training Vision Transformers with Only 2040 Images
15.Vision-Language(视觉语言)
- FashionViL: Fashion-Focused Vision-and-Language Representation Learning
⭐ code - NewsStories: Illustrating articles with visual summaries
⭐ code - Can Shuffling Video Benefit Temporal Bias Problem: A Novel Training Framework for Temporal Grounding
⭐ code - Frozen CLIP Models are Efficient Video Learners
⭐ code - Generative Negative Text Replay for Continual Vision-Language Pretraining
- This Is My Unicorn, Fluffy”: Personalizing Frozen Vision-Language Representations
⭐ code - Contrastive Vision-Language Pre-training with Limited Resources
⭐ code - ASSISTER: Assistive Navigation via Conditional Instruction Generation
- X-DETR: A Versatile Architecture for Instance-Wise Vision-Language Tasks
⭐ code - UniTAB: Unifying Text and Box Outputs for Grounded Vision-Language Modeling
- Single-Stream Multi-level Alignment for Vision-Language Pretraining
- Most and Least Retrievable Images in Visual-Language Query Systems
- 视觉表征学习
- VLN
- Learning from Unlabeled 3D Environments for Vision-and-Language Navigation
🏠 project - Sim-2-Sim Transfer for Vision-and-Language Navigation in Continuous Environments
🏠 project - Bridging the visual gap in VLN via semantically richer instructions
- A Dataset for Interactive Vision-Language Navigation with Unknown Command Feasibility
- Learning Disentanglement with Decoupled Labels for Vision-Language Navigation
⭐ code - Multimodal Transformer with Variable-Length Memory for Vision-and-Language Navigation
⭐ code - FedVLN: Privacy-Preserving Federated Vision-and-Language Navigation
⭐ code - Bridging the Visual Semantic Gap in VLN via Semantically Richer Instructions
- Learning from Unlabeled 3D Environments for Vision-and-Language Navigation
- 视觉重定位
14.Visual Answer Questions(视觉问答)
- Weakly Supervised Grounding for VQA in Vision-Language Transformers
⭐ code - Rethinking Data Augmentation for Robust Visual Question Answering
⭐ code - Reliable Visual Question Answering: Abstain Rather Than Answer Incorrectly
⭐ code - New Datasets and Models for Contextual Reasoning in Visual Dialog
⭐ code - Classification-Regression for Chart Comprehension
- AssistQ: Affordance-Centric Question-Driven Task Completion for Egocentric Assistant
🏠 project - Video-QA
13.Human-Object Interaction(人物交互)
- Towards Hard-Positive Query Mining for DETR-based Human-Object Interaction Detection
⭐ code - Geometric Features Informed Multi-person Human-object Interaction Recognition in Videos
⭐ code - IGFormer: Interaction Graph Transformer for Skeleton-based Human Interaction Recognition
- Mining Cross-Person Cues for Body-Part Interactiveness Learning in HOI Detection
⭐ code - Iwin: Human-Object Interaction Detection via Transformer with Irregular Windows
- SAGA: Stochastic Whole-Body Grasping with Contact
🏠 project - Chairs Can Be Stood On: Overcoming Object Bias in Human-Object Interaction Detection
- Discovering Human-Object Interaction Concepts via Self-Compositional Learning
⭐ code - 交互式物体分割
- HOS
- 手物交互
- 人椅互动
12.Action Detection(人体动作检测与识别)
- 动作识别
- PrivHAR: Recognizing Human Actions from Privacy-Preserving Lens
🏠 project - Source-Free Video Domain Adaptation by Learning Temporal Consistency for Action Recognition
⭐ code - Efficient Video Transformers with Spatial-Temporal Token Selection
⭐ code - Learn2Augment: Learning to Composite Videos for Data Augmentation in Action Recognition
🏠 project - Hierarchically Self-Supervised Transformer for Human Skeleton Representation Learning
⭐ code - An Efficient Spatio-Temporal Pyramid Transformer for Action Detection
- Spatiotemporal Self-attention Modeling with Temporal Patch Shift for Action Recognition
⭐ code - Privacy-Preserving Action Recognition via Motion Difference Quantization
⭐ code - SOS! Self-Supervised Learning over Sets of Handled Objects in Egocentric Action Recognition
- Real-time Online Video Detection with Temporal Smoothing Transformers
⭐ code - CycDA: Unsupervised Cycle Domain Adaptation to Learn from Image to Video
- Uncertainty-Based Spatial-Temporal Attention for Online Action Detection
- Is Appearance Free Action Recognition Possible?
- Panoramic Human Activity Recognition
- Delving into Details: Synopsis-to-Detail Networks for Video Recognition
⭐ code - 细粒度动作识别
- 零样本动作识别
- 小样本动作识别
- 3D动作识别
- Collaborating Domain-shared and Target-specific Feature Clustering for Cross-domain 3D Action Recognition
- CMD: Self-supervised 3D Action Representation Learning with Cross-modal Mutual Distillation
😮 oral⭐ code - Continual 3D Convolutional Neural Networks for Real-Time Processing of Videos
⭐ code - Egocentric Activity Recognition and Localization on a 3D Map
- 基于骨架动作识别
- PrivHAR: Recognizing Human Actions from Privacy-Preserving Lens
- 社会群体活动识别
- Hunting Group Clues with Transformers for Social Group Activity Recognition
- Entry-Flipped Transformer for Inference and Prediction of Participant Behavior
- Hunting Group Clues with Transformers for Social Group Activity Recognition
- Self-Supervised Social Relation Representation for Human Group Detection
⭐ code - COMPOSER: Compositional Reasoning of Group Activity in Videos with Keypoint-Only Modality
⭐ code
- 时序动作检测
- Semi-Supervised Temporal Action Detection with Proposal-Free Masking
⭐ code - Temporal Action Detection with Global Segmentation Mask Learning
⭐ code - ReAct: Temporal Action Detection with Relational Queries
⭐ code - Zero-Shot Temporal Action Detection via Vision-Language Prompting
⭐ code - Weakly-Supervised Temporal Action Detection for Fine-Grained Videos with Hierarchical Atomic Actions
⭐ code - TALLFormer: Temporal Action Localization with a Long-Memory Transformer
⭐ code - A Sliding Window Scheme for Online Temporal Action Localization
- Proposal-Free Temporal Action Detection via Global Segmentation Mask Learning
⭐ code
- Semi-Supervised Temporal Action Detection with Proposal-Free Masking
- 时序动作定位
- 时序动作分割
- Action Quality Assessment(行动质量评估)
- 动作定位
11.Video
- Dynamic Temporal Filtering in Video Models
⭐ code - Delta Distillation for Efficient Video Processing
- TDViT: Temporal Dilated Video Transformer for Dense Video Tasks
⭐ code - 视频合成
- 视频-视频合成
- 视频帧插值
- 视频生成
- 视频质量评估
- 视频修复
- 视频去模糊
- 视频对话
- 有源扬声器检测(视频会议)
- VOS
- XMem: Long-Term Video Object Segmentation with an Atkinson-Shiffrin Memory Model
⭐ code🏠 project📺 video - Tackling Background Distraction in Video Object Segmentation
⭐ code - BATMAN: Bilateral Attention Transformer in Motion-Appearance Neighboring Space for Video Object Segmentation
- Hierarchical Feature Alignment Network for Unsupervised Video Object Segmentation
⭐ code - Learning Quality-aware Dynamic Memory for Video Object Segmentation
⭐ code - Global Spectral Filter Memory Network for Video Object Segmentation
⭐ code
- XMem: Long-Term Video Object Segmentation with an Atkinson-Shiffrin Memory Model
- VIS
- In Defense of Online Models for Video Instance Segmentation
😮 oral⭐ code - Instance As Identity: A Generic Online Paradigm for Video Instance Segmentation
⭐ code - Video Instance Segmentation via Multi-Scale Spatio-Temporal Split Attention Transformer
- Less than Few: Self-Shot Video Instance Segmentation
⭐ code - Video Mask Transfiner for High-Quality Video Instance Segmentation
- SeqFormer: Sequential Transformer for Video Instance Segmentation
⭐ code
- In Defense of Online Models for Video Instance Segmentation
- VSS
- VPS
- 视频抠图
- 视频表征
- 视频传输
- 运动分割
- 视频异常检测
- 视频识别
- Temporal Saliency Query Network for Efficient Video Recognition
🏠 project - NSNet: Non-saliency Suppression Sampler for Efficient Video Recognition
🏠 project - Expanding Language-Image Pretrained Models for General Video Recognition
😮 oral⭐ code - AdaFocusV3: On Unified Spatial-temporal Dynamic Video Recognition
- DualFormer: Local-Global Stratified Transformer for Efficient Video Recognition
⭐ code - K-Centered Patch Sampling for Efficient Video Recognition
- Temporal Saliency Query Network for Efficient Video Recognition
- 视频理解
- Spotting Temporally Precise, Fine-Grained Events in Video
⭐ code🏠 project - Point Primitive Transformer for Long-Term 4D Point Cloud Video Understanding
- Panoramic Vision Transformer for Saliency Detection in 360° Videos
⭐ code - Streaming Multiscale Deep Equilibrium Models
🏠 project - Learning Shadow Correspondence for Video Shadow Detection
- Federated Self-Supervised Learning for Video Understanding
⭐ code - Prompting Visual-Language Models for Efficient Video Understanding
- GraphVid: It Only Takes a Few Nodes to Understand a Video
- Spotting Temporally Precise, Fine-Grained Events in Video
- 视频分类
- 视频卷帘快门(Rolling shutter)
- Video Transition Effects(视频转场特效)
- 图像-视频编解码
- AlphaVC: High-Performance and Efficient Learned Video Compression
- A Cloud 3D Dataset and Application-Specific Learned Image Compression in Cloud 3D
⭐ code - CANF-VC: Conditional Augmented Normalizing Flows for Video Compression
⭐ code - Expanded Adaptive Scaling Normalization for End to End Image Compression
- Coarse-to-Fine Sparse Transformer for Hyperspectral Image Reconstruction
⭐ code - Content Adaptive Latents and Decoder for Neural Image Compression
- Contextformer: A Transformer with Spatio-Channel Attention for Context Modeling in Learned Image Compression
- RAWtoBit: A Fully End-to-End Camera ISP Network
- Content-Oriented Learned Image Compression
- Implicit Neural Representations for Image Compression
- Neural Video Compression Using GANs for Detail Synthesis and Propagation
- 视频摘要
- Video Grounding
- 帧插值
- 视频分析
- 视频编辑
- 视频增强
- 视频目标重识别
- 图像视频编辑
- 视频升格
- 视频色彩传播
- 视听事件定位
- 视频活动定位
- 视听视频解析
- Video Highlight Detection
- 视频片段分类
- Video Relation Grounding
- 视频片段检索
10.Pose Estimation(物体姿势估计)
- 物体姿势
- Neural Correspondence Field for Object Pose Estimation
⭐ code🏠 project - Zero-Shot Category-Level Object Pose Estimation
⭐ code - Fusing Local Similarities for Retrieval-based 3D Orientation Estimation of Unseen Objects
⭐ code🏠 project - A Visual Navigation Perspective for Category-Level Object Pose Estimation
⭐ code - Polarimetric Pose Prediction
- RayTran: 3D Pose Estimation and Shape Reconstruction of Multiple Objects from Videos with Ray-Traced Transformers
- Gaussian Activated Neural Radiance Fields for High Fidelity Reconstruction & Pose Estimation
- Neural Correspondence Field for Object Pose Estimation
- 物体姿势变换
- 抓取物体姿势估计
- 4D
- 6D
- Category-Level 6D Object Pose and Size Estimation using Self-Supervised Deep Prior Deformation Networks
⭐ code - Sim-to-Real 6D Object Pose Estimation via Iterative Self-Training for Robotic Bin Picking
🏠 project - Gen6D: Generalizable Model-Free 6-DoF Object Pose Estimation from RGB Images
🏠 project - Affine Correspondences between Multi-Camera Systems for 6DOF Relative Pose Estimation
⭐ code - ShAPO: Implicit Representations for Multi-Object Shape, Appearance, and Pose Optimization
🏠 project - RBP-Pose: Residual Bounding Box Projection for Category-Level Pose Estimation
⭐ code - Robust Category-Level 6D Pose Estimation with Coarse-to-Fine Rendering of Neural Features
- Learning-based Point Cloud Registration for 6D Object Pose Estimation in the Real World
⭐ code - Perspective Flow Aggregation for Data-Limited 6D Object Pose Estimation
⭐ code - Object Level Depth Reconstruction for Category Level 6D Object Pose Estimation from Monocular RGB Image
- DProST: Dynamic Projective Spatial Transformer Network for 6D Pose Estimation
⭐ code - WeLSA: Learning to Predict 6D Pose from Weakly Labeled Data Using Shape Alignment
- DCL-Net: Deep Correspondence Learning Network for 6D Pose Estimation
⭐ code - DISP6D: Disentangled Implicit Shape and Pose Learning for Scalable 6D Pose Estimation
⭐ code - Vote from the Center: 6 DoF Pose Estimation in RGB-D Images by Radial Keypoint Voting
⭐ code
- Category-Level 6D Object Pose and Size Estimation using Self-Supervised Deep Prior Deformation Networks
- 9D
9.Human Pose Estimation(人体姿态估计)
- Self-Constrained Inference Optimization on Structural Groups for Human Pose Estimation
- Pose for Everything: Towards Category-Agnostic Pose Estimation
😮 oral⭐ code - BodySLAM: Joint Camera Localisation, Mapping, and Human Motion Tracking
- PoseTrans: A Simple Yet Effective Pose Transformation Augmentation for Human Pose Estimation
- Learning Visibility for Robust Dense Human Body Estimation
⭐ code - D&D: Learning Human Dynamics from Dynamic Camera
😮 oral⭐ code - PPT: token-Pruned Pose Transformer for monocular and multi-view human pose estimation
⭐ code - DeciWatch: A Simple Baseline for 10× Efficient 2D and 3D Pose Estimation
⭐ code - SmoothNet: A Plug-and-Play Network for Refining Human Poses in Videos
⭐ code - Poseur: Direct Human Pose Regression with Transformers
⭐ code - SimCC: A Simple Coordinate Classification Perspective for Human Pose Estimation
⭐ code - Regularizing Vector Embedding in Bottom-Up Human Pose Estimation
⭐ code - Hallucinating Pose-Compatible Scenes
- A Unified Framework for Domain Adaptive Pose Estimation
⭐ code - 运动捕捉
- 基于点的衣着人体建模
- 动态人体数字化
- 人体姿势与形状估计
- 三维人体姿势估计
- DH-AUG: DH Forward Kinematics Model Driven Augmentation for 3D Human Pose Estimation
⭐ code - Faster VoxelPose: Real-time 3D Human Pose Estimation by Orthographic Projection
- Explicit Occlusion Reasoning for Multi-person 3D Human Pose Estimation
⭐ code - PoseScript: 3D Human Poses from Natural Language
🏠 project - Multi-Person 3D Pose and Shape Estimation via Inverse Kinematics and Refinement
- 3D Human Pose Estimation Using Möbius Graph Convolutional Networks
- P-STMO: Pre-trained Spatial Temporal Many-to-One Model for 3D Human Pose Estimation
⭐ code - C3P: Cross-Domain Pose Prior Propagation for Weakly Supervised 3D Human Pose Estimation
⭐ code - Structural Triangulation: A Closed-Form Solution to Constrained 3D Human Pose Estimation
⭐ code - VirtualPose: Learning Generalizable 3D Human Pose Models from Virtual Data
⭐ code - Learning to Fit Morphable Models
- EgoBody: Human Body Shape and Motion of Interacting People from Head-Mounted Devices
🏠 project - AutoAvatar: Autoregressive Neural Fields for Dynamic Avatar Modeling
🏠 project - FLEX: Extrinsic Parameters-Free Multi-View 3D Human Motion Reconstruction
🏠 project
- DH-AUG: DH Forward Kinematics Model Driven Augmentation for 3D Human Pose Estimation
- Mul-Pose
- 三维人体重建
- 3D Clothed Human Reconstruction in the Wild
⭐ code - DiffuStereo: High Quality Human Reconstruction via Diffusion-Based Stereo Using Sparse Cameras
- UNIF: United Neural Implicit Functions for Clothed Human Reconstruction and Animation
⭐ code - The One Where They Reconstructed 3D Humans and Environments in TV Shows
⭐ code🏠 project - Neural Capture of Animatable 3D Human from Monocular Video
- SUPR: A Sparse Unified Part-Based Human Representation
⭐ code🏠 project - IntegratedPIFu: Integrated Pixel Aligned Implicit Function for Single-view Human Reconstruction
⭐ code - Learned Vertex Descent:A New Direction for 3D Human Model Fitting
⭐ code🏠 project
- 3D Clothed Human Reconstruction in the Wild
- 三维交互式手部姿势估计
- 姿势合成
- 手物重建
- 人体与场景的交互
- 人体姿势建模
- 姿势跟踪
- 三维人体网格恢复
- 三维人体运动预测与生成
- 姿势迁移
- 人体姿势预测
- 4D
- 人体网格恢复
- 手部网格估计
- 头部网格重建
- 人体网格动画
- 音频驱动的风格化手势生成
8.3D(三维视觉)
- DeepPS2: Revisiting Photometric Stereo Using Two Differently Illuminated Images
⭐ code - Towards High-Fidelity Single-view Holistic Reconstruction of Indoor Scenes
⭐ code - Self-calibrating Photometric Stereo by Neural Inverse Rendering
⭐ code - 3DG-STFM: 3D Geometric Guided Student-Teacher Feature Matching
⭐ code - Learning Online Multi-sensor Depth Fusion
⭐ code - Stereo Matching
- MVS
- 3D场景合成
- 场景重建
- 深度估计
- Physical Attack on Monocular Depth Estimation with Optimal Adversarial Patches
- Relationship Spatialization for Depth Estimation
- BRNet: Exploring Comprehensive Features for Monocular Depth Estimation
⭐ code - Towards Scale-Aware, Robust, and Generalizable Unsupervised Monocular Depth Estimation by Integrating IMU Motion Dynamics
⭐ code - Stereo Depth Estimation with Echoes
- JPerceiver: Joint Perception Network for Depth, Pose and Layout Estimation in Driving Scenes
⭐ code - RA-Depth: Resolution Adaptive Self-Supervised Monocular Depth Estimation
⭐ code - PointFix: Learning to Fix Domain Bias for Robust Online Stereo Adaptation
- Depth Field Networks for Generalizable Multi-view Scene Representation
🏠 project - Structure and Motion from Casual Videos
- MODE: Multi-View Omnidirectional Depth Estimation with 360° Cameras
⭐ code - Gradient-based Uncertainty for Monocular Depth Estimation
⭐ code - DevNet: Self-supervised Monocular Depth Learning via Density Volume Construction
⭐ code - Self-distilled Feature Aggregation for Self-supervised Monocular Depth Estimation
⭐ code - 3D-PL: Domain Adaptive Depth Estimation with 3D-aware Pseudo-Labeling
⭐ code - DELTAR: Depth Estimation from a Light-weight ToF Sensor and RGB Image
⭐ code🏠 project - FloatingFusion: Depth from ToF and Image-stabilized Stereo Cameras
- Context-Enhanced Stereo Transformer
- Adaptive Co-Teaching for Unsupervised Monocular Depth Estimation
⭐ code - PanoFormer: Panorama Transformer for Indoor 360° Depth Estimation
⭐ code - Towards Comprehensive Representation Enhancement in Semantics-Guided Self-Supervised Monocular Depth Estimation
- LocalBins: Improving Depth Estimation by Learning Local Distributions
⭐ code - Depth Map Decomposition for Monocular Depth Estimation
- Uncertainty Quantification in Depth Estimation via Constrained Ordinal Regression
⭐ code - Spike Transformer: Monocular Depth Estimation for Spiking Camera
⭐ code - Learning Phase Mask for Privacy-Preserving Passive Depth Estimation
- 深度补全
- GraphCSPN: Geometry-Aware Depth Completion via Dynamic GCNs
⭐ code - RigNet: Repetitive Image Guided Network for Depth Completion
- Multi-modal Masked Pre-training for Monocular Panoramic Depth Completion
- Monitored Distillation for Positive Congruent Depth Completion
⭐ code - CostDCNet: Cost Volume Based Depth Completion for a Single RGB-D Image
⭐ code
- GraphCSPN: Geometry-Aware Depth Completion via Dynamic GCNs
- 三维视觉
- A Closer Look at Invariances in Self-supervised Pre-training for 3D Vision
- Neural Density-Distance Fields
⭐ code - DANBO: Disentangled Articulated Neural Body Representations via Graph Neural Networks
🏠 project - CryoAI: Amortized Inference of Poses for Ab Initio Reconstruction of 3D Molecular Volumes from Real Cryo-EM Images
- 三维房间布局
- 三维重建
- Object-Compositional Neural Implicit Surfaces
⭐ code🏠 project📺 video - Perspective Phase Angle Model for Polarimetric 3D Reconstruction
⭐ code - Monocular 3D Object Reconstruction with GAN Inversion
⭐ code🏠 project - Structural Causal 3D Reconstruction
- 2D GANs Meet Unsupervised Single-view 3D Reconstruction
⭐ code🏠 project - Few-shot Single-view 3D Reconstruction with Memory Prior Contrastive Network
- NeuRIS: Neural Reconstruction of Indoor Scenes Using Normal Priors
🏠 project - SparseNeuS: Fast Generalizable Neural Surface Reconstruction from Sparse Views
🏠 project - Disentangling Object Motion and Occlusion for Unsupervised Multi-Frame Monocular Depth
⭐ code - SNeS: Learning Probably Symmetric Neural Surfaces from Incomplete Data
- CIRCLE: Convolutional Implicit Reconstruction and Completion for Large-Scale Indoor Scene
- IS-MVSNet: Importance Sampling-Based MVSNet
⭐ code - Unbiased Gradient Estimation for Differentiable Surface Splatting via Poisson Sampling
⭐ code - Towards Learning Neural Representations from Shadows
- PlaneFormers: From Sparse View Planes to 3D Reconstruction
⭐ code🏠 project📺 video - SimpleRecon: 3D Reconstruction Without 3D Convolutions
⭐ code - Share with Thy Neighbors: Single-View Reconstruction by Cross-Instance Consistency
- SketchSampler: Sketch-Based 3D Reconstruction via View-Dependent Depth Sampling
- Semi-Supervised Single-View 3D Reconstruction via Prototype Shape Priors
⭐ code - Bilateral Normal Integration
⭐ code - CHORE: Contact, Human and Object REconstruction from a Single RGB Image
⭐ code🏠 project - Directed Ray Distance Functions for 3D Scene Reconstruction
🏠 project - Object Wake-Up: 3D Object Rigging from a Single Image
⭐ code🏠 project - Latent Partition Implicit with Surface Codes for 3D Representation
⭐ code - 3D Equivariant Graph Implicit Functions
- Projective Parallel Single-Pixel Imaging to Overcome Global Illumination in 3D Structure Light Scanning
- EvAC3D: From Event-Based Apparent Contours to 3D Models via Continuous Visual Hulls
🏠 project - 3D CoMPaT: Composition of Materials on Parts of 3D Things
🏠 project
- Object-Compositional Neural Implicit Surfaces
- 三维形状
- 3D形状匹配
- 3D形状合成
- 形状补全
- 形状解析
- 形状修补
- depth restoration
- 场景理解
- Spatially Invariant Unsupervised 3D Object-Centric Learning and Scene Decomposition
- Pose2Room: Understanding 3D Scenes from Human Activities
- Inverted Pyramid Multi-task Transformer for Dense Scene Understanding
⭐ code - Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation
⭐ code - Weakly Supervised 3D Scene Segmentation with Region-Level Boundary Awareness and Instance Discrimination
- 4DContrast: Contrastive Learning with Dynamic Correspondences for 3D Scene Understanding
- Point Scene Understanding via Disentangled Instance Mesh Reconstruction
⭐ code - PIP: Physical Interaction Prediction via Mental Simulation with Span Selection
🏠 project
7.Object Tracking(目标跟踪)
- Towards Grand Unification of Object Tracking
😮 oral⭐ code📰 ECCV 2022 Oral《Unicorn》首次统一了四项目标跟踪任务的网络结构与学习范式,在8个富有挑战性的数据集上SOTA - HVC-Net: Unifying Homography, Visibility, and Confidence Learning for Planar Object Tracking
- Tracking by Associating Clips
- ByteTrack: Multi-Object Tracking by Associating Every Detection Box
⭐ code - Joint Feature Learning and Relation Modeling for Tracking: A One-Stream Framework
⭐ code - Backbone Is All Your Need: A Simplified Architecture for Visual Object Tracking
⭐ code - Robust Visual Tracking by Segmentation
⭐ code - FEAR: Fast, Efficient, Accurate and Robust Visual Tracker
⭐ code - 3D跟踪
- 3D Siamese Transformer Network for Single Object Tracking on Point Clouds
⭐ code - SpOT: Spatiotemporal Modeling for 3D Object Tracking
- Large-displacement 3D Object Tracking with Hybrid Non-local Optimization
⭐ code - CMT: Context-Matching-Guided Transformer for 3D Tracking in Point Clouds
- Towards Generic 3D Tracking in RGBD Videos: Benchmark and Baseline
⭐ code
- 3D Siamese Transformer Network for Single Object Tracking on Point Clouds
- 多目标跟踪
- Tracking Objects as Pixel-wise Distributions
😮 oral - The Caltech Fish Counting Dataset: A Benchmark for Multiple-Object Tracking and Counting
- MOTCOM: The Multi-Object Tracking Dataset Complexity Metric
⭐ code🏠 project - Tracking Every Thing in the Wild
- PolarMOT: How Far Can Geometric Relations Take Us in 3D Multi-Object Tracking?
- SOMPT22: A Surveillance Oriented Multi-Pedestrian Tracking Dataset
- Robust Multi-Object Tracking by Marginal Inference
- MOTR: End-to-End Multiple-Object Tracking with TRansformer
⭐ code - Large Scale Real-World Multi-person Tracking
⭐ code - Particle Video Revisited: Tracking through Occlusions Using Point Trajectories
🏠 project
- Tracking Objects as Pixel-wise Distributions
- 视觉跟踪
- 细胞跟踪
6.Object Detection(目标检测)
- Should All Proposals be Treated Equally in Object Detection?
⭐ code - TIDEE: Tidying Up Novel Rooms Using Visuo-Semantic Commonsense Priors
🏠 project - TALISMAN: Targeted Active Learning for Object Detection with Rare Classes and Slices Using Submodular Mutual Information
⭐ code - HEAD: HEtero-Assists Distillation for Heterogeneous Object Detectors
⭐ code - Adversarially-Aware Robust Object Detector
😮 oral⭐ code - ObjectBox: From Centers to Boxes for Anchor-Free Object Detection
😮 oral⭐ code - Point-to-Box Network for Accurate Object Detection via Single Point Supervision
⭐ code - You Should Look at All Objects
⭐ code - Class-agnostic Object Detection with Multi-modal Transformer
⭐ code
使用多模态 ViTs 和人类可理解的文本查询来生成高质量的OP - Exploiting Unlabeled Data with Vision and Language Models for Object Detection
⭐ code - PoserNet: Refining Relative Camera Poses Exploiting Object Detections
⭐ code - Robust Object Detection With Inaccurate Bounding Boxes
⭐ code - UC-OWOD: Unknown-Classified Open World Object Detection
⭐ code - Exploring Resolution and Degradation Clues as Self-supervised Signal for Low Quality Object
⭐ code - Unifying Visual Perception by Dispersible Points Learning
⭐ code - A Large-scale Multiple-objective Method for Black-box Attack against Object Detection
⭐ code - Distilling Object Detectors With Global Knowledge
⭐ code - PANDORA: A Panoramic Detection Dataset for Object with Orientation
⭐ code - Exploring Plain Vision Transformer Backbones for Object Detection
⭐ code - Long-Tail Detection with Effective Class-Margins
⭐ code - Detecting Twenty-Thousand Classes Using Image-Level Supervision
⭐ code - Exploring Resolution and Degradation Clues As Self-Supervised Signal for Low Quality Object Detection
⭐ code - Translation, Scale and Rotation: Cross-Modal Alignment Meets RGB-Infrared Vehicle Detection
- MTTrans: Cross-Domain Object Detection with Mean Teacher Transformer
- PromptDet: Towards Open-Vocabulary Detection Using Uncurated Images
🏠 project - Cornerformer: Purifying Instances for Corner-Based Detectors
- Efficient Decoder-Free Object Detection with Transformers
⭐ code - W2N: Switching from Weak Supervision to Noisy Supervision for Object Detection
⭐ code - Towards Data-Efficient Detection Transformers
⭐ code - Open-Vocabulary DETR with Conditional Matching
⭐ code - Prediction-Guided Distillation for Dense Object Detection
⭐ code - Multimodal Object Detection via Probabilistic Ensembling
⭐ code - Open Vocabulary Object Detection with Pseudo Bounding-Box Labels
- GLAMD: Global and Local Attention Mask Distillation for Object Detectors
- Object Detection As Probabilistic Set Prediction
- Out-of-Distribution Identification: Let Detector Tell Which I Am Not Sure
- Simple Open-Vocabulary Object Detection with Vision Transformers
⭐ code - A Simple Approach and Benchmark for 21,000-Category Object Detection
⭐ code - EAutoDet: Efficient Architecture Search for Object Detection
⭐ code - Few-Shot End-to-End Object Detection via Constantly Concentrated Encoding across Heads
- 3D目标检测
- DID-M3D: Decoupling Instance Depth for Monocular 3D Object Detection
⭐ code - Rethinking IoU-based Optimization for Single-stage 3D Object Detection
⭐ code - Densely Constrained Depth Estimator for Monocular 3D Object Detection
⭐ code - Learning Ego 3D Representation As Ray Tracing
🏠 project - LiDAR Distillation: Bridging the Beam-Induced Domain Gap for 3D Object Detection
⭐ code - SpatialDETR: Robust Scalable Transformer-Based 3D Object Detection from Multi-View Camera Images with Global Cross-Sensor Attention
⭐ code - AutoAlignV2: Deformable Feature Aggregation for Dynamic Multi-Modal 3D Object Detection
⭐ code - DEVIANT: Depth EquiVarIAnt NeTwork for Monocular 3D Object Detection
⭐ code - Label-Guided Auxiliary Training Improves 3D Object Detector
⭐ code - Monocular 3D Object Detection with Depth from Motion
😮 oral⭐ code - MV-FCOS3D++: Multi-View Camera-Only 4D Object Detection with Pretrained Monocular Backbones
😮 oral⭐ code - Graph R-CNN: Towards Accurate 3D Object Detection with Semantic-Decorated Local Graph
😮 oral⭐ code - Multimodal Transformer for Automatic 3D Annotation and Object Detection
⭐ code - Semi-Supervised 3D Object Detection with Proficient Teachers
⭐ code - ProposalContrast: Unsupervised Pre-training for LiDAR-Based 3D Object Detection
⭐ code - CenterFormer: Center-based Transformer for 3D Object Detection
😮 oral⭐ code - SWFormer: Sparse Window Transformer for 3D Object Detection in Point Clouds
- Autoregressive Uncertainty Modeling for 3D Bounding Box Prediction
⭐ code - CramNet: Camera-Radar Fusion with Ray-Constrained Cross-Attention for Robust 3D Object Detection
- Homogeneous Multi-modal Feature Fusion and Interaction for 3D Object Detection
- Plausibility Verification For 3D Object Detectors Using Energy-Based Optimization
- Cross-Modality Knowledge Distillation Network for Monocular 3D Object Detection
⭐ code - PETR: Position Embedding Transformation for Multi-View 3D Object Detection
⭐ code - Lidar Point Cloud Guided Monocular 3D Object Detection
⭐ code - INT: Towards Infinite-Frames 3D Detection with an Efficient Framework
- Semi-Supervised Monocular 3D Object Detection by Multi-View Consistency
- Unsupervised Domain Adaptation for Monocular 3D Object Detection via Self-Training
⭐ code - MPPNet: Multi-Frame Feature Intertwining with Proxy Points for 3D Temporal Object Detection
⭐ code - PillarNet: Real-Time and High-Performance Pillar-Based 3D Object Detection
⭐ code - Improving the Intra-Class Long-Tail in 3D Detection via Rare Example Mining
- 3D Object Detection with a Self-Supervised Lidar Scene Flow Backbone
⭐ code - DetMatch: Two Teachers Are Better than One for Joint 2D and 3D Semi-Supervised Object Detection
⭐ code - FCAF3D: Fully Convolutional Anchor-Free 3D Object Detection
⭐ code - Enhancing Multi-modal Features Using Local Self-Attention for 3D Object Detection
- DID-M3D: Decoupling Instance Depth for Monocular 3D Object Detection
- 半监督目标检测
- Dense Teacher: Dense Pseudo-Labels for Semi-supervised Object Detection
⭐ code - Semi-Supervised Object Detection via Virtual Category Learning
⭐ code - Open-Set Semi-Supervised Object Detection
⭐ code🏠 project - PseCo: Pseudo Labeling and Consistency Training for Semi-Supervised Object Detection
⭐ code - Diverse Learner: Exploring Diverse Supervision for Semi-Supervised Object Detection
- Dense Teacher: Dense Pseudo-Labels for Semi-supervised Object Detection
- 小样本目标检测
- Rethinking Few-Shot Object Detection on a Multi-Domain Benchmark
⭐ code - Multi-Faceted Distillation of Base-Novel Commonality for Few-shot Object Detection
⭐ code - AcroFOD: An Adaptive Method for Cross-domain Few-shot Object Detection
⭐ code - Time-rEversed diffusioN tEnsor Transformer: A new TENET of Few-Shot Object Detection
⭐ code - AirDet: Few-Shot Detection without Fine-Tuning for Autonomous Exploration
⭐ code - Few-Shot Object Detection by Knowledge Distillation Using Bag-of-Visual-Words Representations
- Few-Shot Object Detection with Model Calibration
⭐ code - Few-Shot Video Object Detection
⭐ code - Mutually Reinforcing Structure with Proposal Contrastive Consistency for Few-Shot Object Detection
⭐ code
- Rethinking Few-Shot Object Detection on a Multi-Domain Benchmark
- 显著目标检测
- SESS: Saliency Enhancing with Scaling and Sliding
⭐ code - SPSN: Superpixel Prototype Sampling Network for RGB-D Salient Object Detection
⭐ code - Salient Object Detection for Point Clouds
⭐ code - KD-SCFNet: Towards More Accurate and Efficient Salient Object Detection via Knowledge Distillation
⭐ code - Saliency Hierarchy Modeling via Generative Kernels for Salient Object Detection
- MVSalNet:Multi-View Augmentation for RGB-D Salient Object Detection
- SESS: Saliency Enhancing with Scaling and Sliding
- 弱监督目标检测
- Active Learning Strategies for Weakly-supervised Object Detection
⭐ code - W2N:Switching From Weak Supervision to Noisy Supervision for Object Detection
⭐ code - Object Discovery via Contrastive Learning for Weakly Supervised Object Detection
⭐ code - End-to-End Weakly Supervised Object Detection with Sparse Proposal Evolution
- Active Learning Strategies for Weakly-supervised Object Detection
- 目标定位
- Object Manipulation via Visual Target Localization
🏠 project - On Label Granularity and Object Localization
- 弱监督目标定位
- Bagging Regional Classification Activation Maps for Weakly Supervised Object Localization
⭐ code - Weakly Supervised Object Localization through Inter-class Feature Similarity and Intra-Class Appearance Consistency
- Weakly Supervised Object Localization via Transformer with Implicit Spatial Calibration
⭐ code
- Bagging Regional Classification Activation Maps for Weakly Supervised Object Localization
- Object Manipulation via Visual Target Localization
- 单阶目标检测
- 目标计数
- OOD
- Out-of-Distribution Detection with Semantic Mismatch under Masking
⭐ code - Out-of-Distribution Detection with Boundary Aware Learning
- DICE: Leveraging Sparsification for Out-of-Distribution Detection
⭐ code - Class Is Invariant to Context and Vice Versa: On Learning Invariance for Out-of-Distribution Generalization
⭐ code - Data Invariants to Understand Unsupervised Out-of-Distribution Detection
- Out-of-Distribution Detection with Semantic Mismatch under Masking
- VOD
- PTSEFormer: Progressive Temporal-Spatial Enhanced TransFormer towards Video Object Detection
⭐ code - SALISA: Saliency-Based Input Sampling for Efficient Video Object Detection
- Bridging Images and Videos: A Simple Learning Framework for Large Vocabulary Video Object Detection
- Efficient One-Stage Video Object Detection by Exploiting Temporal Consistency
⭐ code
- PTSEFormer: Progressive Temporal-Spatial Enhanced TransFormer towards Video Object Detection
- 小目标检测
- 图像检测
- 目标发现
- 变化检测
5.Image/Video Retrieval(图像/视频检索)
- Text-Based Temporal Localization of Novel Events
- 跨域检索
- 图像检索
- Hierarchical Average Precision Training for Pertinent Image Retrieval
⭐ code - Adaptive Fine-Grained Sketch-Based Image Retrieval
⭐ code - A Sketch Is Worth a Thousand Words: Image Retrieval with Text and Sketch
⭐ code🏠 project - Granularity-aware Adaptation for Image Retrieval over Multiple Tasks
- Reliability-Aware Prediction via Uncertainty Learning for Person Image Retrieval
⭐ code - StyleBabel: Artistic Style Tagging and Captioning
- Deep Hash Distillation for Image Retrieval
⭐ code - Conditional Stroke Recovery for Fine-Grained Sketch-Based Image Retrieval
- Fine-Grained Fashion Representation Learning by Online Deep Clustering
- Hierarchical Average Precision Training for Pertinent Image Retrieval
- 视频检索
- LocVTP: Video-Text Pre-training for Temporal Localization
⭐ code - Dual-Stream Knowledge-Preserving Hashing for Unsupervised Video Retrieval
- Multi-Query Video Retrieval
⭐ code - Learning Audio-Video Modalities from Image Captions
🏠 project - Audio-Visual Mismatch-Aware Video Retrieval via Association and Adjustment
- ECLIPSE: Efficient Long-Range Video Retrieval Using Sight and Sound
⭐ code - Video Geo-localization(检索)
- LocVTP: Video-Text Pre-training for Temporal Localization
- 文本-视频检索
- 图像-文本检索
- 细粒度图像检索
- 视频时刻检索
- 视频-文本检索
- 最近邻搜索
4.Video/Image Captioning(视频/图像字幕)
- D3Net: A Unified Speaker-Listener Architecture for 3D Dense Captioning and Visual Grounding
🏠 project - 图像字幕
- GRIT: Faster and Better Image captioning Transformer Using Dual Visual Features
⭐ code - Explicit Image Caption Editing
⭐ code - ECCV Caption: Correcting False Negatives by Collecting Machine-and-Human-Verified Image-Caption Associations for MS-COCO
⭐ code - GEB+: A Benchmark for Generic Event Boundary Captioning, Grounding and Retrieval
⭐ code - Object-Centric Unsupervised Image Captioning
⭐ code - Unifying Event Detection and Captioning as Sequence Generation via Pre-training
⭐ code
- GRIT: Faster and Better Image captioning Transformer Using Dual Visual Features
3.Image Progress(图像处理)
- 图像质量评估
- 图像修补(image retouching)
- 图像变形(Image Warping)
- 图像恢复
- D2HNet: Joint Denoising and Deblurring with Hierarchical Network for Robust Night Image Restoration
⭐ code - Simple Baselines for Image Restoration
⭐ code - Improving Image Restoration by Revisiting Global Information Aggregation
⭐ code - Seeing through a Black Box: Toward High-Quality Terahertz Imaging via Subspace-and-Attention Guided Restoration
- JPEG Artifacts Removal via Contrastive Representation Learning
⭐ code - TAPE: Task-Agnostic Prior Embedding for Image Restoration
- Spectrum-Aware and Transferable Architecture Search for Hyperspectral Image Restoration
- DRCNet: Dynamic Image Restoration Contrastive Network
- D2HNet: Joint Denoising and Deblurring with Hierarchical Network for Robust Night Image Restoration
- 图像修复
- Learning Prior Feature and Attention Enhanced Image Inpainting
⭐ code - Inpainting at Modern Camera Resolution by Guided PatchMatch with Auto-Curation
⭐ code - High-Fidelity Image Inpainting with GAN Inversion
- Unbiased Multi-Modality Guidance for Image Inpainting
- Image Inpainting with Cascaded Modulation GAN and Object-Aware Training
⭐ code - Perceptual Artifacts Localization for Inpainting
⭐ code - Hourglass Attention Network for Image Inpainting
⭐ code - Diverse Image Inpainting with Normalizing Flow
- Learning Prior Feature and Attention Enhanced Image Inpainting
- 图像增强
- SepLUT: Separable Image-adaptive Lookup Tables for Real-time Image Enhancement
- Uncertainty Inspired Underwater Image Enhancement
⭐ code - Unsupervised Night Image Enhancement: When Layer Decomposition Meets Light-Effects Suppression
⭐ code - LEDNet: Joint Low-Light Enhancement and Deblurring in the Dark
⭐ code🏠 project - NEST: Neural Event Stack for Event-Based Image Enhancement
⭐ code - Seeing Far in the Dark with Patterned Flash
⭐ code - Local Color Distributions Prior for Image Enhancement
🏠 project - SemAug: Semantically Meaningful Image Augmentations for Object Detection through Language Grounding
- 图像和谐化
- 图像去卷积
- 去雾
- 去噪
- Deep Semantic Statistics Matching (D2SM) Denoising Network
⭐ code🏠 project - Optimizing Image Compression via Joint Learning with Denoising
⭐ code - Fast and High Quality Image Denoising via Malleable Convolution
🏠 project - Unidirectional Video Denoising by Mimicking Backward Recurrent Modules with Look-Ahead Forward Ones
⭐ code - TempFormer: Temporally Consistent Transformer for Video Denoising
- Deep Semantic Statistics Matching (D2SM) Denoising Network
- 去雪
- 去雨
- Not Just Streaks: Towards Ground Truth for Single Image Deraining
🏠 project - Blind Image Decomposition
⭐ code - ART-SS: An Adaptive Rejection Technique for Semi-Supervised Restoration for Adverse Weather-Affected Images
⭐ code - Rethinking Video Rain Streak Removal: A New Synthesis Model and a Deraining Network with Video Rain Prior
⭐ code
- Not Just Streaks: Towards Ground Truth for Single Image Deraining
- 去模糊
- Animation from Blur: Multi-modal Blur Decomposition with Motion Guidance
⭐ code - United Defocus Blur Detection and Deblurring via Adversarial Promoting Learning
⭐ code - Learning Degradation Representations for Image Deblurring
⭐ code - Learning Deep Non-Blind Image Deconvolution without Ground Truths
- DeMFI: Deep Joint Deblurring and Multi-Frame Interpolation with Flow-Guided Attentive Correlation and Recursive Boosting
⭐ code - Realistic Blur Synthesis for Learning Image Deblurring
🏠 project - Stripformer: Strip Transformer for Fast Image Deblurring
⭐ code - Event-Based Fusion for Motion Deblurring with Cross-Modal Attention
🏠 project - ERDN: Equivalent Receptive Field Deformable Network for Video Deblurring
⭐ code - Event-Guided Deblurring of Unknown Exposure Time Videos
🏠 project
- Animation from Blur: Multi-modal Blur Decomposition with Motion Guidance
- 去摩尔纹
- 去反射
- 去阴影
- 语义图像编辑
- 图像着色
- PalGAN: Image Colorization with Palette Generative Adversarial Networks
⭐ code - Semantic-Sparse Colorization Network for Deep Exemplar-Based Colorization
- CT2: Colorization Transformer via Color Tokens
- BigColor: Colorization Using a Generative Color Prior for Natural Images
- Colorization for In Situ Marine Plankton Images
- ColorFormer: Image Colorization via Color Memory Assisted Hybrid-Attention Transformer
⭐ code - Bridging the Domain Gap towards Generalization in Automatic Colorization
⭐ code - L-CoDer: Language-Based Colorization with Color-Object Decoupling Transformer
- PalGAN: Image Colorization with Palette Generative Adversarial Networks
- 图像裁剪
- 图像融合
- Rolling shutter(果冻效应)
2.Image Segmentation(图像分割)
- PseudoClick: Interactive Image Segmentation with Click Imitation
- GitNet: Geometric Prior-Based Transformation for Birds-Eye-View Segmentation
- Pixel-Wise Energy-Biased Abstention Learning for Anomaly Segmentation on Complex Urban Driving Scenes
😮 oral⭐ code - Highly Accurate Dichotomous Image Segmentation
🏠 project - Graph-Constrained Contrastive Regularization for Semi-Weakly Volumetric Segmentation
- Slim Scissors: Segmenting Thin Object from Synthetic Background
⭐ code - RankSeg: Adaptive Pixel Classification with Image Category Ranking for Segmentation
⭐ code - Unsupervised Segmentation in Real-World Images via Spelke Object Inference
- Learning Instance-Specific Adaptation for Cross-Domain Segmentation
🏠 project - Scaling Open-Vocabulary Image Segmentation with Image-Level Labels
- Uncertainty-Aware Multi-modal Learning via Cross-Modal Random Network Prediction
- Fashionformer: A Simple, Effective and Unified Baseline for Human Fashion Segmentation and Recognition
⭐ code - 语义分割
- Multi-Exit Semantic Segmentation Networks
- Language-Grounded Indoor 3D Semantic Segmentation in the Wild
🏠 project - Where in the World Is This Image? Transformer-Based Geo-Localization in the Wild
- Open-World Semantic Segmentation for LIDAR Point Clouds
⭐ code - SiamDoGe: Domain Generalizable Semantic Segmentation Using Siamese Network
⭐ code - TACS: Taxonomy Adaptive Cross-Domain Semantic Segmentation
⭐ code - Prototypical Contrast Adaptation for Domain Adaptive Semantic Segmentation
⭐ code - RBC: Rectifying the Biased Context in Continual Semantic Segmentation
⭐ code - ESS: Learning Event-Based Semantic Segmentation from Still Images
🏠 project - Learning Implicit Feature Alignment Function for Semantic Segmentation
⭐ code - Data Efficient 3D Learner via Knowledge Transferred from 2D Model
⭐ code - Multi-Scale and Cross-Scale Contrastive Learning for Semantic Segmentation
⭐ code - 2DPASS: 2D Priors Assisted Semantic Segmentation on LiDAR Point Clouds
⭐ code - Open-world Semantic Segmentation via Contrasting and Clustering Vision-Language Embedding
- ML-BPM: Multi-teacher Learning with Bidirectional Photometric Mixing for Open Compound Domain Adaptation in Semantic Segmentation
- Union-Set Multi-source Model Adaptation for Semantic Segmentation
- Continual Semantic Segmentation via Structure Preserving and Projected Feature Alignment
- Multi-Granularity Distillation Scheme Towards Lightweight Semi-Supervised Semantic Segmentation
⭐ code - LiDAL: Inter-Frame Uncertainty Based Active Learning for 3D LiDAR Semantic Segmentation
⭐ code - DODA: Data-Oriented Sim-to-Real Domain Adaptation for 3D Semantic Segmentation
⭐ code - SQN: Weakly-Supervised Semantic Segmentation of Large-Scale 3D Point Clouds
⭐ code - Learning Semantic Segmentation from Multiple Datasets with Label Shifts
🏠 project - CAR: Class-Aware Regularizations for Semantic Segmentation
⭐ code - Style-Hallucinated Dual Consistency Learning for Domain Generalized Semantic Segmentation
⭐ code - A Transformer-Based Decoder for Semantic Segmentation with Multi-level Context Mining
⭐ code - Extract Free Dense Labels from CLIP
- A Simple Baseline for Open-Vocabulary Semantic Segmentation with Pre-trained Vision-Language Model
⭐ code - UCTNet: Uncertainty-Aware Cross-Modal Transformer Network for Indoor RGB-D Semantic Segmentation
- CP2: Copy-Paste Contrastive Pretraining for Semantic Segmentation
⭐ code - 域适应语义分割
- DecoupleNet: Decoupled Network for Domain Adaptive Semantic Segmentation
⭐ code - HRDA: Context-Aware High-Resolution Domain-Adaptive Semantic Segmentation
⭐ code - D2ADA: Dynamic Density-Aware Active Domain Adaptation for Semantic Segmentation
- Online Domain Adaptation for Semantic Segmentation in Ever-Changing Conditions
⭐ code - Bi-directional Contrastive Learning for Domain Adaptive Semantic Segmentation
⭐ code🏠 project
- DecoupleNet: Decoupled Network for Domain Adaptive Semantic Segmentation
- 小样本语义分割
- 弱监督语义分割
- Adversarial Erasing Framework via Triplet with Gated Pyramid Pooling Layer for Weakly Supervised Semantic Segmentation
⭐ code - 无监督语义分割
- 实例分割
- OSFormer: One-Stage Camouflaged Instance Segmentation with Transformers
⭐ code - Geodesic-Former: A Geodesic-Guided Few-Shot 3D Point Cloud Instance Segmenter
⭐ code - Learning Regional Purity for Instance Segmentation on 3D Point Clouds
- 3D Instances as 1D Kernels
⭐ code - 2D Amodal Instance Segmentation Guided by 3D Shape Prior
- Box-supervised Instance Segmentation with Level Set Evolution
⭐ code - Long-tailed Instance Segmentation using Gumbel Optimized Loss
⭐ code - Active Pointly-Supervised Instance Segmentation
- Trapped in Texture Bias? A Large Scale Comparison of Deep Instance Segmentation
⭐ code - Learning with Free Object Segments for Long-Tailed Instance Segmentation
⭐ code - A Simple Single-Scale Vision Transformer for Object Detection and Instance Segmentation
⭐ code - Box2Mask: Weakly Supervised 3D Semantic Instance Segmentation Using Bounding Boxes
🏠 project - Learning to Detect Every Thing in an Open World
🏠 project
- OSFormer: One-Stage Camouflaged Instance Segmentation with Transformers
- 全景分割
- 运动分割
- 小样本分割
- Dense Cross-Query-and-Support Attention Weighted Mask Aggregation for Few-Shot Segmentation
⭐ code - Cost Aggregation with 4D Convolutional Swin Transformer for Few-Shot Segmentation
⭐ code🏠 project - Doubly Deformable Aggregation of Covariance Matrices for Few-shot Segmentation
⭐ code - Interclass Prototype Relation for Few-Shot Segmentation
- HM: Hybrid Masking for Few-Shot Segmentation
⭐ code - Adaptive Agent Transformer for Few-Shot Segmentation
- Dense Gaussian Processes for Few-Shot Segmentation
⭐ code
- Dense Cross-Query-and-Support Attention Weighted Mask Aggregation for Few-Shot Segmentation
- 抠图
- 3D分割
- 手分割
- 零件分割
- 场景分割
1.其它
- Generative Meta-Adversarial Network for Unseen Object Navigation
⭐ code - Housekeep: Tidying Virtual Households Using Commonsense Reasoning
🏠 project - OPD: Single-View 3D Openable Part Detection
🏠 project - Referring Object Manipulation of Natural Images with Conditional Classifier-Free Guidance
- Webly Supervised Concept Expansion for General Purpose Vision Models
🏠 project - PACS: A Dataset for Physical Audiovisual Commonsense Reasoning
- Fabric Material Recovery from Video Using Multi-Scale Geometric Auto-Encoder
- MegBA: A GPU-Based Distributed Library for Large-Scale Bundle Adjustment
⭐ code - Bandwidth-Aware Adaptive Codec for DNN Inference Offloading in IoT
- Efficient Deep Visual and Inertial Odometry with Adaptive Visual Modality Selection
⭐ code - Flow Graph to Video Grounding for Weakly-Supervised Multi-step Localization
⭐ code - Bayesian Tracking of Video Graphs Using Joint Kalman Smoothing and Registration
- SeqTR: A Simple Yet Universal Network for Visual Grounding
⭐ code - Spatial and Visual Perspective-Taking via View Rotation and Relation Reasoning for Embodied Reference Understanding
⭐ code - Switch-BERT: Learning to Model Multimodal Interactions by Switching Attention and Input
- Fine-Grained Visual Entailment
⭐ code - FindIt: Generalized Localization with Natural Language Queries
🏠 project - Decomposing the Tangent of Occluding Boundaries according to Curvatures and Torsions
- Real-Time Neural Character Rendering with Pose-Guided Multiplane Images
- TensoRF: Tensorial Radiance Fields
🏠 project - TAVA: Template-Free Animatable Volumetric Actors
⭐ code - Relative Pose from SIFT Features
⭐ code - Solution Space Analysis of Essential Matrix Based on Algebraic Error Minimization
- CoVisPose: Co-Visibility Pose Transformer for Wide-Baseline Relative Pose Estimation in 360° Indoor Panoramas
- Space-Partitioning RANSAC
⭐ code - Correspondence Reweighted Translation Averaging
- Beyond Periodicity: Towards a Unifying Framework for Activations in Coordinate-MLPs
⭐ code - GigaDepth: Learning Depth from Structured Light with Branching Neural Networks
- Visual Prompt Tuning
⭐ code - Cross-Modal Knowledge Transfer without Task-Relevant Source Data
- PACTran: PAC-Bayesian Metrics for Estimating the Transferability of Pretrained Models to Classification Tasks
- CYBORGS: Contrastively Bootstrapping Object Representations by Grounding in Segmentation
⭐ code - SALVe: Semantic Alignment Verification for Floorplan Reconstruction from Sparse Panoramas
- MVP: Multimodality-Guided Visual Pre-training
- Self-Filtering: A Noise-Aware Sample Selection for Label Noise with Confidence Penalization
- Learning to Learn with Smooth Regularization
⭐ code - Ensemble Learning Priors Driven Deep Unfolding for Scalable Video Snapshot Compressive Imaging
⭐ code - Approximate Discrete Optimal Transport Plan with Auxiliary Measure Method
- A Comparative Study of Graph Matching Algorithms in Computer Vision
- Semidefinite Relaxations of Truncated Least-Squares in Robust Rotation Search: Tight or Not
- Dynamic Metric Learning with Cross-Level Concept Distillation
⭐ code - MENet: A Memory-Based Network with Dual-Branch for Efficient Event Stream Processing
- Improving Robustness by Enhancing Weak Subnets
- Learning from Multiple Annotator Noisy Labels via Sample-Wise Label Fusion
⭐ code - Unbiased Manifold Augmentation for Coarse Class Subdivision
⭐ code - OccamNets: Mitigating Dataset Bias by Favoring Simpler Hypotheses
⭐ code - ERA: Enhanced Rational Activations
⭐ code - Active Label Correction Using Robust Parameter Update and Entropy Propagation
- Revisiting Batch Norm Initialization
⭐ code - Differentiable Rendering for Synthetic Aperture Radar Imagery
- Batch-efficient EigenDecomposition for Small and Medium Matrices
⭐ code - Accelerating Score-based Generative Models with Preconditioned Diffusion Sampling
- Improving Covariance Conditioning of the SVD Meta-layer by Orthogonality
⭐ code - Contrastive Deep Supervision
⭐ code - Organic Priors in Non-Rigid Structure from Motion
😮 oral - Bootstrapped Masked Autoencoders for Vision BERT Pretraining
⭐ code - Lipschitz Continuity Retained Binary Neural Network
⭐ code - NeFSAC: Neurally Filtered Minimal Samples
⭐ code - Towards Understanding The Semidefinite Relaxations of Truncated Least-Squares in Robust Rotation Search
- Latency-Aware Collaborative Perception
⭐ code - MHR-Net: Multiple-Hypothesis Reconstruction of Non-Rigid Shapes from 2D Views
- SelectionConv: Convolutional Neural Networks for Non-rectilinear Image Data
- Overcoming Shortcut Learning in a Target Domain by Generalizing Basic Visual Factors from a Source Domain
⭐ code - Discrete-Constrained Regression for Local Counting Models
- Streamable Neural Fields
⭐ code - Contributions of Shape, Texture, and Color in Visual Recognition
⭐ code - Single Frame Atmospheric Turbulence Mitigation: A Benchmark Study and A New Physics-Inspired Transformer Model
- Latent Discriminant deterministic Uncertainty
⭐ code - SPIN: An Empirical Evaluation on Sharing Parameters of Isotropic Networks
⭐ code - UFO: Unified Feature Optimization
⭐ code - POP: Mining POtential Performance of new fashion products via webly cross-modal query expansion
⭐ code - My View is the Best View: Procedure Learning from Egocentric Videos
⭐ code🏠 project - Equivariance and Invariance Inductive Bias for Learning from Insufficient Data
⭐ code - Contrastive Monotonic Pixel-Level Modulation
😮 oral⭐ code - Neural-Sim: Learning to Generate Training Data with NeRF
⭐ code - Learning Hierarchy Aware Features for Reducing Mistake Severity
⭐ code - Translating a Visual LEGO Manual to a Machine-Executable Plan
⭐ code🏠 project - Hardly Perceptible Trojan Attack against Neural Networks with Bit Flips
⭐ code - LGV: Boosting Adversarial Example Transferability from Large Geometric Vicinity
- MonteBoxFinder: Detecting and Filtering Primitives to Fit a Noisy Point Cloud
⭐ code🏠 project - Neural Strands: Learning Hair Geometry and Appearance from Multi-View Images
🏠 project - A Repulsive Force Unit for Garment Collision Handling in Neural Networks
🏠 project - Minimal Neural Atlas: Parameterizing Complex Surfaces with Minimal Charts and Distortion
⭐ code - Revisiting the Critical Factors of Augmentation-Invariant Representation Learning
⭐ code - Fast Two-step Blind Optical Aberration Correction
⭐ code - Transformers as Meta-Learners for Implicit Neural Representations
⭐ code🏠 project - Neighborhood Collective Estimation for Noisy Label Identification and Correction
⭐ code - Rethinking Robust Representation Learning Under Fine-grained Noisy Faces
- Contrast-Phys: Unsupervised Video-based Remote Physiological Measurement via Spatiotemporal Contrast
⭐ code - RelPose: Predicting Probabilistic Relative Rotation for Single Objects in the Wild
🏠 project - PRIF: Primary Ray-based Implicit Function
🏠 project - Context-Aware Streaming Perception in Dynamic Environments
⭐ code - AdaBin: Improving Binary Neural Networks with Adaptive Binary Sets
- TRoVE: Transforming Road Scene Datasets into Photorealistic Virtual Environments
⭐ code - L3: Accelerator-Friendly Lossless Image Format for High-Resolution, High-Throughput DNN Training
⭐ code - GCISG: Guided Causal Invariant Learning for Improved Syn-to-real Generalization
- Learning Continuous Implicit Representation for Near-Periodic Patterns
⭐ code🏠 project - A Deep Moving-camera Background Model
⭐ code - NashAE: Disentangling Representations through Adversarial Covariance Minimization
⭐ code - FusionVAE: A Deep Hierarchical Variational Autoencoder for RGB Image Fusion
- Diversified Dynamic Routing for Vision Tasks
- Fast-ParC: Position Aware Global Kernel for ConvNets and ViTs
- Improving the Reliability for Confidence Estimation
- Attaining Class-level Forgetting in Pretrained Model using Few Samples
- Overexposure Mask Fusion: Generalizable Reverse ISP Multi-Step Refinement
- Photo-realistic Neural Domain Randomization
- Editable indoor lighting estimation
🏠 project - A Kendall Shape Space Approach to 3D Shape Estimation from 2D Landmarks
- DeepShadow: Neural Shape from Shadow
⭐ code🏠 project - Intrinsic Neural Fields: Learning Functions on Manifolds
- Unsupervised Pose-Aware Part Decomposition for Man-Made Articulated Objects
- MeshUDF: Fast and Differentiable Meshing of Unsigned Distance Field Networks
- S2N: Suppression-Strengthen Network for Event-Based Recognition under Variant Illuminations
⭐ code - A Spectral View of Randomized Smoothing under Common Corruptions: Benchmarking and Improving Certified Robustness
- Transform Your Smartphone into a DSLR Camera: Learning the ISP in the Wild
⭐ code - Data Association between Event Streams and Intensity Frames under Diverse Baselines
- Instance Contour Adjustment via Structure-Driven CNN
- 3D Scene Inference from Transient Histograms
- Neural Space-Filling Curves
🏠 project - LWGNet – Learned Wirtinger Gradients for Fourier Ptychographic Phase Retrieval
⭐ code - PANDORA: Polarization-Aided Neural Decomposition of Radiance
- Benchmarking Omni-Vision Representation through the Lens of Visual Realms
🏠 project - When Deep Classifiers Agree: Analyzing Correlations between Learning Order and Image Statistics
- MUGEN: A Playground for Video-Audio-Text Multimodal Understanding and GENeration
🏠 project - The Missing Link: Finding Label Relations across Datasets
- Domain Adaptive Hand Keypoint and Pixel Localization in the Wild
🏠 project - DFNet: Enhance Absolute Pose Regression with Direct Feature Matching
🏠 project - GTCaR: Graph Transformer for Camera Re-Localization
- Is Geometry Enough for Matching in Visual Localization?
⭐ code - Reducing Information Loss for Spiking Neural Networks
- Deep Partial Updating: Towards Communication Efficient Updating for On-Device Inference
- SP-Net: Slowly Progressing Dynamic Inference Networks
- Meta-GF: Training Dynamic-Depth Neural Networks Harmoniously
⭐ code - You Already Have It: A Generator-Free Low-Precision DNN Training Framework Using Stochastic Rounding
- Real Spike: Learning Real-Valued Spikes for Spiking Neural Networks
- Exploring Lottery Ticket Hypothesis in Spiking Neural Networks
⭐ code - On the Angular Update and Hyperparameter Tuning of a Scale-Invariant Network
- LANA: Latency Aware Network Acceleration
- Understanding the Dynamics of DNNs Using Graph Modularity
⭐ code - MIME: Minority Inclusion for Majority Group Enhancement of AI Performance
🏠 project - Trust, but Verify: Using Self-Supervised Probing to Improve Trustworthiness
⭐ code - Learning to Censor by Noisy Sampling
- Anti-Neuron Watermarking: Protecting Personal Data against Unauthorized Neural Networks
- Recover Fair Deep Classification Models via Altering Pre-trained Structure
- Decouple-and-Sample: Protecting Sensitive Information in Task Agnostic Data Release
⭐ code - Latent Space Smoothing for Individually Fair Representations
- Parameterized Temperature Scaling for Boosting the Expressive Power in Post-Hoc Uncertainty Calibration
⭐ code - Image-Based CLIP-Guided Essence Transfer
⭐ code - End-to-End Visual Editing with a Generatively Pre-trained Artist
🏠 project - Sobolev Training for Implicit Neural Representations with Approximated Image Derivatives
⭐ code - L-Tracing: Fast Light Visibility Estimation on Neural Surfaces by Sphere Tracing
- Temporal-MPI: Enabling Multi-Plane Images for Dynamic Scene Modelling via Temporal Basis Learning
- 3D-Aware Semantic-Guided Generative Model for Human Synthesis
🏠 project - Unified Implicit Neural Stylization
🏠 project - Deep Portrait Delighting
🏠 project - Free-Viewpoint RGB-D Human Performance Capture and Rendering
🏠 project - Multiview Regenerative Morphing with Dual Flows
⭐ code - NeRF for Outdoor Scene Relighting
🏠 project - Intelli-Paint: Towards Developing More Human-Intelligible Painting Agents
- Motion Transformer for Unsupervised Image Animation
- NÜWA: Visual Synthesis Pre-training for Neural visUal World creAtion
- Implicit Neural Representations for Variable Length Human Motion Generation
⭐ code - Learning Object Placement via Dual-Path Graph Completion
- Compositional Visual Generation with Composable Diffusion Models
🏠 project - Spatial-Frequency Domain Information Integration for Pan-Sharpening
- ReCoNet: Recurrent Correction Network for Fast and Efficient Multi-Modality Image Fusion
⭐ code - Rethinking Generic Camera Models for Deep Single Image Camera Calibration to Recover Rotation and Fisheye Distortion
- Modeling Mask Uncertainty in Hyperspectral Image Reconstruction
⭐ code - Deep Fourier-Based Exposure Correction Network with Spatial-Frequency Interaction
⭐ code - Towards Real-World HDRTV Reconstruction: A Data Synthesis-Based Approach
- Attention-Aware Learning for Hyperparameter Prediction in Image Processing Pipelines
- Memory-Augmented Model-Driven Network for Pansharpening
⭐ code - All You Need Is RAW: Defending against Adversarial Attacks with Camera Image Pipelines
- GRIT-VLP: Grouped Mini-Batch Sampling for Efficient Vision and Language Pre-training
⭐ code - Transformer with Implicit Edges for Particle-Based Physics Simulation
⭐ code - LA3: Efficient Label-Aware AutoAugment
- BA-Net: Bridge Attention for Deep Convolutional Neural Networks
⭐ code - SAU: Smooth Activation Function Using Convolution with Approximate Identities
- Almost-Orthogonal Layers for Efficient General-Purpose Lipschitz Networks
⭐ code - DLME: Deep Local-Flatness Manifold Embedding
- Accurate Detection of Proteins in Cryo-Electron Tomograms from Sparse Labels
- Social ODE: Multi-agent Trajectory Forecasting with Neural Ordinary Differential Equations
- Entropy-Driven Sampling and Training Scheme for Conditional Diffusion Generation
⭐ code - Geometry-Guided Progressive NeRF for Generalizable and Efficient Neural Human Rendering
- Controllable Shadow Generation Using Pixel Height Maps
- Subspace Diffusion Generative Models
⭐ code - MINER: Multiscale Implicit Neural Representation
🏠 project - An Embedded Feature Whitening Approach to Deep Neural Network Optimization
⭐ code - Q-FW: A Hybrid Classical-Quantum Frank-Wolfe for Quadratic Binary Optimization
- Scalable Learning to Optimize: A Learned Optimizer Can Train Big Models
⭐ code - QISTA-ImageNet: A Deep Compressive Image Sensing Framework Solving ℓq-Norm Optimization Problem
- Rethinking Confidence Calibration for Failure Prediction
⭐ code - PRIME: A Few Primitives Can Boost Robustness to Common Corruptions
⭐ code - Learning with Noisy Labels by Efficient Transition Matrix Estimation to Combat Label Miscorrection
- Learning to Drive by Watching YouTube Videos: Action-Conditioned Contrastive Policy Pretraining
🏠 project - Balancing between Forgetting and Acquisition in Incremental Subpopulation Learning
⭐ code - Sound Localization by Self-Supervised Time Delay Estimation
🏠 project - X-Learner: Learning Cross Sources and Tasks for Universal Visual Representation
- A Contrastive Objective for Learning Disentangled Representations
⭐ code - A Gyrovector Space Approach for Symmetric Positive Semi-Definite Matrix Learning
- Trading Positional Complexity vs Deepness in Coordinate Networks
🏠 project - TO-Scene: A Large-Scale Dataset for Understanding 3D Tabletop Scenes
⭐ code - Primitive-Based Shape Abstraction via Nonparametric Bayesian Inference
- S2Net: Stochastic Sequential Pointcloud Forecasting
- LaLaLoc++: Global Floor Plan Comprehension for Layout Localisation in Unvisited Environments
- Variance-Aware Weight Initialization for Point Convolutional Neural Networks
- AdaAfford: Learning to Adapt Manipulation Affordance for 3D Articulated Objects via Few-Shot Interactions
🏠 project - human relighting
- 奇异值检测(Novelty Detection)
- Multi-attribute Learning
- 偏见识别
- 新类别发现(Novel Class Discovery)
- 密集预测
- 变分自动编码器(VAEs)
- 开集识别
- 草图
- 聚类
- Visual Grounding
- 互动结构理解
- HDR全景图生成
- 手语识别
- 读唇术
- BNN
- 图像取证
- 图像对齐
- visual hand pressure estimation
- 光亮估计
- 室内场景照明编辑
- HDR
- 关键点定位
- XAI
- STEEX: Steering Counterfactual Explanations with Semantics
⭐ code - Making Heads or Tails: Towards Semantically Consistent Visual Counterfactuals
⭐ code - HIVE: Evaluating the Human Interpretability of Visual Explanations
🏠 project - Shap-CAM: Visual Explanations for Convolutional Neural Networks Based on Shapley Value
- STEEX: Steering Counterfactual Explanations with Semantics
- 掌纹识别
- 视线估计
- 运动迁移
- 远程呼吸监测
- 图像-图形生成