CVPR-2023-Papers
❣❣❣ CVPR 2023 论文分类整理已完成
📢📢📢获奖论文
🏆Best Paper
- Planning-oriented Autonomous Driving
🏠project - Visual Programming: Compositional visual reasoning without training
🏆Best student Paper
🏆Honorable Mention
🏆Honorable Mention(Student)
↘️ CV-Surveys施工中~~~~~~~~~~
历年综述论文分类汇总戳这里2023 年论文分类汇总戳这里
2022 年论文分类汇总戳这里
2021年论文分类汇总戳这里
2020 年论文分类汇总戳这里
目录
80.计算机图形学
- Learning Anchor Transformations for 3D Garment Animation
⭐code - Trace and Pace: Controllable Pedestrian Animation via Guided Trajectory Diffusion
⭐code - CloSET: Modeling Clothed Humans on Continuous Surface with Explicit Template Decomposition
🏠project - FLEX: Full-Body Grasping Without Full-Body Grasps
🏠project
79.thermal imaging technology(热敏成像技术)
78.Image/Video Editing(图像/视频编辑)
- PREIM3D: 3D Consistent Precise Image Attribute Editing from a Single Image
🏠project - 文本驱动的视频编辑
- Image Editing(图像编辑)
- CoralStyleCLIP: Co-optimized Region and Layer Selection for Image Editing
- SIEDOB: Semantic Image Editing by Disentangling Object and Background
- NULL-Text Inversion for Editing Real Images Using Guided Diffusion Models
- InstructPix2Pix: Learning To Follow Image Editing Instructions
🏠project - Local 3D Editing via 3D Distillation of CLIP Knowledge
- Deep Curvilinear Editing: Commutative and Nonlinear Image Manipulation for Pretrained Deep Generative Model
- Imagic: Text-Based Real Image Editing With Diffusion Models
- 基于样本的图像编辑
77.sketch(草图)
- Photo Pre-Training, but for Sketch
⭐code - Restoration of Hand-Drawn Architectural Drawings Using Latent Space Mapping With Degradation Generator
- SECAD-Net: Self-Supervised CAD Reconstruction by Learning Sketch-Extrude Operations
⭐code
76.IP protection(知识产权保护)
- Model Barrier: A Compact Un-Transferable Isolation Domain for Model Intellectual Property Protection
- Effective Ambiguity Attack Against Passport-Based DNN Intellectual Property Protection Schemes Through Fully Connected Layer Substitution
75.Semantic Scene Completion(语义场景补全)
- Semantic Scene Completion With Cleaner Self
- VoxFormer: Sparse Voxel Transformer for Camera-Based 3D Semantic Scene Completion
⭐code
74.Machine Learning(机器学习)
- Cooperation or Competition: Avoiding Player Domination for Multi-Target Robustness via Adaptive Budgets
- Multi-Agent Automated Machine Learning
- Towards Better Decision Forests: Forest Alternating Optimization
- ERM-KTP: Knowledge-Level Machine Unlearning via Knowledge Transfer
⭐code - A Whac-a-Mole Dilemma: Shortcuts Come in Multiples Where Mitigating One Amplifies Others
⭐code - 新类别发现
- 迁移学习
73.Neural Radiance Fields(神经辐射场)
- Ref-NPR: Reference-Based Non-Photorealistic Radiance Fields for Controllable Scene Stylization
- Discriminating Known From Unknown Objects via Structure-Enhanced Recurrent Variational AutoEncoder
- Occlusion-Free Scene Recovery via Neural Radiance Fields
- Grid-guided Neural Radiance Fields for Large Urban Scenes
🏠project - NeRFLight: Fast and Light Neural Radiance Fields using a Shared Feature Grid
- GazeNeRF: 3D-Aware Gaze Redirection With Neural Radiance Fields
⭐code - SPARF: Neural Radiance Fields from Sparse and Noisy Poses
⭐code - Masked Wavelet Representation for Compact Neural Radiance Fields
⭐code - MobileNeRF: Exploiting the Polygon Rasterization Pipeline for Efficient Neural Field Rendering on Mobile Architectures
⭐code - AligNeRF: High-Fidelity Neural Radiance Fields via Alignment-Aware Training
🏠project - JacobiNeRF: NeRF Shaping With Mutual Information Gradients
- Robust Dynamic Radiance Fields
🏠project - Exact-NeRF: An Exploration of a Precise Volumetric Parameterization for Neural Radiance Fields
- PaletteNeRF: Palette-Based Appearance Editing of Neural Radiance Fields
- EditableNeRF: Editing Topologically Varying Neural Radiance Fields by Key Points
🏠project - SinGRAF: Learning a 3D Generative Radiance Field for a Single Scene
🏠project - ShadowNeuS: Neural SDF Reconstruction by Shadow Ray Supervision
⭐code - Flow supervision for Deformable NeRF
- Local-to-Global Registration for Bundle-Adjusting Neural Radiance Fields
🏠project - EventNeRF: Neural Radiance Fields From a Single Colour Event Camera
🏠project - SeaThru-NeRF: Neural Radiance Fields in Scattering Media
- SteerNeRF: Accelerating NeRF Rendering via Smooth Viewpoint Trajectory
- Complementary Intrinsics From Neural Radiance Fields and CNNs for Outdoor Scene Relighting
- Point2Pix: Photo-Realistic Point Cloud Rendering via Neural Radiance Fields
- Removing Objects From Neural Radiance Fields
- Grid-guided Neural Radiance Fields for Large Urban Scenes
⭐code - GM-NeRF: Learning Generalizable Model-based Neural Radiance Fields from Multi-view Images
- HandNeRF: Neural Radiance Fields for Animatable Interacting Hands
- NeRF-DS: Neural Radiance Fields for Dynamic Specular Objects
⭐code - JAWS: Just A Wild Shot for Cinematic Transfer in Neural Radiance Fields
🏠project - Multi-Space Neural Radiance Fields
⭐code - DBARF: Deep Bundle-Adjusting Generalizable Neural Radiance Fields
⭐code - StyleRF: Zero-shot 3D Style Transfer of Neural Radiance Fields
🏠project - Temporal Interpolation Is All You Need for Dynamic Neural Radiance Fields
🏠project - SPIn-NeRF: Multiview Segmentation and Perceptual Inpainting With Neural Radiance Fields
- F2-NeRF: Fast Neural Radiance Field Training With Free Camera Trajectories
🏠project - Clothed Human Performance Capture with a Double-layer Neural Radiance Fields
- DiffusioNeRF: Regularizing Neural Radiance Fields with Denoising Diffusion Models
- 去模糊
72.open-set recognition(开集识别)
71.visual reasoning(视觉推理)
- Visual Programming: Compositional visual reasoning without training
🏆Best Paper - Abstract Visual Reasoning: An Algebraic Approach for Solving Raven's Progressive Matrices
⭐code - Super-CLEVR: A Virtual Benchmark To Diagnose Domain Robustness in Visual Reasoning
⭐code - Unicode Analogies: An Anti-Objectivist Visual Reasoning Challenge
70.Image Forgery Detection
- Hierarchical Fine-Grained Image Forgery Detection and Localization
⭐code - Detecting and Grounding Multi-Modal Media Manipulation
⭐code
⭐code虚假信息检测 - Evading DeepFake Detectors via Adversarial Statistical Consistency
- Edge-Aware Regional Message Passing Controller for Image Forgery Localization
- TruFor: Leveraging all-round clues for trustworthy image forgery detection and localization
🏠project - Towards Universal Fake Image Detectors That Generalize Across Generative Models
- Deepfake Detection
69.Reinforcement learning(强化学习)
- PIRLNav: Pretraining with Imitation and RL Finetuning for ObjectNav
- Local-Guided Global: Paired Similarity Representation for Visual Reinforcement Learning
- Fusing Pre-Trained Language Models With Multimodal Prompts Through Reinforcement Learning
⭐code - Galactic: Scaling End-to-End Reinforcement Learning for Rearrangement at 100k Steps-per-Second
⭐code - Frustratingly Easy Regularization on Representation Can Boost Deep Reinforcement Learning
🏠project
68.Lifelong Learning(终身学习)
67.Active Learning(主动学习)
- Re-thinking Federated Active Learning based on Inter-class Diversity
- Box-Level Active Detection
⭐code - Are Binary Annotations Sufficient? Video Moment Retrieval via Hierarchical Uncertainty-Based Active Learning
⭐code - Re-Thinking Federated Active Learning Based on Inter-Class Diversity
66.Clustering(聚类)
- DivClust: Controlling Diversity in Deep Clustering
- MVC
- On the Effects of Self-supervision and Contrastive Alignment in Deep Multi-view Clustering
⭐code - GCFAgg: Global and Cross-View Feature Aggregation for Multi-View Clustering
- Sample-Level Multi-View Graph Clustering
- On the Effects of Self-Supervision and Contrastive Alignment in Deep Multi-View Clustering
⭐code - Deep Incomplete Multi-View Clustering With Cross-View Partial Sample and Prototype Alignment
- Highly Confident Local Structure Based Consensus Graph Learning for Incomplete Multi-View Clustering
- On the Effects of Self-supervision and Contrastive Alignment in Deep Multi-view Clustering
65.Scene flow estimation(场景流估计)
- Hidden Gems: 4D Radar Scene Flow Learning Using Cross-Modal Supervision
⭐code - Self-Supervised 3D Scene Flow Estimation Guided by Superpoints
- Unsupervised Cumulative Domain Adaptation for Foggy Scene Optical Flow
64.Motion Retargeting(动作重定向)
63.edge detection(边缘检测)
62.Object Counting(物体计数)
61.Object Re-identification(物体重识别)
- MSINet: Twins Contrastive Search of Multi-Scale Interaction for Object ReID
⭐code - Large-scale Training Data Search for Object Re-identification
⭐code - Adaptive Sparse Pairwise Loss for Object Re-Identification
⭐code
60.Industrial Anomaly Detection(工业缺陷检测)
- 缺陷定位
- 工业异常检测
- 异常分割
59.Image\Video Compression(图像视频压缩)
- Backdoor Attacks Against Deep Image Compression via Adaptive Frequency Trigger
- Context-Based Trit-Plane Coding for Progressive Image Compression
⭐code - Learned Image Compression with Mixed Transformer-CNN Architectures
⭐code - LVQAC: Lattice Vector Quantization Coupled with Spatially Adaptive Companding for Efficient Learned Image Compression
- Optimization-Inspired Cross-Attention Transformer for Compressive Sensing
⭐code - Multi-Realism Image Compression With a Conditional Generator
- AccelIR: Task-aware Image Compression for Accelerating Neural Restoration
- 视频压缩
- Towards Scalable Neural Representation for Diverse Videos
- HNeRV: A Hybrid Neural Representation for Videos
⭐code
⭐code - Video Compression With Entropy-Constrained Neural Representations
- Complexity-Guided Slimmable Decoder for Efficient Deep Video Compression
- EfficientSCI: Densely Connected Network with Space-time Factorization for Large-scale Video Snapshot Compressive Imaging
⭐code - MMVC: Learned Multi-Mode Video Compression with Block-based Prediction Mode Selection and Density-Adaptive Entropy Coding
- Neural Video Compression With Diverse Contexts
⭐code ( Motion Information Propagation for Neural Video Compression - Hierarchical B-Frame Video Coding Using Two-Layer CANF Without Motion Coding
- 矢量量化
58.Neural rendering(神经渲染)
- TMO: Textured Mesh Acquisition of Objects With a Mobile Device by Using Differentiable Rendering
- Tensor4D: Efficient Neural 4D Decomposition for High-Fidelity Dynamic Reconstruction and Rendering
- Hybrid Neural Rendering for Large-Scale Scenes with Motion Blur
🏠project - NeUDF: Leaning Neural Unsigned Distance Fields With Volume Rendering
- DiffRF: Rendering-Guided 3D Radiance Field Diffusion
🏠project - Unsupervised Continual Semantic Adaptation Through Neural Rendering
- Neural Fields Meet Explicit Geometric Representations for Inverse Rendering of Urban Scenes
🏠project - UV Volumes for Real-Time Rendering of Editable Free-View Human Performance
🏠project - Inverse Rendering of Translucent Objects Using Physical and Neural Renderers
- ORCa: Glossy Objects As Radiance-Field Cameras
🏠project - MAIR: Multi-View Attention Inverse Rendering With 3D Spatially-Varying Lighting Estimation
🏠project - FlexNeRF: Photorealistic Free-viewpoint Rendering of Moving Humans from Sparse Views
🏠project - Learning To Render Novel Views From Wide-Baseline Stereo Pairs
🏠project - NeRFLiX: High-Quality Neural View Synthesis by Learning a Degradation-Driven Inter-viewpoint MiXer
🏠project - FreeNeRF: Improving Few-shot Neural Rendering with Free Frequency Regularization
🏠project - Local Implicit Ray Function for Generalizable Radiance Field Representation
⭐code - FitMe: Deep Photorealistic 3D Morphable Model Avatars
⭐code - Pointersect: Neural Rendering with Cloud-Ray Intersection
- Inverse Rendering of Translucent Objects using Physical and Neural Renderers
- Semantic Ray: Learning a Generalizable Semantic Field with Cross-Reprojection Attention
⭐code - ABLE-NeRF: Attention-Based Rendering with Learnable Embeddings for Neural Radiance Field
- WildLight: In-the-wild Inverse Rendering with a Flashlight
⭐code - FlexNeRF: Photorealistic Free-viewpoint Rendering of Moving Humans from Sparse Views
⭐code - NeFII: Inverse Rendering for Reflectance Decomposition with Near-Field Indirect Illumination
- MonoHuman: Animatable Human Neural Field from Monocular Video
⭐code - Neural Residual Radiance Fields for Streamably Free-Viewpoint Videos
⭐code - PlenVDB: Memory Efficient VDB-Based Radiance Fields for Fast Training and Rendering
在 iPhone12 手机上达到了对于输出 1280x720 分辨率的画面每秒 30 帧的速率。 - NeFII: Inverse Rendering for Reflectance Decomposition With Near-Field Indirect Illumination
57.Gaze Estimation(视线估计)
- NeRF-Gaze: A Head-Eye Redirection Parametric Model for Gaze Estimation
- Source-free Adaptive Gaze Estimation by Uncertainty Reduction
⭐code - ReDirTrans: Latent-to-Latent Translation for Gaze and Head Redirection
56.Sound + Vision(声音与视觉)
- Conditional Generation of Audio from Video via Foley Analogies
⭐code - Vision Transformers Are Parameter-Efficient Audio-Visual Learners
- 扬声器检测
- 视听语音识别
- Watch or Listen: Robust Audio-Visual Speech Recognition with Visual Corruption Modeling and Reliability Scoring
⭐code - Collecting Cross-Modal Presence-Absence Evidence for Weakly-Supervised Audio-Visual Event Perception
⭐code - AVFormer: Injecting Vision into Frozen Speech Models for Zero-Shot AV-ASR
- SynthVSR: Scaling Up Visual Speech Recognition With Synthetic Supervision
- Watch or Listen: Robust Audio-Visual Speech Recognition with Visual Corruption Modeling and Reliability Scoring
- 视听定位
- 音频源分离
- 声音合成
- 电影音频描述
- 从声音中生成场景图像
- 视听异常检测
- 电影配音
- 舞蹈生成
- 视频显著性预测
- 音频驱动的肖像动画
- 听觉定位
55.Novel View Synthesis(视图合成)
- Neural Pixel Composition for 3D-4D View Synthesis From Multi-Views
- Consistent View Synthesis With Pose-Guided Diffusion Models
- MixNeRF: Modeling a Ray with Mixture Density for Novel View Synthesis from Sparse Inputs
🏠project - NeRF in the Palm of Your Hand: Corrective Augmentation for Robotics via Novel-View Synthesis
🏠project - NeRDi: Single-View NeRF Synthesis With Language-Guided Diffusion As General Image Priors
- Novel-View Acoustic Synthesis
🏠project - Cross-Guided Optimization of Radiance Fields With Multi-View Image Super-Resolution for High-Resolution Novel View Synthesis
- Frequency-Modulated Point Cloud Rendering with Easy Editing
⭐code - Learning Neural Duplex Radiance Fields for Real-Time View Synthesis
🏠project - ReLight My NeRF: A Dataset for Novel View Synthesis and Relighting of Real World Objects
⭐code - Balanced Spherical Grid for Egocentric View Synthesis
- Progressively Optimized Local Radiance Fields for Robust View Synthesis
⭐code - F$^{2}$-NeRF: Fast Neural Radiance Field Training with Free Camera Trajectories
⭐code - Enhanced Stable View Synthesis
- Consistent View Synthesis with Pose-Guided Diffusion Models
⭐code - Learning to Render Novel Views from Wide-Baseline Stereo Pairs
⭐code - Painting 3D Nature in 2D: View Synthesis of Natural Scenes From a Single Semantic Mask
🏠project - NoPe-NeRF: Optimising Neural Radiance Field With No Pose Prior
🏠project - Multiscale Tensor Decomposition and Rendering Equation Encoding for View Synthesis
⭐code - Efficient View Synthesis and 3D-Based Multi-Frame Denoising With Multiplane Feature Representations
- NeRFVS: Neural Radiance Fields for Free View Synthesis via Geometry Scaffolds
- DINER: Depth-aware Image-based NEural Radiance fields
🏠project - RefSR-NeRF: Towards High Fidelity and Super Resolution View Synthesis
⭐code - VDN-NeRF: Resolving Shape-Radiance Ambiguity via View-Dependence Normalization
⭐code - DynIBaR: Neural Dynamic Image-Based Rendering
🏠project
🏆Honorable Mention - Common Pets in 3D: Dynamic New-View Synthesis of Real-Life Deformable Categories
54.Benchmark/Dataset(基准/数据集)
- Joint HDR Denoising and Fusion: A Real-World Mobile HDR Image Dataset
- A New Dataset Based on Images Taken by Blind People for Testing the Robustness of Image Classification Models Trained for ImageNet Categories
- Benchmarking Self-Supervised Learning on Diverse Pathology Datasets
- Multispectral Video Semantic Segmentation: A Benchmark Dataset and Baseline
- Towards Artistic Image Aesthetics Assessment: A Large-Scale Dataset and a New Method
- ScaleDet: A Scalable Multi-Dataset Object Detector
- JRDB-Pose: A Large-Scale Dataset for Multi-Person Pose Estimation and Tracking
🌻dataset - Architecture, Dataset and Model-Scale Agnostic Data-Free Meta-Learning
- DF-Platter: Multi-Face Heterogeneous Deepfake Dataset
🌻dataset - HandsOff: Labeled Dataset Generation With No Additional Human Annotations
🌻dataset - M6Doc: A Large-Scale Multi-Format, Multi-Type, Multi-Layout, Multi-Language, Multi-Annotation Category Dataset for Modern Document Layout Analysis
⭐code - ShapeTalk: A Language Dataset and Framework for 3D Shape Edits and Deformations
🌻dataset - NewsNet: A Novel Dataset for Hierarchical Temporal Segmentation
⭐code - MISC210K: A Large-Scale Dataset for Multi-Instance Semantic Correspondence
⭐code - StarCraftImage: A Dataset for Prototyping Spatial Reasoning Methods for Multi-Agent Environments
🏠project - Habitat-Matterport 3D Semantics Dataset
- CNVid-3.5M: Build, Filter, and Pre-Train the Large-Scale Public Chinese Video-Text Dataset
⭐code
大规模公共中文视频文本数据集 - FLAG3D: A 3D Fitness Activity Dataset With Language Instruction
🏠project - Multi-Label Compound Expression Recognition: C-EXPR Database & Network
- ARCTIC: A Dataset for Dexterous Bimanual Hand-Object Manipulation
🏠project
手物体操作的数据集 - xFBD: Focused Building Damage Dataset and Analysis
建筑物损坏数据集 - Spring: A High-Resolution High-Detail Dataset and Benchmark for Scene Flow, Optical Flow and Stereo
🌻dataset - Human-Art: A Versatile Human-Centric Dataset Bridging Natural and Artificial Scenes
- HairStep: Transfer Synthetic to Real Using Strand and Depth Maps for Single-View 3D Hair Modeling
🌻dataset - CUDA: Convolution-based Unlearnable Datasets
🌻dataset - MVImgNet: A Large-scale Dataset of Multi-view Images
🌻dataset - V2V4Real: A Real-world Large-scale Dataset for Vehicle-to-Vehicle Cooperative Perception
🌻dataset
Vehicle-to-Vehicle(V2V)感知 - Polynomial Implicit Neural Representations For Large Diverse Datasets
🌻dataset - MaskCon: Masked Contrastive Learning for Coarse-Labelled Dataset
🌻dataset - RaBit: Parametric Modeling of 3D Biped Cartoon Characters with a Topological-consistent Dataset
🌻dataset - Fantastic Breaks: A Dataset of Paired 3D Scans of Real-World Broken Objects and Their Complete Counterparts
⭐code - ARKitTrack: A New Diverse Dataset for Tracking Using Mobile RGB-D Data
⭐code - CelebV-Text: A Large-Scale Facial Text-Video Dataset
⭐code
人脸文本到视频生成 - Towards Artistic Image Aesthetics Assessment: a Large-scale Dataset and a New Method
⭐code
艺术图像美学评估 - CIMI4D: A Large Multimodal Climbing Motion Dataset under Human-scene Interactions
🏠project
攀爬动作数据集 - Uncurated Image-Text Datasets: Shedding Light on Demographic Bias
- AutoShot: A Short Video Dataset and State-of-the-Art Shot Boundary Detection
⭐code
🏠project公共短视频镜头边界检测数据集 - V2X-Seq: A Large-Scale Sequential Dataset for Vehicle-Infrastructure Cooperative Perception and Forecasting
⭐code - WEDGE: A multi-weather autonomous driving dataset built from generative vision-language models
⭐code用于极端天气条件下的物体检测和天气分类任务的合成数据集 - CLOTH4D: A Dataset for Clothed Human Reconstruction
🌻dataset
用于穿衣服人体重建的数据集 - OmniCity: Omnipotent City Understanding With Multi-Level and Multi-View Images
🌻dataset
从多层次和多视图图像中获取全能城市理解的新数据集。 - RealImpact: A Dataset of Impact Sound Fields for Real Objects
⭐code - BEDLAM: A Synthetic Dataset of Bodies Exhibiting Detailed Lifelike Animated Motion
🏠project - GFIE:A Dataset and Baseline for Gaze-Following From 2D to 3D in Indoor Environments
🏠project - Benchmark(基准)
- A Soma Segmentation Benchmark in Full Adult Fly Brain
⭐code - A New Comprehensive Benchmark for Semi-Supervised Video Anomaly Detection and Anticipation
- A Large-Scale Homography Benchmark
- Toward RAW Object Detection: A New Benchmark and a New Model
- MammalNet: A Large-Scale Video Benchmark for Mammal Recognition and Behavior Understanding
- Omni3D: A Large Benchmark and Model for 3D Object Detection in the Wild
🏠project - Advancing Visual Grounding With Scene Knowledge: Benchmark and Method
⭐code - The ObjectFolder Benchmark: Multisensory Learning With Neural and Real Objects
⭐code - Meta Omnium: A Benchmark for General-Purpose Learning-to-Learn
⭐code - A New Benchmark: On the Utility of Synthetic Data with Blender for Bare Supervised Learning and Downstream Domain Adaptation
⭐code - GeoNet: Benchmarking Unsupervised Adaptation across Geographies
⭐code - PosterLayout: A New Benchmark and Approach for Content-aware Visual-Textual Presentation Layout
⭐code - Dense-Localizing Audio-Visual Events in Untrimmed Videos: A Large-Scale Benchmark and Baseline
- ANetQA: A Large-scale Benchmark for Fine-grained Compositional Reasoning over Untrimmed Videos
🏠project - Image Similarity
- ANetQA: A Large-scale Benchmark for Fine-grained Compositional Reasoning over Untrimmed Videos
⭐code - Ultra-High Resolution Segmentation with Ultra-Rich Context: A Novel Benchmark
⭐code - NewsNet: A Novel Benchmark for Hierarchical Temporal Segmentation
⭐code - Ultra-High Resolution Segmentation With Ultra-Rich Context: A Novel Benchmark
⭐code - PosterLayout: A New Benchmark and Approach for Content-Aware Visual-Textual Presentation Layout
⭐code - Meta Omnium: A Benchmark for General-Purpose Learning-To-Learn
⭐code - RefTeacher: A Strong Baseline for Semi-Supervised Referring Expression Comprehension
🏠project
- A Soma Segmentation Benchmark in Full Adult Fly Brain
53.Sign Language (手语)
- Ham2Pose: Animating Sign Language Notation Into Pose Sequences
🏠project - 手语翻译
- 手语识别
- Continuous Sign Language Recognition with Correlation Network
⭐code - Reconstructing Signing Avatars From Video Using Linguistic Priors
🏠project - Distilling Cross-Temporal Contexts for Continuous Sign Language Recognition
- CVT-SLR: Contrastive Visual-Textual Transformation for Sign Language Recognition With Variational Alignment
⭐code - Natural Language-Assisted Sign Language Recognition
⭐code - Continuous Sign Language Recognition With Correlation Network
⭐code
- Continuous Sign Language Recognition with Correlation Network
- 手语检索
52.Human Motion(人体运动)
- Semi-Weakly Supervised Object Kinematic Motion Prediction
- The Wisdom of Crowds: Temporal Progressive Attention for Early Action Prediction
- MotionDiffuser: Controllable Multi-Agent Motion Prediction using Diffusion
- 人体运动预测
- 人体运动合成
- 3D HM
51.Computed Imaging(计算成像,如光学、几何、光场成像等)
- Physics-Guided ISO-Dependent Sensor Noise Modeling for Extreme Low-Light Photography
⭐code - TRACE: 5D Temporal Regression of Avatars With Dynamic Cameras in 3D Environments
⭐code - High-Fidelity Event-Radiance Recovery via Transient Event Frequency
⭐code - Real-Time Neural Light Field on Mobile Devices
🏠project - Accidental Light Probes
🏠project - DyLiN: Making Light Field Networks Dynamic
⭐code - Learning Rotation-Equivariant Features for Visual Correspondence
🏠project - Role of Transients in Two-Bounce Non-Line-of-Sight Imaging
- Revisiting Rolling Shutter Bundle Adjustment: Toward Accurate and Fast Solution
- 相机姿势估计
- 快门校正
- 相机校准
- 几何估计
- 相机定位
*NeuMap: Neural Coordinate Mapping by Auto-Transdecoder for Camera Localization
⭐code
50.Anomaly Detection(异常检测)
- Revisiting Reverse Distillation for Anomaly Detection
- SQUID: Deep Feature In-Painting for Unsupervised Anomaly Detection
- Prototypical Residual Networks for Anomaly Detection and Localization
- OpenMix: Exploring Outlier Samples for Misclassification Detection
⭐code - Explicit Boundary Guided Semi-Push-Pull Contrastive Learning for Supervised Anomaly Detection
⭐code - Diversity-Measurable Anomaly Detection
- SimpleNet: A Simple Network for Image Anomaly Detection and Localization
⭐code - DeSTSeg: Segmentation Guided Denoising Student-Teacher for Anomaly Detection
- WinCLIP: Zero-/Few-Shot Anomaly Classification and Segmentation
- OOD
- Uncertainty-Aware Optimal Transport for Semantically Coherent Out-of-Distribution Detection
⭐code - Mind the Label Shift of Augmentation-Based Graph OOD Generalization
- Block Selection Method for Using Feature Norm in Out-of-Distribution Detection
⭐code - Distribution Shift Inversion for Out-of-Distribution Prediction
⭐code - Are Data-Driven Explanations Robust Against Out-of-Distribution Data?
- LINe: Out-of-Distribution Detection by Leveraging Important Neurons
- Rethinking Out-of-Distribution (OOD) Detection: Masked Image Modeling Is All You Need
⭐code - Balanced Energy Regularization Loss for Out-of-Distribution Detection
- Decoupling MaxLogit for Out-of-Distribution Detection
- Detection of Out-of-Distribution Samples Using Binary Neuron Activation Patterns
- GEN: Pushing the Limits of Softmax-Based Out-of-Distribution Detection
⭐code
- Uncertainty-Aware Optimal Transport for Semantically Coherent Out-of-Distribution Detection
49.Image Geo-localization(图像地理位置识别)
48.NLP(自然语言处理)
- Images Speak in Images: A Generalist Painter for In-Context Visual Learning
⭐code - CLIP-Sculptor: Zero-Shot Generation of High-Fidelity and Diverse Shapes From Natural Language
- 反讽检测(检测文本(或图像,如漫画等其他模态)中是否存在讽刺)
- NLQ
- Visual Grounding(视觉指代)
- Referring Expression Comprehension(指代表达理解)
47.Few/Zero-Shot Learning/Domain Generalization/Adaptation(小/零样本/域泛化/域适应)
- DG
- Towards Domain Generalization for Multi-view 3D Object Detection in Bird-Eye-View
- Meta-Causal Learning for Single Domain Generalization
- Bi-Level Meta-Learning for Few-Shot Domain Generalization
- Promoting Semantic Connectivity: Dual Nearest Neighbors Contrastive Learning for Unsupervised Domain Generalization
- Federated Domain Generalization With Generalization Adjustment
⭐code - Decompose, Adjust, Compose: Effective Normalization by Playing With Frequency for Domain Generalization
- NICO++: Towards Better Benchmarking for Domain Generalization
⭐code - Improved Test-Time Adaptation for Domain Generalization
⭐code - Modality-Agnostic Debiasing for Single Domain Generalization
- Neuron Structure Modeling for Generalizable Remote Physiological Measurement
⭐code - Sharpness-Aware Gradient Matching for Domain Generalization
⭐code - Improving Generalization with Domain Convex Game
- Generalist: Decoupling Natural and Robust Generalization
⭐code - ALOFT: A Lightweight MLP-like Architecture with Dynamic Low-frequency Transform for Domain Generalization
⭐code - Deep Frequency Filtering for Domain Generalization
- Progressive Random Convolutions for Single Domain Generalization
- Meta-causal Learning for Single Domain Generalization
- DA
- Guiding Pseudo-labels with Uncertainty Estimation for Test-Time Adaptation
⭐code - Class Relationship Embedded Learning for Source-Free Unsupervised Domain Adaptation
- Semi-Supervised Domain Adaptation With Source Label Adaptation
- SCoDA: Domain Adaptive Shape Completion for Real Scans
- Divide and Adapt: Active Domain Adaptation via Customized Learning
- Source-Free Video Domain Adaptation With Spatial-Temporal-Historical Consistency Learning
- DARE-GRAM: Unsupervised Domain Adaptation Regression by Aligning Inverse Gram Matrices
⭐code - Dual-Bridging With Adversarial Noise Generation for Domain Adaptive rPPG Estimation
- MIC: Masked Image Consistency for Context-Enhanced Domain Adaptation
⭐code - DATE: Domain Adaptive Product Seeker for E-commerce
⭐code - Adjustment and Alignment for Unbiased Open Set Domain Adaptation
⭐code - Patch-Mix Transformer for Unsupervised Domain Adaptation: A Game Perspective
- MHPL: Minimum Happy Points Learning for Active Source Free Domain Adaptation
- COT: Unsupervised Domain Adaptation with Clustering and Optimal Transport
- Upcycling Models under Domain and Category Shift
⭐code - C-SFDA: A Curriculum Learning Aided Self-Training Framework for Efficient Source Free Domain Adaptation
- TeSLA: Test-Time Self-Learning With Automatic Adversarial Augmentation
⭐code - OSAN: A One-Stage Alignment Network to Unify Multimodal Alignment and Unsupervised Domain Adaptation
- MOT: Masked Optimal Transport for Partial Domain Adaptation
- Feature Alignment and Uniformity for Test Time Adaptation
- Guiding Pseudo-labels with Uncertainty Estimation for Test-Time Adaptation
- ZSL
- Bi-Directional Distribution Alignment for Transductive Zero-Shot Learning
⭐code - Progressive Semantic-Visual Mutual Adaption for Generalized Zero-Shot Learning
⭐code - Learning Attention as Disentangler for Compositional Zero-shot Learning
⭐code - Zero-shot Model Diagnosis
- Learning Conditional Attributes for Compositional Zero-Shot Learning
⭐code - (ML)$^2$P-Encoder: On Exploration of Channel-Class Correlation for Multi-Label Zero-Shot Learning
⭐code - Decomposed Soft Prompt Guided Fusion Enhancing for Compositional Zero-Shot Learning
- Bi-Directional Distribution Alignment for Transductive Zero-Shot Learning
- FSL
- Transductive Few-shot Learning with Prototype-based Label Propagation by Iterative Graph Refinement
- Prompt, Generate, Then Cache: Cascade of Foundation Models Makes Strong Few-Shot Learners
⭐code - Revisiting Prototypical Network for Cross Domain Few-Shot Learning
⭐code - Multimodality Helps Unimodality: Cross-Modal Few-Shot Learning With Multimodal Models
🏠project - Open-Set Likelihood Maximization for Few-Shot Learning
- StyleAdv: Meta Style Adversarial Training for Cross-Domain Few-Shot Learning
⭐code
46.Scene Graph Generation(场景图生成)
- Unbiased Scene Graph Generation in Videos
- Prototype-Based Embedding Network for Scene Graph Generation
- IS-GGT: Iterative Scene Graph Generation With Generative Transformers
- Prototype-based Embedding Network for Scene Graph Generation
⭐code - Devil's on the Edges: Selective Quad Attention for Scene Graph Generation
🏠project - Learning To Generate Language-Supervised and Open-Vocabulary Scene Graph Using Pre-Trained Visual-Semantic Space
- Panoptic Video Scene Graph Generation
- Fast Contextual Scene Graph Generation With Unbiased Context Augmentation
45.Dense Prediction(密集预测)
- Ensemble-Based Blackbox Attacks on Dense Prediction
⭐code - DejaVu: Conditional Regenerative Learning to Enhance Dense Prediction
- Ensemble-based Blackbox Attacks on Dense Prediction
⭐code - Probabilistic Prompt Learning for Dense Prediction
- 1% VS 100%: Parameter-Efficient Low Rank Adapter for Dense Predictions
- DPF: Learning Dense Prediction Fields With Weak Supervision
⭐code - 密集检测
- 密集目标定位
44.Federated Learning(联邦学习)
- Confidence-Aware Personalized Federated Learning via Variational Expectation Maximization
- Federated Learning With Data-Agnostic Distribution Fusion
- How To Prevent the Poor Performance Clients for Personalized Federated Learning?
- GradMA: A Gradient-Memory-Based Accelerated Federated Learning With Alleviated Catastrophic Forgetting
- Bias-Eliminating Augmentation Learning for Debiased Federated Learning
- Make Landscape Flatter in Differentially Private Federated Learning
- The Resource Problem of Using Linear Layer Leakage Attack in Federated Learning
- Rethinking Federated Learning With Domain Shift: A Prototype View
⭐code - On the Effectiveness of Partial Variance Reduction in Federated Learning With Heterogeneous Data
- Elastic Aggregation for Federated Optimization
- FedDM: Iterative Distribution Matching for Communication-Efficient Federated Learning
- Adaptive Channel Sparsity for Federated Learning Under System Heterogeneity
- ScaleFL: Resource-Adaptive Federated Learning With Heterogeneous Clients
- Reliable and Interpretable Personalized Federated Learning
43.Multi-Task Learning(多任务学习)
- Independent Component Alignment for Multi-Task Learning
- Dynamic Neural Network for Multi-Task Learning Searching Across Diverse Network Topologies
- AdaMTL: Adaptive Input-dependent Inference for Efficient Multi-Task Learning
⭐code - Mod-Squad: Designing Mixtures of Experts As Modular Multi-Task Learners
🏠project - Mitigating Task Interference in Multi-Task Learning via Explicit Task Routing With Non-Learnable Primitives
⭐code - Hierarchical Prompt Learning for Multi-Task Learning
42.Metric Learning(度量学习)
- Advancing Deep Metric Learning Through Multiple Batch Norms And Multi-Targeted Adversarial Examples
- Deep Factorized Metric Learning
⭐code - Deep Semi-Supervised Metric Learning With Mixed Label Propagation
- Cross-Image-Attention for Conditional Embeddings in Deep Metric Learning
41.Incremental Learning(增量学习)
- Decoupling Learning and Remembering: A Bilevel Memory Framework With Knowledge Projection for Task-Incremental Learning
⭐code - AttriCLIP: A Non-Incremental Learner for Incremental Knowledge Learning
⭐code - GKEAL: Gaussian Kernel Embedded Analytic Learning for Few-Shot Class Incremental Task
- 类增量学习
- Dense Network Expansion for Class Incremental Learning
- Class-Incremental Exemplar Compression for Class-Incremental Learning
⭐code - Rebalancing Batch Normalization for Exemplar-based Class-Incremental Learning
- Learning with Fantasy: Semantic-Aware Virtual Contrastive Constraint for Few-Shot Class-Incremental Learning
⭐code - On the Stability-Plasticity Dilemma of Class-Incremental Learning
- Few-Shot Class-Incremental Learning via Class-Aware Bilateral Distillation
⭐code - Multi-Centroid Task Descriptor for Dynamic Class Incremental Inference
- DKT: Diverse Knowledge Transfer Transformer for Class Incremental Learning
⭐code - Learning With Fantasy: Semantic-Aware Virtual Contrastive Constraint for Few-Shot Class-Incremental Learning
- CafeBoost: Causal Feature Boost To Eliminate Task-Induced Bias for Class Incremental Learning
40.Adversarial Learning(对抗学习)
- Adversarial Robustness via Random Projection Filters
- Seasoning Model Soups for Robustness to Adversarial and Natural Distribution Shifts
- Dynamic Generative Targeted Attacks With Pattern Injection
- FIANCEE: Faster Inference of Adversarial Networks via Conditional Early Exits
- Enhancing the Self-Universality for Transferable Targeted Attacks
⭐code - Exploring the Relationship Between Architectural Design and Adversarially Robust Generalization
🏠project - Revisiting Residual Networks for Adversarial Robustness
⭐code - Feature Separation and Recalibration for Adversarial Robustness
⭐code - CFA: Class-wise Calibrated Fair Adversarial Training
⭐code - Towards Compositional Adversarial Robustness: Generalizing Adversarial Training to Composite Semantic Perturbations
🏠project - Efficient Loss Function by Minimizing the Detrimental Effect of Floating-Point Errors on Gradient-Based Attacks
- 黑盒
- 对抗样本
- Demystifying Causal Features on Adversarial Examples and Causal Inoculation for Robust Network by Adversarial Instrumental Variable Regression
- Introducing Competition To Boost the Transferability of Targeted Adversarial Examples Through Clean Feature Mixup
⭐code - Towards Transferable Targeted Adversarial Examples
- Improving the Transferability of Adversarial Samples by Path-Augmented Method
- Increasing the Latency of LiDAR-Based Detection Using Adversarial Examples
⭐code
- 后门攻击
- Single Image Backdoor Inversion via Robust Smoothed Classifiers
⭐code - Detecting Backdoors During the Inference Stage Based on Corruption Robustness Consistency
- You Are Catching My Attention: Are Vision Transformers Bad Learners Under Backdoor Attacks?
- MEDIC: Remove Model Backdoors via Importance Driven Cloning
⭐code - Backdoor Defense via Adaptively Splitting Poisoned Dataset
⭐code - Detecting Backdoors in Pre-trained Encoders
⭐code - Color Backdoor: A Robust Poisoning Attack in Color Space
- Detecting Backdoors in Pre-Trained Encoders
⭐code
- Single Image Backdoor Inversion via Robust Smoothed Classifiers
- 对抗攻击
- Adversarial Attack with Raindrops
- Progressive Backdoor Erasing via Connecting Backdoor and Adversarial Attacks
- Towards Benchmarking and Assessing Visual Naturalness of Physical World Adversarial Attacks
- The Best Defense Is a Good Offense: Adversarial Augmentation Against Adversarial Attacks
- Transferable Adversarial Attacks on Vision Transformers with Token Gradient Regularization
- Robust Single Image Reflection Removal Against Adversarial Attacks
- Transferable Adversarial Attacks on Vision Transformers With Token Gradient Regularization
⭐code - StyLess: Boosting the Transferability of Adversarial Examples
- Re-thinking Model Inversion Attacks Against Deep Neural Networks
⭐code - Defending Against Patch-based Backdoor Attacks on Self-Supervised Learning
⭐code - Jedi: Entropy-based Localization and Removal of Adversarial Patches
- 后门防御
- 对抗训练
39.Continual Learning(持续学习)
- Dealing With Cross-Task Class Discrimination in Online Continual Learning
- Heterogeneous Continual Learning
- Batch Model Consolidation: A Multi-Task Model Consolidation Framework
- CODA-Prompt: COntinual Decomposed Attention-Based Prompting for Rehearsal-Free Continual Learning
⭐code - Achieving a Better Stability-Plasticity Trade-off via Auxiliary Networks in Continual Learning
⭐code - Computationally Budgeted Continual Learning: What Does Matter?
⭐code - Achieving a Better Stability-Plasticity Trade-Off via Auxiliary Networks in Continual Learning
- Preserving Linear Separability in Continual Learning by Backward Feature Projection
- Regularizing Second-Order Influences for Continual Learning
⭐code - Rethinking Gradient Projection Continual Learning: Stability / Plasticity Feature Space Decoupling
- MetaMix: Towards Corruption-Robust Continual Learning With Temporally Self-Adaptive Data Transformation
- Exploring Data Geometry for Continual Learning
- PCR: Proxy-based Contrastive Replay for Online Class-Incremental Continual Learning
- Bilateral Memory Consolidation for Continual Learning
- Adaptive Plasticity Improvement for Continual Learning
- Real-Time Evaluation in Online Continual Learning: A New Hope
- PIVOT: Prompting for Video Continual Learning
38.Meta-Learning(元学习)
- Meta-Learning with a Geometry-Adaptive Preconditioner
⭐code元学习 - Improving Generalization of Meta-Learning With Inverted Regularization at Inner-Level
- Ground-Truth Free Meta-Learning for Deep Compressive Sampling
- HIER: Metric Learning Beyond Class Labels via Hierarchical Regularization
- Panoptic Compositional Feature Field for Editable Scene Rendering With Network-Inferred Labels via Metric Learning
37.Contrastive Learning(对比学习)
- Multiple Instance Learning via Iterative Self-Paced Supervised Contrastive Learning
- Difficulty-Based Sampling for Debiased Contrastive Representation Learning
- MaskCLIP: Masked Self-Distillation Advances Contrastive Language-Image Pretraining
⭐code - Twin Contrastive Learning with Noisy Labels
⭐code - Revisiting Multimodal Representation in Contrastive Learning: From Patch and Token Embeddings to Finite Discrete Tokens
- Best of Both Worlds: Multimodal Contrastive Learning With Tabular and Imaging Data
- CLAMP: Prompt-Based Contrastive Learning for Connecting Language and Animal Pose
- ContraNeRF: Generalizable Neural Radiance Fields for Synthetic-to-Real Novel View Synthesis via Contrastive Learning
- Hyperbolic Contrastive Learning for Visual Representations beyond Objects
⭐code - 非对比学习
36.Optical Flow(光流估计)
- Rethinking Optical Flow from Geometric Matching Consistent Perspective
⭐code - DistractFlow: Improving Optical Flow Estimation via Realistic Distractions and Pseudo-Labeling
- AnyFlow: Arbitrary Scale Optical Flow with Implicit Neural Representation
- TransFlow: Transformer as Flow Learner
- Tangentially Elongated Gaussian Belief Propagation for Event-Based Incremental Optical Flow Estimation
⭐code - FlowFormer++: Masked Cost Volume Autoencoding for Pretraining Optical Flow Estimation
35.OCR
- 文本识别
- 场景文本检测
- 表格结构识别
- 字体生成
- 手写文本生成
- 矢量字体合成
- 生成图形文档
- 文本检测
- 文档处理
- Scene Text Spotting
34.Model Compression/Knowledge Distillation/Pruning(模型压缩/知识蒸馏/剪枝)
- Network Expansion for Practical Training Acceleration
⭐code - Accelerating Dataset Distillation via Model Augmentation
- Run, Don’t Walk: Chasing Higher FLOPS for Faster Neural Networks
⭐code - 量化
- Solving Oscillation Problem in Post-Training Quantization Through a Theoretical Perspective
⭐code - Adaptive Data-Free Quantization
⭐code - Defining and Quantifying the Emergence of Sparse Concepts in DNNs
- NIPQ: Noise Proxy-Based Integrated Pseudo-Quantization
- Bit-Shrinking: Limiting Instantaneous Sharpness for Improving Post-Training Quantization
- Genie: Show Me the Data for Quantization
- One-Shot Model for Mixed-Precision Quantization
- Post-training Quantization on Diffusion Models
⭐code - Q-DETR: An Efficient Low-Bit Quantized Detection Transformer
⭐code - NoisyQuant: Noisy Bias-Enhanced Post-Training Activation Quantization for Vision Transformers
- PD-Quant: Post-Training Quantization Based on Prediction Difference Metric
⭐code - Boost Vision Transformer with GPU-Friendly Sparsity and Quantization
- Solving Oscillation Problem in Post-Training Quantization Through a Theoretical Perspective
- 剪枝
- CP$^3$: Channel Pruning Plug-in for Point-based Networks
- Bias in Pruned Vision Models: In-Depth Analysis and Countermeasures
- Global Vision Transformer Pruning With Hessian-Aware Saliency
- X-Pruner: eXplainable Pruning for Vision Transformers
⭐code - DepGraph: Towards Any Structural Pruning
- Progressive Neighbor Consistency Mining for Correspondence Pruning
⭐code - Training Debiased Subnetworks With Contrastive Weight Pruning
- MC
- KD
- DisWOT: Student Architecture Search for Distillation WithOut Training
⭐code - Data-Free Knowledge Distillation via Feature Exchange and Activation Region Constraint
- Supervised Masked Knowledge Distillation for Few-Shot Transformers
⭐code - Generalization Matters: Loss Minima Flattening via Parameter Hybridization for Efficient Online Knowledge Distillation
⭐code - KD-DLGAN: Data Limited Image Generation via Knowledge Distillation
- TinyMIM: An Empirical Study of Distilling MIM Pre-Trained Models(https://github.com/OliverRensu/TinyMIM)
- Masked Autoencoders Enable Efficient Knowledge Distillers
⭐[code]
⭐code - Multi-Mode Online Knowledge Distillation for Self-Supervised Visual Representation Learning
- Class Attention Transfer Based Knowledge Distillation
⭐code - DaFKD: Domain-Aware Federated Knowledge Distillation
- Multi-Level Logit Distillation
⭐code - A Unified Knowledge Distillation Framework for Deep Directed Graphical Models
⭐code - Enhanced Multimodal Representation Learning with Cross-modal KD
- Constructing Deep Spiking Neural Networks From Artificial Neural Networks With Knowledge Distillation
- Learning To Retain While Acquiring: Combating Distribution-Shift in Adversarial Data-Free Knowledge Distillation
- 对抗性蒸馏
- DisWOT: Student Architecture Search for Distillation WithOut Training
- 轻量级网络
- 去量化
33.Human-Object Interaction(人物交互)
- Visibility Aware Human-Object Interaction Tracking From Single RGB Camera
- Affordance Diffusion: Synthesizing Hand-Object Interactions
- HOICLIP: Efficient Knowledge Transfer for HOI Detection With Vision-Language Models
⭐code - ViPLO: Vision Transformer Based Pose-Conditioned Self-Loop Graph for Human-Object Interaction Detection
⭐code - Open-Category Human-Object Interaction Pre-Training via Language Modeling Framework
- Detecting Human-Object Contact in Images
🏠project - Category Query Learning for Human-Object Interaction Classification
- Gazeformer: Scalable, Effective and Fast Prediction of Goal-Directed Human Attention
- Relational Context Learning for Human-Object Interaction Detection
- HOICLIP: Efficient Knowledge Transfer for HOI Detection with Vision-Language Models
⭐code - ViPLO: Vision Transformer based Pose-Conditioned Self-Loop Graph for Human-Object Interaction Detection
⭐code - Visibility Aware Human-Object Interaction Tracking from Single RGB Camera
🏠project - Instant-NVR: Instant Neural Volumetric Rendering for Human-object Interactions from Monocular RGBD Stream
- A Neural Modeling Pipeline on Multi-View Human-Object Interactions
- 双手交互
- 手物交互
32.Data Augmentation(数据增强)
- Full or Weak annotations? An adaptive strategy for budget-constrained annotation campaigns
- SLACK: Stable Learning of Augmentations With Cold-Start and KL Regularization
⭐code - 学习库
- 关键点定位
- 关键点检测
31.Vision-Language(视觉语言)
- Visual Language Pretrained Multiple Instance Zero-Shot Transfer for Histopathology Images
- InternImage: Exploring Large-Scale Vision Foundation Models With Deformable Convolutions
- GIVL: Improving Geographical Inclusivity of Vision-Language Models With Pre-Training Methods
- Learning To Exploit Temporal Structure for Biomedical Vision-Language Processing
- REVEAL: Retrieval-Augmented Visual-Language Pre-Training With Multi-Source Multimodal Knowledge Memory
- Policy Adaptation from Foundation Model Feedback
🏠project - Learning Visual Representations via Language-Guided Sampling
- LASP: Text-to-Text Optimization for Language-Aware Soft Prompting of Vision & Language Models
- Scaling Language-Image Pre-Training via Masking
- MAP: Multimodal Uncertainty-Aware Vision-Language Pre-Training Model
- Improving Commonsense in Vision-Language Models via Knowledge Graph Riddles
⭐code - Few-Shot Learning With Visual Distribution Calibration and Cross-Modal Distribution Alignment
⭐code - ConStruct-VL: Data-Free Continual Structured VL Concepts Learning
⭐code - Teaching Structured Vision & Language Concepts to Vision & Language Models
⭐code - Leveraging per Image-Token Consistency for Vision-Language Pre-Training
- Image as a Foreign Language: BEiT Pretraining for Vision and Vision-Language Tasks
🏠project - CREPE: Can Vision-Language Foundation Models Reason Compositionally?
- Open-vocabulary Attribute Detection
🏠project - Filtering, Distillation, and Hard Negatives for Vision-Language Pre-Training
⭐code - FashionSAP: Symbols and Attributes Prompt for Fine-Grained Fashion Vision-Language Pre-Training
- Exploring the Effect of Primitives for Compositional Generalization in Vision-and-Language
- Task Residual for Tuning Vision-Language Models
⭐code - Masked Autoencoding Does Not Help Natural Language Supervision at Scale
- Open-Set Fine-Grained Retrieval via Prompting Vision-Language Evaluator
- Visual-Language Prompt Tuning With Knowledge-Guided Context Optimization
- Position-Guided Text Prompt for Vision-Language Pre-Training
⭐code - RA-CLIP: Retrieval Augmented Contrastive Language-Image Pre-Training
- FAME-ViL: Multi-Tasking Vision-Language Model for Heterogeneous Fashion Tasks
⭐code - Seeing What You Miss: Vision-Language Pre-Training With Semantic Completion Learning
- You Need Multiple Exiting: Dynamic Early Exiting for Accelerating Unified Vision Language Model
- DeAR: Debiasing Vision-Language Models with Additive Residuals
- Understanding and Constructing Latent Modality Structures in Multi-modal Representation Learning
- Is BERT Blind? Exploring the Effect of Vision-and-Language Pretraining on Visual Language Understanding
⭐code - VILA: Learning Image Aesthetics from User Comments with Vision-Language Pretraining
- MAGVLT: Masked Generative Vision-and-Language Transformer
- Visual-Language Prompt Tuning with Knowledge-guided Context Optimization
- Top-Down Visual Attention from Analysis by Synthesis
🏠project - Accelerating Vision-Language Pretraining with Free Language Modeling
⭐code - Multi-Modal Representation Learning with Text-Driven Soft Masks
- Fine-tuned CLIP models are efficient video learners
⭐code - MaPLe: Multi-modal Prompt Learning
⭐code - Learning to Name Classes for Vision and Language Models
- Dynamic Inference With Grounding Based Vision and Language Models
- Connecting Vision and Language with Video Localized Narratives
🏠project - Bidirectional Cross-Modal Knowledge Exploration for Video Recognition With Pre-Trained Vision-Language Models
⭐code - Uni-Perceiver v2: A Generalist Model for Large-Scale Vision and Vision-Language Tasks
⭐code - VILA: Learning Image Aesthetics From User Comments With Vision-Language Pretraining
⭐code - VLN
- Meta-Explore: Exploratory Hierarchical Vision-and-Language Navigation Using Scene Object Spectrum Grounding
🏠project - Lana: A Language-Capable Navigator for Instruction Following and Generation
⭐code - LANA: A Language-Capable Navigator for Instruction Following and Generation
- KERM: Knowledge Enhanced Reasoning for Vision-and-Language Navigation
⭐code - Improving Vision-and-Language Navigation by Generating Future-View Image Semantics
⭐code - Iterative Vision-and-Language Navigation
- Behavioral Analysis of Vision-and-Language Navigation Agents
- Adaptive Zone-Aware Hierarchical Planner for Vision-Language Navigation
- GeoVLN: Learning Geometry-Enhanced Visual Representation With Slot Attention for Vision-and-Language Navigation
- A New Path: Scaling Vision-and-Language Navigation With Synthetic Instructions and Imitation Learning
- Layout-Based Causal Inference for Object Navigation
⭐code - KERM: Knowledge Enhanced Reasoning for Vision-and-Language Navigation
- Meta-Explore: Exploratory Hierarchical Vision-and-Language Navigation Using Scene Object Spectrum Grounding
- 视频语言
- Test of Time: Instilling Video-Language Models with a Sense of Time
🏠project - All in One: Exploring Unified Video-Language Pre-Training
⭐code - HierVL: Learning Hierarchical Video-Language Embeddings
- An Empirical Study of End-to-End Video-Language Transformers With Masked Visual Modeling
⭐code - Clover: Towards A Unified Video-Language Alignment and Fusion Model
⭐code
Clover 视频-文本预训练模型在 DiDeMo、MSRVTT 和 LSMDC 三个文本-视频检索任务上取得了 zero-shot 及 finetune performance 的最佳表现;在 8 个主流的视频问答 benchmark 上也达到了新的 state-of-the-art。 - VindLU: A Recipe for Effective Video-and-Language Pretraining
⭐code
- Test of Time: Instilling Video-Language Models with a Sense of Time
- LLM
- visual grounding
- 视觉对话
30.Visual Answer Questions(视觉问答)
- VQA
- SimVQA: Exploring Simulated Environments for Visual Question Answering
🏠project - From Images to Textual Prompts: Zero-Shot Visual Question Answering With Frozen Large Language Models
- Logical Implications for Visual Question Answering Consistency
- S3C: Semi-Supervised VQA Natural Language Explanation via Self-Critical Learning
- RMLVQA: A Margin Loss Approach for Visual Question Answering With Language Biases
⭐code - VQACL: A Novel Visual Question Answering Continual Learning Setting
⭐code - Q: How To Specialize Large Vision-Language Models to Data-Scarce VQA Tasks? A: Self-Train on Unlabeled Images!
⭐code - Improving Selective Visual Question Answering by Learning From Your Peers
⭐code - MixPHM: Redundancy-Aware Parameter-Efficient Tuning for Low-Resource Visual Question Answering
- Prompting Large Language Models with Answer Heuristics for Knowledge-based Visual Question Answering
⭐code - MD-VQA: Multi-Dimensional Quality Assessment for UGC Live Videos
- Divide and Conquer: Answering Questions with Object Factorization and Compositional Reasoning
⭐code - Prompting Large Language Models With Answer Heuristics for Knowledge-Based Visual Question Answering
⭐code - Generative Bias for Robust Visual Question Answering
- SimVQA: Exploring Simulated Environments for Visual Question Answering
- Video-QA
29.SLAM/Augmented Reality/Virtual Reality/Robotics(增强/虚拟现实/机器人)
- 机器人
- PartManip: Learning Cross-Category Generalizable Part Manipulation Policy from Point Cloud Observations
- Affordances From Human Videos as a Versatile Representation for Robotics
- Markerless Camera-to-Robot Pose Estimation via Self-supervised Sim-to-Real Transfer
- Robot Structure Prior Guided Temporal Attention for Camera-to-Robot Pose Estimation From Image Sequence
🏠project - Phone2Proc: Bringing Robust Robots Into Our Chaotic World
🏠project - DexArt: Benchmarking Generalizable Dexterous Manipulation with Articulated Objects
🏠project - Learning Human-to-Robot Handovers from Point Clouds
⭐code - Neural Volumetric Memory for Visual Locomotion Control
⭐code - Affordances from Human Videos as a Versatile Representation for Robotics
⭐code - NeuralField-LDM: Scene Generation with Hierarchical Latent Diffusion Models机器人
- 机器手抓取
- UniDexGrasp: Universal Robotic Dexterous Grasping via Learning Diverse Proposal Generation and Goal-Conditioned Policy
🏠project - UniDexGrasp++: Improving Dexterous Grasping Policy Learning via Geometry-aware Curriculum and Iterative Generalist-Specialist Learning
- Target-Referenced Reactive Grasping for Dynamic Objects
🏠project
- UniDexGrasp: Universal Robotic Dexterous Grasping via Learning Diverse Proposal Generation and Goal-Conditioned Policy
- Visual Navigation(视觉导航)
- SLAM
- Efficient Map Sparsification Based on 2D and 3D Discretized Grids
⭐code - Co-SLAM: Joint Coordinate and Sparse Parametric Encodings for Neural Real-Time SLAM
⭐code - ESLAM: Efficient Dense SLAM System Based on Hybrid Representation of Signed Distance Fields
🏠project - ObjectMatch: Robust Registration Using Canonical Object Correspondences
🏠project - vMAP: Vectorised Object Mapping for Neural Field SLAM
🏠project
- Efficient Map Sparsification Based on 2D and 3D Discretized Grids
- 虚拟试穿
- GP-VTON: Towards General Purpose Virtual Try-on via Collaborative Local-Flow Global-Parsing Learning
⭐code - TryOnDiffusion: A Tale of Two UNets
- Linking Garment With Person via Semantically Associated Landmarks for Virtual Try-On
🏠project - Synthesizing Photorealistic Virtual Humans Through Cross-Modal Disentanglement
- GP-VTON: Towards General Purpose Virtual Try-on via Collaborative Local-Flow Global-Parsing Learning
- AR/VR
- Affordance Grounding from Demonstration Video to Target Image
⭐code - GarmentTracking: Category-Level Garment Pose Tracking
🏠project - Object Pop-Up: Can We Infer 3D Objects and Their Poses From Human Interactions Alone?
- Learning to Zoom and Unzoom
⭐code - Auto-CARD: Efficient and Robust Codec Avatar Driving for Real-time Mobile Telepresence
- Avatars Grow Legs: Generating Smooth Human Motion from Sparse Tracking Inputs with Diffusion Model
⭐codeVR/AR - Auto-CARD: Efficient and Robust Codec Avatar Driving for Real-Time Mobile Telepresence
- Affordance Grounding From Demonstration Video To Target Image
⭐code - Hand Avatar: Free-Pose Hand Animation and Rendering from Monocular Video
🏠project
- Affordance Grounding from Demonstration Video to Target Image
- 混合现实
- Visual Localization(视觉定位)
- OrienterNet: Visual Localization in 2D Public Maps with Neural Matching
- Visual Localization using Imperfect 3D Models from the Internet
- SFD2: Semantic-Guided Feature Detection and Description
⭐code - SegLoc: Learning Segmentation-Based Representations for Privacy-Preserving Visual Localization
- Long-term Visual Localization with Mobile Sensors
- Paired-Point Lifting for Enhanced Privacy-Preserving Visual Localization
- VPR(Visual Place Recognition)
- 视觉里程计
28.Style Transfer(风格迁移)
- CAP-VSTNet: Content Affinity Preserved Versatile Style Transfer
- StyleGAN Salon: Multi-View Latent Optimization for Pose-Invariant Hairstyle Transfer
⭐code - Modernizing Old Photos Using Multiple References via Photorealistic Style Transfer
⭐code - Master: Meta Style Transformer for Controllable Zero-Shot and Few-Shot Artistic Style Transfer
- Neural Preset for Color Style Transfer
🏠project - Learning Dynamic Style Kernels for Artistic Style Transfer
- Inversion-Based Style Transfer With Diffusion Models
⭐code - QuantArt: Quantizing Image Style Transfer Towards High Visual Fidelity
⭐code - 文本驱动的室内风格化
27.Pose Estimation(物体姿势估计)
- 物体姿势估计
- Object Pose Estimation with Statistical Guarantees: Conformal Keypoint Detection and Geometric Uncertainty Propagation
- SMOC-Net: Leveraging Camera Pose for Self-Supervised Monocular Object Pose Estimation
- HS-Pose: Hybrid Scope Feature Extraction for Category-level Object Pose Estimation
- TTA-COPE: Test-Time Adaptation for Category-Level Object Pose Estimation
🏠project - IMP: Iterative Matching and Pose Estimation with Adaptive Pooling
⭐code
- 6D
- Rigidity-Aware Detection for 6D Object Pose Estimation
- Neural Texture Learning for Self-Supervised 6D Object Pose Estimation
- Shape-Constraint Recurrent Flow for 6D Object Pose Estimation
- Knowledge Distillation for 6D Pose Estimation by Aligning Distributions of Local Predictions
- HyperReel: High-Fidelity 6-DoF Video with Ray-Conditioned Sampling
🏠project
- 4D
- 动物姿态估计
26.GCN/GNN
- GNN
25.Fine-Grained/Image Classification(细粒度/图像分类)
- Quantum-Inspired Spectral-Spatial Pyramid Network for Hyperspectral Image Classification
- Learning Partial Correlation Based Deep Visual Representation for Image Classification
- iCLIP: Bridging Image Classification and Contrastive Language-Image Pre-Training for Visual Recognition
- I2MVFormer: Large Language Model Generated Multi-View Document Supervision for Zero-Shot Image Classification
- Soft Augmentation for Image Classification
⭐code - Explaining Image Classifiers With Multiscale Directional Image Representation
- Equiangular Basis Vectors
⭐code - Prefix Conditioning Unifies Language and Label Supervision
- Improving Image Recognition by Retrieving from Web-Scale Image-Text Data
- Boosting Verified Training for Robust Image Classifications via Abstraction
⭐code - Semantic Prompt for Few-Shot Image Recognition
- Regularization of polynomial networks for image recognition
⭐code - Active Finetuning: Exploiting Annotation Budget in the Pretraining-Finetuning Paradigm
⭐code - Dynamic Conceptional Contrastive Learning for Generalized Category Discovery
⭐code - Learning Bottleneck Concepts in Image Classification
🏠project
⭐code - Learning Partial Correlation based Deep Visual Representation for Image Classification
- PIP-Net: Patch-Based Intuitive Prototypes for Interpretable Image Classification
⭐code - Language in a Bottle: Language Model Guided Concept Bottlenecks for Interpretable Image Classification
- 小样本图像分类
- 小样本分类
- Prompt, Generate, then Cache: Cascade of Foundation Models makes Strong Few-shot Learners
⭐code - Hubs and Hyperspheres: Reducing Hubness and Improving Transductive Few-shot Learning with Hyperspherical Embeddings
⭐code - Distilling Self-Supervised Vision Transformers for Weakly-Supervised Few-Shot Classification & Segmentation
- Prompt, Generate, then Cache: Cascade of Foundation Models makes Strong Few-shot Learners
- 细粒度
- Learning Common Rationale to Improve Self-Supervised Representation for Fine-Grained Visual Recognition Problems
⭐code - Fine-Grained Classification with Noisy Labels
- An Erudite Fine-Grained Visual Classification Model
⭐code - Weakly Supervised Posture Mining for Fine-Grained Classification
- Learning Attribute and Class-Specific Representation Duet for Fine-Grained Fashion Analysis
- Learning Common Rationale to Improve Self-Supervised Representation for Fine-Grained Visual Recognition Problems
- 视觉识别
- 长尾分类
- 长尾视觉识别
- SuperDisco: Super-Class Discovery Improves Visual Recognition for the Long-Tail
- Balanced Product of Calibrated Experts for Long-Tailed Recognition
⭐code - FCC: Feature Clusters Compression for Long-Tailed Visual Recognition
⭐code - Long-tailed Visual Recognition via Gaussian Clouded Logit Adjustment
⭐code - Global and Local Mixture Consistency Cumulative Learning for Long-tailed Visual Recognitions
⭐code - Long-Tailed Visual Recognition via Self-Heterogeneous Integration with Knowledge Excavation
- Class-Conditional Sharpness-Aware Minimization for Deep Long-Tailed Recognition
⭐code - No One Left Behind: Improving the Worst Categories in Long-Tailed Learning
- 多标签分类
- 多标签识别
- 多视觉分类
- Superclass Learning(超类学习)
- 材料分类
24.Super-Resolution(超分辨率)
- Cascaded Local Implicit Transformer for Arbitrary-Scale Super-Resolution
- Deep Arbitrary-Scale Image Super-Resolution via Scale-Equivariance Pursuit
- N-Gram in Swin Transformers for Efficient Lightweight Image Super-Resolution
⭐code - Perception-Oriented Single Image Super-Resolution Using Optimal Objective Estimation
⭐code - Toward Stable, Interpretable, and Lightweight Hyperspectral Super-Resolution
- CiaoSR: Continuous Implicit Attention-in-Attention Network for Arbitrary-Scale Image Super-Resolution
- Zero-Shot Dual-Lens Super-Resolution
⭐code - Non-Line-of-Sight Imaging With Signal Superresolution Network
- Kernel Aware Resampler
- RobustNeRF: Ignoring Distractors With Robust Losses
- 光场超分辨率
- ISR
- OPE-SR: Orthogonal Position Encoding for Designing a Parameter-free Upsampling Module in Arbitrary-scale Image Super-Resolution
- Activating More Pixels in Image Super-Resolution Transformer
⭐code - Equivalent Transformation and Dual Stream Network Construction for Mobile Image Super-Resolution
⭐code - Learning Generative Structure Prior for Blind Text Image Super-Resolution
- Human Guided Ground-Truth Generation for Realistic Image Super-Resolution
⭐code - OSRT: Omnidirectional Image Super-Resolution With Distortion-Aware Transformer
⭐code - CABM: Content-Aware Bit Mapping for Single Image Super-Resolution Network with Large Input
- Memory-Friendly Scalable Super-Resolution via Rewinding Lottery Ticket Hypothesis
- B-Spline Texture Coefficients Estimator for Screen Content Image Super-Resolution
⭐code - Rethinking Image Super Resolution From Long-Tailed Distribution Learning Perspective
- Correspondence Transformers With Asymmetric Feature Learning and Matching Flow Super-Resolution
⭐code - Toward Accurate Post-Training Quantization for Image Super Resolution
⭐code - Image Super-Resolution Using T-Tetromino Pixels
- Spectral Bayesian Uncertainty for Image Super-Resolution
- Super-Resolution Neural Operator
⭐code - Local Implicit Normalizing Flow for Arbitrary-Scale Image Super-Resolution
- Better "CMOS" Produces Clearer Images: Learning Space-Variant Blur Estimation for Blind Image Super-Resolution
- Human Guided Ground-truth Generation for Realistic Image Super-resolution
⭐code - Implicit Diffusion Models for Continuous Super-Resolution
- Better "CMOS" Produces Clearer Images: Learning Space-Variant Blur Estimation for Blind Image Super-Resolution
- Guided Depth Super-Resolution by Deep Anisotropic Diffusion
⭐code - Omni Aggregation Networks for Lightweight Image Super-Resolution
⭐code
- VSR
- Towards High-Quality and Efficient Video Super-Resolution via Spatial-Temporal Data Overfitting
⭐code - Compression-Aware Video Super-Resolution
⭐code - Structured Sparsity Learning for Efficient Video Super-Resolution
⭐code - Consistent Direct Time-of-Flight Video Depth Super-Resolution
⭐code - Learning Spatial-Temporal Implicit Neural Representations for Event-Guided Video Super-Resolution
- Towards High-Quality and Efficient Video Super-Resolution via Spatial-Temporal Data Overfitting
- 文本图像超分辨率
- Image Resampling(图像重采样)
23.Image Retrieval(图像检索)
- Towards a Smaller Student: Capacity Dynamic Distillation for Efficient Image Retrieval
- Asymmetric Feature Fusion for Image Retrieval
- Improving Image Recognition by Retrieving From Web-Scale Image-Text Data
- Boundary-aware Backward-Compatible Representation via Adversarial Learning in Image Retrieval
⭐code - Revisiting Self-Similarity: Structural Embedding for Image Retrieval
⭐code - Train/Test-Time Adaptation With Retrieval
- Pic2Word: Mapping Pictures to Words for Zero-Shot Composed Image Retrieval
⭐code - 基于草图的图像检索
- 视频-文本检索
- 视频-文本
- 多模态检索
- 跨模态检索
- 文本-图像匹配
- 图像文本检索
- 文本-视频检索
- 视频语言检索
22.Image Synthesis/Generation(图像合成)
- LayoutDiffusion: Controllable Diffusion Model for Layout-to-Image Generation
⭐code - Zero-shot Generative Model Adaptation via Image-specific Prompt Learning
⭐code - TopNet: Transformer-based Object Placement Network for Image Compositing
- 基于草图生成
- 图像-视频合成
- 海报生成
- 文本-图像合成
- Variational Distribution Learning for Unsupervised Text-to-Image Generation
- ReCo: Region-Controlled Text-to-Image Generation
- Specialist Diffusion: Plug-and-Play Sample-Efficient Fine-Tuning of Text-to-Image Diffusion Models To Learn Any Unseen Style
- Toward Verifiable and Reproducible Human Evaluation for Text-to-Image Generation
- DATID-3D: Diversity-Preserved Domain Adaptation Using Text-to-Image Diffusion for 3D Generative Model
- Multi-Concept Customization of Text-to-Image Diffusion
🏠project - Dream3D: Zero-Shot Text-to-3D Synthesis Using 3D Shape Prior and Text-to-Image Diffusion Models
🏠project - Uncovering the Disentanglement Capability in Text-to-Image Diffusion Models
⭐code - GLIGEN: Open-Set Grounded Text-to-Image Generation
- RIATIG: Reliable and Imperceptible Adversarial Text-to-Image Generation With Natural Prompts
⭐code - GALIP: Generative Adversarial CLIPs for Text-to-Image Synthesis
⭐code - Shifted Diffusion for Text-to-image Generation
⭐code - Conditional Text Image Generation With Diffusion Models
- Scaling Up GANs for Text-to-Image Synthesis
🏠project
- prompting
- 图像生成
- LayoutDM: Discrete Diffusion Model for Controllable Layout Generation
🏠project - Private Image Generation With Dual-Purpose Auxiliary Classifier
- Unsupervised Domain Adaption With Pixel-Level Discriminator for Image-Aware Layout Generation
- SpaText: Spatio-Textual Representation for Controllable Image Generation
🏠project - MaskSketch: Unpaired Structure-Guided Masked Image Generation
- Where Is My Spot? Few-Shot Image Generation via Latent Subspace Optimization
⭐code - Not All Image Regions Matter: Masked Vector Quantization for Autoregressive Image Generation
⭐code - Controllable Mesh Generation Through Sparse Latent Point Diffusion Models
🏠project - NoisyTwins: Class-Consistent and Diverse Image Generation Through StyleGANs
🏠project - Exploring Incompatible Knowledge Transfer in Few-shot Image Generation
- Wavelet Diffusion Models Are Fast and Scalable Image Generators
⭐code - Picture That Sketch: Photorealistic Image Generation From Abstract Sketches
🏠project - DiffCollage: Parallel Generation of Large Content with Diffusion Models
🏠project - Towards Accurate Image Coding: Improved Autoregressive Image Generation with Dynamic Vector Quantization
⭐code - LayoutDiffusion: Controllable Diffusion Model for Layout-to-image Generation
⭐code - Domain Expansion of Image Generators
🏠project
- LayoutDM: Discrete Diffusion Model for Controllable Layout Generation
- 视频生成
- Image Synthesis(图像合成)
- Learning 3D-aware Image Synthesis with Unknown Pose Distribution
🏠project - Few-Shot Semantic Image Synthesis With Class Affinity Transfer
- Fake It Till You Make It: Learning Transferable Representations From Synthetic ImageNet Clones
- 3D-Aware Conditional Image Synthesis
- SceneComposer: Any-Level Semantic Image Synthesis
🏠project - RWSC-Fusion: Region-Wise Style-Controlled Fusion Network for the Prohibited X-Ray Security Image Synthesis
- Exploring Intra-Class Variation Factors With Learnable Cluster Prompts for Semi-Supervised Image Synthesis
- Quantitative Manipulation of Custom Attributes on 3D-Aware Image Synthesis
- Inferring and Leveraging Parts From Object Shape for Improving Semantic Image Synthesis
⭐code - MAGE: MAsked Generative Encoder To Unify Representation Learning and Image Synthesis
⭐code - Person Image Synthesis via Denoising Diffusion Model
- Freestyle Layout-to-Image Synthesis
⭐code - Few-shot Semantic Image Synthesis with Class Affinity Transfer图像合成
- Regularized Vector Quantization for Tokenized Image Synthesis
- High-Fidelity Guided Image Synthesis with Latent Diffusion Models
🏠project - PixHt-Lab: Pixel Height Based Light Effect Generation for Image Compositing
🏠project
- Learning 3D-aware Image Synthesis with Unknown Pose Distribution
- 文本-运动生成
- 纹理合成
21.UAV/Remote Sensing/Satellite Image(无人机/遥感/卫星图像)
- TopDiG: Class-Agnostic Topological Directional Graph Extraction From Remote Sensing Images
- Change-Aware Sampling and Contrastive Learning for Satellite Images
- MethaneMapper: Spectral Absorption aware Hyperspectral Transformer for Methane Detection
- ViTs for SITS: Vision Transformers for Satellite Image Time Series
- Adaptive Sparse Convolutional Networks With Global Context Enhancement for Faster Object Detection on Drone Images
⭐code - 图像检测
- 跟踪
- 雷达定位
- 无人机目标检测
20.Autonomous vehicles(自动驾驶)
- 自动驾驶
- UniSim: A Neural Closed-Loop Sensor Simulator
🏠project - Planning-Oriented Autonomous Driving
- Temporal Consistent 3D LiDAR Representation Learning for Semantic Perception in Autonomous Driving
- TBP-Former: Learning Temporal Bird's-Eye-View Pyramid for Joint Perception and Prediction in Vision-Centric Autonomous Driving
⭐code - Weakly Supervised Class-Agnostic Motion Prediction for Autonomous Driving
- Learning and Aggregating Lane Graphs for Urban Automated Driving
⭐code - RangeViT: Towards Vision Transformers for 3D Semantic Segmentation in Autonomous Driving
⭐code - Azimuth Super-Resolution for FMCW Radar in Autonomous Driving
- Unsupervised 3D Point Cloud Representation Learning by Triangle Constrained Contrast for Autonomous Driving
🏠project - Localized Semantic Feature Mixers for Efficient Pedestrian Detection in Autonomous Driving
- DeepMapping2: Self-Supervised Large-Scale LiDAR Map Optimization
- Visual Exemplar Driven Task-Prompting for Unified Perception in Autonomous Driving
- ReasonNet: End-to-End Driving with Temporal and Global Reasoning
- LiDAR2Map: In Defense of LiDAR-Based Semantic Map Construction Using Online Camera Distillation
⭐code - Tri-Perspective View for Vision-Based 3D Semantic Occupancy Prediction
⭐code - Implicit Occupancy Flow Fields for Perception and Prediction in Self-Driving
- UniSim: A Neural Closed-Loop Sensor Simulator
- MSeg3D: Multi-Modal 3D Semantic Segmentation for Autonomous Driving
⭐code - 轨迹预测
- IPCC-TP: Utilizing Incremental Pearson Correlation Coefficient for Joint Multi-Agent Trajectory Prediction
- ViP3D: End-to-End Visual Trajectory Prediction via 3D Agent Queries
- Query-Centric Trajectory Prediction
- Leapfrog Diffusion Model for Stochastic Trajectory Prediction
⭐code - Uncovering the Missing Pattern: Unified Framework Towards Trajectory Imputation and Prediction
⭐code - FEND: A Future Enhanced Distribution-Aware Contrastive Learning Framework for Long-tail Trajectory Prediction
- Unsupervised Sampling Promoting for Stochastic Human Trajectory Prediction
- Stimulus Verification Is a Universal and Effective Sampler in Multi-Modal Human Trajectory Prediction
- Place Recognition
- 车道线检测
- 鸟瞰识别
19.Neural Architecture Search(神经架构搜索)
- PA&DA: Jointly Sampling PAth and DAta for Consistent NAS
⭐code - Differentiable Architecture Search With Random Features
- Adversarially Robust Neural Architecture Search for Graph Neural Networks
- MDL-NAS: A Joint Multi-Domain Learning Framework for Vision Transformer
- HOTNAS: Hierarchical Optimal Transport for Neural Architecture Search
- EMT-NAS:Transferring Architectural Knowledge Between Tasks From Different Datasets
18.Person Re-Identification(人员重识别)
- Towards Modality-Agnostic Person Re-Identification With Descriptive Query
- Event-Guided Person Re-Identification via Sparse-Dense Complementary Learning
⭐code - Patch-Wise High-Frequency Augmentation for Transformer-Based Person Re-Identification
⭐code - TranSG: Transformer-Based Skeleton Graph Prototype Contrastive Learning with Structure-Trajectory Prompted Reconstruction for Person Re-Identification
⭐code - 人员检索
- 换衣重识别
- 可见光-红外人员重识别(VIReID)
- Diverse Embedding Expansion Network and Low-Light Cross-Modality Benchmark for Visible-Infrared Person Re-identification
⭐code - Shape-Erased Feature Learning for Visible-Infrared Person Re-Identification
- PartMix: Regularization Strategy to Learn Part Discovery for Visible-Infrared Person Re-identification可见光-红外人员重识别(VI-ReID)
- Unsupervised Visible-Infrared Person Re-Identification via Progressive Graph Matching and Alternate Learning
- Diverse Embedding Expansion Network and Low-Light Cross-Modality Benchmark for Visible-Infrared Person Re-identification
- G-ReID
- 行人检测
- 人群计数
- 步态识别
- Dynamic Aggregated Network for Gait Recognition
⭐code - LidarGait: Benchmarking 3D Gait Recognition With Point Clouds
🏠project - GaitGCI: Generative Counterfactual Intervention for Gait Recognition
- OpenGait: Revisiting Gait Recognition Towards Better Practicality
- Multi-Modal Gait Recognition via Effective Spatial-Temporal Feature Fusion
- Dynamic Aggregated Network for Gait Recognition
17.Medical Image(医学影像)
- Geometric Visual Similarity Learning in 3D Medical Image Self-Supervised Pre-Training
- Interventional Bag Multi-Instance Learning on Whole-Slide Pathological Images
⭐code - Causally-Aware Intraoperative Imputation for Overall Survival Time Prediction
- Flexible-Cm GAN: Towards Precise 3D Dose Prediction in Radiotherapy
- Towards Trustable Skin Cancer Diagnosis via Rewriting Model’s Decision
- Hierarchical discriminative learning improves visual representations of biomedical microscopy
🏠project - Topology-Guided Multi-Class Cell Context Generation for Digital Pathology
- Image Quality-aware Diagnosis via Meta-knowledge Co-embedding
- METransformer: Radiology Report Generation by Transformer with Multiple Learnable Expert Tokens医学诊断
- 3D医学
- 图像配准
- 图像分类
- ask-specific Fine-tuning via Variational Information Bottleneck for Weakly-supervised Pathology Whole Slide Image Classification
⭐code - RankMix: Data Augmentation for Weakly Supervised Learning of Classifying Whole Slide Images With Diverse Sizes and Imbalanced Categories
- Grounding Counterfactual Explanation of Image Classifiers to Textual Concept Space
- PEFAT: Boosting Semi-Supervised Medical Image Classification via Pseudo-Loss Estimation and Feature Adversarial Training
⭐code - Task-Specific Fine-Tuning via Variational Information Bottleneck for Weakly-Supervised Pathology Whole Slide Image Classification
⭐code - A Loopback Network for Explainable Microvascular Invasion Classification
- ask-specific Fine-tuning via Variational Information Bottleneck for Weakly-supervised Pathology Whole Slide Image Classification
- 报告生成
- 医学影像分割
- Orthogonal Annotation Benefits Barely-supervised Medical Image Segmentation
- SDC-UDA: Volumetric Unsupervised Domain Adaptation Framework for Slice-Direction Continuous Cross-Modality Medical Image Segmentation
- Pseudo-Label Guided Contrastive Learning for Semi-Supervised Medical Image Segmentation
⭐code - Fair Federated Medical Image Segmentation via Client Contribution Estimation
- Bidirectional Copy-Paste for Semi-Supervised Medical Image Segmentation
⭐code - Devil is in the Queries: Advancing Mask Transformers for Real-world Medical Image Segmentation and Out-of-Distribution Localization
- Weakly supervised segmentation with point annotations for histopathology images via contrast-based variational model
- MCF: Mutual Correction Framework for Semi-Supervised Medical Image Segmentation
⭐code - Rethinking Few-Shot Medical Segmentation: A Vector Quantization View
- Devil Is in the Queries: Advancing Mask Transformers for Real-World Medical Image Segmentation and Out-of-Distribution Localization
- MagicNet: Semi-Supervised Multi-Organ Segmentation via Magic-Cube Partition and Recovery
⭐code - Ambiguous Medical Image Segmentation Using Diffusion Models
- Directional Connectivity-Based Segmentation of Medical Images
- 医学影像分析
- 肿瘤分割
- 医学影像报告生成
- 切片分析
- 细胞检测、跟踪与计数
- DeGPR: Deep Guided Posterior Regularization for Multi-Class Cell Detection and Counting
- Overlapped Cell on Tissue Dataset for Histopathology
- Unsupervised Contour Tracking of Live Cells by Mechanical and Cycle Consistency Losses
⭐code - Masked Autoencoder Guided Segmentation at Pixel Resolution for Accurate, Self-Supervised Subcellular Structure Recognition
⭐code
- 单目内窥镜跟踪
- 皮肤癌诊断
- MRI 重建
- 生物医学
16.Semi/self-supervised learning(半/自监督)
- 无监督学习
- 自监督
- Coreset Sampling from Open-Set for Fine-Grained Self-Supervised Learning
- StepFormer: Self-Supervised Step Discovery and Localization in Instructional Videos
- DLBD: A Self-Supervised Direct-Learned Binary Descriptor
- Canonical Fields: Self-Supervised Learning of Pose-Canonicalized Neural Fields
- Self-Supervised Learning From Images With a Joint-Embedding Predictive Architecture
- Defending Against Patch-Based Backdoor Attacks on Self-Supervised Learning
⭐code - DrapeNet: Garment Generation and Self-Supervised Draping
⭐code - Neural Congealing: Aligning Images to a Joint Semantic Atlas
🏠project - Self-Supervised AutoFlow
- Towards Realistic Long-Tailed Semi-Supervised Learning: Consistency Is All You Need
⭐code - Siamese Image Modeling for Self-Supervised Vision Representation Learning
⭐code - SCOOP: Self-Supervised Correspondence and Optimization-Based Scene Flow
⭐code - Three Guidelines You Should Know for Universally Slimmable Self-Supervised Learning
⭐code - Look, Radiate, and Learn: Self-Supervised Localisation via Radio-Visual Correspondence
- Self-Supervised Image-to-Point Distillation via Semantically Tolerant Contrastive Loss
- Evolved Part Masking for Self-Supervised Learning
- Towards Professional Level Crowd Annotation of Expert Domain Data
- ALSO: Automotive Lidar Self-Supervision by Occupancy Estimation
⭐code - Correlational Image Modeling for Self-Supervised Visual Pre-Training
- Beyond Appearance: a Semantic Controllable Self-Supervised Learning Framework for Human-Centric Visual Tasks
⭐code
👍CVPR 2023 深挖无标签数据价值!自监督学习框架SOLIDER:用于以人为中心的视觉 - Mixed Autoencoder for Self-supervised Visual Representation Learning
- Siamese DETR
⭐code - Token Boosting for Robust Self-Supervised Visual Transformer Pre-training
- Self-Supervised Representation Learning for CAD
- 半监督
- Boosting Semi-Supervised Learning by Exploiting All Unlabeled Data
⭐code - HyperMatch: Noise-Tolerant Semi-Supervised Learning via Relaxed Contrastive Constraint
- Out-of-Distributed Semantic Pruning for Robust Semi-Supervised Learning
- DualRel: Semi-Supervised Mitochondria Segmentation From a Prototype Perspective
- CHMATCH:Contrastive Hierarchical Matching and Robust Adaptive Threshold Boosted Semi-Supervised Learning
⭐code - ProtoCon: Pseudo-label Refinement via Online Clustering and Prototypical Consistency for Efficient Semi-supervised Learning
- Class Balanced Adaptive Pseudo Labeling for Federated Semi-Supervised Learning
⭐code - MarginMatch:Improving Semi-Supervised Learning with Pseudo-Margins
⭐code - Semi-Supervised Learning Made Simple With Self-Supervised Clustering
- Boosting Semi-Supervised Learning by Exploiting All Unlabeled Data
- 弱监督
15.Vision Transformers
- Transformer-Based Learned Optimization
- Teaching Matters: Investigating the Role of Supervision in Vision Transformers
- Beyond Attentive Tokens: Incorporating Token Importance and Diversity for Efficient Vision Transformers
- PaCa-ViT: Learning Patch-to-Cluster Attention in Vision Transformers
- NLOST: Non-Line-of-Sight Imaging with Transformer
- SVGformer: Representation Learning for Continuous Vector Graphics Using Transformers
- Adversarial Normalization: I Can visualize Everything (ICE)
⭐code - Hint-Aug: Drawing Hints From Foundation Vision Transformers Towards Boosted Few-Shot Parameter-Efficient Tuning
⭐code - PanoSwin: A Pano-Style Swin Transformer for Panorama Understanding
- D2Former: Jointly Learning Hierarchical Detectors and Contextual Descriptors via Agent-Based Transformers
- NAR-Former: Neural Architecture Representation Learning Towards Holistic Attributes Prediction
⭐code - DropKey for Vision Transformer
- Integrally Pre-Trained Transformer Pyramid Networks
⭐code - DSVT: Dynamic Sparse Voxel Transformer With Rotated Sets
- Slide-Transformer: Hierarchical Vision Transformer with Local Self-Attention
- Trade-Off Between Robustness and Accuracy of Vision Transformers
- A Light Touch Approach to Teaching Transformers Multi-view Geometry
- Masked Jigsaw Puzzle: A Versatile Position Embedding for Vision Transformers
⭐code - RGB no more: Minimally-decoded JPEG Vision Transformers
- Making Vision Transformers Efficient from A Token Sparsification View
⭐code - Blur Interpolation Transformer for Real-World Motion from Blur
⭐code - Neighborhood Attention Transformer
⭐code - MixMAE: Mixed and Masked Autoencoder for Efficient Pretraining of Hierarchical Vision Transformers
⭐code - Towards End-to-End Generative Modeling of Long Videos with Memory-Efficient Bidirectional Transformers
🏠project - Improving Robustness of Vision Transformers by Reducing Sensitivity To Patch Corruptions
- Latency Matters: Real-Time Action Forecasting Transformer
⭐code - OmniMAE: Single Model Masked Pretraining on Images and Videos
⭐code - MAGVIT: Masked Generative Video Transformer
🏠project - Learning Imbalanced Data with Vision Transformers
⭐code - Visual Atoms: Pre-training Vision Transformers with Sinusoidal Waves
🏠project - AMIGO: Sparse Multi-Modal Graph Transformer with Shared-Context Processing for Representation Learning of Giga-pixel Images
⭐code - Generic-to-Specific Distillation of Masked Autoencoders
⭐code - BiFormer: Vision Transformer with Bi-Level Routing Attention
⭐code - Making Vision Transformers Efficient from A Token Sparsification View
- Dual-path Adaptation from Image to Video Transformers
⭐code - Spherical Transformer for LiDAR-based 3D Recognition
⭐code - MELTR: Meta Loss Transformer for Learning to Fine-tune Video Foundation Models
⭐code - Sparsifiner: Learning Sparse Instance-Dependent Attention for Efficient Vision Transformers
- Learning Expressive Prompting With Residuals for Vision Transformers
- SparseViT: Revisiting Activation Sparsity for Efficient High-Resolution Vision Transformer
🏠project - Visual Dependency Transformers: Dependency Tree Emerges from Reversed AttentionTransformer
- Token Boosting for Robust Self-Supervised Visual Transformer Pre-trainingTransformer
- Slide-Transformer: Hierarchical Vision Transformer with Local Self-Attention
⭐code - RIFormer: Keep Your Vision Backbone Effective While Removing Token Mixer
⭐code - DropKey
👍CVPR 2023|两行代码高效缓解视觉Transformer过拟合,美图&国科大联合提出正则化方法DropKey - Joint Token Pruning and Squeezing Towards More Aggressive Compression of Vision Transformers
⭐code - EfficientViT: Memory Efficient Vision Transformer with Cascaded Group Attention
⭐code - TrojViT: Trojan Insertion in Vision Transformers
- Castling-ViT: Compressing Self-Attention via Switching Towards Linear-Angular Attention at Vision Transformer Inference
🏠project - ResFormer: Scaling ViTs with Multi-Resolution Training
⭐code - Vision Transformer With Super Token Sampling
⭐code - Vision Transformers Are Good Mask Auto-Labelers
14.Video
- PointAvatar: Deformable Point-Based Head Avatars From Videos
- Video Probabilistic Diffusion Models in Projected Latent Space
- Masked Motion Encoding for Self-Supervised Video Representation Learning
⭐code - Modular Memorability: Tiered Representations for Video Memorability Prediction
⭐code - Language-Guided Music Recommendation for Video via Prompt Analogies
- Rethinking Video ViTs: Sparse Video Tubes for Joint Image and Video Learning
- 1000 FPS HDR Video With a Spike-RGB Hybrid Camera
🏠project - Egocentric Video Task Translatio
🏠project - Relational Space-Time Query in Long-Form Videos
- Spatial-Then-Temporal Self-Supervised Learning for Video Correspondence
⭐code - Few-Shot Referring Relationships in Videos
🏠project - Aligning Step-by-Step Instructional Diagrams to Video Demonstrations
🏠project - 3D Video Loops From Asynchronous Input
🏠project - VideoMAE V2: Scaling Video Masked Autoencoders With Dual Masking
⭐code - Weakly Supervised Video Representation Learning with Unaligned Text for Sequential Videos
⭐code - StepFormer: Self-supervised Step Discovery and Localization in Instructional Videos
- VideoMAE V2: Scaling Video Masked Autoencoders with Dual Masking
- Implicit View-Time Interpolation of Stereo Videos using Multi-Plane Disparities and Non-Uniform Coordinates
🏠project
📺video - How You Feelin'? Learning Emotions and Mental States in Movie Scenes
🏠project - 视频时刻检索
- 视频高亮检测
- 视频帧插值
- Extracting Motion and Appearance via Inter-Frame Attention for Efficient Video Frame Interpolation
⭐code - AMT: All-Pairs Multi-Field Transforms for Efficient Frame Interpolation
⭐code - Exploring Motion Ambiguity and Alignment for High-Quality Video Frame Interpolation
- Exploring Discontinuity for Video Frame Interpolation
- Event-Based Video Frame Interpolation With Cross-Modal Asymmetric Bidirectional Motion Fields
⭐code - A Unified Pyramid Recurrent Network for Video Frame Interpolation
⭐code - Joint Video Multi-Frame Interpolation and Deblurring under Unknown Exposure Time
⭐code - BiFormer: Learning Bilateral Motion Estimation via Bilateral Transformer for 4K Video Frame Interpolation
⭐code视频帧插值 - Frame Interpolation Transformer and Uncertainty Guidance
- Range-Nullspace Video Frame Interpolation With Focalized Motion Estimation
- Extracting Motion and Appearance via Inter-Frame Attention for Efficient Video Frame Interpolation
- 视频合成
- 视频预测
- 视频理解
- Selective Structured State-Spaces for Long-Form Video Understanding
- How you feelin'? Learning Emotions and Mental States in Movie Scenes
⭐code - System-status-aware Adaptive Network for Online Streaming Video Understanding
- LAVENDER: Unifying Video-Language Understanding As Masked Language Modeling
⭐code - System-Status-Aware Adaptive Network for Online Streaming Video Understanding
- Therbligs in Action: Video Understanding Through Motion Primitives
- Streaming Video Model
⭐code - Procedure-Aware Pretraining for Instructional Video Understanding
⭐code - Learning Procedure-aware Video Representation from Instructional Videos and Their Narrations
⭐code - Revisiting Temporal Modeling for CLIP-based Image-to-Video Knowledge Transferring
⭐code
- 视频分类
- 视频描述
- 视频摘要
- 视频识别
- Video Deflickering(去闪烁)
- 时间句子定位(TSG)
- VAD
- Hierarchical Semantic Contrast for Scene-aware Video Anomaly Detection
- Video Event Restoration Based on Keyframes for Video Anomaly Detection
- Generating Anomalies for Video Anomaly Detection With Prompt-Based Feature Mapping
- Unbiased Multiple Instance Learning for Weakly Supervised Video Anomaly Detection
- Exploiting Completeness and Uncertainty of Pseudo Labels for Weakly Supervised Video Anomaly Detection
- Look Around for Anomalies: Weakly-Supervised Anomaly Detection via Context-Motion Relational Learning
- 视频异常定位
- 视频镜像检测
- Learning To Detect Mirrors From Videos via Dual Correspondences
🏠project - 视频表示学习
- Weakly Supervised Video Representation Learning With Unaligned Text for Sequential Videos
⭐code - Learning Procedure-Aware Video Representation From Instructional Videos and Their Narrations
⭐code - Masked Video Distillation: Rethinking Masked Feature Modeling for Self-supervised Video Representation Learning
⭐code - Modeling Video As Stochastic Processes for Fine-Grained Video Representation Learning
⭐code
- Weakly Supervised Video Representation Learning With Unaligned Text for Sequential Videos
- Learning To Detect Mirrors From Videos via Dual Correspondences
- Video Paragraph Grounding
- Video Grounding
- Text-Visual Prompting for Efficient 2D Temporal Video Grounding
- WINNER: Weakly-Supervised hIerarchical decompositioN and aligNment for Spatio-tEmporal Video gRounding
- Iterative Proposal Refinement for Weakly-Supervised Video Grounding
- Collaborative Static and Dynamic Vision-Language Streams for Spatio-Temporal Video Grounding
- ProTeGe: Untrimmed Pretraining for Video Temporal Grounding by Video Temporal Grounding
- 视频阴影检测
- 视频关键点检测
- 视频情感检测
- 场景检测
13.GAN
- AdaptiveMix: Improving GAN Training via Feature Space Shrinkage
- Masked Auto-Encoders Meet Generative Adversarial Networks and Beyond
- Spider GAN: Leveraging Friendly Neighbors To Accelerate GAN Training
- Learning on Gradients: Generalized Artifacts Representation for GAN-Generated Images Detection
⭐code - MoStGAN-V: Video Generation With Temporal Motion Styles
⭐code - Sequential Training of GANs Against GAN-Classifiers Reveals Correlated "Knowledge Gaps" Present Among Independently Trained GAN Instances
- Re-GAN: Data-Efficient GANs Training via Architectural Reconfiguration
⭐code - HumanGen: Generating Human Radiance Fields With Explicit Priors
- Bi-Directional Feature Fusion Generative Adversarial Network for Ultra-High Resolution Pathological Image Virtual Re-Staining
- GlassesGAN: Eyewear Personalization Using Synthetic Appearance Discovery and Targeted Subspace Modeling
- Delving StyleGAN Inversion for Image Editing: A Foundation Latent Space Viewpoint
🏠project - 3DAvatarGAN: Bridging Domains for Personalized Editable Avatars
🏠project - GLeaD: Improving GANs With a Generator-Leading Task
🏠project - Transforming the Residuals for Real Image Editing With StyleGAN
⭐code - Improving GAN Training via Feature Space Shrinkage
⭐code - Spider GAN: Leveraging Friendly Neighbors to Accelerate GAN Training
- NoisyTwins: Class-Consistent and Diverse Image Generation through StyleGANs
⭐code - Graph Transformer GANs for Graph-Constrained House Generation
- Cross-GAN Auditing: Unsupervised Identification of Attribute Level Similarities and Differences between Pretrained Generative Models
⭐code - Efficient Scale-Invariant Generator with Column-Row Entangled Pixel Synthesis
⭐code - VIVE3D: Viewpoint-Independent Video Editing using 3D-Aware GANs
⭐code - Discriminator-Cooperated Feature Map Distillation for GAN Compression
⭐code - Lift3D: Synthesize 3D Training Data by Lifting 2D GAN to 3D Generative Radiance Field
⭐code - 图像-文本合成
- 扩散模型
- How to Backdoor Diffusion Models?
⭐code - Diffusion Probabilistic Model Made Slim
- VectorFusion: Text-to-SVG by Abstracting Pixel-Based Diffusion Models
- Diffusion Art or Digital Forgery? Investigating Data Replication in Diffusion Models
- Seeing Beyond the Brain: Conditional Diffusion Model With Sparse Masked Modeling for Vision Decoding
- Self-Guided Diffusion Models
- ObjectStitch: Object Compositing With Diffusion Model
- Solving 3D Inverse Problems Using Pre-Trained 2D Diffusion Models
- Parallel Diffusion Models of Operator and Image for Blind Inverse Problems
- RGBD2: Generative Scene Synthesis via Incremental View Inpainting using RGBD Diffusion Models
🏠project - Dimensionality-Varying Diffusion Process
- TrojDiff: Trojan Attacks on Diffusion Models With Diverse Targets
⭐code - Towards Practical Plug-and-Play Diffusion Models
⭐code - All Are Worth Words: A ViT Backbone for Diffusion Models
- Unite and Conquer: Plug & Play Multi-Modal Synthesis using Diffusion Models
🏠project - Binary Latent Diffusion
- Safe Latent Diffusion: Mitigating Inappropriate Degeneration in Diffusion Models
- Lookahead Diffusion Probabilistic Models for Refining Mean Estimation
- EDICT: Exact Diffusion Inversion via Coupled Transformations
⭐code - ERNIE-ViLG 2.0: Improving Text-to-Image Diffusion Model With Knowledge-Enhanced Mixture-of-Denoising-Experts
- How to Backdoor Diffusion Models?
- GAN 逆映射
12.Image-to-Image Translation(图像到图像翻译)
- 3D-Aware Multi-Class Image-to-Image Translation With NeRFs
- Plug-and-Play Diffusion Features for Text-Driven Image-to-Image Translation
- DSI2I: Dense Style for Unpaired Image-to-Image Translation
- Fix the Noise: Disentangling Source Feature for Controllable Domain Translation
⭐code - 3D-Aware Multi-Class Image-to-Image Translation with NeRFs
- LANIT: Language-Driven Image-to-Image Translation for Unlabeled Data
🏠project - Unpaired Image-to-Image Translation With Shortest Path Regularization
⭐code - BBDM: Image-to-Image Translation With Brownian Bridge Diffusion Models
- 图像翻译
- 视频翻译
11.Face(人脸)
- Rethinking Feature-Based Knowledge Distillation for Face Recognition
- Logical Consistency and Greater Descriptive Power for Facial Hair Attribute Learning
- Learning a 3D Morphable Face Reflectance Model From Low-Cost Data
- CLIP2Protect: Protecting Facial Privacy Using Text-Guided Makeup via Adversarial Latent Search
- Learning Detailed Radiance Manifolds for High-Fidelity and 3D-Consistent Portrait Synthesis From Monocular Image
🏠project - Learning Neural Proto-Face Field for Disentangled 3D Face Modeling in the Wild
- Evading Forensic Classifiers With Attribute-Conditioned Adversarial Faces
⭐code - Improving Fairness in Facial Albedo Estimation via Visual-Textual Cues
- Attribute-Preserving Face Dataset Anonymization via Latent Code Optimization
⭐code - Pose-Disentangled Contrastive Learning for Self-Supervised Facial Representation
⭐code - Privacy-Preserving Adversarial Facial Features
- BioNet: A Biologically-Inspired Network for Face Recognition
⭐code - High-Res Facial Appearance Capture From Polarized Smartphone Images
- MARLIN: Masked Autoencoder for Facial Video Representation LearnINg
⭐code - Sibling-Attack: Rethinking Transferable Adversarial Attacks against Face Recognition
🏠project - Disentanglement of Pose and Expression for General Video Portrait Editing
- BlendFields: Few-Shot Example-Driven Facial Modeling
⭐code - Towards Effective Adversarial Textured 3D Meshes on Physical Face Recognition
👍CVPR 2023 | 人脸识别路漫漫:清华、北大等提出AT3D人脸识别系统攻击方法 - Collaborative Diffusion for Multi-Modal Face Generation and Editing
⭐code
⭐code
👍CVPR 2023 | Collaborative Diffusion 怎样让不同的扩散模型合作? - Recognizability Embedding Enhancement for Very Low-Resolution Face Recognition and Quality Estimation
- DiffusionRig: Learning Personalized Priors for Facial Appearance Editing
⭐code - Probabilistic Knowledge Distillation of Face Ensembles
- DCFace: Synthetic Face Generation with Dual Condition Diffusion Model
⭐code - Discrete Point-Wise Attack Is Not Enough: Generalized Manifold Adversarial Attack for Face Recognition
- 3D 人脸
- Graphics Capsule: Learning Hierarchical 3D Face Representations from 2D Images
- Physical-World Optical Adversarial Attacks on 3D Face Recognition
- Learning a 3D Morphable Face Reflectance Model from Low-cost Data
🏠project - NeuFace: Realistic 3D Neural Face Rendering from Multi-view Images
⭐code - FaceLit: Neural 3D Relightable Faces
- Learning Neural Proto-face Field for Disentangled 3D Face Modeling In the Wild
- High-Fidelity 3D Face Generation From Natural Language Descriptions
⭐code - CodeTalker: Speech-Driven 3D Facial Animation with Discrete Motion Prior
⭐code - PanoHead: Geometry-Aware 3D Full-Head Synthesis in 360deg
- 人脸重建
- A Hierarchical Representation Network for Accurate and Detailed Face Reconstruction from In-The-Wild Images
🏠project - Graphics Capsule: Learning Hierarchical 3D Face Representations From 2D Images
- FFHQ-UV: Normalized Facial UV-Texture Dataset for 3D Face Reconstruction
⭐code - Robust Model-Based Face Reconstruction Through Weakly-Supervised Outlier Segmentation
- AVFace: Towards Detailed Audio-Visual 4D Face Reconstruction
🏠project
- A Hierarchical Representation Network for Accurate and Detailed Face Reconstruction from In-The-Wild Images
- 人脸恢复
- 人脸对齐
- 人脸匿名化
- 人脸超分辨率
- 裸眼年龄识别
- 情绪识别
- 人像照明
- 人脸活体检测
- 说话头
- OTAvatar: One-shot Talking Face Avatar with Controllable Tri-plane Rendering
⭐code - High-Fidelity Generalized Emotional Talking Face Generation With Multi-Modal Emotion Space Learning
- Progressive Disentangled Representation Learning for Fine-Grained Controllable Talking Head Synthesis
- LipFormer: High-Fidelity and Generalizable Talking Face Generation With a Pre-Learned Facial Codebook
- SadTalker: Learning Realistic 3D Motion Coefficients for Stylized Audio-Driven Single Image Talking Face Animation
⭐code - Implicit Neural Head Synthesis via Controllable Local Deformation Fields
- Identity-Preserving Talking Face Generation with Landmark and Appearance Priors
⭐code - Seeing What You Said: Talking Face Generation Guided by a Lip Reading Expert
- High-Fidelity and Freely Controllable Talking Head Video Generation
🏠project - High-fidelity Generalized Emotional Talking Face Generation with Multi-modal Emotion Space Learning
- GANHead: Towards Generative Animatable Neural Head Avatars
⭐code - One-Shot High-Fidelity Talking-Head Synthesis with Deformable Neural Radiance Field
🏠project - MetaPortrait: Identity-Preserving Talking Head Generation with Fast Personalized Adaptation
🏠project
- OTAvatar: One-shot Talking Face Avatar with Controllable Tri-plane Rendering
- 人脸分割
- 眨眼检测
- 三维头像生成
- Learning Personalized High Quality Volumetric Head Avatars from Monocular RGB Videos
⭐code - Instant Volumetric Head Avatars
🏠project - Next3D: Generative Neural Texture Rasterization for 3D-Aware Head Avatars
🏠project - OmniAvatar: Geometry-Guided Controllable 3D Head Synthesis
- PanoHead: Geometry-Aware 3D Full-Head Synthesis in 360◦
- Learning Personalized High Quality Volumetric Head Avatars from Monocular RGB Videos
- 人脸表情识别
- 微表情识别
- 人脸合成
- 假脸检测
- Facial Action Unit Detection
- 人脸视频编辑
- 人脸质量评估
- 人脸交换
- 3D-Aware Face Swapping
⭐code - Implicit Identity Driven Deepfake Face Swapping Detection
- StyleIPSB: Identity-Preserving Semantic Basis of StyleGAN for High Fidelity Face Swapping
⭐code - Fine-Grained Face Swapping via Regional GAN Inversion
🏠project - DiffSwap: High-Fidelity and Controllable Face Swapping via 3D-Aware Masked Diffusion
- 3D-Aware Face Swapping
- 人脸聚类
- 人脸修饰
- 三维数字头像
- 音频驱动的人脸重演
- 隐私保护
- 人脸关键点检测
- 头部捕获
- 年龄估计
10.3D(三维重建\三维视觉)
- Structured 3D Features for Reconstructing Controllable Avatars
🏠project - In-Hand 3D Object Scanning from an RGB Sequence
- Learning Geometric-Aware Properties in 2D Representation Using Lightweight CAD Models, or Zero Real 3D Pairs
⭐code - 3D Concept Learning and Reasoning from Multi-View Images
🏠project - LP-DIF: Learning Local Pattern-Specific Deep Implicit Function for 3D Objects and Scenes
🏠project - DynamicStereo: Consistent Dynamic Depth From Stereo Videos
🏠project - ARO-Net: Learning Implicit Fields from Anchored Radial Observations
- G-MSM:Unsupervised Multi-Shape Matching With Graph-Based Affinity Priors
⭐code - Magic3D: High-Resolution Text-to-3D Content Creation
🏠project - PointListNet: Deep Learning on 3D Point Lists
- Omnimatte3D: Associating Objects and Their Effects in Unconstrained Monocular Video
- HexPlane: A Fast Representation for Dynamic Scenes
🏠project - Energy-Efficient Adaptive 3D Sensing
🏠project - Objaverse: A Universe of Annotated 3D Objects
🏠project - Level-S2fM: Structure from Motion on Neural Level Set of Implicit Surfaces
🏠project - 3D Highlighter: Localizing Regions on 3D Shapes via Text Descriptions
⭐code - OmniObject3D: Large-Vocabulary 3D Object Dataset for Realistic Perception, Reconstruction and Generation
🏠project
👍CVPR 2023 Award Candidate | 真实高精三维物体数据集OmniObject3D - Neural Scene Chronology
🏠project - 3D Neural Field Generation Using Triplane Diffusion
🏠project - Learning Adaptive Dense Event Stereo From the Image Domain
- GANmouflage: 3D Object Nondetection With Texture Fields
🏠project - Learning Accurate 3D Shape Based on Stereo Polarimetric Imaging
- Sphere-Guided Training of Neural Implicit Surfaces
🏠project - PartNeRF: Generating Part-Aware Editable 3D Shapes without 3D Supervision
🏠project - Masked Scene Contrast: A Scalable Framework for Unsupervised 3D Representation Learning
⭐code - Self-Supervised Learning for Multimodal Non-Rigid 3D Shape Matching
⭐code - SINE: Semantic-driven Image-based NeRF Editing with Prior-guided Editing Field
⭐code - 3DQD: Generalized Deep 3D Shape Prior via Part-Discretized Diffusion Process
⭐code - DynamicStereo: Consistent Dynamic Depth from Stereo Videos
🏠project - 3D Concept Learning and Reasoning from Multi-View Images
🏠project - PanoHead: Geometry-Aware 3D Full-Head Synthesis in 360$^{\circ}$
⭐code - Persistent Nature: A Generative Model of Unbounded 3D Worlds
🏠project - TAPS3D: Text-Guided 3D Textured Shape Generation from Pseudo Supervision
- Transforming Radiance Field with Lipschitz Network for Photorealistic 3D Scene Stylization
- Robust Outlier Rejection for 3D Registration With Variational Bayes
⭐code - On the Importance of Accurate Geometry Data for Dense 3D Vision Tasks
⭐code - SUDS: Scalable Urban Dynamic Scenes
🏠project - Understanding and Improving Features Learned in Deep Functional Maps
⭐code - TMO: Textured Mesh Acquisition of Objects with a Mobile Device by using Differentiable Rendering
⭐code - Generalizable Local Feature Pre-training for Deformable Shape Analysis
⭐code - CARTO: Category and Joint Agnostic Reconstruction of ARTiculated Objects
🏠project - CCuantuMM: Cycle-Consistent Quantum-Hybrid Matching of Multiple Shapes
🏠project - HOLODIFFUSION: Training a 3D Diffusion Model using 2D Images
⭐code - Multi-View Azimuth Stereo via Tangent Space Consistency
⭐code - 3D Line Mapping Revisited
⭐code - NeRF-Supervised Deep Stereo
⭐code
⭐code - Robust Outlier Rejection for 3D Registration with Variational Bayes三维
- Incremental 3D Semantic Scene Graph Prediction from RGB Sequences
- Stereo Matching
- Iterative Geometry Encoding Volume for Stereo Matching
⭐code - Masked representation learning for domain generalized stereo matching
- Learning the Distribution of Errors in Stereo Matching for Joint Disparity and Uncertainty Estimation
- Domain Generalized Stereo Matching via Hierarchical Visual Transformation
- Unsupervised Deep Asymmetric Stereo Matching With Spatially-Adaptive Self-Similarity
- High-frequency Stereo Matching Network
⭐code
- Iterative Geometry Encoding Volume for Stereo Matching
- 三维视觉
- 三维重建
- Neural Lens Modeling
⭐code - Self-Supervised Super-Plane for Neural 3D Reconstruction
⭐code - Multi-Sensor Large-Scale Dataset for Multi-View 3D Reconstruction
- ALTO: Alternating Latent Topologies for Implicit 3D Reconstruction
- Towards Unbiased Volume Rendering of Neural Implicit Surfaces With Geometry Priors
- Multiview Compressive Coding for 3D Reconstruction
🏠project - Multi-View Reconstruction Using Signed Ray Distance Functions (SRDF)
- PC2: Projection-Conditioned Point Cloud Diffusion for Single-Image 3D Reconstruction
🏠project - RealFusion: 360deg Reconstruction of Any Object From a Single Image
🏠project - Deep Polarization Reconstruction With PDAVIS Events
⭐code - RenderDiffusion: Image Diffusion for 3D Reconstruction, Inpainting and Generation
⭐code - Distilling Neural Fields for Real-Time Articulated Shape Reconstruction
🏠project - High-Fidelity Clothed Avatar Reconstruction from a Single Image
- Efficient Second-Order Plane Adjustment
- SDFusion: Multimodal 3D Shape Completion, Reconstruction, and Generation
🏠project - Reconstructing Animatable Categories From Videos
🏠project - OReX: Object Reconstruction From Planar Cross-Sections Using Neural Fields
- Learning Articulated Shape with Keypoint Pseudo-labels from Web Images
⭐code - SparseFusion: Distilling View-Conditioned Diffusion for 3D Reconstruction
🏠project - 3D Shape Reconstruction of Semi-Transparent Worms
- Power Bundle Adjustment for Large-Scale 3D Reconstruction
- PET-NeuS: Positional Encoding Tri-Planes for Neural Surfaces
⭐code - AutoRecon: Automated 3D Object Discovery and Reconstruction
⭐code - 3D Registration with Maximal Cliques
- 3D shape reconstruction of semi-transparent worms
- VisFusion: Visibility-aware Online 3D Scene Reconstruction from Videos
⭐code - NeUDF: Leaning Neural Unsigned Distance Fields with Volume Rendering
🏠project - ShapeClipper: Scalable 3D Shape Learning from Single-View Images via Geometric and CLIP-based Consistency
🏠project - BundleSDF: Neural 6-DoF Tracking and 3D Reconstruction of Unknown Objects
⭐code - PAniC-3D: Stylized Single-view 3D Reconstruction from Portraits of Anime Characters
⭐code - Unsupervised 3D Shape Reconstruction by Part Retrieval and Assembly
- MobileBrick: Building LEGO for 3D Reconstruction on Mobile Devices
🏠project - Seeing Through the Glass: Neural 3D Reconstruction of Object Inside a Transparent Container
⭐code - SCADE: NeRFs from Space Carving with Ambiguity-Aware Depth Estimates
⭐code - MACARONS: Mapping And Coverage Anticipation with RGB Online Self-Supervision
🏠project - Scalable, Detailed and Mask-Free Universal Photometric Stereo
⭐code - Structural Multiplane Image: Bridging Neural View Synthesis and 3D Reconstruction
- NEF: Neural Edge Fields for 3D Parametric Curve Reconstruction from Multi-view Images
🏠project - Behind the Scenes: Density Fields for Single View Reconstruction
🏠project - VolRecon: Volume Rendering of Signed Ray Distance Functions for Generalizable Multi-View Reconstruction
- Surface Reconstruction(曲面重建)
- NeuDA: Neural Deformable Anchor for High-Fidelity Implicit Surface Reconstruction
- Octree Guided Unoriented Surface Reconstruction
- Neuralangelo: High-Fidelity Neural Surface Reconstruction
🏠project - Neural Kernel Surface Reconstruction
- Looking Through the Glass: Neural Surface Reconstruction Against High Specular Reflections
⭐code
- Neural Lens Modeling
- 深度估计
- Fully Self-Supervised Depth Estimation from Defocus Clue
⭐code - Gated Stereo: Joint Depth Estimation From Gated and Wide-Baseline Active Stereo Cues
🏠project - OmniVidar: Omnidirectional Depth Estimation From Multi-Fisheye Images
- Learning To Fuse Monocular and Multi-View Cues for Multi-Frame Depth E
⭐code - SfM-TTR: Using Structure From Motion for Test-Time Refinement of Single-View Depth Networks
⭐code - Shakes on a Plane: Unsupervised Depth Estimation From Unstabilized Photography
🏠project - Depth Estimation From Camera Image and mmWave Radar Point Cloud
⭐code - Deep Depth Estimation From Thermal Image
⭐code - LightedDepth: Video Depth Estimation in Light of Limited Inference View Angles
⭐code - Trap Attention: Monocular Depth Estimation With Manual Traps
⭐code - PlaneDepth: Self-supervised Depth Estimation via Orthogonal Planes
⭐code - Depth Estimation From Indoor Panoramas With Neural Scene Representation
⭐code - Polarimetric iToF:Measuring High-Fidelity Depth Through Scattering Media
- SCADE: NeRFs from Space Carving With Ambiguity-Aware Depth Estimates
⭐code - iDisc: Internal Discretization for Monocular Depth Estimation
🏠project - HRDFuse: Monocular 360°Depth Estimation by Collaboratively Learning Holistic-with-Regional Depth Distributions
🏠project - Learning to Fuse Monocular and Multi-view Cues for Multi-frame Depth Estimation in Dynamic Scenes
⭐code - Temporally Consistent Online Depth Estimation Using Point-Based Fusion
🏠project - DualRefine: Self-Supervised Depth and Pose Estimation Through Iterative Epipolar Sampling and Refinement Toward Equilibrium
⭐code
⭐code - Lite-Mono: A Lightweight CNN and Transformer Architecture for Self-Supervised Monocular Depth Estimation
⭐code
👍CVPR2023 | 轻量高效的自监督深度估计框架Lite-Mono
- Fully Self-Supervised Depth Estimation from Defocus Clue
- 深度补全
- 室内场景重建
- 场景重建
- 3D场景生成
- MVS
- 三维形状分类
- 三维图像
- 三维形状
- 三维形状生成 *Diffusion-Based Signed Distance Fields for 3D Shape Generation
- 三维形状重建
- 3D动画
- 室内布局
- 视频重建
9.Human Pose Estimation(人体姿态估计)
- 手势
- A2J-Transformer: Anchor-to-Joint Transformer Network for 3D Interacting Hand Pose Estimation from a Single RGB Image
⭐code3D交互手势姿势估计 - Neural Voting Field for Camera-Space 3D Hand Pose Estimation
- AssemblyHands: Towards Egocentric Activity Understanding via 3D Hand Pose Estimation
⭐code - Hierarchical Temporal Transformer for 3D Hand Pose Estimation and Action Recognition from Egocentric RGB Videos
🏠project - Cross-Domain 3D Hand Pose Estimation with Dual Modalities
- 音频驱动的联合语音手势生成
- 手势合成
- 手部重建
- ACR: Attention Collaboration-based Regressor for Arbitrary Two-Hand Reconstruction
⭐code - High Fidelity 3D Hand Shape Reconstruction via Scalable Graph Frequency Decomposition
⭐code - ACR: Attention Collaboration-Based Regressor for Arbitrary Two-Hand Reconstruction
⭐code - HARP: Personalized Hand Reconstruction From a Monocular RGB Video
🏠project - Overcoming the Trade-off Between Accuracy and Plausibility in 3D Hand Shape Reconstruction
- A Probabilistic Attention Model with Occlusion-aware Texture Regression for 3D Hand Reconstruction from a Single RGB Image
- gSDF: Geometry-Driven Signed Distance Functions for 3D Hand-Object Reconstruction
⭐code - MeMaHand: Exploiting Mesh-Mano Interaction for Single Image Two-Hand Reconstruction
- POEM: Reconstructing Hand in a Point Embedded Multi-view Stereo
⭐code - Handy: Towards a high fidelity 3D hand shape and appearance model
⭐code
- ACR: Attention Collaboration-based Regressor for Arbitrary Two-Hand Reconstruction
- 3D手部恢复
- Bringing Inputs to Shared Domains for 3D Interacting Hands Recovery in the Wild
⭐code - Recovering 3D Hand Mesh Sequence from a Single Blurry Image: A New Dataset and Temporal Unfolding
⭐code - Semi-Supervised Hand Appearance Recovery via Structure Disentanglement and Dual Adversarial Discrimination
🏠project - H2ONet: Hand-Occlusion-and-Orientation-Aware Network for Real-Time 3D Hand Mesh Reconstruction
- Bringing Inputs to Shared Domains for 3D Interacting Hands Recovery in the Wild
- 手物姿态估计
- 3D手势预测
- A2J-Transformer: Anchor-to-Joint Transformer Network for 3D Interacting Hand Pose Estimation from a Single RGB Image
- 人体
- HPE
- DistilPose: Tokenized Pose Regression with Heatmap Distillation
- Towards Stable Human Pose Estimation via Cross-View Fusion and Foot Stabilization
- Human Pose As Compositional Tokens
⭐code - Semi-Supervised 2D Human Pose Estimation Driven by Position Inconsistency Pseudo Label Correction Module
⭐code - TokenHPE: Learning Orientation Tokens for Efficient Head Pose Estimation via Transformers
⭐code - A Characteristic Function-Based Method for Bottom-Up Human Pose Estimation
- Analyzing and Diagnosing Pose Estimation With Attributions
🏠project - PoseExaminer: Automated Testing of Out-of-Distribution Robustness in Human Pose and Shape Estimation
⭐code - Human Pose as Compositional Tokens
⭐code - Unified Pose Sequence Modeling
- Mutual Information-Based Temporal Difference Learning for Human Pose Estimation in Video
- Self-Correctable and Adaptable Inference for Generalizable Human Pose Estimation
- HuManiFlow: Ancestor-Conditioned Normalising Flows on SO(3) Manifolds for Human Pose and Shape Distribution Estimation
⭐code - Human Pose Estimation in Extremely Low-Light Conditions
- 3D HPE
- PSVT: End-to-End Multi-person 3D Pose and Shape Estimation with Progressive Video Transformers
- PLIKS: A Pseudo-Linear Inverse Kinematic Solver for 3D Human Body Estimation
- NIKI: Neural Inverse Kinematics With Invertible Neural Networks for 3D Human Pose and Shape Estimation
⭐code - DiffPose: Toward More Reliable 3D Pose Estimation
🏠project - Scene-Aware Egocentric 3D Human Pose Estimation
- Self-Supervised 3D Keypoint Discovery From Multi-View Videos
🏠project - Global-to-Local Modeling for Video-Based 3D Human Pose and Shape Estimation
⭐code - 3D Human Pose Estimation With Spatio-Temporal Criss-Cross Attention
- Ego-Body Pose Estimation via Ego-Head Pose Estimation
获奖论文候选 - Listening Human Behavior: 3D Human Pose Estimation With Acoustic Signals
- NIKI: Neural Inverse Kinematics with Invertible Neural Networks for 3D Human Pose and Shape Estimation
⭐code - GFPose: Learning 3D Human Pose Prior With Gradient Fields
🏠project - PoseFormerV2: Exploring Frequency Domain for Efficient and Robust 3D Human Pose Estimation
⭐code
⭐code - 3D Human Pose Estimation via Intuitive Physics
🏠project - 3D 人体关键点估计
- 4D HPE
- 网格恢复
- POTTER: Pooling Attention Transformer for Efficient Human Mesh Recovery
⭐code - Deformable Mesh Transformer for 3D Human Mesh Recovery
⭐code - One-Stage 3D Whole-Body Mesh Recovery with Component Aware Transformer
🏠project - Learning Human Mesh Recovery in 3D Scenes
⭐code - One-Stage 3D Whole-Body Mesh Recovery with Component Aware Transformer
⭐code
👍CVPR2023 IDEA与清华提出首个一阶段3D全身人体网格重建算法OSX - Learning Analytical Posterior Probability for Human Mesh Recovery
⭐code - Implicit 3D Human Mesh Recovery Using Consistency With Pose and Shape From Unseen-View
- POTTER: Pooling Attention Transformer for Efficient Human Mesh Recovery
- 三维人体网格估计
- 三维人体网格重建
- 3D人体重建
- High-fidelity 3D Human Digitization from Single 2K Resolution Images
⭐code - Crowd3D: Towards Hundreds of People Reconstruction From a Single Image
- PersonNeRF: Personalized Reconstruction From Photo Collections
🏠project - NeMo: 3D Neural Motion Fields from Multiple Video Instances of the Same Action
🏠project - FeatER: An Efficient Network for Human Reconstruction via Feature Map-Based TransformER
🏠project - CloSET: Modeling Clothed Humans on Continuous Surface With Explicit Template Decomposition
⭐code - Learning Visibility Field for Detailed 3D Human Reconstruction and Relighting
- FeatER: An Efficient Network for Human Reconstruction via Feature Map-Based TransformER
🏠project - Learning Semantic-Aware Disentangled Representation for Flexible 3D Human Body Editing
🏠project - Complete 3D Human Reconstruction From a Single Incomplete Image
- High-Fidelity 3D Human Digitization From Single 2K Resolution Images
⭐code - BAAM: Monocular 3D Pose and Shape Reconstruction With Bi-Contextual Attention Module and Attention-Guided Modeling
⭐code - Humans As Light Bulbs: 3D Human Reconstruction From Thermal Reflection
- Clothed Human Reconstruction(穿衣人体重建)
- High-fidelity 3D Human Digitization from Single 2K Resolution Images
- 人体形状补全
- HPE
- 多人姿态预测
- 人体解析
- 姿势迁移
- Avatar
8.Action Detection(人体动作检测与识别)
- Video Test-Time Adaptation for Action Recognition
- A Large-Scale Robustness Analysis of Video Action Recognition Models
- How Can Objects Help Action Recognitio
- MMG-Ego4D: Multimodal Generalization in Egocentric Action Recognition
⭐code - Dual-Path Adaptation From Image to Video Transformers
⭐code - Hybrid Active Learning via Deep Clustering for Video Action Detection
🏠project - Prompt-Guided Zero-Shot Anomaly Action Recognition using Pretrained Deep Skeleton Features
- Learning Action Changes by Measuring Verb-Adverb Textual Relationships
⭐code - STMixer: A One-Stage Sparse Action Detector
- AutoLabel: CLIP-based framework for Open-set Video Domain Adaptation
- Search-Map-Search: A Frame Selection Paradigm for Action Recognition
- On the Benefits of 3D Pose and Tracking for Human Action Recognition
⭐code - MMG-Ego4D: Multi-Modal Generalization in Egocentric Action Recognition
⭐code - SVFormer: Semi-Supervised Video Transformer for Action Recognition
- 基于骨架的动作识别
- Learning Discriminative Representations for Skeleton Based Action Recognition
- Actionlet-Dependent Contrastive Learning for Unsupervised Skeleton-Based Action Recognition
🏠project - 3Mformer: Multi-order Multi-mode Transformer for Skeletal Action Recognition
- HaLP: Hallucinating Latent Positives for Skeleton-based Self-Supervised Learning of Actions
⭐code - Neural Koopman Pooling: Control-Inspired Temporal Dynamics Encoding for Skeleton-Based Action Recognition
- 基于关键点的动作识别
- 时序动作识别
- TriDet: Temporal Action Detection with Relative Boundary Modeling
⭐code - Proposal-Based Multiple Instance Learning for Weakly-Supervised Temporal Action Localization
⭐code - Post-Processing Temporal Action Detection
⭐code - Decomposed Cross-modal Distillation for RGB-based Temporal Action Detection
- PivoTAL: Prior-Driven Supervision for Weakly-Supervised Temporal Action Localization
- TriDet: Temporal Action Detection with Relative Boundary Modeling
- 开集动作识别
- 基于MoCap的动作识别
- 小样本动作识别
- 半监督动作识别
- 时序动作定位
- Boosting Weakly-Supervised Temporal Action Localization with Text Information
⭐code - Soft-Landing Strategy for Alleviating the Task Discrepancy Problem in Temporal Action Localization Tasks
- Two-Stream Networks for Weakly-Supervised Temporal Action Localization With Semantic-Aware Mechanisms
- Cascade Evidential Learning for Open-World Weakly-Supervised Temporal Action Localization
- Improving Weakly Supervised Temporal Action Localization by Bridging Train-Test Gap in Pseudo Labels
- Distilling Vision-Language Pre-Training To Collaborate With Weakly-Supervised Temporal Action Localization
- AdamsFormer for Spatial Action Localization in the Future
- Re2TAL: Rewiring Pretrained Video Backbones for Reversible Temporal Action Localization
- Boosting Weakly-Supervised Temporal Action Localization with Text Information
- 群组动作质量评估
- 群体动作识别
7.Point Cloud(点云)
- FlatFormer: Flattened Window Attention for Efficient Point Cloud Transformer
- Grad-PU: Arbitrary-Scale Point Cloud Upsampling via Gradient Descent With Learned Distance Functions
- Complete-to-Partial 4D Distillation for Self-Supervised Point Cloud Sequence Representation Learning
- Unsupervised Inference of Signed Distance Functions From Single Sparse Point Clouds Without Learning Priors
- PointVector: A Vector Representation in Point Cloud Analysis
- CLIP2: Contrastive Language-Image-Point Pretraining From Real-World Point Cloud Data
- PointClustering: Unsupervised Point Cloud Pre-Training Using Transformation Invariance in Clustering
- Adversarially Masking Synthetic To Mimic Real: Adaptive Noise Injection for Point Cloud Segmentation Adaptation
- Parts2Words: Learning Joint Embedding of Point Clouds and Texts by Bidirectional Matching Between Parts and Words
- Attention-Based Point Cloud Edge Sampling
- Meta Architecture for Point Cloud Analysis
- Building Rearticulable Models for Arbitrary 3D Objects From 4D Point Clouds
🏠project - Implicit Surface Contrastive Clustering for LiDAR Point Clouds
- Poly-PC: A Polyhedral Network for Multiple Point Cloud Tasks at Once
- TriVol: Point Cloud Rendering via Triple Volumes
- PartManip: Learning Cross-Category Generalizable Part Manipulation Policy from Point Cloud Observations
- PointCMP: Contrastive Mask Prediction for Self-supervised Learning on Point Cloud Videos
- GrowSP: Unsupervised Semantic Segmentation of 3D Point Clouds
- Point Cloud Forecasting as a Proxy for 4D Occupancy Forecasting
⭐code - ULIP: Learning a Unified Representation of Language, Images, and Point Clouds for 3D Understanding
- SE-ORNet: Self-Ensembling Orientation-Aware Network fhttpsor Unsupervised Point Cloud Shape Correspondence
- GeoMAE: Masked Geometric Target Prediction for Self-Supervised Point Cloud Pre-Training
- Neural Intrinsic Embedding for Non-rigid Point Cloud Matching
- 3D Spatial Multimodal Knowledge Accumulation for Scene Graph Prediction in Point Cloud
⭐code - SHS-Net: Learning Signed Hyper Surfaces for Oriented Normal Estimation of Point Clouds
- GeoMAE: Masked Geometric Target Prediction for Self-supervised Point Cloud Pre-Training
- SCPNet: Semantic Scene Completion on Point Cloud
- NeuralEditor: Editing Neural Radiance Fields via Manipulating Point Clouds
⭐code - Rotation-Invariant Transformer for Point Cloud Matching
- Recognizing Rigid Patterns of Unlabeled Point Clouds by Complete and Continuous Isometry Invariants with no False Negatives and no False Positives
🏠project - PointCMP: Contrastive Mask Prediction for Self-supervised Learning on Point Cloud Videos
- VL-SAT: Visual-Linguistic Semantics Assisted Training for 3D Semantic Scene Graph Prediction in Point Cloud
⭐code - Unsupervised Inference of Signed Distance Functions from Single Sparse Point Clouds without Learning Priors
⭐code - Grad-PU: Arbitrary-Scale Point Cloud Upsampling via Gradient Descent with Learned Distance Functions
⭐code - Binarizing Sparse Convolutional Networks for Efficient Point Cloud Analysis
- Spatiotemporal Self-supervised Learning for Point Clouds in the Wild
⭐code - NerVE: Neural Volumetric Edges for Parametric Curve Extraction from Point Cloud
⭐code - IterativePFN: True Iterative Point Cloud Filtering
⭐code - Fast Point Cloud Generation With Straight Flows
- GD-MAE: Generative Decoder for MAE Pre-Training on LiDAR Point Clouds
- 3D点云
- Parameter is Not All You Need: Starting from Non-Parametric Networks for 3D Point Cloud Analysis
⭐code - ToThePoint: Efficient Contrastive Learning of 3D Point Clouds via Recycling
- PartSLIP: Low-Shot Part Segmentation for 3D Point Clouds via Pretrained Image-Language Models
🏠project - Starting From Non-Parametric Networks for 3D Point Cloud Analysis
⭐code - Learnable Skeleton-Aware 3D Point Cloud Sampling
- GraVoS: Voxel Selection for 3D Point-Cloud Detection
- MarS3D: A Plug-and-Play Motion-Aware Model for Semantic Segmentation on Multi-Scan 3D Point Clouds
⭐code - NeuralPCI: Spatio-temporal Neural Field for 3D Point Cloud Multi-frame Non-linear Interpolation
⭐code
⭐code - Rethinking the Approximation Error in 3D Surface Fitting for Point Cloud Normal Estimation
⭐code
- Parameter is Not All You Need: Starting from Non-Parametric Networks for 3D Point Cloud Analysis
- 点云实例分割
- 点云分类
- 点云补全
- ProxyFormer: Proxy Alignment Assisted Point Cloud Completion with Missing Part Sensitive Transformer
⭐code - Symmetric Shape-Preserving Autoencoder for Unsupervised Real Scene Point Cloud Completion
- ACL-SPC: Adaptive Closed-Loop system for Self-Supervised Point Cloud Completion
⭐code - AnchorFormer: Point Cloud Completion From Discriminative Nodes
⭐code - Hyperspherical Embedding for Point Cloud Completion
- ProxyFormer: Proxy Alignment Assisted Point Cloud Completion with Missing Part Sensitive Transformer
- 点云配准
- Deep Graph-based Spatial Consistency for Robust Non-rigid Point Cloud Registration
⭐code - PEAL: Prior-Embedded Explicit Attention Learning for Low-Overlap Point Cloud Registration
- Unsupervised Deep Probabilistic Approach for Partial Point Cloud Registration
- Robust Multiview Point Cloud Registration with Reliable Pose Graph Initialization and History Reweighting
⭐code - BUFFER: Balancing Accuracy, Efficiency, and Generalizability in Point Cloud Registration
⭐code
- Deep Graph-based Spatial Consistency for Robust Non-rigid Point Cloud Registration
- 点云理解
- 点云重建
- 点云匹配
- 点云分割 *Improving Graph Representation for Point Cloud Segmentation via Attentive Filtering
- 点云压缩
6.Object Tracking(目标跟踪)
- Data-Driven Feature Tracking for Event Cameras
- Autoregressive Visual Tracking
⭐code - Propagate And Calibrate: Real-time Passive Non-line-of-sight Tracking
🏠project - Unifying Short and Long-Term Tracking With Graph Hierarchies
🏠project - VideoTrack: Learning To Track Objects via Video Transformer
- Tracking Through Containers and Occluders in the Wild
🏠project - Frame-Event Alignment and Fusion Network for High Frame Rate Tracking
- Propagate And Calibrate: Real-time Passive Non-line-of-sight Tracking
⭐code - Joint Visual Grounding and Tracking with Natural Language Specification
⭐code - Generalized Relation Modeling for Transformer Tracking
⭐code - SeqTrack: Sequence to Sequence Learning for Visual Object Tracking
- Tracking through Containers and Occluders in the Wild
🏠project - DropMAE: Masked Autoencoders with Spatial-Attention Dropout for Tracking Tasks
⭐code - CXTrack: Improving 3D Point Cloud Tracking With Contextual Information
- Representation Learning for Visual Object Tracking by Masked Appearance Transfer
⭐code - 3D-POP - An Automated Annotation Approach to Facilitate Markerless 2D-3D Tracking of Freely Moving Birds With Marker-Based Motion Capture
- 多目标跟踪
- Referring Multi-Object Tracking
⭐code - Observation-Centric SORT: Rethinking SORT for Robust Multi-Object Tracking
- Simple Cues Lead to a Strong Multi-Object Tracker
- Tracking Multiple Deformable Objects in Egocentric Videos
🏠project - MOTRv2: Bootstrapping End-to-End Multi-Object Tracking by Pretrained Object Detectors
⭐code - UTM: A Unified Multiple Object Tracking Model With Identity-Aware Feature Enhancement
- Focus on Details: Online Multi-Object Tracking With Diverse Fine-Grained Representation
- Standing Between Past and Future: Spatio-Temporal Modeling for Multi-Camera 3D Multi-Object Tracking
⭐code - MotionTrack: Learning Robust Short-term and Long-term Motions for Multi-Object Tracking
- OVTrack: Open-Vocabulary Multiple Object Tracking
🏠project
- Referring Multi-Object Tracking
- 多模态跟踪
- RGB-T tracking(可见光图像(RGB)和热红外图像(T)结合起来进行目标追踪)
5.Object Detection(目标检测)
- Angelic Patches for Improving Third-Party Object Detector Performance
- STDLens: Model Hijacking-Resilient Federated Learning for Object Detection
- Enhanced Training of Query-Based Object Detection via Selective Query Recollection
- The Differentiable Lens: Compound Lens Search Over Glass Surfaces and Materials for Object Detection
- Multi-view Adversarial Discriminator: Mine the Non-causal Factors for Object Detection in Unseen Domains
⭐code - Detection Hub: Unifying Object Detection Datasets via Query Adaptation on Language Embedding
- NeRF-RPN: A General Framework for Object Detection in NeRFs
⭐code - Towards Efficient Use of Multi-Scale Features in Transformer-Based Object Detectors
- Towards Building Self-Aware Object Detectors via Reliable Uncertainty Quantification and Calibration
⭐code - Gaussian Label Distribution Learning for Spherical Image Object Detection
- Rawgment: Noise-Accounted RAW Augmentation Enables Recognition in a Wide Variety of Environments
- Towards Unsupervised Object Detection From LiDAR Point Clouds
🏠project - Mask DINO: Towards a Unified Transformer-Based Framework for Object Detection and Segmentation
⭐code - T-SEA: Transfer-Based Self-Ensemble Attack on Object Detection
⭐code - Recurrent Vision Transformers for Object Detection With Event Cameras
- Learned Two-Plane Perspective Prior Based Image Resampling for Efficient Object Detection
- Normalizing Flow Based Feature Synthesis for Outlier-Aware Object Detection
- YOLOv7: Trainable Bag-of-Freebies Sets New State-of-the-Art for Real-Time Object Detectors
⭐code - MetaFusion: Infrared and Visible Image Fusion via Meta-Feature Embedding From Object Detection
⭐code - Doubly Right Object Recognition: A Why Prompt for Visual Rationales
- Phase-Shifting Coder: Predicting Accurate Orientation in Oriented Object Detection
⭐code - Unbalanced Optimal Transport: A Unified Framework for Object Detection
- CLIP the Gap: A Single Domain Generalization Approach for Object Detection
- Learning Transformations To Reduce the Geometric Shift in Object Detection
- Object Detection With Self-Supervised Scene Adaptation
⭐code - Lite DETR : An Interleaved Multi-Scale Encoder for Efficient DETR
⭐code - SAP-DETR: Bridging the Gap Between Salient Points and Queries-Based Transformer Detector for Fast Model Convergency
⭐code - Multiclass Confidence and Localization Calibration for Object Detection
⭐code - Mobile User Interface Element Detection Via Adaptively Prompt Tuning
- DynamicDet: A Unified Dynamic Architecture for Object Detection
⭐code - ZBS: Zero-shot Background Subtraction via Instance-level Background Modeling and Foreground Selection
⭐code - Curricular Object Manipulation in LiDAR-based Object Detection
⭐code - STDLens: Model Hijacking-resilient Federated Learning for Object Detection
⭐code - What Can Human Sketches Do for Object Detection?
⭐code - Unknown Sniffer for Object Detection: Don't Turn a Blind Eye to Unknown Objects
⭐code - Bridging Precision and Confidence: A Train-Time Loss for Calibrating Object Detection
⭐code - Learned Two-Plane Perspective Prior based Image Resampling for Efficient Object Detection
- T-SEA: Transfer-based Self-Ensemble Attack on Object Detection
⭐code
👍CVPR 2023 | 北大提出T-SEA: 自集成策略实现更强的黑盒攻击迁移性 - Knowledge Combination to Learn Rotated Detection Without Rotated Annotation
- Universal Instance Perception as Object Discovery and Retrieval
⭐code - Continual Detection Transformer for Incremental Object Detection目标检测
- Multi-view Adversarial Discriminator: Mine the Non-causal Factors for Object Detection in Unseen Domains
⭐code目标检测 - 开放词汇目标检测
- Aligning Bag of Regions for Open-Vocabulary Object Detection
⭐code - Region-Aware Pretraining for Open-Vocabulary Object Detection With Vision Transformers
- DetCLIPv2: Scalable Open-Vocabulary Object Detection Pre-Training via Word-Region Alignment
- OvarNet: Towards Open-vocabulary Object Attribute Recognition
👍CVPR2023|小红书提出 OvarNet 模型:开集预测的新SOTA,“万物识别”有了新玩法 - Learning To Detect and Segment for Open Vocabulary Object Detection
- Region-Aware Pretraining for Open-Vocabulary Object Detection with Vision Transformers
- Object-Aware Distillation Pyramid for Open-Vocabulary Object Detection
⭐code - CORA: Adapting CLIP for Open-Vocabulary Detection with Region Prompting and Anchor Pre-Matching
- DetCLIPv2: Scalable Open-Vocabulary Object Detection Pre-training via Word-Region Alignment
- Aligning Bag of Regions for Open-Vocabulary Object Detection
- 开放世界目标检测
- Annealing-Based Label-Transfer Learning for Open World Object Detection
⭐code - CapDet: Unifying Dense Captioning and Open-World Detection Pretraining
- PROB: Probabilistic Objectness for Open World Object Detection
⭐code - CAT: LoCalization and IdentificAtion Cascade Detection Transformer for Open-World Object Detection
- Detecting Everything in the Open World: Towards Universal Object Detection
⭐code
👍CVPR 2023 | 标注500类,检测7000类!清华大学等提出通用目标检测算法UniDetector
- Annealing-Based Label-Transfer Learning for Open World Object Detection
- 目标定位
- 3D OD
- Virtual Sparse Convolution for Multimodal 3D Object Detection
⭐code - Bi3D: Bi-Domain Active Learning for Cross-Domain 3D Object Detection
- MSMDFusion: Fusing LiDAR and Camera at Multiple Scales With Multi-Depth Seeds for 3D Object Detection
- BEVHeight: A Robust Framework for Vision-Based Roadside 3D Object Detection
- UniDistill: A Universal Cross-Modality Knowledge Distillation Framework for 3D Object Detection in Bird's-Eye View
- PointDistiller: Structured Knowledge Distillation Towards Efficient and Compact 3D Detection
⭐code - AShapeFormer: Semantics-Guided Object-Level Active Shape Encoding for 3D Object Detection via Transformers
⭐code - BEV-SAN: Accurate BEV 3D Object Detection via Slice Attention Networks
- 3D Video Object Detection With Learnable Object-Centric Global Optimization
⭐code - ConQueR: Query Contrast Voxel-DETR for 3D Object Detection
🏠project - Bi-LRFusion: Bi-Directional LiDAR-Radar Fusion for 3D Dynamic Object Detection
⭐code - Uni3D: A Unified Baseline for Multi-Dataset 3D Object Detection
⭐code - Distilling Focal Knowledge From Imperfect Expert for 3D Object Detection
⭐code - Are We Ready for Vision-Centric Driving Streaming Perception? The ASAP Benchmark
⭐code - Deep Dive Into Gradients: Better Optimization for 3D Object Detection With Gradient-Corrected IoU Supervision
⭐code - AeDet: Azimuth-invariant Multi-view 3D Object Detection
⭐code - FrustumFormer: Adaptive Instance-Aware Resampling for Multi-View 3D Detection
⭐code - PVT-SSD: Single-Stage 3D Object Detector With Point-Voxel Transformer
- itKD: Interchange Transfer-Based Knowledge Distillation for 3D Object Detection
- OcTr: Octree-Based Transformer for 3D Object Detection
- MoDAR: Using Motion Forecasting for 3D Object Detection in Point Cloud Sequences
- Semi-Supervised Stereo-Based 3D Object Detection via Cross-View Consensus
- LinK: Linear Kernel for LiDAR-based 3D Perception
⭐code - PillarNeXt: Rethinking Network Designs for 3D Object Detection in LiDAR Point Clouds
- PVT-SSD: Single-Stage 3D Object Detector with Point-Voxel Transformer
⭐code - 3D Video Object Detection with Learnable Object-Centric Global Optimization
⭐code - Density-Insensitive Unsupervised Domain Adaption on 3D Object Detection
⭐code - X3KD: Knowledge Distillation Across Modalities, Tasks and Stages for Multi-Camera 3D Object Detection
⭐code - Understanding the Robustness of 3D Object Detection with Bird's-Eye-View Representations in Autonomous Driving
- Weakly Supervised Monocular 3D Object Detection using Multi-View Projection and Direction Consistency
⭐code - Viewpoint Equivariance for Multi-View 3D Object Detection
⭐code - Benchmarking Robustness of 3D Object Detection to Common Corruptions in Autonomous Driving
⭐code - Collaboration Helps Camera Overtake LiDAR in 3D Detection
⭐code
⭐code - OcTr: Octree-based Transformer for 3D Object Detection
- MSF: Motion-guided Sequential Fusion for Efficient 3D Object Detection from Point Cloud Sequences
⭐code - MonoATT: Online Monocular 3D Object Detection with Adaptive Token Transformer
- MV-JAR: Masked Voxel Jigsaw and Reconstruction for LiDAR-Based Self-Supervised Pre-Training
⭐code - NS3D: Neuro-Symbolic Grounding of 3D Objects and Relations
- VoxelNeXt: Fully Sparse VoxelNet for 3D Object Detection and Tracking
⭐code - Bi3D: Bi-domain Active Learning for Cross-domain 3D Object Detection
⭐code - LoGoNet: Towards Accurate 3D Object Detection with Local-to-Global Cross-Modal Fusion
⭐code - PiMAE: Point Cloud and Image Interactive Masked Autoencoders for 3D Object Detection
⭐code - CAPE: Camera View Position Embedding for Multi-View 3D Object Detection
⭐code - Uni3D: A Unified Baseline for Multi-dataset 3D Object Detection
⭐code - Hierarchical Supervision and Shuffle Data Augmentation for 3D Semi-Supervised Object Detection
⭐code3D目标检测
- Virtual Sparse Convolution for Multimodal 3D Object Detection
- 端到端目标检测
- 半监督目标检测
- Active Teacher for Semi-Supervised Object Detection
⭐code - Semi-DETR: Semi-Supervised Object Detection With Detection Transformers
- Consistent-Teacher: Towards Reducing Inconsistent Pseudo-Targets in Semi-Supervised Object Detection
⭐code - SOOD: Towards Semi-Supervised Oriented Object Detection
⭐code - MixTeacher: Mining Promising Labels with Mixed Scale Teacher for Semi-Supervised Object Detection
⭐code
- Active Teacher for Semi-Supervised Object Detection
- 弱监督目标检测
- 小样本目标检测
- NIFF: Alleviating Forgetting in Generalized Few-Shot Object Detection via Neural Instance Feature Forging
- Generating Features with Increased Crop-related Diversity for Few-Shot Object Detection
- Meta-tuning Loss Functions and Data Augmentation for Few-shot Object Detection
- DiGeo: Discriminative Geometry-Aware Learning for Generalized Few-Shot Object Detection
⭐code
- 域适应目标检测
- 2PCNet: Two-Phase Consistency Training for Day-to-Night Unsupervised Domain Adaptive Object Detection
- AsyFOD: An Asymmetric Adaptation Paradigm for Few-Shot Domain Adaptive Object Detection
⭐code - CIGAR: Cross-Modality Graph Reasoning for Domain Adaptive Object Detection
- Instance Relation Graph Guided Source-Free Domain Adaptive Object Detection
🏠project - Domain Adaptive Detection Transformer With Information Fusion
- Harmonious Teacher for Cross-Domain Object Detection
- Contrastive Mean Teacher for Domain Adaptive Object Detectors
- 弱样本目标检测
- 显著目标检测
- Sketch2Saliency: Learning to Detect Salient Objects from Human Drawings
- Pixels, Regions, and Objects: Multiple Enhancement for Salient Object Detection
- Modeling the Distributional Uncertainty for Salient Object Detection Models
⭐code - Test Time Adaptation With Regularized Loss for Weakly Supervised Salient Object Detection
- Texture-Guided Saliency Distilling for Unsupervised Salient Object Detection
- 红外目标检测
- 伪装目标检测
- 密集目标检测
- 协同目标检测
- 点云目标检测
- 目标发现
- 视频目标检测
- 小目标检测
- Dynamic Coarse-to-Fine Learning for Oriented Tiny Object Detection
⭐code - Mapping Degeneration Meets Label Evolution: Learning Infrared Small Target Detection With Single Point Supervision
⭐code - Distilling Scale-Aware Knowledge in Small Object Detector
- LSTFE-Net:Long Short-Term Feature Enhancement Network for Video Small Object Detection
⭐code - 红外小目标检测
- Dynamic Coarse-to-Fine Learning for Oriented Tiny Object Detection
- 线段检测
- 目标导航
4.Image Captioning(图像字幕生成)
- 视频字幕
- 图像字幕
- Cross-Domain Image Captioning with Discriminative Finetuning
- Crossing the Gap: Domain Generalization for Image Captioning
- Model-Agnostic Gender Debiased Image Captioning
- A-CAP: Anticipation Captioning with Commonsense Knowledge字幕
- Positive-Augmented Contrastive Learning for Image and Video Captioning Evaluation
⭐code - HAAV: Hierarchical Aggregation of Augmented Views for Image Captioning
- Semantic-Conditional Diffusion Networks for Image Captioning
⭐code - ConZIC: Controllable Zero-Shot Image Captioning by Sampling-Based Polishing
- SmallCap: Lightweight Image Captioning Prompted With Retrieval Augmentation
- story generation(视觉故事生成)
- 3D密集字幕
3.Image Progress(低层图像处理、质量评价)
- Initialization Noise in Image Gradients and Saliency Maps
- Learning a Practical SDR-to-HDRTV Up-conversion using New Dataset and Degradation Models
⭐code - Tunable Convolutions with Parametric Multi-Loss Optimization
- 图像着色
- 阴影去除
- 图像恢复
- Efficient and Explicit Modelling of Image Hierarchies for Image Restoration
⭐code - Visual Recognition-Driven Image Restoration for Multiple Degradation With Intrinsic Semantics Recovery
- Learning Weather-General and Weather-Specific Features for Image Restoration Under Multiple Adverse Weather Conditions
- Generating Aligned Pseudo-Supervision From Non-Aligned Data for Image Restoration in Under-Display Camera
⭐code - Comprehensive and Delicate: An Efficient Transformer for Image Restoration
- Ingredient-Oriented Multi-Degradation Learning for Image Restoration
- All-in-One Image Restoration for Unknown Degradations Using Adaptive Discriminative Filters for Specific Degradations
- Contrastive Semi-Supervised Learning for Underwater Image Restoration via Reliable Bank
⭐code - Burstormer: Burst Image Restoration and Enhancement Transformer
- Generating Aligned Pseudo-Supervision from Non-Aligned Data for Image Restoration in Under-Display Camera
⭐code - Generative Diffusion Prior for Unified Image Restoration and Enhancement
- Bitstream-Corrupted JPEG Images are Restorable: Two-stage Compensation and Alignment Framework for Image Restoration
⭐code - Learning Distortion Invariant Representation for Image Restoration From a Causality Perspective
⭐code - Breaching FedMD: Image Recovery via Paired-Logits Inversion Attack
- Robust Unsupervised StyleGAN Image Restoration
🏠project
- Efficient and Explicit Modelling of Image Hierarchies for Image Restoration
- 图像修复
- 视频恢复
- 视频修复
- 图像照明
- 图像质量评估
- Quality-aware Pre-trained Models for Blind Image Quality Assessment
- Blind Image Quality Assessment via Vision-Language Correspondence: A Multitask Learning Perspective
⭐code - Quality-Aware Pre-Trained Models for Blind Image Quality Assessment
- An Image Quality Assessment Dataset for Portraits
⭐code - Re-IQA: Unsupervised Learning for Image Quality Assessment in the Wild
- 去雾
- Video Dehazing via a Multi-Range Temporal Alignment Network with Physical Prior
⭐code - Curricular Contrastive Regularization for Physics-aware Single Image Dehazing
- Curricular Contrastive Regularization for Physics-Aware Single Image Dehazing
- Efficient Frequency Domain-Based Transformers for High-Quality Image Deblurring
⭐code - RIDCP: Revitalizing Real Image Dehazing via High-Quality Codebook Priors
- Video Dehazing via a Multi-Range Temporal Alignment Network with Physical Prior
- 去雨
- 去噪
- Masked Image Training for Generalizable Deep Image Denoising
- Real-Time Controllable Denoising for Image and Video
- Patch-Craft Self-Supervised Training for Correlated Image Denoising
- Polarized Color Image Denoising
- sRGB Real Noise Synthesizing With Neighboring Correlation-Aware Noise Model
⭐code - Zero-Shot Noise2Noise: Efficient Image Denoising Without Any Data
🏠project - HouseDiffusion: Vector Floorplan Generation via a Diffusion Model With Discrete and Continuous Denoising
🏠project - Structure Aggregation for Cross-Spectral Stereo Image Guided Denoising
⭐code - Spatially Adaptive Self-Supervised Learning for Real-World Image Denoising
⭐code - Spectral Enhanced Rectangle Transformer for Hyperspectral Image Denoising
⭐code - Real-time Controllable Denoising for Image and Video
- LG-BPN: Local and Global Blind-Patch Network for Self-Supervised Real-World Denoising
⭐code - Efficient View Synthesis and 3D-based Multi-Frame Denoising with Multiplane Feature Representations
- Learning with Noisy labels via Self-supervised Adversarial Noisy Masking去噪
- Learning from Noisy Labels with Decoupled Meta Label Purifier去噪
- 去模糊
- HyperCUT: Video Sequence from a Single Blurry Image using Unsupervised Ordering
⭐code - Neumann Network With Recursive Kernels for Single Image Defocus Deblurring
- K3DN: Disparity-Aware Kernel Estimation for Dual-Pixel Defocus Deblurring
- Uncertainty-Aware Unsupervised Image Deblurring With Deep Residual Prior
- $\text{DC}^2$: Dual-Camera Defocus Control by Learning to Refocus
⭐code去模糊 - Self-Supervised Non-Uniform Kernel Estimation With Flow-Based Motion Prior for Blind Image Deblurring
🏠project - Joint Video Multi-Frame Interpolation and Deblurring Under Unknown Exposure Time
⭐code - Event-Based Frame Interpolation With Ad-Hoc Deblurring
- Deep Discriminative Spatial and Temporal Network for Efficient Video Deblurring
- HyperCUT: Video Sequence from a Single Blurry Image using Unsupervised Ordering
- 去鬼影
- 去反射光斑
- image deweathering
- 图像缩放
- 瞬间恢复与增强
- 图像增强
- Learning Semantic-Aware Knowledge Guidance for Low-Light Image Enhancement
- Realistic Saliency Guided Image Enhancement
- Learning a Simple Low-Light Image Enhancer From Paired Low-Light Instances
⭐code - Low-Light Image Enhancement via Structure Modeling and Guidance
- You Do Not Need Additional Priors or Regularizers in Retinex-Based Low-Light Image Enhancement
- 图像和谐化
- 图像曝光校正
- 物体移除
- Image Decomposition
- 图像重建
- Raw Image Reconstruction With Learned Compact Metadata
⭐code - Catch Missing Details: Image Reconstruction with Frequency Augmented Variational Autoencoder
- High-Resolution Image Reconstruction With Latent Diffusion Models From Human Brain Activity
🏠project - PermutoSDF: Fast Multi-View Reconstruction with Implicit Surfaces using Permutohedral Lattices
🏠project
- Raw Image Reconstruction With Learned Compact Metadata
- 文本驱动的图像处理
- 运动模糊
- 图像裁剪
- 图像重照明
- 模糊帧插值
2.Image Segmentation(图像分割)
- MED-VT: Multiscale Encoder-Decoder Video Transformer With Application To Object Segmentation
- SimpSON: Simplifying Photo Cleanup With Single-Click Distracting Object Segmentation Network
- Towards Open-World Segmentation of Parts
- Heat Diffusion Based Multi-Scale and Geometric Structure-Aware Transformer for Mesh Segmentation
- MOVES: Manipulated Objects in Video Enable Segmentation
- Decoupled Semantic Prototypes Enable Learning From Diverse Annotation Types for Semi-Weakly Segmentation in Expert-Driven Domains
- Compositor: Bottom-Up Clustering and Compositing for Robust Part and Object Segmentation
- VectorFloorSeg: Two-Stream Graph Attention Network for Vectorized Roughcast Floorplan Segmentation
⭐code - Nerflets: Local Radiance Fields for Efficient Structure-Aware 3D Scene Representation from 2D Supervisio
- OneFormer: One Transformer To Rule Universal Image Segmentation
🏠project - PanelNet: Understanding 360 Indoor Environment via Panel Representation
- AutoFocusFormer: Image Segmentation off the Grid
- MP-Former: Mask-Piloted Transformer for Image Segmentation
⭐code - Explicit Visual Prompting for Low-Level Structure Segmentations
⭐code - Focused and Collaborative Feedback Integration for Interactive Image Segmentation
⭐code - FreeSeg: Unified, Universal and Open-Vocabulary Image Segmentation
🏠project
在 VIS、VOS、MOTS 三个下游视频分割任务的五个数据集上,将 InstMove 插入到现有 SOTA 模型可以进一步带来 1~5 个点的提升。 - MED-VT: Multiscale Encoder-Decoder Video Transformer with Application to Object Segmentation分割
- 零样本分割
- 3D分割
- 全景分割
- 实例分割
- DynaMask: Dynamic Mask Selection for Instance Segmentation
⭐code - Tree Instance Segmentation With Temporal Contour Graph
- Hi4D: 4D Instance Segmentation of Close Human Interaction
- Beyond mAP: Towards Better Evaluation of Instance Segmentation
- Boosting Low-Data Instance Segmentation by Unsupervised Pre-Training With Saliency Prompt
- Cut and Learn for Unsupervised Object Detection and Instance Segmentation
⭐code - PartDistillation: Learning Parts From Instance Segmentation
⭐code - Iterative Next Boundary Detection for Instance Segmentation of Tree Rings in Microscopy Images of Shrub Cross Sections
⭐code - AttentionShift: Iteratively Estimated Part-Based Attention Map for Pointly Supervised Instance Segmentation
- DoNet: Deep De-overlapping Network for Cytology Instance Segmentation
⭐code - FastInst: A Simple Query-Based Model for Real-Time Instance Segmentation
⭐code - Camouflaged Instance Segmentation via Explicit De-Camouflaging
- 无监督实例分割
- 弱监督实例分割
- SIM: Semantic-aware Instance Mask Generation for Box-Supervised Instance Segmentation
⭐code - BoxTeacher: Exploring High-Quality Pseudo Labels for Weakly Supervised Instance Segmentation
⭐code - The Devil is in the Points: Weakly Semi-Supervised Instance Segmentation via Point-Guided Mask Representation
⭐code
- SIM: Semantic-aware Instance Mask Generation for Box-Supervised Instance Segmentation
- 开放词汇实例分割
- 零样本实例分割
- DynaMask: Dynamic Mask Selection for Instance Segmentation
- 语义分割
- IFSeg: Image-free Semantic Segmentation via Vision-Language Model
⭐code - Transformer Scale Gate for Semantic Segmentation
- Towards Better Stability and Adaptability: Improve Online Self-Training for Model Adaptation in Semantic Segmentation
- BAEFormer: Bi-Directional and Early Interaction Transformers for Bird's Eye View Semantic Segmentation
- Combining Implicit-Explicit View Correlation for Light Field Semantic Segmentation
- Pruning Parameterization With Bi-Level Optimization for Efficient Semantic Segmentation on the Edge
- Less Is More: Reducing Task and Model Complexity for 3D Point Cloud Semantic Segmentation
- SemiCVT: Semi-Supervised Convolutional Vision Transformer for Semantic Segmentation
- PIDNet: A Real-Time Semantic Segmentation Network Inspired by PID Controllers
- Principles of Forgetting in Domain-Incremental Semantic Segmentation in Adverse Weather Conditions
- PeakConv: Learning Peak Receptive Field for Radar Semantic Segmentation
⭐code - Understanding Imbalanced Semantic Segmentation Through Neural Collapse
⭐code - Geometry and Uncertainty-Aware 3D Point Cloud Class-Incremental Semantic Segmentation
⭐code - Single Domain Generalization for LiDAR Semantic Segmentation
⭐code - FedSeg: Class-Heterogeneous Federated Learning for Semantic Segmentation
- Proximal Splitting Adversarial Attack for Semantic Segmentation
⭐code - On Calibrating Semantic Segmentation Models: Analyses and an Algorithm
- Incrementer: Transformer for Class-Incremental Semantic Segmentation With Knowledge Distillation Focusing on Old Class
- Content-Aware Token Sharing for Efficient Semantic Segmentation With Vision Transformers
- Endpoints Weight Fusion for Class Incremental Semantic Segmentation
- Sparsely Annotated Semantic Segmentation With Adaptive Gaussian Mixtures
⭐code - ACSeg: Adaptive Conceptualization for Unsupervised Semantic Segmentation
- Improving Robustness of Semantic Segmentation to Motion-Blur Using Class-Centric Augmentation
- Dynamic Focus-Aware Positional Queries for Semantic Segmentation
⭐code - Continual Semantic Segmentation With Automatic Memory Sample Selection
- Learning Open-Vocabulary Semantic Segmentation Models From Natural Language Supervision
- Dynamically Instance-Guided Adaptation: A Backward-Free Approach for Test-Time Domain Adaptive Semantic Segmentation
⭐code - Federated Incremental Semantic Segmentation
⭐code - Delivering Arbitrary-Modal Semantic Segmentation
⭐code - Foundation Model Drives Weakly Incremental Learning for Semantic Segmentation
- A Simple Framework for Text-Supervised Semantic Segmentation
⭐code
在 PASCAL VOC 2012、PASCAL Context 和 COCO 数据集上的表现明显优于之前最先进的方法。 - Mask3D: Pre-training 2D Vision Transformers by Learning Masked 3D Priors
- Generative Semantic Segmentation
⭐code - Reliability in Semantic Segmentation: Are We on the Right Track?
⭐code - Both Style and Distortion Matter: Dual-Path Unsupervised Domain Adaptation for Panoramic Semantic Segmentation
- Less is More: Reducing Task and Model Complexity for 3D Point Cloud Semantic Segmentation
⭐code - Instant Domain Augmentation for LiDAR Semantic Segmentation
🏠project - Delving into Shape-aware Zero-shot Semantic Segmentation
⭐code - 开放词汇语义分割
- 开放世界语义分割
- 域适应语义分割
- 域泛化语义分割
- 无监督语义分割
- 半监督语义分割
- Conflict-Based Cross-View Consistency for Semi-Supervised Semantic Segmentation
⭐code - Revisiting Weak-to-Strong Consistency in Semi-Supervised Semantic Segmentation
- Hunting Sparsity: Density-Guided Contrastive Learning for Semi-Supervised Semantic Segmentation
⭐code - Instance-Specific and Model-Adaptive Supervision for Semi-Supervised Semantic Segmentation
- LaserMix for Semi-Supervised LiDAR Semantic Segmentation
⭐code - Augmentation Matters: A Simple-Yet-Effective Approach to Semi-Supervised Semantic Segmentation
- Fuzzy Positive Learning for Semi-Supervised Semantic Segmentation
- Conflict-Based Cross-View Consistency for Semi-Supervised Semantic Segmentation
- 弱监督语义分割
- Token Contrast for Weakly-Supervised Semantic Segmentation
⭐code - CLIP Is Also an Efficient Segmenter: A Text-Driven Approach for Weakly Supervised Semantic Segmentation
- Boundary-Enhanced Co-Training for Weakly Supervised Semantic Segmentation
⭐code - Out-of-Candidate Rectification for Weakly Supervised Semantic Segmentation
- Weakly Supervised Semantic Segmentation via Adversarial Learning of Classifier and Reconstructor
⭐code
- Token Contrast for Weakly-Supervised Semantic Segmentation
- 自监督语义分割
- 点云语义分割
- 零样本语义分割
- 小样本语义分割
- 长尾语义分割
- 3D 语义分割
- 开集语义分割
- IFSeg: Image-free Semantic Segmentation via Vision-Language Model
- 交互式分割
- 小样本分割
- VSS
- Efficient Semantic Segmentation by Altering Resolutions for Compressed Videos
⭐code - Simultaneously Short- and Long-Term Temporal Modeling for Semi-Supervised Video Semantic Segmentation
- Spatio-Temporal Pixel-Level Contrastive Learning-based Source-Free Domain Adaptation for Video Semantic Segmentation
⭐code
- Efficient Semantic Segmentation by Altering Resolutions for Compressed Videos
- VOS
- InstMove: Instance Motion for Object-centric Video Segmentation
⭐code - Breaking the "Object" in Video Object Segmentation
- Look Before You Match: Instance Understanding Matters in Video Object Segmentation
- MobileVOS: Real-Time Video Object Segmentation Contrastive Learning meets Knowledge Distillation
- Boosting Video Object Segmentation via Space-time Correspondence Learning
⭐code - Bootstrapping Objectness from Videos by Relaxed Common Fate and Visual GroupingVOS
- Unified Mask Embedding and Correspondence Learning for Self-Supervised Video Segmentation
⭐code - Two-shot Video Object Segmentation
⭐code - Bootstrapping Objectness from Videos by Relaxed Common Fate and Visual Grouping
- InstMove: Instance Motion for Object-centric Video Segmentation
- VIS
- 场景理解
- FREDOM: Fairness Domain Adaptation Approach to Semantic Scene Understanding
- SceneTrilogy: On Human Scene-Sketch and its Complementarity with Photo and Text
🏠project - Movies2Scenes: Using Movie Metadata To Learn Scene Representation
- Seeing With Sound: Long-range Acoustic Beamforming for Multimodal Scene Understanding
- Single View Scene Scale Estimation Using Scale Field
- Neural Part Priors: Learning To Optimize Part-Based Object Completion in RGB-D Scans
- 3D 场景理解
- OpenScene: 3D Scene Understanding With Open Vocabularies
- Long Range Pooling for 3D Large-Scale Scene Understanding
- Panoptic Lifting for 3D Scene Understanding With Neural Fields
🏠project - FAC: 3D Representation Learning via Foreground Aware Feature Contrast
- Self-supervised Pre-training with Masked Shape Prediction for 3D Scene Understanding
- CLIP2Scene: Towards Label-Efficient 3D Scene Understanding by CLIP
⭐code - PLA:Language-driven Open-Vocabulary 3D Scene Understanding
⭐code
🏠project - MM-3DScene: 3D Scene Understanding by Customizing Masked Modeling With Informative-Preserved Reconstruction and Self-Distilled Consistency
- 抠图
- 指代图像分割
- 引用表达分割
- 运动分割
- 视频分割
- 动作分割
1.other(其它,待分类)
- CIRCLE: Capture in Rich Contextual Environments
- Trainable Projected Gradient Method for Robust Fine-Tuning
- HDR Imaging With Spatially Varying Signal-to-Noise Ratios
- Are Deep Neural Networks SMARTer Than Second Graders?
- Blowing in the Wind: CycleNet for Human Cinemagraphs From Still Images
- Uncertainty-Aware Vision-Based Metric Cross-View Geolocalization
- pCON: Polarimetric Coordinate Networks for Neural Scene Representations
- Two-Stage Co-Segmentation Network Based on Discriminative Representation for Recovering Human Mesh From Videos
- Ranking Regularization for Critical Rare Classes: Minimizing False Positives at a High True Positive Rate
- Implicit View-Time Interpolation of Stereo Videos Using Multi-Plane Disparities and Non-Uniform Coordinates
- LayoutFormer++: Conditional Graphic Layout Generation via Constraint Serialization and Decoding Space Restriction
- Stare at What You See: Masked Image Modeling Without Reconstruction
- Neural Kaleidoscopic Space Sculpting
- HyperCUT: Video Sequence From a Single Blurry Image Using Unsupervised Ordering
- Can't Steal? Cont-Steal! Contrastive Stealing Attacks Against Image Encoders
- Edges to Shapes to Concepts: Adversarial Augmentation for Robust Vision
- Improved Distribution Matching for Dataset Condensation
- Slimmable Dataset Condensation
- LEGO-Net: Learning Regular Rearrangements of Objects in Rooms
- Neuralizer: General Neuroimage Analysis Without Re-Training
- DETRs With Hybrid Matching
⭐code - A Rotation-Translation-Decoupled Solution for Robust and Efficient Visual-Inertial Initialization
- A-La-Carte Prompt Tuning (APT): Combining Distinct Data via Composable Prompting
- Open-World Multi-Task Control Through Goal-Aware Representation Learning and Adaptive Horizon Prediction
- Hi-LASSIE: High-Fidelity Articulated Shape and Skeleton Discovery From Sparse Image Ensemble
- Decentralized Learning With Multi-Headed Distillation
- On the Convergence of IRLS and Its Variants in Outlier-Robust Estimation
- Learning Joint Latent Space EBM Prior Model for Multi-Layer Generator
- Knowledge Combination To Learn Rotated Detection Without Rotated Annotation
- FlowGrad: Controlling the Output of Generative ODEs With Gradients
- Recurrent Homography Estimation Using Homography-Guided Image Warping and Focus Transformer
- Multi-View Inverse Rendering for Large-Scale Real-World Indoor Scenes
- Boosting Transductive Few-Shot Fine-Tuning With Margin-Based Uncertainty Weighting and Probability Regularization
- BiasAdv: Bias-Adversarial Augmentation for Model Debiasing
- CDDFuse: Correlation-Driven Dual-Branch Feature Decomposition for Multi-Modality Image Fusion
- Why Is the Winner the Best?
- HGNet: Learning Hierarchical Geometry From Points, Edges, and Surfaces
- Revisiting the P3P Problem
- RiDDLE: Reversible and Diversified De-Identification With Latent Encryptor
- BASiS: Batch Aligned Spectral Embedding Space
- CRAFT: Concept Recursive Activation FacTorization for Explainability
- Infinite Photorealistic Worlds using Procedural Generation
- All-in-Focus Imaging From Event Focal Stack
- Learning 3D Scene Priors With 2D Supervision
- NeuWigs: A Neural Dynamic Model for Volumetric Hair Capture and Animation
- CLIPPO: Image-and-Language Understanding from Pixels Only
⭐code - Towards Bridging the Performance Gaps of Joint Energy-Based Models
- expOSE: Accurate Initialization-Free Projective Factorization Using Exponential Regularization
- Learning Debiased Representations via Conditional Attribute Interpolation
- Learning Neural Volumetric Representations of Dynamic Humans in Minutes
- Bayesian Posterior Approximation With Stochastic Ensembles
- RILS: Masked Visual Reconstruction in Language Semantic Space
- RepMode: Learning to Re-Parameterize Diverse Experts for Subcellular Structure Prediction
- Zero-Shot Model Diagnosis
- Improving Visual Grounding by Encouraging Consistent Gradient-Based Explanations
⭐code - AdaMAE: Adaptive Masking for Efficient Spatiotemporal Learning With Masked Autoencoders
- Understanding and Improving Visual Prompting: A Label-Mapping Perspective
- DegAE: A New Pretraining Paradigm for Low-Level Vision
- LiDAR-in-the-Loop Hyperparameter Optimization
- Understanding Deep Generative Models With Generalized Empirical Likelihoods
- Visual Query Tuning: Towards Effective Usage of Intermediate Representations for Parameter and Memory Efficient Transfer Learning
- Compressing Volumetric Radiance Fields to 1 MB
⭐code - Label Information Bottleneck for Label Enhancement
⭐code - DNF: Decouple and Feedback Network for Seeing in the Dark
- Cloud-Device Collaborative Adaptation to Continual Changing Environments in the Real-World
- How To Prevent the Continuous Damage of Noises To Model Training?
- ActMAD: Activation Matching To Align Distributions for Test-Time-Training
🏠project - Leveraging Temporal Context in Low Representational Power Regimes
🏠project - Guided Recommendation for Model Fine-Tuning
- OT-Filter: An Optimal Transport Filter for Learning With Noisy Labels
- E2PN: Efficient SE(3)-Equivariant Point Network
⭐code - Understanding Masked Image Modeling via Learning Occlusion Invariant Feature
- Fine-Tuned CLIP Models Are Efficient Video Learners
⭐code - Visual Recognition by Request
- Stitchable Neural Networks
🏠project - RUST: Latent Neural Scene Representations From Unposed Imagery
⭐code - Spatio-Focal Bidirectional Disparity Estimation From a Dual-Pixel Image
- Four-View Geometry With Unknown Radial Distortion
- Learning Optical Expansion From Scale Matching
⭐code - Don't Lie to Me! Robust and Efficient Explainability With Verified Perturbation Analysis
⭐code - Learning Transformation-Predictive Representations for Detection and Description of Local Features
- Two-Way Multi-Label Loss
⭐code - Where is my Wallet? Modeling Object Proposal Sets for Egocentric Visual Query Localization
⭐code - Dionysus: Recovering Scene Structures by Dividing Into Semantic Pieces
- Noisy Correspondence Learning With Meta Similarity Correction
- HOOD: Hierarchical Graphs for Generalized Modelling of Clothing Dynamics
- Modeling Entities As Semantic Points for Visual Information Extraction in the Wild
🏠project - NeAT: Learning Neural Implicit Surfaces With Arbitrary Topologies From Multi-View Images
- Learning a Deep Color Difference Metric for Photographic Images
- DINN360: Deformable Invertible Neural Network for Latitude-Aware 360deg Image Rescaling
⭐code - Finetune Like You Pretrain: Improved Finetuning of Zero-Shot Vision Models
⭐code - Learning a Practical SDR-to-HDRTV Up-Conversion Using New Dataset and Degradation Models
⭐code - DynaFed: Tackling Client Data Heterogeneity With Global Dynamics
- CUF: Continuous Upsampling Filters
- Learning Decorrelated Representations Efficiently Using Fast Fourier Transform
- Practical Network Acceleration With Tiny Sets
- AstroNet: When Astrocyte Meets Artificial Neural Network
- NeuralLift-360: Lifting an In-the-Wild 2D Photo to a 3D Object With 360deg Views
⭐code - Command-Driven Articulated Object Understanding and Manipulation
⭐code - HelixSurf: A Robust and Efficient Neural Implicit Surface Learning of Indoor Scenes With Iterative Intertwined Regularization
⭐code - Joint Appearance and Motion Learning for Efficient Rolling Shutter Correction
⭐code - Gradient-Based Uncertainty Attribution for Explainable Bayesian Deep Learning
- Class Adaptive Network Calibration
⭐code - OCTET: Object-Aware Counterfactual Explanations
⭐code - DNeRV: Modeling Inherent Dynamics via Difference Neural Representation for Videos
- FFF: Fragment-Guided Flexible Fitting for Building Complete Protein Structures
- Open-Set Representation Learning Through Combinatorial Embedding
- A Unified HDR Imaging Method With Pixel and Patch Level
- Accelerated Coordinate Encoding: Learning to Relocalize in Minutes Using RGB and Poses
⭐code - Switchable Representation Learning Framework With Self-Compatibility
- Exploring and Utilizing Pattern Imbalance
- Top-Down Visual Attention From Analysis by Synthesis
🏠project - Interactive Cartoonization With Controllable Perceptual Factors
- Regularize Implicit Neural Representation by Itself
- Delving Into Discrete Normalizing Flows on SO(3) Manifold for Probabilistic Rotation Modeling
- Re-Basin via Implicit Sinkhorn Differentiation
⭐code - Towards Effective Visual Representations for Partial-Label Learning
- Samples With Low Loss Curvature Improve Data Efficiency
⭐code - Learning Correspondence Uncertainty via Differentiable Nonlinear Least Squares
- Tunable Convolutions With Parametric Multi-Loss Optimization
- RelightableHands: Efficient Neural Relighting of Articulated Hand Models
🏠project - DyNCA: Real-Time Dynamic Texture Synthesis Using Neural Cellular Automata
🏠project - Token Turing Machines
⭐code - Probabilistic Debiasing of Scene Graphs
⭐code - Few-Shot Non-Line-of-Sight Imaging With Signal-Surface Collaborative Regularization
- The Dark Side of Dynamic Routing Neural Networks: Towards Efficiency Backdoor Injection
- Generalized Decoding for Pixel, Image, and Language
🏠project - EC2: Emergent Communication for Embodied Control
- Generalizable Local Feature Pre-Training for Deformable Shape Analysis
⭐code - On-the-Fly Category Discovery
⭐code - PyramidFlow: High-Resolution Defect Contrastive Localization Using Pyramid Normalizing Flow
- Efficient Verification of Neural Networks Against LVM-Based Specifications
- TensoIR: Tensorial Inverse Rendering
🏠project - Learning From Unique Perspectives: User-Aware Saliency Modeling
- LargeKernel3D: Scaling Up Kernels in 3D Sparse CNNs
⭐code - Learning Transferable Spatiotemporal Representations From Natural Script Knowledge
⭐code - FFCV: Accelerating Training by Removing Data Bottlenecks
🏠project - Semidefinite Relaxations for Robust Multiview Triangulation
- GradICON: Approximate Diffeomorphisms via Gradient Inverse Consistency
⭐code - Polynomial Implicit Neural Representations for Large Diverse Datasets
⭐code - Back to the Source: Diffusion-Driven Adaptation To Test-Time Corruption
- Learning To Zoom and Unzoom
🏠project - Masked Image Modeling With Local Multi-Scale Reconstruction
- Neural Vector Fields: Implicit Representation by Explicit Learning
⭐code - Rate Gradient Approximation Attack Threats Deep Spiking Neural Networks
⭐code - Critical Learning Periods for Multisensory Integration in Deep Networks
- Imitation Learning as State Matching via Differentiable Physics
⭐code - Probing Sentiment-Oriented Pre-Training Inspired by Human Sentiment Perception Mechanism
⭐code - Relightable Neural Human Assets From Multi-View Gradient Illuminations
⭐code - DINER: Disorder-Invariant Implicit Neural Representation
- Robust Mean Teacher for Continual and Gradual Test-Time Adaptation
⭐code - A Probabilistic Framework for Lifelong Test-Time Adaptation
⭐code - Probing Neural Representations of Scene Perception in a Hippocampally Dependent Task Using Artificial Neural Networks
- Decoupling Human and Camera Motion From Videos in the Wild
🏠project - DISC: Learning From Noisy Labels via Dynamic Instance-Specific Selection and Correction
⭐code - DC2: Dual-Camera Defocus Control by Learning To Refocus
- FJMP: Factorized Joint Multi-Agent Motion Prediction over Learned Directed Acyclic Interaction Graphs
- "Seeing" Electric Network Frequency From Events
🏠project - Confidential and Private Decentralized Learning Based on Encryption-Friendly Distillation Loss
⭐code - Revealing the Dark Secrets of Masked Image Modeling
- RIFormer: Keep Your Vision Backbone Effective but Removing Token Mixer
- Adaptive Graph Convolutional Subspace Clustering
- Graph Representation for Order-Aware Visual Transformation
- Train-Once-for-All Personalization
- Learning Sample Relationship for Exposure Correction
- EXIF as Language: Learning Cross-Modal Associations Between Images and Camera Metadata
🏠project - Gradient norm aware minimization seeks first-order flatness and improves generalization
⭐code
👍CVPR2023|清华大学提出GAM:神经网络“一阶平滑优化器”,显著提升模型“泛化能力” - EXIF As Language: Learning Cross-Modal Associations Between Images and Camera Metadata
🏠project - InstantAvatar: Learning Avatars From Monocular Video in 60 Seconds
- GAPartNet: Cross-Category Domain-Generalizable Object Perception and Manipulation via Generalizable and Actionable Parts
- Deep Deterministic Uncertainty: A New Simple Baseline
- WIRE: Wavelet Implicit Neural Representations
- Learning From Noisy Labels With Decoupled Meta Label Purifier
- Architectural Backdoors in Neural Networks
- Event-Based Shape From Polarization
- Deep Hashing With Minimal-Distance-Separated Hash Centers
- Progressive Spatio-Temporal Alignment for Efficient Event-Based Motion Estimation
⭐code - Score Jacobian Chaining: Lifting Pretrained 2D Diffusion Models for 3D Generation
🏠project - MetaCLUE: Towards Comprehensive Visual Metaphors Research
🏠project - EVA: Exploring the Limits of Masked Visual Representation Learning at Scale
⭐code - Sliced Optimal Partial Transport
- Deep Learning of Partial Graph Matching via Differentiable Top-K
⭐code - Unsupervised Volumetric Animation
🏠project - Passive Micron-Scale Time-of-Flight With Sunlight Interferometry
- Generalizable Implicit Neural Representations via Instance Pattern Composers
⭐code - On the Pitfall of Mixup for Uncertainty Calibration
- UMat: Uncertainty-Aware Single Image High Resolution Material Capture
- On Data Scaling in Masked Image Modeling
- End-to-End Vectorized HD-Map Construction With Piecewise Bezier Curve
⭐code - Boundary Unlearning: Rapid Forgetting of Deep Networks via Shifting the Decision Boundary
- MobileOne: An Improved One millisecond Mobile Backbone
⭐code - Improving Robust Generalization by Direct PAC-Bayesian Bound Minimization
- Shepherding Slots to Objects: Towards Stable and Robust Object-Centric Learning
⭐code - Residual Degradation Learning Unfolding Framework With Mixing Priors Across Spectral and Spatial for Compressive Spectral Imaging
- Robust and Scalable Gaussian Process Regression and Its Applications
⭐code - NeuralUDF: Learning Unsigned Distance Fields for Multi-View Reconstruction of Surfaces With Arbitrary Topologies
🏠project - Shortcomings of Top-Down Randomization-Based Sanity Checks for Evaluations of Deep Neural Network Explanations
- Alias-Free Convnets: Fractional Shift Invariance via Polynomial Activations
⭐code - Multiplicative Fourier Level of Detail
- VGFlow: Visibility guided Flow Network for Human Reposing
- Neural Dependencies Emerging From Learning Massive Categories
- MaLP: Manipulation Localization Using a Proactive Scheme
🏠project - Efficient Robust Principal Component Analysis via Block Krylov Iteration and CUR Decomposition
- ConvNeXt V2: Co-designing and Scaling ConvNets with Masked Autoencoders
⭐code - Learning 3D Representations From 2D Pre-Trained Models via Image-to-Point Masked Autoencoders
⭐code - MEGANE: Morphable Eyeglass and Avatar Network
🏠project - Solving relaxations of MAP-MRF problems: Combinatorial in-face Frank-Wolfe directions
- EXCALIBUR: Encouraging and Evaluating Embodied Exploration
- Learning To Predict Scene-Level Implicit 3D From Posed RGBD Data
- SplineCam: Exact Visualization and Characterization of Deep Network Geometry and Decision Boundaries
🏠project - Learning Neural Parametric Head Models
🏠project - Integral Neural Networks
- Simulated Annealing in Early Layers Leads to Better Generalization
- Fresnel Microfacet BRDF: Unification of Polari-Radiometric Surface-Body Reflection
- Improving Visual Representation Learning Through Perceptual Understanding
- Probability-Based Global Cross-Modal Upsampling for Pansharpening
⭐code - SCConv: Spatial and Channel Reconstruction Convolution for Feature Redundancy
- Megahertz Light Steering Without Moving Parts
- TempSAL - Uncovering Temporal Information for Deep Saliency Prediction
🏠project - Affection: Learning Affective Explanations for Real-World Visual Data
🏠project - Metadata-Based RAW Reconstruction via Implicit Neural Functions
- Coaching a Teachable Student
- Progressive Transformation Learning for Leveraging Virtual Images in Training
- NIRVANA: Neural Implicit Representations of Videos with Adaptive Networks and Autoregressive Patch-wise Modeling
- Spatial-Temporal Concept Based Explanation of 3D ConvNets
- Overlooked Factors in Concept-Based Explanations: Dataset Choice, Concept Learnability, and Human Capability
⭐code - Neural Fourier Filter Bank
⭐code - ECON: Explicit Clothed Humans Optimized via Normal Integration
⭐code - Autonomous Manipulation Learning for Similar Deformable Objects via Only One Demonstration
- Plateau-Reduced Differentiable Path Tracing
🏠project - Test Time Adaptation With Transformation Invariance
⭐code - Learning To Exploit the Sequence-Specific Prior Knowledge for Image Processing Pipelines Optimization
- Deep Fair Clustering via Maximizing and Minimizing Mutual Information: Theory, Algorithm and Metric
🏠project - CUDA: Convolution-based Unlearnable Datasets
- Efficient On-Device Training via Gradient Filtering
- Transfer Knowledge From Head to Tail: Uncertainty Calibration Under Long-Tailed Distribution
- Temporal Attention Unit: Towards Efficient Spatiotemporal Predictive Learning
- Disentangled Representation Learning for Unsupervised Neural Quantization
- DA Wand: Distortion-Aware Selection Using Neural Mesh Parameterization
⭐code
🏠project - On Distillation of Guided Diffusion Models
- Putting People in Their Place: Affordance-Aware Human Insertion Into Scenes
⭐code - K-Planes: Explicit Radiance Fields in Space, Time, and Appearance
🏠project - Understanding Masked Autoencoders via Hierarchical Latent Variable Models
- Co-Training 2L Submodels for Visual Recognition
- Masked Images Are Counterfactual Samples for Robust Fine-Tuning
⭐code - Learning Customized Visual Models With Retrieval-Augmented Knowledge
- A Unified Spatial-Angular Structured Light for Single-View Acquisition of Shape and Reflectance
- PromptCAL: Contrastive Affinity Learning via Auxiliary Prompts for Generalized Novel Category Discovery
⭐code - Reproducible Scaling Laws for Contrastive Language-Image Learning
⭐code - Intrinsic Physical Concepts Discovery With Object-Centric Predictive Models
- Invertible Neural Skinning
🏠project - Multi-Object Manipulation via Object-Centric Neural Scattering Functions
- Fair Scratch Tickets: Finding Fair Sparse Networks Without Weight Training
- Backdoor Cleansing With Unlabeled Data
⭐code - Full or Weak Annotations? An Adaptive Strategy for Budget-Constrained Annotation Campaigns
- Extracting Class Activation Maps From Non-Discriminative Features As Well
- Executing Your Commands via Motion Diffusion in Latent Space
- Chat2Map: Efficient Scene Mapping From Multi-Ego Conversations
🏠project - Learning To Generate Image Embeddings With User-Level Differential Privacy
- Revisiting the Stack-Based Inverse Tone Mapping
- PACO: Parts and Attributes of Common Objects
⭐code - Teacher-Generated Spatial-Attention Labels Boost Robustness and Accuracy of Contrastive Models
- A General Regret Bound of Preconditioned Gradient Method for DNN Training
⭐code - A Practical Upper Bound for the Worst-Case Attribution Deviations
- Perception and Semantic Aware Regularization for Sequential Confidence Calibration
⭐code - Deep Random Projector: Accelerated Deep Image Prior
⭐[code](https://github.com/sun- umn/DeepRandom-Projector) - Bias Mimicking: A Simple Sampling Approach for Bias Mitigation
⭐code - DeCo: Decomposition and Reconstruction for Compositional Temporal Grounding via Coarse-To-Fine Contrastive Ranking
- Structured Kernel Estimation for Photon-Limited Deconvolution
⭐code - FlexiViT: One Model for All Patch Sizes
⭐code - BiasBed - Rigorous Texture Bias Evaluation
⭐code - GeoLayoutLM: Geometric Pre-Training for Visual Information Extraction
⭐code - Finding Geometric Models by Clustering in the Consensus Space
⭐code - Hierarchical Neural Memory Network for Low Latency Event Processing
🏠project - Connecting the Dots: Floorplan Reconstruction Using Two-Level Queries
⭐code - PointConvFormer: Revenge of the Point-Based Convolution
- A Practical Stereo Depth System for Smart Glasses
- Differentiable Shadow Mapping for Efficient Inverse Graphics
- Multi Domain Learning for Motion Magnification
⭐code - Re-Thinking Model Inversion Attacks Against Deep Neural Networks
⭐code - DexArt: Benchmarking Generalizable Dexterous Manipulation With Articulated Objects
🏠project - Two-View Geometry Scoring Without Correspondences
🏠project - ScanDMM: A Deep Markov Model of Scanpath Prediction for 360deg Images
⭐code - Zero-Shot Text-to-Parameter Translation for Game Character Auto-Creation
- Analyzing Physical Impacts Using Transient Surface Wave Imaging
- Adaptive Global Decay Process for Event Cameras
⭐code - Leveraging Inter-Rater Agreement for Classification in the Presence of Noisy Labels
- Towards Better Gradient Consistency for Neural Signed Distance Functions via Level Set Alignment
⭐code - Swept-Angle Synthetic Wavelength Interferometry
- Shape, Pose, and Appearance From a Single Image via Bootstrapped Radiance Field Inversion
🏠project - Unlearnable Clusters: Towards Label-Agnostic Unlearnable Examples
⭐code - 3D-Aware Object Goal Navigation via Simultaneous Exploration and Identification
- EcoTTA: Memory-Efficient Continual Test-Time Adaptation via Self-Distilled Regularization
- Text-Guided Unsupervised Latent Transformation for Multi-Attribute Image Manipulation
- Minimizing the Accumulated Trajectory Error To Improve Dataset Distillation
⭐code - DisCoScene: Spatially Disentangled Generative Radiance Fields for Controllable 3D-Aware Scene Synthesis
🏠project - Virtual Occlusions Through Implicit Depth
- StyleSync: High-Fidelity Generalized and Personalized Lip Sync in Style-based Generator
⭐code - Putting People in Their Place: Affordance-Aware Human Insertion into Scenes
⭐code - Inverting the Imaging Process by Learning an Implicit Camera Model
⭐code - Visual DNA: Representing and Comparing Images using Distributions of Neuron Activations
⭐code - GeoLayoutLM: Geometric Pre-training for Visual Information Extraction
⭐code - Gradient-based Uncertainty Attribution for Explainable Bayesian Deep Learning
- Noisy Correspondence Learning with Meta Similarity Correction
- Efficient Multimodal Fusion via Interactive Prompting
- Representing Volumetric Videos as Dynamic MLP Maps
⭐code - Neuro-Modulated Hebbian Learning for Fully Test-Time Adaptation
- Disentangling Orthogonal Planes for Indoor Panoramic Room Layout Estimation with Cross-Scale Distortion Awareness
- DART: Diversify-Aggregate-Repeat Training Improves Generalization of Neural Networks
- EcoTTA: Memory-Efficient Continual Test-time Adaptation via Self-distilled Regularization
- Intrinsic Physical Concepts Discovery with Object-Centric Predictive Models
- A Meta-Learning Approach to Predicting Performance and Data Requirements
- Multimodal Prompting with Missing Modalities for Visual Recognition
⭐code - Masked Images Are Counterfactual Samples for Robust Fine-tuning
- UniHCP: A Unified Model for Human-Centric Perceptions
⭐code - DeepMAD: Mathematical Architecture Design for Deep Convolutional Neural Network
⭐code - Progressive Open Space Expansion for Open-Set Model Attribution
⭐code - TrojDiff: Trojan Attacks on Diffusion Models with Diverse Targets
⭐code - HumanBench: Towards General Human-centric Perception with Projector Assisted Pretraining
⭐code - 3D Cinemagraphy from a Single Image
🏠project - Masked Image Modeling with Local Multi-Scale Reconstruction
⭐code - Revisiting Rotation Averaging: Uncertainties and Robust Losses
⭐code - Unifying Layout Generation with a Decoupled Diffusion Model
- Adversarial Counterfactual Visual Explanations
⭐code - Trainable Projected Gradient Method for Robust Fine-tuning
⭐code - Partial Network Cloning
⭐code - Extracting Class Activation Maps from Non-Discriminative Features as well
⭐code - TWINS: A Fine-Tuning Framework for Improved Transferability of Adversarial Robustness and Generalization
⭐code - Visibility Constrained Wide-band Illumination Spectrum Design for Seeing-in-the-Dark
⭐code - PRISE: Demystifying Deep Lucas-Kanade with Strongly Star-Convex Constraints for Multimodel Image Alignment
⭐code - Boundary Unlearning
🏠project - ProphNet: Efficient Agent-Centric Motion Forecasting with Anchor-Informed Proposals
- VecFontSDF: Learning to Reconstruct and Synthesize High-quality Vector Fonts via Signed Distance Functions
- Learning a Depth Covariance Function
⭐code - A Bag-of-Prototypes Representation for Dataset-Level Applications
- CrOC: Cross-View Online Clustering for Dense Visual Representation Learning
⭐code - Exploring Structured Semantic Prior for Multi Label Recognition with Incomplete Labels
⭐code - Marching-Primitives: Shape Abstraction from Signed Distance Function
⭐code - Robust Generalization against Photon-Limited Corruptions via Worst-Case Sharpness Minimization
- Robust Test-Time Adaptation in Dynamic Scenarios
⭐code - Enhancing Multiple Reliability Measures via Nuisance-extended Information Bottleneck
⭐code - IDGI: A Framework to Eliminate Explanation Noise from Integrated Gradients
- Compacting Binary Neural Networks by Sparse Kernel Selection
- PDPP:Projected Diffusion for Procedure Planning in Instructional Videos
⭐code - Multi-Granularity Archaeological Dating of Chinese Bronze Dings Based on a Knowledge-Guided Relation Graph
⭐code - Quantum Multi-Model Fitting
⭐code - Continuous Intermediate Token Learning with Implicit Motion Manifold for Keyframe Based Motion Interpolation
- PMatch: Paired Masked Image Modeling for Dense Geometric Matching
⭐code - ImageNet-E: Benchmarking Neural Network Robustness via Attribute Editing
⭐code - Single Image Depth Prediction Made Better: A Multivariate Gaussian Take
- Why is the winner the best?
- Disorder-invariant Implicit Neural Representation
⭐code - HypLiLoc: Towards Effective LiDAR Pose Regression with Hyperbolic Fusion
⭐code - Enhancing Deformable Local Features by Jointly Learning to Detect and Describe Keypoints
🏠project - SMPConv: Self-moving Point Representations for Continuous Convolution
⭐code - VNE: An Effective Method for Improving Deep Representation by Manipulating Eigenvalue Distribution
⭐code - Delving into Discrete Normalizing Flows on SO(3) Manifold for Probabilistic Rotation Modeling
- Wide-Angle Rectification via Content-Aware Conformal Mapping
🏠project - Large-capacity and Flexible Video Steganography via Invertible Neural Network
⭐code - SketchXAI: A First Look at Explainability for Human Sketches
⭐code - Hard Patches Mining for Masked Image Modeling
👍CVPR 2023 | HPM:在掩码学习中挖掘困难样本,带来稳固性能提升! - Learning Geometry-aware Representations by Sketching
- DisCo-CLIP: A Distributed Contrastive Loss for Memory Efficient CLIP Training
⭐code - Investigating the Nature of 3D Generalization in Deep Neural Networks
⭐code - EC^2: Emergent Communication for Embodied Control
- Generalizing Dataset Distillation via Deep Generative Prior
⭐code
🏠project - Learning Locally Editable Virtual Humans
🏠project - Class-Balancing Diffusion Models
- SFD2: Semantic-guided Feature Detection and Description
⭐code - Computational Flash Photography Through Intrinsics
- Deep Graph Reprogramming
- LayoutDM: Transformer-based Diffusion Model for Layout Generation
- MetaViewer: Towards a Unified Multi-View Representation
- Learning Compact Representations for LiDAR Completion and Generation
🏠project - 多模态
- Understanding and Constructing Latent Modality Structures in Multi-Modal Representation Learning
- PMR: Prototypical Modal Rebalance for Multimodal Learning
- Multi-Modal Learning With Missing Modality via Shared-Specific Feature Modelling
- Towards Flexible Multi-Modal Document Models
- Multi-Modal Representation Learning With Text-Driven Soft Masks
- Align and Attend: Multimodal Summarization With Dual Contrastive Losses
🏠project - Improving Zero-Shot Generalization and Robustness of Multi-Modal Models
⭐code - BEV-Guided Multi-Modality Fusion for Driving Perception
⭐code - BiCro: Noisy Correspondence Rectification for Multi-modality Data via Bi-directional Cross-modal Similarity Consistency
- Towards All-in-One Pre-Training via Maximizing Multi-Modal Mutual Information
⭐code - Learning Instance-Level Representation for Large-Scale Multi-Modal Pretraining in E-commerce多模态预训练
- MMANet: Margin-Aware Distillation and Modality-Aware Regularization for Incomplete Multimodal Learning
⭐code
- Affordance Learning(启示学习)
- Feature Matching(特征匹配)
- PATS: Patch Area Transportation with Subdivision for Local Feature Matching
🏠project - Adaptive Spot-Guided Transformer for Consistent Local Feature Matching
⭐code
⭐code - Adaptive Assignment for Geometry Aware Local Feature Matching
⭐code特征匹配 - DKM: Dense Kernelized Feature Matching for Geometry Estimation
⭐code
- PATS: Patch Area Transportation with Subdivision for Local Feature Matching
- 紫外线预测
- vector quantization(矢量量化)