There are no reviews yet. Be the first to send feedback to the community and the maintainers!
Repository Details
ICASSP 2023-2024 Papers: A complete collection of influential and exciting research papers from the ICASSP 2023-24 conferences. Explore the latest advancements in acoustics, speech and signal processing. Code included. Star the repository to support the advancement of audio and signal processing!
ICASSP-2023-Papers
Completed:
ICASSP 2023 Papers: A complete collection of influential and exciting research papers from the ICASSP 2023 conference. Explore the latest advancements in acoustics, speech and signal processing. Code included. ⭐ the repository to support the advancement of audio and signal processing!
Contributions to improve the completeness of this list are greatly appreciated. If you come across any overlooked papers, please feel free to create pull requests, open issues or contact me via email. Your participation is crucial to making this repository even better.
HPFTN: Hierarchical Progressive Fusion Transformer Network for Video Denoising
➖
398
Subspace Modeling enabled High-Sensitivity X-Ray Chemical Imaging
➖
274
MSP-Former: Multi-Scale Projection Transformer for Single Image Desnowing
➖
117
Hyperspectral Image Denoising via Nonlocal Rank Residual Modeling
Semantic Segmentation
🆔
Title
Repo
Paper
190
LoG-CAN: Local-Global Class-aware Network for Semantic Segmentation of Remote Sensing Images
406
WUDA: Unsupervised Domain Adaptation based on Weak Source Domain Labels
555
Class-aware Contextual Information for Semantic Segmentation
➖
1132
Semi-Supervised Semantic Segmentation with Structured Output Space Adaption
➖
1170
PRRD: Pixel-Region Relation Distillation for Efficient Semantic Segmentation
➖
2521
Spatial Correlation Fusion Network for Few-Shot Segmentation
➖
3306
Exploring Vision Transformer Layer Choosing for Semantic Segmentation
➖
3941
Joint Training of Hierarchical GANs and Semantic Segmentation for Expression Translation
➖
6357
Progressive Refinement Learning based on Feature Cross Perception for Residential Areas Semantic Segmentation
➖
1599
Lightweight Portrait Segmentation via Edge-optimized Attention
3857
A Dynamic Cross-Scale Transformer with Dual-Compound Representation for 3D Medical Image Segmentation
➖
3793
LABANet: Lead-Assisting Backbone Attention Network for Oral Multi-Pathology Segmentation
➖
Object Segmentation
🆔
Title
Repo
Paper
3473
Robust Video Object Segmentation with Restricted Attention
➖
3501
Stacking-based Attention Temporal Convolutional Network for Action Segmentation
➖
2436
VLKP: Video Instance Segmentation with Visual-Linguistic Knowledge Prompts
➖
4867
Automatic Error Detection in Integrated Circuits Image Segmentation: A Data-Driven Approach
➖
3745
TransWnet: Integrating Transformers Into CNNs via Row and Column Attention for Abdominal Multi-Organ Segmentation
➖
5844
Active Perception System for Enhanced Visual Signal Recovery using Deep Reinforcement Learning
➖
302
OAFormer: Learning Occlusion Distinguishable Feature for Amodal Instance Segmentation
➖
698
Encoder-Decoder Graph Convolutional Network for Automatic Timed-Up-and-Go and Sit-to-Stand Segmentation
➖
758
Meta++ Network for Few-Shot Aerospace Crack Segmentation
➖
1764
IAST: Instance Association Relying on Spatio-Temporal Features for Video Instance Segmentation
2469
Continual Cell Instance Segmentation of Microscopy Images
➖
Deep Learning for Image and Video Processing
🆔
Title
Repo
Paper
5397
Spammer Detection on Short Video Applications: A New Challenge and Baselines
➖
814
Weakly- and Semi-Supervised Object Localization
➖
2503
Balanced Mixup Loss for Long-Tailed Visual Recognition
➖
4130
On Cross-Layer Alignment for Model Fusion of Heterogeneous Neural Networks
➖
2813
Invariant Adversarial Imitation Learning from Visual Inputs
➖
6423
SPECTRANET-SO(3): Learning Satellite Orientation from Optical Spectra by Implicitly Modeling Mutually Exclusive Probability Distributions on the Rotation Manifold
➖
3097
Structured-Anchor Projected Clustering for Hyperspectral Images
➖
140
Learning Sparse Auto-Encoders for Green AI Image Coding
➖
643
Learning to Generate 3D Representations of Building Roofs using Single-View Aerial Imagery
➖
4843
Robust Monocular Localization of Drones by Adapting Domain Maps to Depth Prediction Inaccuracies
➖
5940
Large Dimensional Analysis of LS-SVM Transfer Learning: Application to PolSAR Classification
➖
5062
SMUG: Towards Robust MRI Reconstruction by Smoothed Unrolling
Graph based Learning
🆔
Title
Repo
Paper
715
Graph-Graph Context Dependency Attention for Graph Edit Distance
➖
3882
Topology Uncertainty Modeling for Imbalanced Node Classification on Graphs
➖
589
CPD-GAN: Cascaded Pyramid Deformation GAN for Pose Transfer
➖
5321
Space-Time Graph Neural Networks with Stochastic Graph Perturbations
➖
6793
Untrained Graph Neural Networks for Denoising
➖
5846
Learning on Graphs under Label Noise
➖
2906
Select the Best: Enhancing Graph Representation with Adaptive Negative Sample Selection
➖
2586
Learning with Multigraph Convolutional Filters
➖
2164
Self-Supervised Guided Hypergraph Feature Propagation for Semi-Supervised Classification with Missing Node Features
➖
3752
Incorporating Reliability in Graph Information Propagation by Fluid Dynamics Diffusion: a Case of Multimodal Semi-Supervised Deep Learning
➖
5159
GraphMAD: Graph Mixup for Data Augmentation using Data-Driven Convex Clustering
3724
Time-Varying Signals Recovery via Graph Neural Networks
➖
Learning from Multimodal Data
🆔
Title
Repo
Paper
3546
Multimodal Knowledge Distillation for Arbitrary-Oriented Object Detection in Aerial Images
➖
1234
Hierarchical Spatial-Temporal Transformer with Motion Trajectory for Individual Action and Group Activity Recognition
➖
693
Autonomous Soundscape Augmentation with Multimodal Fusion of Visual and Participant-Linked Inputs
➖
1571
Towards Robust Audio-based Vehicle Detection via Importance-Aware Audio-Visual Learning
➖
841
Hierarchical Multi-Task Learning for Fabric Component Analysis Based on NIR Spectral Signals
➖
1706
Cross Modality Knowledge Distillation for Robust Pedestrian Detection in Low Light and Adverse Weather Conditions
➖
6375
Data Leakage in Cross-Modal Retrieval Training: A Case Study
➖
5825
Difficulty-Aware Data Augmentor for Scene Text Recognition
➖
461
TinyOOD: Effective Out-of-Distribution Detection for TinyML
➖
4211
A Principled Approach to Model Validation in Domain Generalization
4220
Scale-Adaptive Tiny Object Detection Enhanced by Across-Scale and Shape-Preserved Semantic Location
➖
3735
Audio-Visual Inpainting: Reconstructing Missing Visual Information with Sound
➖
Matrix/Tensor Factorization and Completion
🆔
Title
Repo
Paper
507
Learn Topological Representation with Flexible Manifold Layer
1438
Tensorized LSSVMs for Multitask Regression
➖
3571
A Bayesian Perspective for Determinant Minimization based Robust Structured Matrix Factorization
➖
5045
Volume-Regularized Nonnegative Tucker Decomposition with Identifiability Guarantees
➖
687
Transductive Matrix Completion with Calibration for Multi-Task Learning
➖
1668
Projected Hierarchical ALS for Generalized Boolean Matrix Factorization
➖
2934
Robust Binary Component Decompositions
➖
3897
Multi-Resolution Convolutional Dictionary Learning for Riverbed Dynamics Modeling
➖
2388
PARAFAC2-based Coupled Matrix and Tensor Factorizations
6088
Deep Plug-and-Play for Tensor Robust Principal Component Analysis
➖
6125
Geometric Matrix Completion with Collaborative Routing between Capsules
➖
3256
Enrollment Rate Prediction in Clinical Trials based on CDF Sketching and Tensor Factorization Tools
➖
ASR - Improve Latency, Efficiency, and Accuracy
🆔
Title
Repo
Paper
900
Multi-blank Transducers for Speech Recognition
1642
Diagonal State Space Augmented Transformers for Speech Recognition
➖
1661
TrimTail: Low-Latency Streaming ASR with Simple but Effective Spectrogram-Level Length Penalty
➖
3385
Towards Accurate and Real-Time End-of-Speech Estimation
➖
3999
Peak-First CTC: Reducing the Peak Latency of CTC Models by Applying Peak-First Regularization
➖
4330
Evaluating Parameter-Efficient Transfer Learning Approaches on SURE Benchmark for Speech Understanding
5058
Powerful and Extensible WFST Framework for RNN-Transducer Losses
➖
5337
Predicting Multi-Codebook Vector Quantization Indexes for Knowledge Distillation
➖
5434
Improving Non-Autoregressive Speech Recognition with Autoregressive Pretraining
➖
5558
Conversation-Oriented ASR with Multi-Look-Ahead CBS Architecture
➖
5607
Using Adapters to Overcome Catastrophic Forgetting in End-to-End Automatic Speech Recognition
➖
5824
Fast and Parallel Decoding for Transducer
ASR: Domain Adaptation and Robust Training
🆔
Title
Repo
Paper
505
SAN: A Robust End-to-End ASR Model Architecture
➖
1604
Explanations for Automatic Speech Recognition
➖
1674
On-the-Fly Text Retrieval for End-to-End ASR Adaptation
➖
2397
Unsupervised Model-based Speaker Adaptation of End-To-End Lattice-Free MMI Model for Speech Recognition
➖
3258
Domain Adaptation with External Off-Policy Acoustic Catalogs for Scalable Contextual End-To-End Automated Speech Recognition
➖
3600
Comparison of Soft and Hard Target RNN-T Distillation for Large-scale ASR
➖
3973
WeavSpeech: Data Augmentation Strategy for Automatic Speech Recognition via Semantic-aware Weaving
➖
4139
Joint Discriminator and Transfer based Fast Domain Adaptation for End-to-End Speech Recognition
➖
5424
Improving Fairness and Robustness in End-to-End Speech Recognition Through Unsupervised Clustering
➖
5491
Improving Fast-Slow Encoder based Transducer with Streaming Deliberation
➖
5496
Dynamic Alignment Mask CTC: Improved Mask-CTC with Aligned Cross Entropy
➖
5902
Improving Accented Speech Recognition with Multi-Domain Training
➖
ASR: New Models
🆔
Title
Repo
Paper
179
UCONV-Conformer: High Reduction of Input Sequence Length for End-to-End Speech Recognition
876
A Comparison of Semi-Supervised Learning Techniques for Streaming ASR at Scale
➖
1356
Improving Contextual Biasing with Text Injection
➖
1655
Structured State Space Decoder for Speech Recognition and Synthesis
➖
3365
JEIT: Joint End-to-End Model and Internal Language Model Training for Speech Recognition
➖
3368
Variable Attention Masking for Configurable Transformer Transducer Speech Recognition
➖
3499
Factorized Blank Thresholding for Improved Runtime Efficiency of Neural Transducers
➖
3926
Fast-U2++: Fast and Accurate End-to-End Speech Recognition in Joint CTC/Attention Frames
➖
4365
Understanding Shared Speech-Text Representations
➖
4534
Front-End Adapter: Adapting Front-End Input of Speech based Self-Supervised Learning for Speech Recognition
➖
2237
Lego-Features: Exporting Modular Encoder Features for Streaming and Deliberation ASR
➖
5384
Modular Conformer Training for Flexible End-to-End ASR
➖
ASR: Noise Robustness
🆔
Title
Repo
Paper
1897
On Word Error Rate Definitions and Their Efficient Computation for Multi-Speaker Speech Recognition Systems
1919
Gradient Remedy for Multi-Task Learning in End-to-End Noise-Robust Speech Recognition
1929
MADI: Inter-Domain Matching and Intra-Domain Discrimination for Cross-Domain Speech Recognition
➖
1971
Robust Data2vec: Noise-Robust Speech Representation Learning for ASR by Combining Regression and Improved Contrastive Learning
➖
2040
Robust Audio-Visual ASR with Unified Cross-Modal Attention
➖
3292
HuBERT-AGG: Aggregated Representation Distillation of Hidden-Unit BERT for Robust Speech Recognition
➖
4124
Speech and Noise Dual-Stream Spectrogram Refine Network with Speech Distortion Loss for Robust Speech Recognition
4680
RobustDistiller: Compressing Universal Speech Representations for Enhanced Environment Robustness
➖
5455
Cleanformer: A Multichannel Array Configuration-Invariant Neural Enhancement Frontend for ASR in Smart Speakers
➖
5504
On the Effectiveness of Monoaural Target Source Extraction for Distant End-to-End Automatic Speech Recognition
➖
6389
Noise-aware Target Extension with Self-Distillation for Robust Speech Recognition
➖
Audio Signal Restoration and Editing
🆔
Title
Repo
Paper
5003
AERO: Audio Super Resolution in the Spectral Domain
1768
UPGLADE: Unplugged Plug-and-Play Audio Declipper based on Consensus Equilibrium of DNN and Sparse Optimization
➖
2121
Improving Performance of Real-Time Full-Band Blind Packet-Loss Concealment with Predictive Network
4388
Faster than Fast: Accelerating the Griffin-Lim Algorithm
➖
3726
Improving Phase-Vocoder-based Time Stretching by Time-Directional Spectrogram Squeezing
6288
Extreme Audio Time Stretching using Neural Synthesis
➖
Epilepsy Detection Grand Challenge
🆔
Title
Repo
Paper
7015
Lightweight Machine Learning for Seizure Detection on Wearable Devices
➖
7021
Pretrained Transformers for Seizure Detection
➖
7022
Towards Interpretable Seizure Detection using Wearables
➖
7033
Optimization of the Deep Neural Networks for Seizure Detection
➖
Deep Learning Theory
🆔
Title
Repo
Paper
2465
MSFormer: Multi-Scale Transformer with Neighborhood Consensus for Feature Matching
➖
3498
Decoupled Visual Causality for Robust Detection
➖
2500
Semantics-Disentangled Contrastive Embedding for Generalized Zero-Shot Learning
➖
4730
Dynamic Scalable Self-Attention Ensemble for Task-Free Continual Learning
➖
2125
Ultimate Negative Sampling for Contrastive Learning
➖
3936
An Application of Quantum Mechanics to Attention Methods in Computer Vision
➖
Neural Architecture Search
🆔
Title
Repo
Paper
3492
Search for Efficient Deep Visual-Inertial Odometry Through Neural Architecture Search
4072
Receptive Field Reliant Zero-Cost Proxies for Neural Architecture Search
➖
4346
ZO-DARTS: Differentiable Architecture Search with Zeroth-Order Approximation
➖
2675
Performing Neural Architecture Search without Gradients
796
Neural Architecture of Speech
➖
1461
BHE-DARTS: Bilevel Optimization based on Hypergradient Estimation for Differentiable Architecture Search
➖
Expressive and Controllable TTS
🆔
Title
Repo
Paper
2625
Improving Speech Prosody of Audiobook Text-to-Speech Synthesis with Acoustic and Textual Contexts
4768
Context-Aware Coherent Speaking Style Prediction with Hierarchical Transformers for Audiobook Speech Synthesis
4776
Ensemble Prosody Prediction for Expressive Speech Synthesis
5782
Expressive-VC: Highly Expressive Voice Conversion with Attention Fusion of Bottleneck and Perturbation Features
5970
High-Acoustic Fidelity Text to Speech Synthesis with Fine-Grained Control of Speech Attributes
➖
6203
Embedding a Differentiable Mel-Cepstral Synthesis Filter to a Neural Speech Synthesis System
Keyword Spotting
🆔
Title
Repo
Paper
1848
Disentangled Training with Adversarial Examples for Robust Small-Footprint Keyword Spotting
➖
3578
Dual-Attention Neural Transducers for Efficient Wake Word Spotting in Speech Recognition
➖
5025
Fixed-Point Quantization Aware Training for On-Device Keyword-Spotting
➖
5106
To Wake-Up or Not to Wake-Up: Reducing Keyword False Alarm by Successive Refinement
➖
5584
Transcription Free Filler Word Detection with Neural Semi-CRFs
6078
The DKU Post-Challenge Audio-Visual Wake Word Spotting System for the 2021 MISP Challenge: Deep Analysis
Detection and Classification
🆔
Title
Repo
Paper
657
Passive Detection of Rank-One Gaussian Signals for Known Channel Subspaces and Arbitrary Noise
➖
2389
False Alarm Regulation for Off-Grid Target Detection with the Matched Filter
➖
2536
Data-Driven Quickest Change Detection in Markov Models
➖
3510
Quickest Change Detection with Leave-one-Out Density Estimation
➖
4778
Identifying Coordination in a Cognitive Radar Network - A Multi-Objective Inverse Reinforcement Learning Approach
4815
Improved Small Sample Hypothesis Testing using the Uncertain Likelihood Ratio
➖
Advances in Signal Processing and Machine Learning for Non-Intrusive Load Monitoring
🆔
Title
Repo
Paper
2170
A Wavelet Scattering Approach for Load Identification with Limited Amount of Training Data
➖
2653
Applying Symmetrical Component Transform for Industrial Appliance Classification in Non-Intrusive Load Monitoring
➖
3326
ContiNILM: A Continual Learning Scheme for Non-Intrusive Load Monitoring
➖
5853
Improving Knowledge Distillation for Non-Intrusive Load Monitoring through Explainability Guided Learning
➖
6414
Improved Appliance Transient Feature Extraction via Template Matching
➖
Machine Learning Applications
🆔
Title
Repo
Paper
6355
Causal Discovery and Causal Inference based Counterfactual Fairness in Machine Learning
➖
4965
Benchmarking Convolutional Neural Network Inference on Low-Power Edge Devices
➖
1115
Code-Enhanced Fine-Grained Semantic Matching for Tag Recommendation in Software Information Sites
➖
394
Robust Dominant Periodicity Detection for Time Series with Missing Data
➖
3994
Dynamic Split Computing for Efficient Deep Edge Intelligence
5723
Dense Adversarial Transfer Learning based on Class-Invariance
➖
4620
VAN-ICP: GPU-Accelerated Approximate Nearest Neighbor Search for ICP Registration via Voxel Dilation
5776
Clustering-based Supervised Contrastive Learning for Identifying Risk Items on Heterogeneous Graph
➖
4052
Multiresolution Signal Processing of Financial Market Objects
➖
1752
Hierarchical Multi-Agent Reinforcement Learning with Intrinsic Reward Rectification
➖
3493
An Antispoofing Approach in Biometric Authentication System for a Smartcard
➖
3576
Unsupervised Domain Adaptation via Subspace Interpolating Deep Dictionary Learning: A Case Study in Machine Inspection
➖
Classification
🆔
Title
Repo
Paper
283
Multi-Modal Domain Generalization for Cross-Scene Hyperspectral Image Classification
➖
1056
Hierarchical Transformer for Multi-Label Trailer Genre Classification
➖
1236
S3I-PointHop: SO(3)-Invariant PointHop for 3D Point Cloud Classification
➖
1302
Sample-Aware Knowledge Distillation for Long-Tailed Learning
➖
1562
Laryngeal Leukoplakia Classification via Dense Multiscale Feature Extraction in White Light Endoscopy Images
➖
1904
Long-Tailed Recognition with Causal Invariant Transformation
➖
2199
STACKMAPS: A Visualization Technique for Diabetic Retinopathy Grading
➖
2904
Gender-Cartoon: Image Cartoonization Method based on Gender Classification
➖
3167
Extracting the Brain-Like Representation by an Improved Self-Organizing Map for Image Classification
3888
DDN: Dynamic Aggregation Enhanced Dual-Stream Network for Medical Image Classification
➖
4696
LGViT: Local-Global Vision Transformer for Breast Cancer Histopathological Image Classification
➖
5583
Learning a Weight Map for Weakly-Supervised Localization
➖
Human Posture Estimation
🆔
Title
Repo
Paper
301
Interweaved Graph and Attention Network for 3D Human Pose Estimation
3696
Learning 3D Human Pose and Shape Estimation using Uncertainty-Aware Body Part Segmentation
➖
3841
Monocular 3D Human Pose Estimation based on Global Temporal-Attentive and Joints-Attention in Video
4380
EVOPOSE: A Recursive Transformer for 3D Human Pose Estimation with Kinematic Structure Priors
➖
142
HTNet: Human Topology Aware Network for 3D Human Pose Estimation
1107
Improving Occluded Human Pose Estimation via Linked Joints
➖
5121
Efficient and Effective Multi-Camera Pose Estimation with Weighted M-Estimate Sample Consensus
➖
5668
AMPose: Alternately Mixed Global-Local Attention Model for 3D Human Pose Estimation
5750
FlowPose: Conditional Normalizing Flows for 3D Human Pose and Shape Estimation from Monocular Videos
➖
6050
Animal Re-Identification Algorithm for Posture Diversity
6322
Retrieval-based Natural 3D Human Motion Generation
➖
2453
Human Pose Estimation from Ambiguous Pressure Recordings with Spatio-Temporal Masked Transformers
➖
Human Reconstruction
🆔
Title
Repo
Paper
4237
Time-Frequency Awareness Network for Human Mesh Recovery from Videos
2028
Diffusion Motion: Generate Text-Guided 3D Human Motion by Diffusion Model
➖
4667
GATOR: Graph-Aware Transformer with Motion-Disentangled Regression for Human Mesh Recovery from a 2D Pose
5538
Real-Time Human Reconstruction based on Human Pose Prior and Epipolar Refinement
➖
642
Efficient Feature Fusion for Learning-based Photometric Stereo
➖
2442
Volumetric 3D Reconstruction with Window-Wise Global Feature Aggregation
➖
4008
Stereoscopic Video Retargeting based on Camera Motion Classification
➖
4893
Detail-Aware Uncalibrated Photometric Stereo
➖
5712
SDRNet: Shape Decoupled Regression Network for 3D Face Reconstruction
➖
1119
Binary Image Fast Perfect Recovery from Sparse 2D-DFT Coefficients
➖
1175
HQP-MVS: High-Quality Plane Priors Assisted Multi-View Stereo for Low-Textured Areas
➖
3183
Dynamic Multi-View Scene Reconstruction using Neural Implicit Surface
➖
Face Recognition
🆔
Title
Repo
Paper
3959
LOGO-Former: Local-Global Spatio-Temporal Transformer for Dynamic Facial Expression Recognition
➖
4254
Quaternion Orthogonal Transformer for Facial Expression Recognition in the Wild
3490
Privacy Preserving Face Recognition with Lensless Camera
➖
3649
MaskDUL: Data Uncertainty Learning in Masked Face Recognition
4814
Cov Loss: Covariance-based Loss for Deep Face Recognition
➖
5674
Boosting Face Recognition Performance with Synthetic Data and Limited Real Data
➖
2762
A Dual-Branch Adaptive Distribution Fusion Framework for Real-World Facial Expression Recognition
4199
Efficient Practices for Profile-to-Frontal Face Synthesis and Recognition
➖
4208
Learning Causal Representations for Generalizable Face Anti-Spoofing
➖
2767
Self-Paced Partial Domain-aware Learning for Face Anti-Spoofing
➖
746
Context-aware Face Clustering with Graph Convolutional Networks
➖
Source Separation, ICA, and Sparsity
🆔
Title
Repo
Paper
193
A Multi-Stage Triple-Path Method for Speech Separation in Noisy and Reverberant Environments
➖
524
On the Minimum Perimeter Criterion for Bounded Component Analysis
➖
4129
Joint Unmixing and Demosaicing Methods for Snapshot Spectral Images
➖
5036
Identifiable Bounded Component Analysis via Minimum Volume Enclosing Parallelotope
➖
5587
Balanced Deep CCA for Bird Vocalization Detection
1692
Independent Vector Analysis with Multivariate Gaussian Model: A Scalable Method by Multilinear Regression
➖
3184
Activity-Informed Industrial Audio Anomaly Detection via Source Separation
➖
6717
Double Nonstationarity: Blind Extraction of Independent Nonstationary Vector/Component from Nonstationary Mixtures - Algorithms
➖
6798
Towards Flexible Sparsity-Aware Modeling: Automatic Tensor Rank Learning using the Generalized Hyperbolic Prior
➖
5426
MedleyVox: An Evaluation Dataset for Multiple Singing Voices Separation
674
Hybrid Transformers for Music Source Separation
5141
Dictionary Learning on Graph Data with Weisfieler-Lehman Sub-Tree Kernel and KSVD
➖
Neural Sound Synthesis and Representation
🆔
Title
Repo
Paper
2678
GANStrument: Adversarial Instrument Sound Synthesis with Pitch-Invariant Instance Conditioning
2555
I Hear Your True Colors: Image Guided Audio Generation
1261
Grad-StyleSpeech: Any-Speaker Adaptive Text-to-Speech Synthesis with Diffusion Models
3085
Voice Conversion using Feature Specific Loss Function based Self-Attentive Generative Adversarial Network
1268
TriAAN-VC: Triple Adaptive Attention Normalization for Any-to-Any Voice Conversion
6748
Decorrelating Feature Spaces for Learning General-Purpose Audio Representations
4904
Continuous Descriptor-based Control for Deep Audio Synthesis
5786
Rigid-Body Sound Synthesis with Differentiable Modal Resonators
5349
Exploring Approaches to Multi-Task Automatic Synthesizer Programming
➖
6710
Speech Time-Scale Modification with GANs
➖
4339
Full-Band General Audio Synthesis with Score-based Diffusion
4443
Is Quality Enoughƒ Integrating Energy Consumption in a Large-Scale Evaluation of Neural Audio Synthesis Models
➖
Deep Learning for Audio and Music Applications
🆔
Title
Repo
Paper
896
Controllable Music Inpainting with Mixed-Level and Disentangled Representation
1991
HIPI: A Hierarchical Performer Identification Model based on Symbolic Representation of Music
➖
207
Chord-Conditioned Melody Harmonization with Controllable Harmonicity
1878
Jazznet: A Dataset of Fundamental Piano Patterns for Music Audio Machine Learning Research
5273
Music Mixing Style Transfer: A Contrastive Learning Approach to Disentangle Audio Effects
1442
An Improved Optimal Transport Kernel Embedding Method with Gating Mechanism for Singing Voice Separation and Speaker Identification
➖
3448
Tempo vs. Pitch: Understanding Self-Supervised Tempo Estimation
1995
Adversarial Permutation Invariant Training for Universal Sound Separation
1379
Anomalous Sound Detection using Audio Representation with Machine ID based Contrastive Learning Pretraining
➖
4727
Low-Resource Music Genre Classification with Cross-Modal Neural Model Reprogramming
1375
SPADE: Self-Supervised Pretraining for Acoustic Disentanglement
➖
1615
On Out-of-Distribution Detection for Audio with Deep Nearest Neighbors
Machine Learning for Image and Video Processing
🆔
Title
Repo
Paper
1011
IoU-Aware Multi-Expert Cascade Network via Dynamic Ensemble for Long-Tailed Object Detection
➖
1622
Efficient Compressed Video Action Recognition via Late Fusion with a Single Network
➖
1649
Amicable Aid: Perturbing Images to Improve Classification Performance
➖
3861
Spatial Cross-Attention for Transformer-based Image Captioning
➖
3879
Towards Hyperbolic Regularizers for Point Cloud Part Segmentation
➖
5265
Clip4VideoCap: Rethinking CLIP for Video Captioning with Multiscale Temporal Fusion and Commonsense Knowledge
➖
6356
Learning Silhouettes with Group Sparse Autoencoders
5042
Deep Learning for Lagrangian Drift Simulation at The Sea Surface