• Stars
    star
    355
  • Rank 119,027 (Top 3 %)
  • Language
    Python
  • License
    MIT License
  • Created about 1 year ago
  • Updated 4 days ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

ICASSP 2023-2024 Papers: A complete collection of influential and exciting research papers from the ICASSP 2023-24 conferences. Explore the latest advancements in acoustics, speech and signal processing. Code included. Star the repository to support the advancement of audio and signal processing!

ICASSP-2023-Papers

Awesome Version GitHub repo size License: MIT Contributions welcome GitHub contributors GitHub commit activity (branch) GitHub closed issues GitHub issues GitHub closed pull requests GitHub pull requests GitHub last commit GitHub watchers GitHub forks GitHub Repo stars Visitors

Completed: Progress


ICASSP 2023 Papers: A complete collection of influential and exciting research papers from the ICASSP 2023 conference. Explore the latest advancements in acoustics, speech and signal processing. Code included. ⭐ the repository to support the advancement of audio and signal processing!

ICASSP 2023


PDF version of the ICASSP 2023 Conference Programme, which lists all accepted full papers along with their presentation mode and time.


Other collections of the best AI conferences

❗ Conference table will be up to date all the time.

Conference Year
Computer Vision (CV)
CVPR 2023
ICCV 2023
Speech/Signal Processing (SP/SigProc)
INTERSPEECH 2023
ISMIR 2023

Contributors



Contributions to improve the completeness of this list are greatly appreciated. If you come across any overlooked papers, please feel free to create pull requests, open issues or contact me via email. Your participation is crucial to making this repository even better.


Papers

List of sections

Audio for Multimedia and Multimodal Processing

🆔 Title Repo Paper
647 Diverse and Vivid Sound Generation from Text Descriptions GitHub Page IEEE Xplore
arXiv
2248 EPIC-SOUNDS: A Large-Scale Dataset of Actions that Sound GitHub Page
GitHub
IEEE Xplore
arXiv
784 I See What You Hear: A Vision-inspired Method to Localize Words IEEE Xplore
arXiv
6119 Incorporating Lip Features Into Audio-Visual Multi-Speaker DOA Estimation by Gated Fusion IEEE Xplore
6787 UAVM: Towards Unifying Audio and Visual Models (SPS Journal Paper) GitHub IEEE Xplore
arXiv

Drone-vs-Bird Detection Grand Challenge at ICASSP23

🆔 Title Repo Paper
6834 High-Speed Drone Detection based on Yolo-v8 IEEE Xplore
6863 S-Feature Pyramid Network and Attention Model for Drone Detection IEEE Xplore
6881 Drone-vs-Bird: Drone Detection using Yolov7 with CSRT Tracker IEEE Xplore

Human Identification and Face Recognition

🆔 Title Repo Paper
530 EMCLR: Expectation Maximization Contrastive Learning Representations IEEE Xplore
711 Boosting Person Re-Identification with Viewpoint Contrastive Learning and Adversarial Training IEEE Xplore
812 Top-K Visual Tokens Transformer: Selecting Tokens for Visible-infrared Person Re-Identification IEEE Xplore
2531 Frequency-aware Attentional Feature Fusion for Deepfake Detection IEEE Xplore
5309 Recursive Joint Attention for Audio-Visual Fusion in Regression based Emotion Recognition GitHub IEEE Xplore
arXiv
3475 Multi-Stream Facial Adaptive Network for Expression Recognition from a Single Image GitHub IEEE Xplore

Self-Supervised Learning Methods

🆔 Title Repo Paper
429 PointACL: Adversarial Contrastive Learning for Robust Point Clouds Representation under Adversarial Attack GitHub IEEE Xplore
arXiv
2579 Enhancing Representation Learning with Deep Classifiers in Presence of Shortcut GitHub IEEE Xplore
730 K2NN: Self-Supervised Learning with Hierarchical Nearest Neighbors for Remote Sensing IEEE Xplore
4453 TriNet: Stabilizing Self-Supervised Learning from Complete or Slow Collapse GitHub IEEE Xplore
arXiv
1629 On Minimal Variations for Unsupervised Representation Learning IEEE Xplore
arXiv
740 Adaptive Data Augmentation for Contrastive Learning IEEE Xplore
arXiv

ASR with Constrained Resource

🆔 Title Repo Paper
690 De'HuBERT: Disentangling Noise in a Self-Supervised Model for Robust Speech Recognition IEEE Xplore
arXiv
1948 Masked Token Similarity Transfer for Compressing Transformer-based ASR Models IEEE Xplore
2888 Unsupervised Fine-Tuning Data Selection for ASR using Self-Supervised Speech Models IEEE Xplore
arXiv
3250 CB-Conformer: Contextual Biasing Conformer for Biased Word Recognition IEEE Xplore
arXiv
3712 Context-aware Fine-Tuning of Self-Supervised Speech Models IEEE Xplore
arXiv
6449 Data2vec-Aqc: Search for the Right Teaching Assistant in the Teacher-Student Training Setup GitHub IEEE Xplore
arXiv

ASR: Multilingual Speech Recognition

🆔 Title Repo Paper
2417 Hierarchical Softmax for End-to-End Low-Resource Multilingual Speech Recognition GitHub IEEE Xplore
arXiv
4510 Improving Massively Multilingual ASR With Auxiliary CTC Objectives GitHub Page
GitHub
IEEE Xplore
arXiv
4777 Massively Multilingual Shallow Fusion with Large Language Models IEEE Xplore
arXiv
5465 UML: A Universal Monolingual Output Layer for Multilingual ASR IEEE Xplore
arXiv
5744 Investigation Into Phone-based Subword Units for Multilingual End-to-End Speech Recognition IEEE Xplore
6221 Massively Multilingual ASR on 70 Languages: Tokenization, Architecture, and Generalization Capabilities IEEE Xplore
arXiv

Adaptive Signal Processing

🆔 Title Repo Paper
1224 A Compensated Shrinkage Affine Projection Algorithm for Debiased Sparse Adaptive Filtering IEEE Xplore
1761 Dynamic Selection of p-Norm in Linear Adaptive Filtering via Online Kernel-based Reinforcement Learning IEEE Xplore
arXiv
2511 Neural Network Models with Integrated Training and Adaptation for Nonlinear Acoustic System Identification IEEE Xplore
3895 Neural Mode Estimation IEEE Xplore
5352 Adaptive ECCM for Mitigating Smart Jammers IEEE Xplore
arXiv
6529 Differentiable Adaptive Short-Time Fourier Transform with Respect to the Window Length IEEE Xplore

6G Integrated Sensing and Communication (ISAC) from Theory to Practice - A Signal Processing Perspective

🆔 Title Repo Paper
3049 6G Integrated Sensing and Communication - Sensing Assisted Environmental Reconstruction and Communication IEEE Xplore
3325 Neurally Augmented State Space Model for Simultaneous Communication and Tracking with Low Complexity Receivers IEEE Xplore
3456 Multi-View Millimeter-Wave Imaging Over Wireless Cellular Network IEEE Xplore
3803 Joint Data Association, NLOS Mitigation, and Clutter Suppression for Networked Device-Free Sensing in 6G Cellular Network IEEE Xplore
arXiv
4255 Integrating the Sensing and Radio Communications Channel Modelling from Radar Mutual Interference IEEE Xplore
5326 Active Beam Tracking with Reconfigurable Intelligent Surface IEEE Xplore

Applications to Physiological Signals, Audio, and Speech

🆔 Title Repo Paper
5872 ClassA Entropy for the Analysis of Structural Complexity of Physiological Signals IEEE Xplore
1034 Unobtrusive Respiratory Monitoring System for Intensive Care IEEE Xplore
4381 Improved WiFi-based Respiration Tracking via Contrast Enhancement IEEE Xplore
4851 Joint Angle and Respiration Estimation for Passive and Device-Free Respiration Monitoring IEEE Xplore
3418 Implementing Continuous HRTF Measurement in Near-Field IEEE Xplore
arXiv
5094 SeliNet: A Lightweight Model for Single Channel Speech Separation IEEE Xplore
5196 Adaptive Time-Scale Modification for Improving Speech Intelligibility based on Phoneme Clustering for Streaming Services IEEE Xplore
3109 Cutting through the Noise: An Empirical Comparison of Psychoacoustic and Envelope-based Features for Machinery Fault Detection IEEE Xplore
arXiv
4835 Cochlear Decomposition: A Novel Bio-Inspired Multiscale Analysis Framework IEEE Xplore
2458 Design and Performance of the Low-Power Noise Reduction Algorithm of the Med-EL Sonnet 2TM Cochlear Implant Audio Processor IEEE Xplore
6491 Modulo EEG Signal Recovery using Transformers IEEE Xplore
454 Knowledge-Graph Augmented Music Representation for Genre Classification IEEE Xplore

Super Resolution

🆔 Title Repo Paper
275 PFT-SSR: Parallax Fusion Transformer for Stereo Image Super-Resolution GitHub IEEE Xplore
arXiv
326 Raising the Limit of Image Rescaling using Auxiliary Encoding IEEE Xplore
arXiv
1431 Kernel Estimation and Deconvolution for Blind Image Super-Resolution IEEE Xplore
1555 A Comprehensive Comparison of Projections in Omnidirectional Super-Resolution IEEE Xplore
arXiv
1900 Long-Short Attention Network for the Spectral Super-Resolution of Multispectral Images GitHub IEEE Xplore
2363 Multi-Level Fusion for Burst Super-Resolution with Deep Permutation-Invariant Conditioning IEEE Xplore
2684 Frequency Reciprocal Action and Fusion for Single Image Super-Resolution IEEE Xplore
2777 FCIR: Rethink Aerial Image Super Resolution with Fourier Analysis GitHub IEEE Xplore
Pdf
2962 A Content-based Multi-Scale Network for Single Image Super-Resolution IEEE Xplore
3053 Learning to Explain: A Gradient-based Attribution Method for Interpreting Super-Resolution Networks IEEE Xplore
3140 CNN Filter for RPR-based SR in VVC with Wavelet Decomposition IEEE Xplore
3555 Local to Global Prior Learning for Blind Unsupervised Image Super-Resolution IEEE Xplore

Denoising

🆔 Title Repo Paper
5974 Rain2Avoid: Self-Supervised Single Image Deraining IEEE Xplore
5479 Aprogressive Image Dehazing Framework with Inter and Intra Contrastive Learning IEEE Xplore
5267 Graph-based Point Cloud Color Denoising with 3-Dimensional Patch-based Similarity IEEE Xplore
2310 CAENet: using Collaborative Attention Transformer and Add-Boost Strategy for Single Image Deraining IEEE Xplore
1791 SFEMGN: Image Denoising with Shallow Feature Enhancement Network and Multi-Scale ConvGRU IEEE Xplore
1554 Affinity Learning with Blind-Spot Self-Supervision for Image Denoising IEEE Xplore
1473 SAR Image Despeckling with Residual-in-Residual Dense Generative Adversarial Network IEEE Xplore
1211 Uncer2Natural: Uncertainty-aware Unsupervised Image Denoising IEEE Xplore
553 HPFTN: Hierarchical Progressive Fusion Transformer Network for Video Denoising IEEE Xplore
398 Subspace Modeling enabled High-Sensitivity X-Ray Chemical Imaging IEEE Xplore
arXiv
274 MSP-Former: Multi-Scale Projection Transformer for Single Image Desnowing IEEE Xplore
arXiv
117 Hyperspectral Image Denoising via Nonlocal Rank Residual Modeling GitHub IEEE Xplore

Semantic Segmentation

🆔 Title Repo Paper
190 LoG-CAN: Local-Global Class-aware Network for Semantic Segmentation of Remote Sensing Images GitHub IEEE Xplore
arXiv
406 WUDA: Unsupervised Domain Adaptation based on Weak Source Domain Labels GitHub IEEE Xplore
arXiv
555 Class-aware Contextual Information for Semantic Segmentation IEEE Xplore
1132 Semi-Supervised Semantic Segmentation with Structured Output Space Adaption IEEE Xplore
1170 PRRD: Pixel-Region Relation Distillation for Efficient Semantic Segmentation IEEE Xplore
2521 Spatial Correlation Fusion Network for Few-Shot Segmentation IEEE Xplore
3306 Exploring Vision Transformer Layer Choosing for Semantic Segmentation IEEE Xplore
arXiv
3941 Joint Training of Hierarchical GANs and Semantic Segmentation for Expression Translation IEEE Xplore
6357 Progressive Refinement Learning based on Feature Cross Perception for Residential Areas Semantic Segmentation IEEE Xplore
1599 Lightweight Portrait Segmentation via Edge-optimized Attention GitHub IEEE Xplore
3857 A Dynamic Cross-Scale Transformer with Dual-Compound Representation for 3D Medical Image Segmentation IEEE Xplore
3793 LABANet: Lead-Assisting Backbone Attention Network for Oral Multi-Pathology Segmentation IEEE Xplore

Object Segmentation

🆔 Title Repo Paper
3473 Robust Video Object Segmentation with Restricted Attention IEEE Xplore
3501 Stacking-based Attention Temporal Convolutional Network for Action Segmentation IEEE Xplore
2436 VLKP: Video Instance Segmentation with Visual-Linguistic Knowledge Prompts IEEE Xplore
4867 Automatic Error Detection in Integrated Circuits Image Segmentation: A Data-Driven Approach IEEE Xplore
arXiv
3745 TransWnet: Integrating Transformers Into CNNs via Row and Column Attention for Abdominal Multi-Organ Segmentation IEEE Xplore
5844 Active Perception System for Enhanced Visual Signal Recovery using Deep Reinforcement Learning IEEE Xplore
302 OAFormer: Learning Occlusion Distinguishable Feature for Amodal Instance Segmentation IEEE Xplore
698 Encoder-Decoder Graph Convolutional Network for Automatic Timed-Up-and-Go and Sit-to-Stand Segmentation IEEE Xplore
758 Meta++ Network for Few-Shot Aerospace Crack Segmentation IEEE Xplore
1764 IAST: Instance Association Relying on Spatio-Temporal Features for Video Instance Segmentation GitHub IEEE Xplore
2469 Continual Cell Instance Segmentation of Microscopy Images IEEE Xplore

Deep Learning for Image and Video Processing

🆔 Title Repo Paper
5397 Spammer Detection on Short Video Applications: A New Challenge and Baselines IEEE Xplore
814 Weakly- and Semi-Supervised Object Localization IEEE Xplore
2503 Balanced Mixup Loss for Long-Tailed Visual Recognition IEEE Xplore
4130 On Cross-Layer Alignment for Model Fusion of Heterogeneous Neural Networks IEEE Xplore
arXiv
2813 Invariant Adversarial Imitation Learning from Visual Inputs IEEE Xplore
6423 SPECTRANET-SO(3): Learning Satellite Orientation from Optical Spectra by Implicitly Modeling Mutually Exclusive Probability Distributions on the Rotation Manifold IEEE Xplore
3097 Structured-Anchor Projected Clustering for Hyperspectral Images IEEE Xplore
140 Learning Sparse Auto-Encoders for Green AI Image Coding IEEE Xplore
arXiv
643 Learning to Generate 3D Representations of Building Roofs using Single-View Aerial Imagery IEEE Xplore
arXiv
4843 Robust Monocular Localization of Drones by Adapting Domain Maps to Depth Prediction Inaccuracies IEEE Xplore
arXiv
5940 Large Dimensional Analysis of LS-SVM Transfer Learning: Application to PolSAR Classification IEEE Xplore
Pdf
5062 SMUG: Towards Robust MRI Reconstruction by Smoothed Unrolling GitHub IEEE Xplore
arXiv

Graph based Learning

🆔 Title Repo Paper
715 Graph-Graph Context Dependency Attention for Graph Edit Distance IEEE Xplore
3882 Topology Uncertainty Modeling for Imbalanced Node Classification on Graphs IEEE Xplore
589 CPD-GAN: Cascaded Pyramid Deformation GAN for Pose Transfer IEEE Xplore
5321 Space-Time Graph Neural Networks with Stochastic Graph Perturbations IEEE Xplore
arXiv
6793 Untrained Graph Neural Networks for Denoising IEEE Xplore
arXiv
5846 Learning on Graphs under Label Noise IEEE Xplore
arXiv
2906 Select the Best: Enhancing Graph Representation with Adaptive Negative Sample Selection IEEE Xplore
2586 Learning with Multigraph Convolutional Filters IEEE Xplore
arXiv
2164 Self-Supervised Guided Hypergraph Feature Propagation for Semi-Supervised Classification with Missing Node Features IEEE Xplore
arXiv
3752 Incorporating Reliability in Graph Information Propagation by Fluid Dynamics Diffusion: a Case of Multimodal Semi-Supervised Deep Learning IEEE Xplore
5159 GraphMAD: Graph Mixup for Data Augmentation using Data-Driven Convex Clustering GitHub IEEE Xplore
arXiv
3724 Time-Varying Signals Recovery via Graph Neural Networks IEEE Xplore
arXiv

Learning from Multimodal Data

🆔 Title Repo Paper
3546 Multimodal Knowledge Distillation for Arbitrary-Oriented Object Detection in Aerial Images IEEE Xplore
1234 Hierarchical Spatial-Temporal Transformer with Motion Trajectory for Individual Action and Group Activity Recognition IEEE Xplore
693 Autonomous Soundscape Augmentation with Multimodal Fusion of Visual and Participant-Linked Inputs IEEE Xplore
arXiv
1571 Towards Robust Audio-based Vehicle Detection via Importance-Aware Audio-Visual Learning IEEE Xplore
841 Hierarchical Multi-Task Learning for Fabric Component Analysis Based on NIR Spectral Signals IEEE Xplore
1706 Cross Modality Knowledge Distillation for Robust Pedestrian Detection in Low Light and Adverse Weather Conditions IEEE Xplore
6375 Data Leakage in Cross-Modal Retrieval Training: A Case Study IEEE Xplore
arXiv
5825 Difficulty-Aware Data Augmentor for Scene Text Recognition IEEE Xplore
461 TinyOOD: Effective Out-of-Distribution Detection for TinyML IEEE Xplore
4211 A Principled Approach to Model Validation in Domain Generalization GitHub IEEE Xplore
arXiv
4220 Scale-Adaptive Tiny Object Detection Enhanced by Across-Scale and Shape-Preserved Semantic Location IEEE Xplore
3735 Audio-Visual Inpainting: Reconstructing Missing Visual Information with Sound IEEE Xplore

Matrix/Tensor Factorization and Completion

🆔 Title Repo Paper
507 Learn Topological Representation with Flexible Manifold Layer GitHub IEEE Xplore
1438 Tensorized LSSVMs for Multitask Regression IEEE Xplore
arXiv
3571 A Bayesian Perspective for Determinant Minimization based Robust Structured Matrix Factorization IEEE Xplore
arXiv
5045 Volume-Regularized Nonnegative Tucker Decomposition with Identifiability Guarantees IEEE Xplore
687 Transductive Matrix Completion with Calibration for Multi-Task Learning IEEE Xplore
arXiv
1668 Projected Hierarchical ALS for Generalized Boolean Matrix Factorization IEEE Xplore
2934 Robust Binary Component Decompositions IEEE Xplore
3897 Multi-Resolution Convolutional Dictionary Learning for Riverbed Dynamics Modeling IEEE Xplore
2388 PARAFAC2-based Coupled Matrix and Tensor Factorizations GitHub IEEE Xplore
ResearchGate
arXiv
6088 Deep Plug-and-Play for Tensor Robust Principal Component Analysis IEEE Xplore
6125 Geometric Matrix Completion with Collaborative Routing between Capsules IEEE Xplore
3256 Enrollment Rate Prediction in Clinical Trials based on CDF Sketching and Tensor Factorization Tools IEEE Xplore

ASR - Improve Latency, Efficiency, and Accuracy

🆔 Title Repo Paper
900 Multi-blank Transducers for Speech Recognition GitHub IEEE Xplore
arXiv
1642 Diagonal State Space Augmented Transformers for Speech Recognition IEEE Xplore
arXiv
1661 TrimTail: Low-Latency Streaming ASR with Simple but Effective Spectrogram-Level Length Penalty IEEE Xplore
arXiv
3385 Towards Accurate and Real-Time End-of-Speech Estimation IEEE Xplore
Amazon Science
3999 Peak-First CTC: Reducing the Peak Latency of CTC Models by Applying Peak-First Regularization IEEE Xplore
arXiv
4330 Evaluating Parameter-Efficient Transfer Learning Approaches on SURE Benchmark for Speech Understanding GitHub IEEE Xplore
arXiv
5058 Powerful and Extensible WFST Framework for RNN-Transducer Losses IEEE Xplore
arXiv
5337 Predicting Multi-Codebook Vector Quantization Indexes for Knowledge Distillation IEEE Xplore
arXiv
5434 Improving Non-Autoregressive Speech Recognition with Autoregressive Pretraining IEEE Xplore
5558 Conversation-Oriented ASR with Multi-Look-Ahead CBS Architecture IEEE Xplore
arXiv
5607 Using Adapters to Overcome Catastrophic Forgetting in End-to-End Automatic Speech Recognition IEEE Xplore
arXiv
5824 Fast and Parallel Decoding for Transducer GitHub IEEE Xplore
arXiv

ASR: Domain Adaptation and Robust Training

🆔 Title Repo Paper
505 SAN: A Robust End-to-End ASR Model Architecture IEEE Xplore
arXiv
1604 Explanations for Automatic Speech Recognition IEEE Xplore
arXiv
1674 On-the-Fly Text Retrieval for End-to-End ASR Adaptation IEEE Xplore
Amazon Science
arXiv
2397 Unsupervised Model-based Speaker Adaptation of End-To-End Lattice-Free MMI Model for Speech Recognition IEEE Xplore
arXiv
3258 Domain Adaptation with External Off-Policy Acoustic Catalogs for Scalable Contextual End-To-End Automated Speech Recognition IEEE Xplore
Amazon Science
3600 Comparison of Soft and Hard Target RNN-T Distillation for Large-scale ASR IEEE Xplore
arXiv
3973 WeavSpeech: Data Augmentation Strategy for Automatic Speech Recognition via Semantic-aware Weaving IEEE Xplore
4139 Joint Discriminator and Transfer based Fast Domain Adaptation for End-to-End Speech Recognition IEEE Xplore
5424 Improving Fairness and Robustness in End-to-End Speech Recognition Through Unsupervised Clustering IEEE Xplore
arXiv
5491 Improving Fast-Slow Encoder based Transducer with Streaming Deliberation IEEE Xplore
arXiv
5496 Dynamic Alignment Mask CTC: Improved Mask-CTC with Aligned Cross Entropy IEEE Xplore
arXiv
5902 Improving Accented Speech Recognition with Multi-Domain Training IEEE Xplore
arXiv

ASR: New Models

🆔 Title Repo Paper
179 UCONV-Conformer: High Reduction of Input Sequence Length for End-to-End Speech Recognition GitHub Code IEEE Xplore
arXiv
876 A Comparison of Semi-Supervised Learning Techniques for Streaming ASR at Scale IEEE Xplore
arXiv
1356 Improving Contextual Biasing with Text Injection IEEE Xplore
1655 Structured State Space Decoder for Speech Recognition and Synthesis IEEE Xplore
arXiv
3365 JEIT: Joint End-to-End Model and Internal Language Model Training for Speech Recognition IEEE Xplore
arXiv
3368 Variable Attention Masking for Configurable Transformer Transducer Speech Recognition IEEE Xplore
arXiv
3499 Factorized Blank Thresholding for Improved Runtime Efficiency of Neural Transducers IEEE Xplore
arXiv
3926 Fast-U2++: Fast and Accurate End-to-End Speech Recognition in Joint CTC/Attention Frames IEEE Xplore
arXiv
4365 Understanding Shared Speech-Text Representations IEEE Xplore
arXiv
4534 Front-End Adapter: Adapting Front-End Input of Speech based Self-Supervised Learning for Speech Recognition IEEE Xplore
arXiv
2237 Lego-Features: Exporting Modular Encoder Features for Streaming and Deliberation ASR IEEE Xplore
arXiv
5384 Modular Conformer Training for Flexible End-to-End ASR IEEE Xplore

ASR: Noise Robustness

🆔 Title Repo Paper
1897 On Word Error Rate Definitions and Their Efficient Computation for Multi-Speaker Speech Recognition Systems GitHub IEEE Xplore
arXiv
1919 Gradient Remedy for Multi-Task Learning in End-to-End Noise-Robust Speech Recognition GitHub IEEE Xplore
arXiv
1929 MADI: Inter-Domain Matching and Intra-Domain Discrimination for Cross-Domain Speech Recognition IEEE Xplore
arXiv
1971 Robust Data2vec: Noise-Robust Speech Representation Learning for ASR by Combining Regression and Improved Contrastive Learning IEEE Xplore
arXiv
2040 Robust Audio-Visual ASR with Unified Cross-Modal Attention IEEE Xplore
3292 HuBERT-AGG: Aggregated Representation Distillation of Hidden-Unit BERT for Robust Speech Recognition IEEE Xplore
4124 Speech and Noise Dual-Stream Spectrogram Refine Network with Speech Distortion Loss for Robust Speech Recognition GitHub IEEE Xplore
arXiv
4680 RobustDistiller: Compressing Universal Speech Representations for Enhanced Environment Robustness IEEE Xplore
arXiv
5455 Cleanformer: A Multichannel Array Configuration-Invariant Neural Enhancement Frontend for ASR in Smart Speakers IEEE Xplore
arXiv
5504 On the Effectiveness of Monoaural Target Source Extraction for Distant End-to-End Automatic Speech Recognition IEEE Xplore
6389 Noise-aware Target Extension with Self-Distillation for Robust Speech Recognition IEEE Xplore

Audio Signal Restoration and Editing

🆔 Title Repo Paper
5003 AERO: Audio Super Resolution in the Spectral Domain WEB Page
GitHub
IEEE Xplore
arXiv
1768 UPGLADE: Unplugged Plug-and-Play Audio Declipper based on Consensus Equilibrium of DNN and Sparse Optimization IEEE Xplore
Pdf
2121 Improving Performance of Real-Time Full-Band Blind Packet-Loss Concealment with Predictive Network GitHub IEEE Xplore
arXiv
4388 Faster than Fast: Accelerating the Griffin-Lim Algorithm IEEE Xplore
arXiv
3726 Improving Phase-Vocoder-based Time Stretching by Time-Directional Spectrogram Squeezing GitHub Page IEEE Xplore
Pdf
6288 Extreme Audio Time Stretching using Neural Synthesis IEEE Xplore
arXiv

Epilepsy Detection Grand Challenge

🆔 Title Repo Paper
7015 Lightweight Machine Learning for Seizure Detection on Wearable Devices IEEE Xplore
Pdf
7021 Pretrained Transformers for Seizure Detection IEEE Xplore
7022 Towards Interpretable Seizure Detection using Wearables IEEE Xplore
7033 Optimization of the Deep Neural Networks for Seizure Detection IEEE Xplore

Deep Learning Theory

🆔 Title Repo Paper
2465 MSFormer: Multi-Scale Transformer with Neighborhood Consensus for Feature Matching IEEE Xplore
3498 Decoupled Visual Causality for Robust Detection IEEE Xplore
2500 Semantics-Disentangled Contrastive Embedding for Generalized Zero-Shot Learning IEEE Xplore
4730 Dynamic Scalable Self-Attention Ensemble for Task-Free Continual Learning IEEE Xplore
2125 Ultimate Negative Sampling for Contrastive Learning IEEE Xplore
3936 An Application of Quantum Mechanics to Attention Methods in Computer Vision IEEE Xplore

Neural Architecture Search

🆔 Title Repo Paper
3492 Search for Efficient Deep Visual-Inertial Odometry Through Neural Architecture Search GitHub IEEE Xplore
4072 Receptive Field Reliant Zero-Cost Proxies for Neural Architecture Search IEEE Xplore
4346 ZO-DARTS: Differentiable Architecture Search with Zeroth-Order Approximation IEEE Xplore
2675 Performing Neural Architecture Search without Gradients GitHub IEEE Xplore
796 Neural Architecture of Speech IEEE Xplore
1461 BHE-DARTS: Bilevel Optimization based on Hypergradient Estimation for Differentiable Architecture Search IEEE Xplore

Expressive and Controllable TTS

🆔 Title Repo Paper
2625 Improving Speech Prosody of Audiobook Text-to-Speech Synthesis with Acoustic and Textual Contexts GitHub Page IEEE Xplore
arXiv
4768 Context-Aware Coherent Speaking Style Prediction with Hierarchical Transformers for Audiobook Speech Synthesis GitHub Page IEEE Xplore
arXiv
4776 Ensemble Prosody Prediction for Expressive Speech Synthesis WEB Page IEEE Xplore
arXiv
5782 Expressive-VC: Highly Expressive Voice Conversion with Attention Fusion of Bottleneck and Perturbation Features GitHub Page IEEE Xplore
arXiv
5970 High-Acoustic Fidelity Text to Speech Synthesis with Fine-Grained Control of Speech Attributes IEEE Xplore
6203 Embedding a Differentiable Mel-Cepstral Synthesis Filter to a Neural Speech Synthesis System GitHub IEEE Xplore
arXiv

Keyword Spotting

🆔 Title Repo Paper
1848 Disentangled Training with Adversarial Examples for Robust Small-Footprint Keyword Spotting IEEE Xplore
Facebook
Pdf
3578 Dual-Attention Neural Transducers for Efficient Wake Word Spotting in Speech Recognition IEEE Xplore
arXiv
5025 Fixed-Point Quantization Aware Training for On-Device Keyword-Spotting IEEE Xplore
arXiv
5106 To Wake-Up or Not to Wake-Up: Reducing Keyword False Alarm by Successive Refinement IEEE Xplore
arXiv
5584 Transcription Free Filler Word Detection with Neural Semi-CRFs GitHub IEEE Xplore
arXiv
6078 The DKU Post-Challenge Audio-Visual Wake Word Spotting System for the 2021 MISP Challenge: Deep Analysis GitHub IEEE Xplore
arXiv

Detection and Classification

🆔 Title Repo Paper
657 Passive Detection of Rank-One Gaussian Signals for Known Channel Subspaces and Arbitrary Noise IEEE Xplore
Pdf
2389 False Alarm Regulation for Off-Grid Target Detection with the Matched Filter IEEE Xplore
2536 Data-Driven Quickest Change Detection in Markov Models IEEE Xplore
arXiv
3510 Quickest Change Detection with Leave-one-Out Density Estimation IEEE Xplore
arXiv
4778 Identifying Coordination in a Cognitive Radar Network - A Multi-Objective Inverse Reinforcement Learning Approach IEEE Xplore
arXiv
4815 Improved Small Sample Hypothesis Testing using the Uncertain Likelihood Ratio IEEE Xplore

Advances in Signal Processing and Machine Learning for Non-Intrusive Load Monitoring

🆔 Title Repo Paper
2170 A Wavelet Scattering Approach for Load Identification with Limited Amount of Training Data IEEE Xplore
2653 Applying Symmetrical Component Transform for Industrial Appliance Classification in Non-Intrusive Load Monitoring IEEE Xplore
Pdf
3326 ContiNILM: A Continual Learning Scheme for Non-Intrusive Load Monitoring IEEE Xplore
5853 Improving Knowledge Distillation for Non-Intrusive Load Monitoring through Explainability Guided Learning IEEE Xplore
Pdf
6414 Improved Appliance Transient Feature Extraction via Template Matching IEEE Xplore

Machine Learning Applications

🆔 Title Repo Paper
6355 Causal Discovery and Causal Inference based Counterfactual Fairness in Machine Learning IEEE Xplore
4965 Benchmarking Convolutional Neural Network Inference on Low-Power Edge Devices IEEE Xplore
1115 Code-Enhanced Fine-Grained Semantic Matching for Tag Recommendation in Software Information Sites IEEE Xplore
394 Robust Dominant Periodicity Detection for Time Series with Missing Data IEEE Xplore
arXiv
3994 Dynamic Split Computing for Efficient Deep Edge Intelligence WEB Page IEEE Xplore
arXiv
5723 Dense Adversarial Transfer Learning based on Class-Invariance IEEE Xplore
4620 VAN-ICP: GPU-Accelerated Approximate Nearest Neighbor Search for ICP Registration via Voxel Dilation GitHub IEEE Xplore
5776 Clustering-based Supervised Contrastive Learning for Identifying Risk Items on Heterogeneous Graph IEEE Xplore
4052 Multiresolution Signal Processing of Financial Market Objects IEEE Xplore
arXiv
1752 Hierarchical Multi-Agent Reinforcement Learning with Intrinsic Reward Rectification IEEE Xplore
3493 An Antispoofing Approach in Biometric Authentication System for a Smartcard IEEE Xplore
3576 Unsupervised Domain Adaptation via Subspace Interpolating Deep Dictionary Learning: A Case Study in Machine Inspection IEEE Xplore

Classification

🆔 Title Repo Paper
283 Multi-Modal Domain Generalization for Cross-Scene Hyperspectral Image Classification IEEE Xplore
1056 Hierarchical Transformer for Multi-Label Trailer Genre Classification IEEE Xplore
1236 S3I-PointHop: SO(3)-Invariant PointHop for 3D Point Cloud Classification IEEE Xplore
arXiv
1302 Sample-Aware Knowledge Distillation for Long-Tailed Learning IEEE Xplore
1562 Laryngeal Leukoplakia Classification via Dense Multiscale Feature Extraction in White Light Endoscopy Images IEEE Xplore
1904 Long-Tailed Recognition with Causal Invariant Transformation IEEE Xplore
2199 STACKMAPS: A Visualization Technique for Diabetic Retinopathy Grading IEEE Xplore
ResearchGate
2904 Gender-Cartoon: Image Cartoonization Method based on Gender Classification IEEE Xplore
3167 Extracting the Brain-Like Representation by an Improved Self-Organizing Map for Image Classification GitHub IEEE Xplore
arXiv
3888 DDN: Dynamic Aggregation Enhanced Dual-Stream Network for Medical Image Classification IEEE Xplore
4696 LGViT: Local-Global Vision Transformer for Breast Cancer Histopathological Image Classification IEEE Xplore
5583 Learning a Weight Map for Weakly-Supervised Localization IEEE Xplore

Human Posture Estimation

🆔 Title Repo Paper
301 Interweaved Graph and Attention Network for 3D Human Pose Estimation GitHub IEEE Xplore
arXiv
3696 Learning 3D Human Pose and Shape Estimation using Uncertainty-Aware Body Part Segmentation IEEE Xplore
3841 Monocular 3D Human Pose Estimation based on Global Temporal-Attentive and Joints-Attention in Video GitHub IEEE Xplore
4380 EVOPOSE: A Recursive Transformer for 3D Human Pose Estimation with Kinematic Structure Priors IEEE Xplore
arXiv
142 HTNet: Human Topology Aware Network for 3D Human Pose Estimation GitHub IEEE Xplore
arXiv
1107 Improving Occluded Human Pose Estimation via Linked Joints IEEE Xplore
5121 Efficient and Effective Multi-Camera Pose Estimation with Weighted M-Estimate Sample Consensus IEEE Xplore
5668 AMPose: Alternately Mixed Global-Local Attention Model for 3D Human Pose Estimation GitHub IEEE Xplore
arXiv
5750 FlowPose: Conditional Normalizing Flows for 3D Human Pose and Shape Estimation from Monocular Videos IEEE Xplore
6050 Animal Re-Identification Algorithm for Posture Diversity GitHub IEEE Xplore
6322 Retrieval-based Natural 3D Human Motion Generation IEEE Xplore
2453 Human Pose Estimation from Ambiguous Pressure Recordings with Spatio-Temporal Masked Transformers IEEE Xplore
arXiv

Human Reconstruction

🆔 Title Repo Paper
4237 Time-Frequency Awareness Network for Human Mesh Recovery from Videos GitHub IEEE Xplore
2028 Diffusion Motion: Generate Text-Guided 3D Human Motion by Diffusion Model IEEE Xplore
arXiv
4667 GATOR: Graph-Aware Transformer with Motion-Disentangled Regression for Human Mesh Recovery from a 2D Pose GitHub IEEE Xplore
arXiv
5538 Real-Time Human Reconstruction based on Human Pose Prior and Epipolar Refinement IEEE Xplore
642 Efficient Feature Fusion for Learning-based Photometric Stereo IEEE Xplore
2442 Volumetric 3D Reconstruction with Window-Wise Global Feature Aggregation IEEE Xplore
4008 Stereoscopic Video Retargeting based on Camera Motion Classification IEEE Xplore
4893 Detail-Aware Uncalibrated Photometric Stereo IEEE Xplore
Pdf
5712 SDRNet: Shape Decoupled Regression Network for 3D Face Reconstruction IEEE Xplore
1119 Binary Image Fast Perfect Recovery from Sparse 2D-DFT Coefficients IEEE Xplore
1175 HQP-MVS: High-Quality Plane Priors Assisted Multi-View Stereo for Low-Textured Areas IEEE Xplore
3183 Dynamic Multi-View Scene Reconstruction using Neural Implicit Surface IEEE Xplore
arXiv

Face Recognition

🆔 Title Repo Paper
3959 LOGO-Former: Local-Global Spatio-Temporal Transformer for Dynamic Facial Expression Recognition IEEE Xplore
arXiv
4254 Quaternion Orthogonal Transformer for Facial Expression Recognition in the Wild GitHub IEEE Xplore
arXiv
3490 Privacy Preserving Face Recognition with Lensless Camera IEEE Xplore
3649 MaskDUL: Data Uncertainty Learning in Masked Face Recognition GitHub IEEE Xplore
4814 Cov Loss: Covariance-based Loss for Deep Face Recognition IEEE Xplore
5674 Boosting Face Recognition Performance with Synthetic Data and Limited Real Data IEEE Xplore
2762 A Dual-Branch Adaptive Distribution Fusion Framework for Real-World Facial Expression Recognition GitHub IEEE Xplore
4199 Efficient Practices for Profile-to-Frontal Face Synthesis and Recognition IEEE Xplore
4208 Learning Causal Representations for Generalizable Face Anti-Spoofing IEEE Xplore
2767 Self-Paced Partial Domain-aware Learning for Face Anti-Spoofing IEEE Xplore
746 Context-aware Face Clustering with Graph Convolutional Networks IEEE Xplore

Source Separation, ICA, and Sparsity

🆔 Title Repo Paper
193 A Multi-Stage Triple-Path Method for Speech Separation in Noisy and Reverberant Environments IEEE Xplore
arXiv
524 On the Minimum Perimeter Criterion for Bounded Component Analysis IEEE Xplore
4129 Joint Unmixing and Demosaicing Methods for Snapshot Spectral Images IEEE Xplore
5036 Identifiable Bounded Component Analysis via Minimum Volume Enclosing Parallelotope IEEE Xplore
5587 Balanced Deep CCA for Bird Vocalization Detection GitHub IEEE Xplore
arXiv
1692 Independent Vector Analysis with Multivariate Gaussian Model: A Scalable Method by Multilinear Regression IEEE Xplore
3184 Activity-Informed Industrial Audio Anomaly Detection via Source Separation IEEE Xplore
6717 Double Nonstationarity: Blind Extraction of Independent Nonstationary Vector/Component from Nonstationary Mixtures - Algorithms IEEE Xplore
arXiv
6798 Towards Flexible Sparsity-Aware Modeling: Automatic Tensor Rank Learning using the Generalized Hyperbolic Prior IEEE Xplore
arXiv
5426 MedleyVox: An Evaluation Dataset for Multiple Singing Voices Separation WEB Page
GitHub
IEEE Xplore
arXiv
674 Hybrid Transformers for Music Source Separation GitHub IEEE Xplore
arXiv
5141 Dictionary Learning on Graph Data with Weisfieler-Lehman Sub-Tree Kernel and KSVD IEEE Xplore

Neural Sound Synthesis and Representation

🆔 Title Repo Paper
2678 GANStrument: Adversarial Instrument Sound Synthesis with Pitch-Invariant Instance Conditioning GitHub Page
GitHub
IEEE Xplore
arXiv
2555 I Hear Your True Colors: Image Guided Audio Generation WEB Page
GitHub
IEEE Xplore
arXiv
1261 Grad-StyleSpeech: Any-Speaker Adaptive Text-to-Speech Synthesis with Diffusion Models GitHub Page IEEE Xplore
arXiv
3085 Voice Conversion using Feature Specific Loss Function based Self-Attentive Generative Adversarial Network GitHub IEEE Xplore
1268 TriAAN-VC: Triple Adaptive Attention Normalization for Any-to-Any Voice Conversion GitHub Page
GitHub
IEEE Xplore
arXiv
6748 Decorrelating Feature Spaces for Learning General-Purpose Audio Representations GitHub
GitHub
IEEE Xplore
4904 Continuous Descriptor-based Control for Deep Audio Synthesis GitHub Page
GitHub
IEEE Xplore
arXiv
5786 Rigid-Body Sound Synthesis with Differentiable Modal Resonators GitHub Page
GitHub
IEEE Xplore
arXiv
5349 Exploring Approaches to Multi-Task Automatic Synthesizer Programming IEEE Xplore
6710 Speech Time-Scale Modification with GANs IEEE Xplore
4339 Full-Band General Audio Synthesis with Score-based Diffusion GitHub Page IEEE Xplore
arXiv
4443 Is Quality Enoughƒ Integrating Energy Consumption in a Large-Scale Evaluation of Neural Audio Synthesis Models IEEE Xplore

Deep Learning for Audio and Music Applications

🆔 Title Repo Paper
896 Controllable Music Inpainting with Mixed-Level and Disentangled Representation GitHub IEEE Xplore
1991 HIPI: A Hierarchical Performer Identification Model based on Symbolic Representation of Music IEEE Xplore
207 Chord-Conditioned Melody Harmonization with Controllable Harmonicity GitHub IEEE Xplore
arXiv
1878 Jazznet: A Dataset of Fundamental Piano Patterns for Music Audio Machine Learning Research GitHub IEEE Xplore
arXiv
5273 Music Mixing Style Transfer: A Contrastive Learning Approach to Disentangle Audio Effects GitHub Page
GitHub
IEEE Xplore
arXiv
1442 An Improved Optimal Transport Kernel Embedding Method with Gating Mechanism for Singing Voice Separation and Speaker Identification IEEE Xplore
3448 Tempo vs. Pitch: Understanding Self-Supervised Tempo Estimation GitHub IEEE Xplore
arXiv
1995 Adversarial Permutation Invariant Training for Universal Sound Separation WEB Page IEEE Xplore
arXiv
1379 Anomalous Sound Detection using Audio Representation with Machine ID based Contrastive Learning Pretraining IEEE Xplore
arXiv
4727 Low-Resource Music Genre Classification with Cross-Modal Neural Model Reprogramming GitHub IEEE Xplore
arXiv
1375 SPADE: Self-Supervised Pretraining for Acoustic Disentanglement IEEE Xplore
arXiv
1615 On Out-of-Distribution Detection for Audio with Deep Nearest Neighbors GitHub Page
GitHub
IEEE Xplore
arXiv

Machine Learning for Image and Video Processing

🆔 Title Repo Paper
1011 IoU-Aware Multi-Expert Cascade Network via Dynamic Ensemble for Long-Tailed Object Detection IEEE Xplore
Pdf
1622 Efficient Compressed Video Action Recognition via Late Fusion with a Single Network IEEE Xplore
1649 Amicable Aid: Perturbing Images to Improve Classification Performance IEEE Xplore
arXiv
3861 Spatial Cross-Attention for Transformer-based Image Captioning IEEE Xplore
Pdf
3879 Towards Hyperbolic Regularizers for Point Cloud Part Segmentation IEEE Xplore
5265 Clip4VideoCap: Rethinking CLIP for Video Captioning with Multiscale Temporal Fusion and Commonsense Knowledge IEEE Xplore
6356 Learning Silhouettes with Group Sparse Autoencoders GitHub IEEE Xplore
Pdf
5042 Deep Learning for Lagrangian Drift Simulation at The Sea Surface GitHub IEEE Xplore
arXiv
2382 Difference Guided VHR Remote Sensing Image Change Detection IEEE Xplore
2696 Adaptive Submanifold-Preserving Sparse Regression for Feature Selection and Multiclass Classification IEEE Xplore
6814 Learning Multiscale Convolutional Dictionaries for Image Reconstruction GitHub Page
GitHub
IEEE Xplore
arXiv
7162 Impact of PolSAR Pre-Processing and Balancing Methods on Complex-Valued Neural Networks Segmentation Tasks IEEE Xplore
arXiv
HAL Science

ASR: Text Adaptation

🆔 Title Repo Paper
209 Improving Contextual Spelling Correction by External Acoustics Attention and Semantic Aware Data Augmentation IEEE Xplore
arXiv
1007 AdapITN: A Fast, Reliable, and Dynamic Adaptive Inverse Text Normalization GitHub
Hugging Face
IEEE Xplore
ResearchGate
1373 Fast and Accurate Factorized Neural Transducer for Text Adaption of End-to-End Speech Recognition Models IEEE Xplore
arXiv
1628 Effective Training of RNN Transducer Models on Diverse Sources of Speech and Text Data IEEE Xplore
1672 Text is All You Need: Personalizing ASR Models using Controllable Speech Synthesis IEEE Xplore
arXiv
2409 Slot-triggered Contextual Biasing for Personalized Speech Recognition using Neural Transducers IEEE Xplore
Pdf
3355 Fine-grained Textual Knowledge Transfer to Improve RNN Transducers for Speech Recognition and Understanding IEEE Xplore
4612 Gated Contextual Adapters for Selective Contextual Biasing in Neural Transducers IEEE Xplore
Amazon Science
4830 Internal Language Model Estimation based Adaptive Language Model Fusion for Domain Adaptation IEEE Xplore
arXiv
4970 Adaptable End-to-End ASR Models using Replaceable Internal LMs and Residual Softmax IEEE Xplore
arXiv
5596 Mask The Bias: Improving Domain-Adaptive Generalization of CTC-based ASR with Internal Language Model Estimation IEEE Xplore
arXiv
6116 Factorized AED: Factorized Attention-based Encoder-Decoder for Text-Only Domain Adaptive ASR IEEE Xplore

ASR: Training Methods

🆔 Title Repo Paper
3731 Weight Averaging: A Simple Yet Effective Method to Overcome Catastrophic Forgetting in Automatic Speech Recognition IEEE Xplore
arXiv
112 Reducing the GAP Between Streaming and Non-Streaming Transducer-based ASR by Adaptive Two-Stage Knowledge Distillation IEEE Xplore
arXiv
164 Alignment Entropy Regularization IEEE Xplore
arXiv
392 From English to more Languages: Parameter-Efficient Model Reprogramming for Cross-Lingual Speech Recognition IEEE Xplore
arXiv
1499 Neural Transducer Training: Reduced Memory Consumption with Sample-Wise Computation IEEE Xplore
arXiv
2433 Towards Domain Generalisation in ASR with Elitist Sampling and Ensemble Knowledge Distillation IEEE Xplore
arXiv
2677 Accelerating RNN-T Training and Inference using CTC Guidance IEEE Xplore
arXiv
3382 Resource-Efficient Transfer Learning from Speech Foundation Model using Hierarchical Feature Fusion IEEE Xplore
arXiv
3917 Robust Knowledge Distillation from RNN-T Models with Noisy Training Labels using Full-Sum Loss IEEE Xplore
arXiv
5520 More Speaking or more Speakers? IEEE Xplore
arXiv
5845 Federated Learning for ASR based on Wav2Vec 2.0 IEEE Xplore
arXiv
6343 Estimating Shapley Values of Training Utterances for Automatic Speech Recognition Models IEEE Xplore

ASR: VAD and Other Topics

🆔 Title Repo Paper
691 Real-Time Speech Interruption Analysis: from Cloud to Client Deployment IEEE Xplore
arXiv
2005 Audio-to-Intent using Acoustic-Textual Subword Representations from End-to-End ASR IEEE Xplore
arXiv
2615 Adaptive Endpointing with Deep Contextual Multi-Armed Bandits IEEE Xplore
arXiv
2616 Dynamic Speech Endpoint Detection with Regression Targets IEEE Xplore
arXiv
2665 Speaker Change Detection for Transformer Transducer ASR IEEE Xplore
arXiv
4769 Less is more: A Unified Architecture for Device-Directed Speech Detection with Multiple Invocation Types IEEE Xplore
4865 SG-VAD: Stochastic Gates based Speech Activity Detection GitHub IEEE Xplore
arXiv
5523 Filler Word Detection with Hard Category Mining and Inter-Category Focal Loss IEEE Xplore
arXiv
5787 Unsupervised Voice Type Discrimination Score Adaptation using X-Vector Clusters IEEE Xplore
6269 Multilingual Word Error Rate Estimation: E-Wer3 IEEE Xplore
arXiv
5792 Multilingual Query-by-Example Keyword Spotting with Metric Learning and Phoneme-to-Embedding Mapping IEEE Xplore
arXiv
7177 Leveraging Domain Features for Detecting Adversarial Attacks Against Deep Speech Recognition in Noise IEEE Xplore
arXiv
836 Keyword-Specific Acoustic Model Pruning for Open Vocabulary Keyword Spotting IEEE Xplore
5030 Self-Supervised Speech Representation Learning for Keyword-Spotting with Light-Weight Transformers IEEE Xplore
arXiv
5579 Lightweight Feature Encoder for Wake-Up Word Detection based on Self-Supervised Speech Representation IEEE Xplore
arXiv
5649 VE-KWS: Visual Modality Enhanced End-to-End Keyword Spotting arXiv
1378 Unified Keyword Spotting and Audio Tagging on Mobile Devices with Transformers GitHub IEEE Xplore
arXiv
1518 Continual Learning for On-Device Speech Recognition using Disentangled Conformers IEEE Xplore
arXiv
1986 Filterbank Learning for Noise-Robust Small-Footprint Keyword Spotting IEEE Xplore
arXiv
3390 Locale Encoding for Scalable Multilingual Keyword Spotting Models IEEE Xplore
arXiv
3531 Small-Footprint Slimmable Networks for Keyword Spotting IEEE Xplore
arXiv
3615 Metric Learning for User-Defined Keyword Spotting WEB Page
GitHub
IEEE Xplore
arXiv
3928 WeKws: A Production First Small-Footprint End-to-End Keyword Spotting Toolkit GitHub IEEE Xplore
arXiv
4822 Exploring Sequence-to-Sequence Transformer-Transducer Models for Keyword Spotting IEEE Xplore
arXiv

Automatic Audio Captioning and Retrieval

🆔 Title Repo Paper
662 A Novel Metric for Evaluating Audio Caption Similarity IEEE Xplore
arXiv
5376 On Negative Sampling for Contrastive Audio-Text Retrieval IEEE Xplore
arXiv
2001 Audio-Text Models do not yet Leverage Natural Language IEEE Xplore
arXiv
4981 Improving Audio Captioning using Semantic Similarity Metrics IEEE Xplore
arXiv
4900 SPICE+: Evaluation of Automatic Audio Captioning Systems with Pre-trained Language Models IEEE Xplore
HAL Science
6766 Local Information Assisted Attention-Free Decoder for Audio Captioning GitHub IEEE Xplore
arXiv

Auditory EEG Decoding Challenge

🆔 Title Repo Paper
6832 HappyQuokka System for ICASSP 2023 Auditory EEG Challenge GitHub IEEE Xplore
arXiv
6855 Relate Auditory Speech to EEG by Shallow-Deep Attention-based Network IEEE Xplore
arXiv
6859 Multi-Head Attention and GRU for Improved Match-Mismatch Classification of Speech Stimulus and EEG Response IEEE Xplore
6861 Relating EEG Recordings to Speech using Envelope Tracking and the Speech-FFR IEEE Xplore
arXiv
6882 Decoding Auditory EEG Responses using an Adapted WaveNet IEEE Xplore

Image Restoration

🆔 Title Repo Paper
564 MRNet: Multi-Refinement Network for Dual-Pixel Images Defocus Deblurring IEEE Xplore
5802 Joint Compression and Demosaicking For Satellite Images IEEE Xplore
HAL Science
1157 Decontamination Transformer for Blind Image Inpainting GitHub Page
GitHub
IEEE Xplore
Pdf
658 Exploration into Translation-Equivariant Image Quantization IEEE Xplore
arXiv
2562 Tensor Decomposition based Latent Feature Clustering for Hyperspectral Band Selection IEEE Xplore

Interpretable and Explainable Machine Learning

Will soon be added

Language Modeling

Will soon be added

Language Modeling and Spoken Language Understanding

Will soon be added

Estimation Theory and Methods

Will soon be added

AI Security and Privacy in Speech and Audio Processing

🆔 Title Repo Paper
673 Privacy-Enhanced Federated Learning Against Attribute Inference Attack for Speech Emotion Recognition IEEE Xplore
2009 Privacy-Preserving Occupancy Estimation IEEE Xplore
3761 Federated Intelligent Terminals Facilitate Stuttering Monitoring GitHub IEEE Xplore
ResearchGate
4942 Beyond Neural-on-Neural Approaches to Speaker Gender Protection GitHub IEEE Xplore
arXiv
6129 Distinguishable Speaker Anonymization Based on Formant and Fundamental Frequency Scaling IEEE Xplore
arXiv

Binaural Audio; Multichannel Source Separation

🆔 Title Repo Paper
1755 Spatially Informed Independent Vector Analysis for Source Extraction based on the Convolutive Transfer Function Model IEEE Xplore
2514 Fast Online Source Steering Algorithm for Tracking Single Moving Source using Online Independent Vector Analysis IEEE Xplore
Pdf
4589 Online Binaural Speech Separation of Moving Speakers with a Wavesplit Network IEEE Xplore
arXiv
5759 Convolutive NTF for Ambisonic Source Separation under Reverberant Conditions IEEE Xplore
4677 On the Relevance of the Differences between HRTF Measurement Setups for Machine Learning IEEE Xplore
arXiv
6362 Neural Fourier Shift for Binaural Speech Rendering WEB Page
GitHub
IEEE Xplore
arXiv
1620 Global HRTF Interpolation via Learned Affine Transformation of Hyper-Conditioned Features WEB Page
GitHub
IEEE Xplore
arXiv
4790 HRTF Field: Unifying Measured HRTF Magnitude Representation with Neural Fields GitHub IEEE Xplore
arXiv
5041 Learning to Personalize Equalization for High-Fidelity Spatial Audio Reproduction IEEE Xplore
6719 A Data-Driven Approach to Audio Decorrelation IEEE Xplore
6777 Switching Independent Vector Analysis and Its Extension to Blind and Spatially Guided Convolutional Beamforming Algorithms IEEE Xplore
arXiv

Image/Video Caption Generation

🆔 Title Repo Paper
6029 End-to-End Non-Autoregressive Image Captioning GitHub IEEE Xplore
337 Enhancing Multimodal Alignment with Momentum Augmentation for Dense Video Captioning IEEE Xplore
450 I-Tuning: Tuning Frozen Language Models with Image for Lightweight Image Captioning IEEE Xplore
arXiv
972 Video Captioning via Relation-Aware Graph Learning GitHub IEEE Xplore
1192 Improving Image Captioning with Control Signal of Sentence Quality IEEE Xplore
arXiv
5827 Background Disturbance Mitigation for Video Captioning via Entity-Action Relocation IEEE Xplore
5304 Motion-Aware Video Paragraph Captioning via Exploring Object-Centered Internal Knowledge IEEE Xplore
2203 Associative Learning Network for Coherent Visual Storytelling IEEE Xplore
6772 Shot Noise Analysis for Differential Sampling in Indirect Time of Flight Cameras IEEE Xplore

Flow Estimation

Will soon be added

Image/Video Retrieval

Will soon be added

Transfer Learning

Will soon be added

Learning Theory and Algorithms

Will soon be added

Distributed and Federated Learning

Will soon be added

Machine Learning for Telecommunications

Will soon be added

Dialog and Multimodal Processing of Language

Will soon be added

Discourse and Dialog

Will soon be added

Emerging Topics in Speech Synthesis

Will soon be added

Audio and Text Segmentation, Tagging and Parsing

Will soon be added

Diffusion-based Generative Models for Audio and Speech

🆔 Title Repo Paper
5245 Cold Diffusion for Speech Enhancement IEEE Xplore
arXiv
5709 Analysing Diffusion-based Generative Approaches versus Discriminative Approaches for Speech Restoration WEB Page
GitHub
IEEE Xplore
arXiv
2264 Unsupervised Vocal Dereverberation with Diffusion-based Generative Models GitHub Page IEEE Xplore
arXiv
5637 Solving Audio Inverse Problems with a Diffusion Model GitHub IEEE Xplore
arXiv
5778 DiffPhase: Generative Diffusion-based STFT Phase Retrieval WEB Page
GitHub
IEEE Xplore
arXiv
3196 Optimal Transport in Diffusion Modeling for Conversion Tasks in Audio Domain IEEE Xplore

Multilingual Alzheimer's Dementia Recognition through Spontaneous Speech: a Signal Processing Grand Challenge

Will soon be added

Model Pruning and Compression

Will soon be added

Image Recognition and Detection

🆔 Title Repo Paper
907 Data-Aware Zero-Shot Neural Architecture Search for Image Recognition IEEE Xplore
3890 CFFMixer: Multi-Dimensional Feature Fusion for Object Detection IEEE Xplore
1242 SANet: Spatial Attention Network with Global Average Contrast Learning for Infrared Small Target GitHub IEEE Xplore
736 Logovit: Local-Global Vision Transformer for Object Re-Identification GitHub IEEE Xplore
319 ProContEXT: Exploring Progressive Context Transformer for Tracking GitHub IEEE Xplore
arXiv
3268 Pair DETR: Toward Faster Convergent DETR IEEE Xplore
arXiv

Machine Learning Methods for Language

Will soon be added

Machine Translation and Dialog System

Will soon be added

Radar Waveform Design: Recent Advances and New Emerging Applications

Will soon be added

Conversational Healthcare Interfaces

Will soon be added

Computer Vision Applications

Will soon be added

Domain-Specific Detection

Will soon be added

Temporal Video Analysis and Detection

Will soon be added

Object Detection

Will soon be added

Deep Learning for Speech and Audio Processing

Will soon be added

Deep Learning for Speech and Language Processing

Will soon be added

Language Modeling and Representation Learning

Will soon be added

Lightweight TTS and TTS Analysis

Will soon be added

Machine Translation for Spoken and Written Language

🆔 Title Repo Paper
683 Improving Speech-to-Speech Translation through Unlabeled Text IEEE Xplore
arXiv
1867 A Holistic Cascade System, Benchmark, and Human Evaluation Protocol for Expressive Speech-to-Speech Translation IEEE Xplore
arXiv
3026 Decoupled Non-Parametric Knowledge Distillation for End-to-End Speech Translation IEEE Xplore
arXiv
3135 Joint Pre-training with Speech and Bilingual Text for Direct Speech-to-Speech Translation GitHub Page
GitHub
IEEE Xplore
arXiv
3822 LEAPT: Learning Adaptive Prefix-to-Prefix Translation for Simultaneous Machine Translation IEEE Xplore
arXiv
3889 Enhancing Speech-To-Speech Translation with Multiple TTS Targets IEEE Xplore
arXiv
4196 Rethinking the Reasonability of the Test Set for Simultaneous Machine Translation GitHub IEEE Xplore
arXiv
4387 Align, Write, Re-order: Explainable End-to-End Speech Translation via Operation Sequence Generation IEEE Xplore
arXiv
4983 Efficient Speech Translation with Dynamic Latent Perceivers GitHub IEEE Xplore
arXiv
5169 Joint Training and Decoding for Multilingual End-to-End Simultaneous Speech Translation GitHub IEEE Xplore
5381 Enhancing Ontology Translation through Cross-Lingual Agreement IEEE Xplore
6523 M3ST: Mix at Three Levels for Speech Translation IEEE Xplore
arXiv

Music Audio Synthesis and Modeling

Will soon be added

Spoken Language Understanding Grand Challenge

Will soon be added

Image Segmentation

Will soon be added

Multi-Speaker ASR

Will soon be added

Multimodal Processing of Language and Language Systems

🆔 Title Repo Paper
1158 Prefix Tuning for Automated Audio Captioning GitHub Page
GitHub
IEEE Xplore
arXiv
1648 C2KD: Cross-Lingual Cross-Modal Knowledge Distillation for Multilingual Text-Video Retrieval GitHub IEEE Xplore
arXiv
2096 The Edinburgh International Accents of English Corpus: Towards the Democratization of English ASR EdAcc IEEE Xplore
arXiv
2768 Adaptive Knowledge Distillation between Text and Speech Pre-trained Models IEEE Xplore
arXiv
6140 A Processing Framework to Access Large Quantities of Whispered Speech Found in ASMR GitHub Page
GitHub
IEEE Xplore
arXiv
567 Cross-Modal Mutual Learning for Cued Speech Recognition IEEE Xplore
arXiv
1886 SLBERT: A Novel Pre-Training Framework for Joint Speech and Language Modeling IEEE Xplore
2190 Cross-Modal Adversarial Contrastive Learning for Multi-Modal Rumor Detection IEEE Xplore
arXiv
2884 Multiple Contrastive Learning for Multimodal Sentiment Analysis IEEE Xplore
3666 Token2vec: A Joint Self-Supervised Pre-Training Framework using Unpaired Speech and Text IEEE Xplore
arXiv
3714 DAIS: The Delft Database of EEG Recordings of Dutch Articulated and Imagined Speech IEEE Xplore
4409 A Token-Level Contrastive Framework for Sign Language Translation GitHub IEEE Xplore
arXiv
4801 Sign Language Recognition via Deformable 3D Convolutions and Modulated Graph Convolutional Networks IEEE Xplore
Pdf
4837 LAST: Scalable Lattice-based Speech Modelling in JAX GitHub IEEE Xplore
arXiv
4989 M-SpeechCLIP: Leveraging Large-Scale, Pre-trained Models for Multilingual Speech to Image Retrieval IEEE Xplore
arXiv
5014 Using Emotion Embeddings to Transfer Knowledge between Emotions, Languages, and Annotation Formats GitHub IEEE Xplore
arXiv
5146 Speech-Text based Multi-Modal Training with Bidirectional Attention for Improved Speech Recognition IEEE Xplore
arXiv

Tracking

Will soon be added

Radar-Assisted Perception (RAP)

Will soon be added

Data Driven and Machine Learning based Room Acoustic Modeling

Will soon be added

Sensing Applications

Will soon be added

Computational Imaging

Will soon be added

Anomaly Detection

Will soon be added

Deep Neural Network

Will soon be added

Deep Learning

Will soon be added

Deep and Sequential Learning

Will soon be added

Machine Learning for Time Series Analysis

Will soon be added

Multilingual Speech Recognition and Identification

Will soon be added

Quantum Computing for Machine Learning and Signal Processing

Will soon be added

Sound Event Detection

Will soon be added

Brain Connectivity

Will soon be added

Speech Signal Improvement Signal Processing Grand Challenge 2023

Will soon be added

Anonymization and Data Privacy

Will soon be added

Natural Language Processing

Will soon be added

Pronunciation and Fluency Assessment

Will soon be added

Edge Learning for Emerging Wireless Technologies

Will soon be added

Acoustic Sensor Array Processing and Sound Source Localization

Will soon be added

Representation Learning

Will soon be added

Adversarial Machine Learning

🆔 Title Repo Paper
987 Backdoor Defense via Suppressing Model Shortcuts GitHub IEEE Xplore
arXiv

Target Detection and Classification

Will soon be added

Spatial Processing for Audio and Speech

Will soon be added

Brain Computer Interfaces

Will soon be added

Acoustic Echo Cancellation Signal Processing Grand Challenge 2023

Will soon be added

DoA Estimation

Will soon be added

Speaker Recognition: Scoring, Fairness, Privacy

Will soon be added

Speaker Recognition: Verification, Diarization, Anti-Spoofing

🆔 Title Repo Paper
3059 Pushing the Limits of Self-Supervised Speaker Verification using Regularized Distillation Framework GitHub IEEE Xplore
arXiv

Recent Advances in Robust Learning for Modern Computational Imaging

Will soon be added

Signal Processing and Machine Learning for Networked Autonomous Agents

Will soon be added

Active Noise Control, echo Reduction and Feedback Reduction

Will soon be added

Anomaly Detection and Representation Learning for Audio Classification

Will soon be added

Data Processing

Will soon be added

Perceptual Assessment

Will soon be added

Machine Learning for Recommendation, Search and other Applications

Will soon be added

Reinforcement Learning

Will soon be added

Pattern Recognition and Classification

Will soon be added

Sparsity, Compressed Sensing, and Tensor Decomposition

Will soon be added

Adversarial Machine Learning and Information Theoretic Security

Will soon be added

Resource Constrained ASR

Will soon be added

Singing Voice Synthesis/Conversion and Pretrained TTS

Will soon be added

Medical Image Reconstruction

Will soon be added

L3DAS23: Learning 3D Audio Sources for Audio-Visual Extended Reality

Will soon be added

Multimedia Forensics

Will soon be added

MIMO Radars and Waveform Design

Will soon be added

Speech Dysarthria

Will soon be added

Speech Emotion Recognition: General Topics

🆔 Title Repo Paper
2490 Multi-Scale Receptive Field Graph Model for Emotion Recognition in Conversations GitHub IEEE Xplore
3918 MGAT: Multi-Granularity Attention based Transformers for Multi-Modal Emotion Recognition IEEE Xplore
4523 Achieving Fair Speech Emotion Recognition via Perceptual Fairness IEEE Xplore
5023 Personalized Task Load Prediction in Speech Communication WEB Page IEEE Xplore
arXiv
5075 DWFormer: Dynamic Window Transformer for Speech Emotion Recognition GitHub IEEE Xplore
arXiv
5730 Multi-View Learning for Speech Emotion Recognition with Categorical Emotion, Categorical Sentiment, and Dimensional Scores IEEE Xplore
Microsoft
Pdf
540 Mingling or Misalignment? Temporal Shift for Speech Emotion Recognition with Pre-trained Representations GitHub IEEE Xplore
arXiv
563 Emotion Recognition in Conversation from Variable-Length Context IEEE Xplore
1423 Knowledge-Aware Graph Convolutional Network with Utterance-Specific Window Search for Emotion Recognition in Conversations IEEE Xplore
1611 Masking Speech Contents by Random Splicing: is Emotional Expression Preserved? IEEE Xplore
ResearchGate
3129 Multi-Local Attention for Speech-based Depression Detection IEEE Xplore
Pdf
3130 Daily Mental Health Monitoring from Speech: A Real-World Japanese Dataset and Multitask Learning Analysis IEEE Xplore
ResearchGate
3830 SDTN: Speaker Dynamics Tracking Network for Emotion Recognition in Conversation IEEE Xplore
4065 Temporal Modeling Matters: A Novel Temporal Emotional Modeling Approach for Speech Emotion Recognition GitHub IEEE Xplore
arXiv
5683 Designing and Evaluating Speech Emotion Recognition Systems: A Reality Check Case Study with IEMOCAP IEEE Xplore
arXiv
5711 EMix: A Data Augmentation Method for Speech Emotion Recognition IEEE Xplore
6131 A Hierarchical Regression Chain Framework for Affective Vocal Burst Recognition IEEE Xplore
arXiv
6316 Automatic Classification of Vocal Intensity Category from Speech IEEE Xplore

Intelligent and Semantic Communications for 5G Mobile Networks and Beyond

Will soon be added

Audio and Speech Quality Measurements

Will soon be added

Acoustic Modeling; Auditory Modeling for Hearing Instruments

Will soon be added

Anonymization, Data Privacy, and Biometrics

Will soon be added

Object Recognition

Will soon be added

Identification Detection

Will soon be added

Tracking, Data Fusion, and Sensor Networks

🆔 Title Repo Paper
268 Deep Fusion of Multi-Object Densities using Transformer GitHub IEEE Xplore
arXiv
6240 Nonnegative Block-Term Decomposition with the β-Divergence: Joint Data Fusion and Blind Spectral Unmixing GitHub IEEE Xplore
2238 Robust Subspace Tracking with Contamination via α-Divergence GitHub IEEE Xplore
ResearchGate
2321 Wireless Location Tracking via Complex-Domain Super MDS with Time Series Self-Localization Information IEEE Xplore
2463 Angle-of-Arrival Target Tracking using a Mobile UAV in External Signal-Denied Environment IEEE Xplore
2821 A Distributed Adaptive Algorithm for Non-Smooth Spatial Filtering Problems IEEE Xplore
arXiv
2937 A Computationally Efficient Algorithm for Distributed Adaptive Signal Fusion based on Fractional Programs IEEE Xplore
Pdf
3217 Data Driven Joint Sensor Fusion and Regression based on Geometric Mean Squared Error IEEE Xplore
4043 Sensor Selection for Angle of Arrival Estimation based on the Two-Target Cramér-Rao Bound GitHub IEEE Xplore
4149 Clustered Greedy Algorithm for Large-Scale Sensor Selection IEEE Xplore

Speaker Recognition: Neural Network Architecture

Will soon be added

Speech Analysis

Will soon be added

Speaker Recognition: Anti-Spoofing and Verification

🆔 Title Repo Paper
5447 SAMO: Speaker Attractor Multi-Center One-Class Learning for Voice Anti-Spoofing GitHub IEEE Xplore
arXiv

Bayesian Signal Processing

Will soon be added

Speaker Recognition: Verification and Diarization

Will soon be added

Learning on Graphs for Biology and Medicine

🆔 Title Repo Paper
2914 Deep Spatio-Temporal Multiplex Graph Learning for Cardiac Imaging Classification IEEE Xplore
4165 Graph Signal Processing for Neurogimaging to Reveal Dynamics of Brain Structure-Function Coupling IEEE Xplore
4375 Multiple Signed Graph Learning for Gene Regulatory Network Inference IEEE Xplore
4599 Predicting Brain Age using Transferable Covariance Neural Networks IEEE Xplore
arXiv
6456 Spatial Graph Signal Interpolation with an Application for Merging BCI Datasets with Various Dimensionalities GitHub IEEE Xplore
arXiv

Learning from Neuroimaging Data

Will soon be added

Lightweight, Multi-Speaker, Multi-Lingual Indic Text-to-Speech

Will soon be added

Quality Assessment and Anomaly Detection

Will soon be added

Human-Centric Multimedia and Human-Machine Interaction

Will soon be added

Speech Emotion Recognition: Transfer Learning

🆔 Title Repo Paper
457 A Generalized Subspace Distribution Adaptation Framework for Cross-Corpus Speech Emotion Recognition IEEE Xplore
3755 Fast Yet Effective Speech Emotion Recognition with Self-Distillation GitHub IEEE Xplore
arXiv
3954 Domain Adaptation without Catastrophic Forgetting on a Small-Scale Partially-Labeled Corpus for Speech Emotion Recognition IEEE Xplore
4547 Phonetic Anchor-based Transfer Learning to Facilitate Unsupervised Cross-Lingual Speech Emotion Recognition IEEE Xplore
Pdf
4559 Zero-Shot Speech Emotion Recognition using Generative Learning with Reconstructed Prototypes IEEE Xplore
4858 Unsupervised Domain Adaptation for Preference Learning based Speech Emotion Recognition IEEE Xplore
Pdf

Multi-Antenna Communications and Sensing

Will soon be added

Quantum Machine Learning Algorithms and Applications on NISQ Devices

Will soon be added

Neural Speech and Audio Coding: Emerging Challenges and Opportunities

Will soon be added

Medical and Environmental Acoustics; Audio Security

Will soon be added

Classification of Acoustic Scenes and Events

Will soon be added

Learning from EEG Data

Will soon be added

Physiological Signal Processing

Will soon be added

Speech Production, Perception,and Psychoacoustics

Will soon be added

Watermarking, Data Hiding and Human Factors in Security

Will soon be added

3D Point Cloud/Stereo Video

Will soon be added

Face Processing

Will soon be added

MIMO Radars and MIMO Communications

Will soon be added

Speaker Recognition: Diarization

Will soon be added

Estimation, Detection, and Classification

Will soon be added

Model Lightweight and Video Compression

Will soon be added

Subspace and Manifold Learning

# Title Repo Paper
2651 Generative Modeling based Manifold Learning for Adaptive Filtering Guidance IEEE Xplore
Amazon Science
684 Tensor Completion for Efficient and Accurate Hyperparameter Optimisation in Large-Scale Statistical Learning IEEE Xplore
903 CO-NET: Classification-Oriented Point Cloud Sampling via Informative Feature Learning and Non-Overlapped Local Adjustment IEEE Xplore
2091 Deep Survival Analysis and Counterfactual Inference using Balanced Representations IEEE Xplore
3045 Feature Space Recovery for Incomplete Multi-View Clustering IEEE Xplore
HAL Science
4602 Study of Manifold Geometry using Multiscale Non-Negative Kernel Graphs IEEE Xplore
arXiv

Speech Enhancement - Diffusion and Other Generative Models

🆔 Title Repo Paper
2594 Cross-domain Diffusion based Speech Enhancement for Very Noisy Speech GitHub Page IEEE Xplore
3643 SRTNet: Time Domain Speech Enhancement via Stochastic Refinement GitHub IEEE Xplore
arXiv
4671 Diffusion-based Generative Speech Source Separation GitHub IEEE Xplore
arXiv
4716 SEPDIFF: Speech Separation based on Denoising Diffusion Model IEEE Xplore
5798 Fast and Efficient Speech Enhancement with Variational Autoencoders IEEE Xplore
arXiv
6105 Metric-oriented Speech Enhancement using Diffusion Probabilistic Model IEEE Xplore
arXiv

ICASSP2023 General Meeting Understanding and Generation (MUG) Challenge

Will soon be added

Signal Processing for Smart City Applications and the Internet of Things

Will soon be added

Symbol-Level Precoding: Recent Advance and New Applications in 6G and Beyond

Will soon be added

Graphical Inference and Modeling in Dynamical Systems

Will soon be added

Deep Learning-based Source Separation

Will soon be added

Medical Image Segmentation

Will soon be added

Bioinformatics

Will soon be added

Cybersecurity, Hardware and Network Security

Will soon be added

Multi-Antenna Communications and Intelligent Reflecting Surfaces

Will soon be added

Multimedia Compression and Quality

Will soon be added

Multimedia Analysis, Synthesis, and Learning

Will soon be added

DoA Estimation and Beamforming

Will soon be added

Speech Emotion Recognition: Multimodality

Will soon be added

Speech Emotion Recognition: Neural Architectures

Will soon be added

Optimization Methods for Signal Processing

Will soon be added

5th DNS Challenge at IEEE ICASSP 2023

Will soon be added

Signal Processing and Learning over Dynamic Graphs

Will soon be added

Human Action Recognition

Will soon be added

Deep Generative Model

🆔 Title Repo Paper
1565 String-based Molecule Generation via Multi-Decoder VAE IEEE Xplore
arXiv
4161 Graph Contrastive Learning with Learnable Graph Augmentation IEEE Xplore
3180 Conditioning and Sampling in Variational Diffusion Models for Speech Super-Resolution GitHub Page
GitHub
IEEE Xplore
arXiv
5068 Evaluation of Categorical Generative Models - Bridging the Gap Between Real and Synthetic Data IEEE Xplore
arXiv
6053 Diffusion Probabilistic Modeling for Fine-Grained Urban Traffic Flow Inference with Relaxed Structural Constraint IEEE Xplore
4977 Single-Shot Domain Adaptation via Target-aware Generative Augmentations GitHub IEEE Xplore
arXiv

Multimodal Signal Processing and Analysis

Will soon be added

Speech Enhancement - Self-Supervised Learning

🆔 Title Repo Paper
915 Perceive and Predict: Self-Supervised Speech Representation based Loss Functions for Speech Enhancement IEEE Xplore
arXiv
2006 DATA2VEC-SG: Improving Self-Supervised Learning Representations for Speech Generation Tasks IEEE Xplore
3343 Speech Separation with Large-Scale Self-Supervised Learning IEEE Xplore
arXiv
3511 Self-Supervised Learning-based Source Separation for Meeting Data WEB Page IEEE Xplore
arXiv
4456 An Adapter based Multi-Label Pre-training for Speech Separation and Enhancement IEEE Xplore
arXiv
5785 Self-Supervised Learning for Speech Enhancement Through Synthesis GitHub IEEE Xplore
arXiv

Distributed and Reliable Signal Processing and Communications

Will soon be added

Resource-Efficient Real-time Neural Speech Separation

Will soon be added

Multichannel Speech Enhancement, Dereverberation, and System Identification

Will soon be added

Multilabel Acoustic Event Classification

Will soon be added

Deep Learning for Medical Imaging

🆔 Title Repo Paper
1384 Coarse-to-Fine Covid-19 Segmentation via Vision-Language Alignment GitHub IEEE Xplore
arXiv

Machine/Deep Learning Methodologies for Multimedia

Will soon be added

Human-Centric Multimedia

Will soon be added

Source Localization and Separation

Will soon be added

Speech Enhancement /Audio-Visual, Multi-Channel, and Other

Will soon be added

Speech Enhancement - Separation and Target Speech Extraction

🆔 Title Repo Paper
3175 Unifying Speech Enhancement and Separation with Gradient Modulation for End-to-End Noise-Robust Speech Separation GitHub IEEE Xplore
arXiv

Speech Enhancement - Single Channel

Will soon be added

Machine Learning Applications to Communications

Will soon be added

Aspects in Image Generation/Analysis

Will soon be added

Multi-Antenna and Multi-Carrier Communications

Will soon be added

Signal Filtering, Restoration, Enhancement, and Reconstruction

Will soon be added

ICASSP SP Clarity Challenge: Speech Enhancement for Hearing Aids

Will soon be added

Image and Video Enhancement

Will soon be added

Speech Recognition-training/adaptation

Will soon be added

Decentralized Wireless Systems and Energy Harvesting

Will soon be added

Robust Learning and Inference

Will soon be added

Music Classification and Transcription

Will soon be added

Music Information Retrieval

Will soon be added

Deep Learning for Medical Image Segmentation

Will soon be added

Detection and Classification in Medical Imaging

Will soon be added

Image Coding/Compression

Will soon be added

Audio-Visual Signal Processing and Analysis

Will soon be added

Various Aspects in Speech and Language Processing

Will soon be added

Speech Recognition: Modeling and Context

Will soon be added

Speech Recognition: Self-Supervised Models

Will soon be added

Channel State Estimation

Will soon be added

Signal Processing over Graphs and Networks

Will soon be added

Signal Processing over Networks

Will soon be added

Applications to Vision, Speech, and Robotics

🆔 Title Repo Paper
6443 LMBAO: A Landmark Map for Bundle Adjustment Odometry in Lidar Slam IEEE Xplore
arXiv
1069 Residual Squeeze-and-Excitation U-Shaped Network for Minutia Extraction in Contactless Fingerprint Images IEEE Xplore
1603 TSPTQ-ViT: Two-Scaled Post-Training Quantization for Vision Transformer IEEE Xplore
arXiv
3925 Low-Complexity Low-Rank Approximation SVD for Massive Matrix in Tensor Train Format IEEE Xplore
2043 DailyTalk: Spoken Dialogue Dataset for Conversational Text-to-Speech GitHub IEEE Xplore
arXiv
3040 Cooperative Five Degrees of Freedom Motion Estimation for a Swarm of Autonomous Vehicles IEEE Xplore

Person Identification and Relapse Detection from Continuous Recordings of Biosignals

Will soon be added

Vision and Language Model

Will soon be added

TTS: AM and Vocoder

Will soon be added

Signal Processing Education

Will soon be added

Signal Processing and Systems for Remote Biometrics

Will soon be added

Signal Processing for RIS-Enabled Smart Wireless Environments

Will soon be added

Multimodal Learning

Will soon be added

Video Coding/Compression

Will soon be added

Object Tracking

Will soon be added

Image Generation

Will soon be added

Spoken Language Understanding

Will soon be added

Optimization and Machine Learning for Communications

Will soon be added

Sparse/Low-Dimensional Signal Processing

Will soon be added

Signal Processing Theory and Methods

Will soon be added

Radar/Array Signal Processing. Networks and Communications

Will soon be added

Applications to Communications

Will soon be added

The First Pathloss Radio Map Prediction Challenge

Will soon be added

Human Video Generation and Editing

Will soon be added

Point Cloud Processing

Will soon be added

Multimedia Databases and Information Retrieval

Will soon be added

Voice and Style Conversion

Will soon be added

Synergy between Human and Machine Approaches to Sound/Scene Recognition and Processing

Will soon be added

Topological and Simplicial Data Processing

Will soon be added

Unsupervised Deep Learning of Image Priors for Inverse Problems

Will soon be added

Self-Supervised Learning and Data-Efficiency for Speech and Audio

🆔 Title Repo Paper
5842 Audio Signal Enhancement with Learning from Positive and Unlabelled Data GitHub IEEE Xplore
arXiv

Sound Event Detection and Localization; Bioacoustic Event Detection

Will soon be added

Aspects in Machine Learning

Will soon be added

Aspects in Image/Video Processing and Analysis

🆔 Title Repo Paper
2133 ShaDocNet: Learning Spatial-Aware Tokens in Transformer for Document Shadow Removal GitHub Page
GitHub
IEEE Xplore
arXiv

Learning Algorithms and Applications

Will soon be added

Optimization Methods in Machine Learning

Will soon be added

Applications of Machine Learning

Will soon be added

Sensing, Computing, and Semantic Communications

Will soon be added

Sparsity and Low-Rank Models

Will soon be added

Signal Processing over Graphs

Will soon be added

Target Source Extraction

Will soon be added

Music Generation and Arrangement

Will soon be added

Multimodal Information based Speech Processing (MISP) 2022 Challenge

Will soon be added

Image Retrieval and Classification

Will soon be added

Variational Inference and Approximate Bayesian Techniques

Will soon be added

Spatial Audio Recording and Reproduction

Will soon be added

Speech Modeling and Audio Coding

Will soon be added

Audio Processing and Analysis

Will soon be added

Image/Video Enhancement

Will soon be added

Zero or Few-Shot Learning

Will soon be added

Acoustic and Microphone Array Processing

Will soon be added

Speech and Language Disorders

Will soon be added

Various Aspects in Speech and Speaker Recognition

Will soon be added

Sampling Theory, Compressed and Non-uniform Sampling

Will soon be added

Show and Tell Demos: Session

🆔 Title Repo Paper
7049 Generating Sound Effects, Music, Speech, and Beyond, with Text
7059 DisCoHeadTV: Disentangled Control of Head Pose and Facial Expressions for Text-to-Video Synthesis
7064 Intelligent Dialogue-based Tutoring System for Second Language Reading Comprehension
7068 Optimize for my Voice with Speaker Identification Pdf

Rising Stars Workshop

Will soon be added


Star History

Star History Chart

More Repositories

1

ICCV-2023-Papers

ICCV 2023 Papers: Discover cutting-edge research from ICCV 2023, the leading computer vision conference. Stay updated on the latest in computer vision and deep learning, with code included. ⭐ support visual intelligence development!
Python
900
star
2

INTERSPEECH-2023-24-Papers

INTERSPEECH 2023-2024 Papers: A complete collection of influential and exciting research papers from the INTERSPEECH 2023-24 conference. Explore the latest advances in speech and language processing. Code included. Star the repository to support the advancement of speech technology!
630
star
3

CVPR-2023-24-Papers

CVPR 2023-2024 Papers: Dive into advanced research presented at the leading computer vision conference. Keep up to date with the latest developments in computer vision and deep learning. Code included. ⭐ support visual intelligence development!
Python
380
star
4

NewEraAI-Papers

The repository provides links to collections of influential and interesting research papers from top AI conferences, with open-source code to promote reproducibility and provide detailed implementation insights beyond the scope of the article. Stay up to date with the latest advances in AI research!
Python
87
star
5

AAAI-2024-Papers

AAAI 2024 Papers: Explore a comprehensive collection of innovative research papers presented at one of the premier artificial intelligence conferences. Seamlessly integrate code implementations for better understanding. ⭐ experience the forefront of progress in artificial intelligence with this repository!
Python
76
star
6

WACV-2024-Papers

WACV 2024 Papers: Discover cutting-edge research from WACV 2024, the leading computer vision conference. Stay updated on the latest in computer vision and deep learning, with code included. ⭐ support visual intelligence development!
Python
69
star
7

EMNLP-2023-Papers

EMNLP 2023 Papers: Explore cutting-edge research from EMNLP 2023, the premier conference for advancing empirical methods in natural language processing. Stay updated on the latest in machine learning, deep learning, and natural language processing with code included. ⭐ support NLP!
Python
65
star
8

OIDv6

Download single or multiple classes from the Open Images V6 dataset (OIDv6)
Python
43
star
9

terminal_docs

Команды для работы в терминале
6
star
10

OpenAV

An open-source library for recognition of speech commands in the user dictionary using audiovisual data of the speaker
Python
3
star
11

docs_js

Документация по JavaScript
JavaScript
2
star
12

TfObjDet

TensorFlow Object Detection API
Jupyter Notebook
1
star
13

Awesome-Speech-Enhancement

Read articles, explore effectiveness metrics for speech enhancement methodologies. Seamlessly integrate code implementations for better understanding, and stay at the forefront of advances in speech enhancement with this repository! Don't forget to ⭐ if you find it helpful.
Jupyter Notebook
1
star
14

download_google_images

Массовая загрузка изображений из Google Images
Python
1
star
15

LeetCode

Solutions for Leetcode
Python
1
star