INTERSPEECH-2023-Papers
INTERSPEECH 2023 Papers: A complete collection of influential and exciting research papers from the INTERSPEECH 2023 conference. Explore the latest advances in speech and language processing. Code included. ⭐ the repository to support the advancement of speech technology!
👉
*
This count includes repositories on GitHub, GitLab, Hugging Face, and distributions on PyPI, while excluding Web Page or GitHub Page links.
The PDF version of the INTERSPEECH 2023 Conference Programme, comprises a list of all accepted full papers, their presentation order, as well as the designated presentation times.
Other collections of the best AI conferences
❗ Conference table will be up to date all the time.
Conference | Year |
Computer Vision (CV) | |
CVPR | 2023 |
ICCV | 2023 |
Speech (SP) | |
ICASSP | 2023 |
Contributors
Contributions to improve the completeness of this list are greatly appreciated. If you come across any overlooked papers, please feel free to create pull requests, open issues or contact me via email. Your participation is crucial to making this repository even better.
Papers
List of sections
- Resources for Spoken Language Processing
- Speech Synthesis: Prosody and Emotion
- Statistical Machine Translation
- Self-Supervised Learning in ASR
- Prosody
- Speech Production
- Dysarthric Speech Assessment
- Speech Coding: Transmission
- Speech Recognition: Signal Processing, Acoustic Modeling, Robustness, Adaptation
- Analysis of Speech and Audio Signals
- Speech Recognition: Architecture, Search, and Linguistic Components
- Speech Recognition: Technologies and Systems for New Applications
- Lexical and Language Modeling for ASR
- Language Identification and Diarization
- Speech Quality Assessment
- Feature Modeling for ASR
- Interfacing Speech Technology and Phonetics
- Speech Synthesis: Multilinguality
- Speech Emotion Recognition
- Spoken Dialog Systems and Conversational Analysis
- Speech Coding and Enhancement
- Paralinguistics
- Speech Enhancement and Denoising
- Speech Synthesis: Evaluation
- End-to-End Spoken Dialog Systems
- Biosignal-enabled Spoken Communication
- Neural-based Speech and Acoustic Analysis
- DiGo - Dialog for Good: Speech and Language Technology for Social Good
- Spoken Language Processing: Translation, Information Retrieval, Summarization, Resources, and Evaluation
- Speech, Voice, and Hearing Disorders
- Spoken Term Detection and Voice Search
- Models for Streaming ASR
- Source Separation
- Speech Perception
- Phonetics and Phonology: Languages and Varieties
- Speaker and Language Identification
- Speech Synthesis and Voice Conversion
- Speech and Language in Health: from Remote Monitoring to Medical Conversations
- Novel Transformer Models for ASR
- Speaker Recognition
- Cross-lingual and Multilingual ASR
- Voice Conversion
- Pathological Speech Analysis
- Multimodal Speech Emotion Recognition
- Phonetics, Phonology, and Prosody
- Speech Coding: Privacy
- Analysis of Neural Speech Representations
- End-to-end ASR
- Spoken Language Understanding, Summarization, and Information Retrieval
- Invariant and Robust Pre-trained Acoustic Models
- Speech Synthesis: Representation Learning
- Speech Perception, Production, and Acquisition
- Acoustic Model Adaptation for ASR
- Speech Synthesis: Expressivity
- Multi-modal Systems
- Question Answering from Speech
- Multi-talker Methods in Speech Processing
- Sociophonetics
- Speaker and Language Diarization
- Anti-Spoofing for Speaker Verification
- Speech Coding: Intelligibility
- New Computational Strategies for ASR Training and Inference
- MERLIon CCS Challenge: Multilingual Everyday Recordings - Language Identification On Code-Switched Child-Directed Speech
- Health-Related Speech Analysis
- Automatic Audio Classification and Audio Captioning
- Speech Synthesis
- Speech Synthesis: Controllability and Adaptation
- Search Methods and Decoding Algorithms for ASR
- Speech Signal Analysis
- Connecting Speech-science and Speech-technology for Children's Speech
- Dialog Management
- Speech Activity Detection and Modeling
- Multilingual Models for ASR
- Speech Enhancement and Bandwidth Expansion
- Articulation
- Neural Processing of Speech and Language: Encoding and Decoding the Diverse Auditory Brain
- Perception of Paralinguistics
- Technologies for Child Speech Processing
- Speech Synthesis: Multilinguality; Evaluation
- Show and Tell: Health Applications and Emotion Recognition
- Show and Tell: Speech Tools, Speech Enhancement, Speech Synthesis
- Show and Tell: Language Learning and Educational Resources
- Show and Tell: Media and Commercial Applications
Resources for Spoken Language Processing
Speech Synthesis: Prosody and Emotion
Statistical Machine Translation
Self-Supervised Learning in ASR
Prosody
Speech Production
Dysarthric Speech Assessment
Speech Coding: Transmission
Speech Recognition: Signal Processing, Acoustic Modeling, Robustness, Adaptation
Analysis of Speech and Audio Signals
Speech Recognition: Architecture, Search, and Linguistic Components
Speech Recognition: Technologies and Systems for New Applications
Lexical and Language Modeling for ASR
Language Identification and Diarization
Speech Quality Assessment
Feature Modeling for ASR
Interfacing Speech Technology and Phonetics
Speech Synthesis: Multilinguality
Speech Emotion Recognition
Spoken Dialog Systems and Conversational Analysis
Speech Coding and Enhancement
Paralinguistics
Speech Enhancement and Denoising
Speech Synthesis: Evaluation
End-to-End Spoken Dialog Systems
Biosignal-enabled Spoken Communication
Neural-based Speech and Acoustic Analysis
DiGo - Dialog for Good: Speech and Language Technology for Social Good
Spoken Language Processing: Translation, Information Retrieval, Summarization, Resources, and Evaluation
Speech, Voice, and Hearing Disorders
Spoken Term Detection and Voice Search
Models for Streaming ASR
Source Separation
Speech Perception
Phonetics and Phonology: Languages and Varieties
Speaker and Language Identification
Speech Synthesis and Voice Conversion
Speech and Language in Health: from Remote Monitoring to Medical Conversations
Novel Transformer Models for ASR
Speaker Recognition
Cross-lingual and Multilingual ASR
Voice Conversion
Pathological Speech Analysis
Multimodal Speech Emotion Recognition
Phonetics, Phonology, and Prosody
Speech Coding: Privacy
Analysis of Neural Speech Representations
End-to-end ASR
Spoken Language Understanding, Summarization, and Information Retrieval
Invariant and Robust Pre-trained Acoustic Models
Speech Synthesis: Representation Learning
Speech Perception, Production, and Acquisition
Acoustic Model Adaptation for ASR
Speech Synthesis: Expressivity
Multi-modal Systems
Question Answering from Speech
Multi-talker Methods in Speech Processing
Sociophonetics
Speaker and Language Diarization
Anti-Spoofing for Speaker Verification
Speech Coding: Intelligibility
New Computational Strategies for ASR Training and Inference
MERLIon CCS Challenge: Multilingual Everyday Recordings - Language Identification On Code-Switched Child-Directed Speech
Health-Related Speech Analysis
Automatic Audio Classification and Audio Captioning
Speech Synthesis
Speech Synthesis: Controllability and Adaptation
Search Methods and Decoding Algorithms for ASR
Speech Signal Analysis
Connecting Speech-science and Speech-technology for Children's Speech
Dialog Management
Speech Activity Detection and Modeling
Multilingual Models for ASR
Speech Enhancement and Bandwidth Expansion
Articulation
Neural Processing of Speech and Language: Encoding and Decoding the Diverse Auditory Brain
Perception of Paralinguistics
Technologies for Child Speech Processing
Speech Synthesis: Multilinguality; Evaluation
Show and Tell: Health Applications and Emotion Recognition
Show and Tell: Speech Tools, Speech Enhancement, Speech Synthesis
Show and Tell: Language Learning and Educational Resources
Show and Tell: Media and Commercial Applications
Star History