Technical Program - INTERSPEECH 2020

Home
About

About the Conference Welcome from the Chair Conference Committees Area Chairs Organizers ISCA
Calls

Papers Surveys Satellite Workshops Tutorials Show & Tell Special Sessions & Challenges Areas & Topics Important Dates
Authors

Author Resources Submission Policy ISCA Ethics Paper Submission Presentation Guidelines
Program

Program at a Glance Technical Program Presentation Videos Presentation Guidelines Keynotes Satellite Workshops Tutorials Special Sessions & Challenges Show & Tell
Student Information

Student Events Travel Grants
Venue & Travel

Conference Venue & Accommodations Transportations Visa About Shanghai
Registration

Registration Overview & Fees ISCA Membership ISCA Code of Conduct Online Registration
Sponsorships & Exhibition

Sponsors Virtual Booth Satellite Events Acknowledgement
Contact

Contact Us

Program at a Glance

Technical Program

Presentation Videos

Presentation Guidelines

Keynotes

Satellite Workshops

Tutorials

Special Sessions & Challenges

Show & Tell

Technical Program

Position: Home > Program > Technical Program >

ASR neural network architectures I^[Mon-1-1]
Time:   19:15-20:15(GMT+8), October 26
Room: 1

Mon-1-1-1 On the Comparison of Popular End-to-End Models for Large Scale Speech Recognition
Author: Jinyu Li, Yu Wu, Yashesh Gaur, Chengyi Wang, rui zhao and Shujie Liu

Mon-1-1-2 SAN-M: Memory Equipped Self-Attention for End-to-End Speech Recognition
Author: Zhifu Gao, ShiLiang Zhang, Ming Lei and Ian McLoughlin

Mon-1-1-3 CONTEXTUAL RNN-T FOR OPEN DOMAIN ASR
Author: mahaveer jain, Yatharth Saraf, Gil Keren, Jay Mahadeokar, Geoffrey Zweig and Florian Metze

Mon-1-1-4 ASAPP-ASR: Multistream CNN and Self-Attentive SRU for SOTA Speech Recognition
Author: Jing Pan, Joshua Shapiro, Jeremy Wohlwend, Kyu Han, Tao Lei and Tao Ma

Mon-1-1-5 Compressing LSTM Networks with Hierarchical Coarse-Grain Sparsity
Author: Deepak Kadetotad, Jian Meng, Visar Berisha, Chaitali Chakrabarti and Jae-sun Seo

Mon-1-1-6 BLSTM-Driven Stream Fusion for Automatic Speech Recognition: Novel Methods and a Multi-Size Window Fusion Example
Author: Timo Lohrenz and Tim Fingscheidt

Mon-1-1-7 Relative Positional Encoding for Speech Recognition and Direct Translation
Author: Ngoc-Quan Pham, Thanh-Le Ha, Tuan Nam Nguyen, Thai Son Nguyen, Elizabeth Salesky, Sebastian Stüker, Jan Niehues and Alexander Waibel

Mon-1-1-8 Joint Speaker Counting, Speech Recognition, and Speaker Identification for Overlapped Speech of Any Number of Speakers
Author: Naoyuki Kanda, Yashesh Gaur, Xiaofei Wang, Zhong Meng, Zhuo Chen, Tianyan Zhou and Takuya Yoshioka

Mon-1-1-9 Implicit Transfer of Privileged Acoustic Information in a Generalized Knowledge Distillation Framework
Author: Takashi Fukuda and Samuel Thomas

Mon-1-1-10 Effect of Adding Positional Information on Convolutional Neural Networks for End-to-End Speech Recognition
Author: Jinhwan Park and Wonyong Sung

Multi-channel speech enhancement^[Mon-1-2]
Time:   19:15-20:15(GMT+8), October 26
Room: 2

Mon-1-2-1 Deep Neural Network-Based Generalized Sidelobe Canceller for Robust Multi-channel Speech Recognition
Author: Guanjun Li, Shan Liang, Shuai Nie, Wenju Liu, Zhanlei Yang and Longshuai Xiao

Mon-1-2-2 Neural Spatio-Temporal Beamformer for Target Speech Separation
Author: YONG XU, Meng Yu, Shi-Xiong Zhang, Lianwu Chen, Chao Weng, Jianming Liu and Dong Yu

Mon-1-2-3 Online directional speech enhancement using geometrically constrained independent vector analysis
Author: Li Li, Kazuhito Koishida and Shoji Makino

Mon-1-2-4 End-to-End Multi-Look Keyword Spotting
Author: Meng Yu, Xuan Ji, Bo Wu, Dan Su and Dong Yu

Mon-1-2-5 Differential Beamforming for Uniform Circular Array with Directional Microphones
Author: Weilong Huang and Jinwei Feng

Mon-1-2-6 Exploring Deep Hybrid Tensor-to-Vector Network Architectures for Regression Based Speech Enhancement
Author: Jun Qi, Hu Hu, Yannan Wang, Chao-Han Huck Yang, Sabato Marco Siniscalchi and Chin-Hui Lee

Mon-1-2-7 An End-to-end Architecture of Online Multi-channel Speech Separation
Author: Jian Wu, Zhuo Chen, Jinyu Li, Takuya Yoshioka, Zhili Tan, Ed Lin, Yi Luo and Lei Xie

Mon-1-2-8 Mentoring-Reverse Mentoring for Unsupervised Multi-channel Speech Source Separation
Author: Yu Nakagome, Masahito Togami, Tetsuji Ogawa and Tetsunori Kobayashi

Mon-1-2-9 Computationally efficient and versatile framework for joint optimization of blind speech separation and dereverberation
Author: Tomohiro Nakatani, Rintaro Ikeshita, Keisuke Kinoshita, Hiroshi Sawada and Shoko Araki

Mon-1-2-10 A Space-and-Speaker-Aware Iterative Mask Estimation Approach to Multi-channel Speech Recognition in the CHiME-6 Challenge
Author: Yan-Hui Tu, Jun Du, Lei Sun, Feng Ma, Jia Pan and Chin-Hui Lee

Speech processing in the brain^[Mon-1-3]
Time:   19:15-20:15(GMT+8), October 26
Room: 3

Mon-1-3-1 Identifying Causal Relationships Between Behavior and Local Brain Activity During Natural Conversation
Author: Youssef Hmamouche, Laurent Prévot, Magalie Ochs and Thierry Chaminade

Mon-1-3-2 Neural Entrainment to Natural Speech Envelope Based on Subject Aligned EEG Signals
Author: Di Zhou, Gaoyan Zhang, Jianwu Dang, Shuang Wu and Zhuo Zhang

Mon-1-3-3 Does Lexical Retrieval Deteriorate in Patients with Mild Cognitive Impairment? Analysis of Brain Functional Network Will Tell
Author: Chongyuan Lian, Tianqi Wang, Mingxiao Gu, Manwa Lawrence Ng, Feiqi Zhu, Lan Wang and Nan Yan

Mon-1-3-4 Congruent Audiovisual Speech Enhances Cortical Envelope Tracking during Auditory Selective Attention
Author: Zhen Fu and Jing Chen

Mon-1-3-5 Contribution of RMS-level-based speech segments to target speech decoding under noisy conditions
Author: Lei Wang, Ed X. Wu and Fei Chen

Mon-1-3-6 Cortical Oscillatory Hierarchy for Natural Sentence Processing
Author: Bin Zhao, Jianwu Dang, Gaoyan Zhang and Masashi Unoki

Mon-1-3-7 Comparing EEG analyses with different epoch alignments in an auditory lexical decision experiment
Author: Louis ten Bosch, Kimberley Mulder and Lou Boves

Mon-1-3-8 Detection of Subclinical Mild Traumatic Brain Injury (mTBI) Through Speech and Gait
Author: Tanya Talkar, Sophia Yuditskaya, James Williamson, Adam Lammert, Hrishikesh Rao, Daniel Hannon, Anne O'Brien, Gloria Vergara-Diaz, Richard DeLaura, Douglas Sturim, Gregory Ciccarelli, Ross Zafonte, Jeffrey Palmer, Paolo Bonato and Thomas Quatieri

Speech Signal Representation^[Mon-1-4]
Time:   19:15-20:15(GMT+8), October 26
Room: 4

Mon-1-4-1 Towards Learning a Universal Non-Semantic Representation of Speech
Author: Joel Shor, Aren Jansen, Ronnie Maor, Oran Lang, Omry Tuval, Félix de Chaumont Quitry, Marco Tagliasacchi, Ira Shavitt, Dotan Emanuel and Yinnon Haviv

Mon-1-4-2 Poetic Meter Classification Using i-vector-MTF Fusion
Author: Rajeev Rajan, Aiswarya Vinod and Ben P. Babu

Mon-1-4-3 Formant Tracking Using Dilated Convolutional Networks Through Dense Connection with Gating Mechanism
Author: Wang Dai, Jinsong Zhang, Yingming Gao, Wei Wei, Dengfeng Ke, Binghuai Lin and Yanlu Xie

Mon-1-4-4 Automatic Analysis of Speech Prosody in Dutch
Author: Na Hu, Berit Janssen, Judith Hanssen, Carlos Gussenhoven and Aoju Chen

Mon-1-4-5 Learning Voice Representation Using Knowledge Distillation For Automatic Voice Casting
Author: Adrien Gresse, Mathias Quillot, Richard Dufour and Jean-Francois Bonastre

Mon-1-4-6 Enhancing formant information in spectrographic display of speech
Author: Bayya Yegnanarayana, Anand Medabalimi and Vishala Pannala

Mon-1-4-7 Unsupervised Methods for Evaluating Speech Representations
Author: Michael Gump, Wei-Ning Hsu and James Glass

Mon-1-4-8 Robust pitch regression with voiced/unvoiced classification in nonstationary noise environments
Author: Dung Tran, Uros Batricevic and Kazuhito Koishida

Mon-1-4-9 Nonlinear ISA with Auxiliary Variables for Learning Speech Representations
Author: Amrith Setlur, Barnabas Poczos and Alan W Black

Mon-1-4-10 Harmonic Lowering for Accelerating Harmonic Convolution for Audio Signals
Author: Hirotoshi Takeuchi, Kunio Kashino, Yasunori Ohishi and Hiroshi Saruwatari

Speech Synthesis: Neural Waveform Generation I^[Mon-1-5]
Time:   19:15-20:15(GMT+8), October 26
Room: 5

Mon-1-5-1 Knowledge-and-Data-Driven Amplitude Spectrum Prediction for Hierarchical Neural Vocoders
Author: Yang Ai and Zhenhua Ling

Mon-1-5-2 FeatherWave: An efficient high-fidelity neural vocoder with multi-band linear prediction
Author: Qiao Tian, Zewang Zhang, Heng Lu, Ling-Hui Chen and Shan Liu

Mon-1-5-3 VocGAN: A High-Fidelity Real-time Vocoder with a Hierarchically-nested Adversarial Network
Author: Jinhyeok Yang, Junmo Lee, Young-Ik Kim, HOON-YOUNG CHO and Injung Kim

Mon-1-5-4 Lightweight LPCNet-based Neural Vocoder with Tensor Decomposition
Author: Hiroki Kanagawa and Yusuke Ijima

Mon-1-5-5 WG-WaveNet: Real-Time High-Fidelity Speech Synthesis without GPU
Author: Po-chun Hsu and Hung-yi Lee

Mon-1-5-6 What the future brings: investigating the impact of lookahead for incremental neural TTS
Author: Brooke Stephenson, Laurent Besacier, Laurent Girin and Thomas Hueber

Mon-1-5-7 Fast and lightweight on-device TTS with Tacotron2 and LPCNet
Author: Vadim Popov, Stanislav Kamenev, Mikhail Kudinov, Sergey Repyevsky, Tasnima Sadekova, Vitalii Bushaev, Vladimir Kryzhanovskiy and Denis Parkhomenko

Mon-1-5-8 Efficient WaveGlow: An Improved WaveGlow Vocoder with Enhanced Speed
Author: Wei Song, Guanghui Xu, Zhengchen Zhang, Chao Zhang, Xiaodong He and Bowen Zhou

Mon-1-5-9 Can Auditory Nerve models tell us what’s different about WaveNet vocoded speech?
Author: Sébastien Le Maguer and Naomi Harte

Mon-1-5-10 Speaker Conditional WaveRNN: Towards Universal Neural Vocoder for Unseen Speaker and Recording Conditions
Author: Dipjyoti Paul, Yannis Pantazis and Yannis Stylianou

Mon-1-5-11 Neural Homomorphic Vocoder
Author: Zhijun Liu, Kuan Chen and Kai Yu

Automatic Speech Recognition for Non-Native Children's Speech^[Mon-SS-1-6]
Time:   19:15-20:15(GMT+8), October 26
Room: 6

Mon-SS-1-6-1 The NTNU System at the Interspeech 2020 Non-Native Children’s Speech ASR Challenge
Author: Tien-Hong Lo, Fu-An Chao, Shi-Yan Weng and Berlin Chen

Mon-SS-1-6-2 Overview of the Interspeech TLT2020 Shared Task on ASR for Non-Native Children’s Speech
Author: Roberto Gretter, Marco Matassoni, Falavigna Daniele, Keelan Evanini and Chee Wee (Ben) Leong

Mon-SS-1-6-3 Non-Native Children's Automatic Speech Recognition: the INTERSPEECH 2020 Shared Task ALTA Systems
Author: Kate Knill, Linlin Wang, Yu Wang, Xixin Wu and Mark Gales

Mon-SS-1-6-4 Data augmentation using prosody and false starts to recognize non-native children's speech
Author: Hemant Kathania, Mittul Singh, Tamás Grósz and Mikko Kurimo

Mon-SS-1-6-5 UNSW System Description for the Shared Task on Automatic Speech Recognition for Non-Native Children’s Speech
Author: Mostafa Shahin, Renée Lu, Julien Epps and Beena Ahmed

Speaker Diarization^[Mon-1-7]
Time:   19:15-20:15(GMT+8), October 26
Room: 7

Mon-1-7-1 End-to-End Speaker Diarization for an Unknown Number of Speakers with Encoder-Decoder Based Attractors
Author: Shota Horiguchi, Yusuke Fujita, Shinji Watanabe, Yawen Xue and Kenji Nagamatsu

Mon-1-7-2 Target-Speaker Voice Activity Detection: a Novel Approach for Multi-Speaker Diarization in a Dinner Party Scenario
Author: Ivan Medennikov, Maxim Korenevsky, Tatiana Prisyach, Yuri Khokhlov, Mariya Korenevskaya, Ivan Sorokin, Tatiana Timofeeva, Anton Mitrofanov, Andrei Andrusenko, Ivan Podluzhny, Aleksandr Laptev and Aleksei Romanenko

Mon-1-7-3 Online Speaker Diarization with Relation Network
Author: Xiang Li, Yucheng Zhao, Chong Luo and Wenjun Zeng

Mon-1-7-4 New advances in speaker diarization
Author: Hagai Aronowitz, Weizhong Zhu, Masayuki Suzuki, Gakuto Kurata and Ron Hoory

Mon-1-7-5 Self-Attentive Similarity Measurement Strategies in Speaker Diarization
Author: Qingjian Lin, Yu Hou and Ming Li

Mon-1-7-6 Speaker attribution with voice profiles by graph-based semi-supervised learning
Author: Jixuan Wang, Xiong Xiao, Jian Wu, Ranjani Ramamurthy, Frank Rudzicz and Michael Brudno

Mon-1-7-7 Deep Self-Supervised Hierarchical Clustering for Speaker Diarization
Author: Prachi Singh and Sriram Ganapathy

Mon-1-7-8 Spot the conversation: speaker diarisation in the wild
Author: Joon Son Chung, Jaesung Huh, Arsha Nagrani, Triantafyllos Afouras and Andrew Zisserman

Noise robust and distant speech recognition[Mon-1-8]
Time:   19:15-20:15(GMT+8), October 26
Room: 8

Mon-1-8-1 Learning Contextual Language Embeddings for Monaural Multi-talker Speech Recognition
Author: Wangyou Zhang and Yanmin Qian

Mon-1-8-2 Double Adversarial Network based Monaural Speech Enhancement for Robust Speech Recognition
Author: Zhihao Du, Jiqing Han and Xueliang Zhang

Mon-1-8-3 Anti-aliasing regularization in stacking layers
Author: Antoine Bruguier, Ananya Misra, Arun Narayanan and Rohit Prabhavalkar

Mon-1-8-4 Towards a Competitive End-to-End Speech Recognition for CHiME-6 Dinner Party Transcription
Author: Andrei Andrusenko, Aleksandr Laptev and Ivan Medennikov

Mon-1-8-5 End-to-End Far-Field Speech Recognition with Uniﬁed Dereverberation and Beamforming
Author: Wangyou Zhang, Aswin Shanmugam Subramanian, Xuankai Chang, Shinji Watanabe and Yanmin Qian

Mon-1-8-6 Quaternion Neural Networks for Multi-channel Distant Speech Recognition
Author: Xinchi Qiu, Titouan parcollet, Mirco Ravanelli, Nicholas Lane and Mohamed Morchid

Mon-1-8-7 Improved Guided Source Separation Integrated with a Strong Back-end for the CHiME-6 Dinner Party Scenario
Author: Hangting Chen, Pengyuan Zhang, Qian Shi and Zuozhen Liu

Mon-1-8-8 Neural Speech Separation Using Spatially Distributed Microphones
Author: Dongmei Wang, Zhuo Chen and Takuya Yoshioka

Mon-1-8-9 Utterance-Wise Meeting Transcription System Using Asynchronous Distributed Microphones
Author: Shota Horiguchi, Yusuke Fujita and Kenji Nagamatsu

Mon-1-8-10 Simulating realistically-spatialised simultaneous speech using video-driven speaker detection and the CHiME-5 dataset
Author: Jack Deadman and Jon Barker

Speech in Multimodality (MULTIMODAL)^[Mon-1-9]
Time:   19:15-20:15(GMT+8), October 26
Room: 9

Mon-1-9-1 Toward Silent Paralinguistics: Speech-to-EMG – Retrieving Articulatory Muscle Activity from Speech
Author: Catarina Botelho, Lorenz Diener, Dennis Küster, Kevin Scheck, Shahin Amiriparian, Björn Schuller, Tanja Schultz, Alberto Abad and Isabel Trancoso

Mon-1-9-2 Multimodal Deception Detection using Automatically Extracted Acoustic, Visual, and Lexical Features
Author: Jiaxuan Zhang, Sarah Ita Levitan and Julia Hirschberg

Mon-1-9-3 Multi-modal Attention for Speech Emotion Recognition
Author: Zexu Pan, Zhaojie Luo, Jichen Yang and Haizhou Li

Mon-1-9-4 WISE: Word-Level Interaction-Based Multimodal Fusion for Speech Emotion Recognition
Author: Guang Shen, Riwei Lai, Rui Chen, Yu Zhang, Kejia Zhang, Qilong Han and Hongtao Song

Mon-1-9-5 A Multi-scale Fusion Framework for Bimodal Speech Emotion Recognition
Author: Ming Chen and Xudong Zhao

Mon-1-9-6 Group Gated Fusion on Attention-based Bidirectional Alignment for Multimodal Emotion Recognition
Author: Pengfei Liu, Kun Li and Helen Meng

Mon-1-9-7 Multi-modal embeddings using multi-task learning for emotion recognition
Author: Aparna Khare, Srinivas Parthasarathy and Shiva Sundaram

Mon-1-9-8 Using Speaker-Aligned Graph Memory Block in Multimodally Attentive Emotion Recognition Network
Author: Jeng-Lin Li and Chi-Chun Lee

Mon-1-9-9 Context-Dependent Domain Adversarial Neural Network for Multimodal Emotion Recognition
Author: Zheng Lian, Jianhua Tao, Bin Liu, Jian Huang, Zhanlei Yang and Rongjun Li

Speech, Language, and Multimodal Resources[Mon-1-10]
Time:   19:15-20:15(GMT+8), October 26
Room: 10

Mon-1-10-1 ATCSpeech: a Multilingual pilot-controller Speech Corpus from Real Air Traffic Control Environment
Author: Bo Yang, Xianlong Tan, Zhengmao Chen, Bing Wang, Min Ruan, Dan Li, Zhongping Yang, Xiping Wu and Yi LIN

Mon-1-10-2 Developing an Open-Source Corpus of Yoruba Speech
Author: Alexander Gutkin, Isin Demirsahin, Oddur Kjartansson, Clara Rivera and Kọ́lá Túbọ̀sún

Mon-1-10-3 ClovaCall: Korean Goal-Oriented Dialog Speech Corpus for Automatic Speech Recognition of Contact Centers
Author: Jung-Woo Ha, Kihyun Nam, Jin Gu Kang, Sang-Woo Lee, Sohee Yang, Hyunhoon Jung, Hyeji Kim, Eunmi Kim, Soojin Kim, Hyun Ah Kim, Kyoungtae Doh, Chan Kyu Lee, Nako Sung and Sunghun Kim

Mon-1-10-4 LAIX Corpus of Chinese Learner English Towards A Benchmark for L2 English ASR
Author: Huan Luan, Jiahong Yuan, Hui Lin and Yanhong Wang

Mon-1-10-5 Design and Development of a Human-Machine Dialog Corpus for the Automated Assessment of Conversational English Proficiency
Author: Vikram Ramanarayanan

Mon-1-10-6 CUCHILD: A Large-Scale Cantonese Corpus of Child Speech for Phonology and Articulation Assessment
Author: Si-Ioi Ng, Cymie Wing-Yee Ng, Jiarui Wang, Tan Lee, Kathy Yuet-Sheung Lee and Michael Chi-Fai Tong

Mon-1-10-7 FinChat: Corpus and evaluation setup for Finnish chat conversations on everyday topics
Author: Katri Leino, Juho Leinonen, Mittul Singh, Sami Virpioja and Mikko Kurimo

Mon-1-10-8 DiPCo - Dinner Party Corpus
Author: Maarten Van Segbroeck, Ahmed Zaid, Ksenia Kutsenko, Cirenia Huerta, Tinh Nguyen, Xuewen Luo, Bjorn Hoffmeister, Jan Trmal, Maurizio Omologo and Roland Maas

Mon-1-10-9 Learning to Detect Bipolar Disorder and Borderline Personality Disorder with Language and Speech in Non-Clinical Interviews
Author: Bo Wang, Yue Wu, Niall Taylor, Terry Lyons, Maria Liakata, Alejo J Nevado-Holgado and Kate Saunders

Mon-1-10-10 FT Speech: Danish Parliament Speech Corpus
Author: Andreas Søeborg Kirkedal, Marija Stepanović and Barbara Plank

Speech Emotion Recognition I (SER I)^[Mon-2-1]
Time:   20:30-21:30(GMT+8), October 26
Room: 1

Mon-2-1-1 Enhancing Transferability of Black-box Adversarial Attacks via Lifelong Learning for Speech Emotion Recognition Models
Author: Zhao Ren, Jing Han, Nicholas Cummins and Björn Schuller

Mon-2-1-2 End-to-End Speech Emotion Recognition Combined with Acoustic-to-Word ASR Model
Author: Han Feng, Sei Ueno and Tatsuya Kawahara

Mon-2-1-3 Improving Speech Emotion Recognition Using Graph Attentive Bi-directional Gated Recurrent Unit Network
Author: Bo-Hao Su, Chun-Min Chang, Yun-Shao Lin and Chi-Chun Lee

Mon-2-1-4 An Investigation of Cross-Cultural Semi-Supervised Learning for Continuous Affect Recognition
Author: Adria Mallol-Ragolta, Nicholas Cummins and Björn Schuller

Mon-2-1-5 Ensemble of Students Taught by Probabilistic Teachers to Improve Speech Emotion Recognition
Author: Kusha Sridhar and Carlos Busso

Mon-2-1-6 Augmenting Generative Adversarial Networks for Speech Emotion Recognition
Author: Siddique Latif, Muhammad Asim, Rajib Rana, Sara Khalifa, Raja Jurdak and Björn Schuller

Mon-2-1-7 Speech Emotion Recognition ‘in the wild’ Using an Autoencoder
Author: Vipula Dissanayake, Haimo Zhang, Mark Billinghurst and Suranga Nanayakkara

Mon-2-1-8 Emotion Profile Refinery for Speech Emotion Classification
Author: Shuiyang Mao, P. C. Ching and Tan Lee

Mon-2-1-9 Speech Representation Learning for Emotion Recognition Using End-to-End ASR with Factorized Adaptation
Author: Sung-Lin Yeh, Yun-Shao Lin and Chi-Chun Lee

ASR neural network architectures and training I^[Mon-2-2]
Time:   20:30-21:30(GMT+8), October 26
Room: 2

Mon-2-2-1 FAST AND SLOW ACOUSTIC MODEL
Author: Kshitiz Kumar, Emilian Stoimenov, Hosam Khalil and Jian Wu

Mon-2-2-2 Self-Distillation for Improving CTC-Transformer-based ASR Systems
Author: Takafumi Moriya, Tsubasa Ochiai, Shigeki Karita, Hiroshi Sato, Tomohiro Tanaka, Takanori Ashihara, Ryo Masumura, Yusuke Shinohara and Marc Delcroix

Mon-2-2-3 Single headed attention based sequence-to-sequence model for state-of-the-art results on Switchboard
Author: Zoltán Tüske, George Saon, Kartik Audhkhasi and Brian Kingsbury

Mon-2-2-4 Improving Speech Recognition using GAN-based Speech Synthesis and Contrastive Unspoken Text Selection
Author: Zhehuai Chen, Andrew Rosenberg, Yu Zhang, Gary Wang, Bhuvana Ramabhadran and Pedro Moreno

Mon-2-2-5 PyChain: A Fully Parallelized PyTorch Implementation of LF-MMI for End-to-End ASR
Author: Yiwen Shao, Yiming Wang, Dan Povey and Sanjeev Khudanpur

Mon-2-2-6 CAT: A CTC-CRF based ASR Toolkit Bridging the Hybrid and the End-to-end Approaches towards Data Efficiency and Low Latency
Author: keyu An, Hongyu Xiang and Zhijian Ou

Mon-2-2-7 CTC-synchronous Training for Monotonic Attention Model
Author: Hirofumi Inaguma, Masato Mimura and Tatsuya Kawahara

Mon-2-2-8 Continual Learning for Multi-Dialect Acoustic Models
Author: Brady Houston and Katrin Kirchhoff

Mon-2-2-9 SpecSwap: A Simple Data Augmentation Method for End-to-End Speech Recognition
Author: Xingchen Song, Zhiyong Wu, Yiheng Huang, Dan Su and Helen Meng

Evaluation of Speech Technology Systems and Methods for Resource Construction and Annotation^[Mon-2-3]
Time:   20:30-21:30(GMT+8), October 26
Room: 3

Mon-2-3-1 RECOApy: Data recording, pre-processing and phonetic transcription for end-to-end speech-based applications
Author: Adriana Stan

Mon-2-3-2 Analyzing the Quality and Stability of a Streaming End-to-End On-Device Speech Recognizer
Author: Yuan Shangguan, Katie Knister, Yanzhang He, Ian McGraw and Francoise Beaufays

Mon-2-3-3 Statistical Testing on ASR Performance via Blockwise Bootstrap
Author: Zhe Liu and Fuchun Peng

Mon-2-3-4 SENTENCE LEVEL ESTIMATION OF PSYCHOLINGUISTIC NORMS USING JOINT MULTIDIMENSIONAL ANNOTATIONS
Author: Anil Ramakrishna and Shrikanth Narayanan

Mon-2-3-5 Neural Zero-Inflated Quality Estimation Model For Automatic Speech Recognition System
Author: Kai Fan, Bo Li, Jiayi Wang, Shiliang Zhang, Boxing Chen, Niyu Ge and Zhi-Jie Yan

Mon-2-3-6 Confidence measures in encoder-decoder models for speech recognition
Author: Alejandro Woodward, Clara Bonnín, Daivid Varas, Issey Masuda, Elisenda Bou-Balust and Juan Carlos Riveiro

Mon-2-3-7 Word Error Rate Estimation Without ASR Output: e-WER2
Author: Ahmed Ali and Steve Renals

Mon-2-3-8 An evaluation of manual and semi-automatic laughter annotation
Author: Bogdan Ludusan and Petra Wagner

Mon-2-3-9 Understanding Racial Disparities in Automatic Speech Recognition: the case of habitual "be"
Author: Joshua Martin and Kevin Tang

Phonetics and Phonology^[Mon-2-4]
Time:   20:30-21:30(GMT+8), October 26
Room: 4

Mon-2-4-1 Secondary phonetic cues in the production of the nasal short-a system in California English
Author: Georgia Zellou, Rebecca Scarborough and Renee Kemp

Mon-2-4-2 Acoustic properties of strident fricatives at the edges: implications for consonant discrimination
Author: lorenzo maselli, Leo Varnet and Maria Giavazzi

Mon-2-4-3 Processes and Consequences of Co-articulation in Mandarin V1N.(C2)V2 Context: Phonology and Phonetics
Author: Mingqiong Luo

Mon-2-4-4 Voicing Distinction of Obstruents in the Hangzhou Wu Chinese Dialect
Author: Yang Yue and Fang Hu

Mon-2-4-5 The phonology and phonetics of Kaifeng Mandarin vowels
Author: Lei Wang

Mon-2-4-6 Microprosodic variability in plosives in German and Austrian German
Author: Margaret Zellers and Barbara Schuppler

Mon-2-4-7 Er-suffixation in Southwestern Mandarin: An EMA and ultrasound study
Author: Jing Huang, Feng-fan Hsieh and Yueh-chin Chang

Mon-2-4-8 Electroglottographic-Phonetic Study on Korean Phonation Induced by Tripartite Plosives in Yanbian Korean
Author: Yinghao Li and Jinghua Zhang

Mon-2-4-9 Modeling Global Body Configurations in American Sign Language
Author: Nicholas Wilkins, Beck Cordes Galbraith and Ifeoma Nwogu

Topics in ASR I^[Mon-2-5]
Time:   20:30-21:30(GMT+8), October 26
Room: 5

Mon-2-5-1 Augmenting Turn-taking Prediction with Wearable Eye Activity During Conversation
Author: Hang Li, Siyuan Chen and Julien Epps

Mon-2-5-2 CAM: Uninteresting Speech Detector
Author: Weiyi Lu, Yi Xu, Peng Yang and Belinda Zeng

Mon-2-5-3 Mixed Case Contextual ASR Using Capitalization Masks
Author: Diamantino Caseiro, Pat Rondon, Quoc-Nam Le The and Petar Aleksic

Mon-2-5-4 Speech Recognition and Multi-Speaker Diarization of Long Conversations
Author: Huanru Henry Mao, Shuyang Li, Julian McAuley and Garrison Cottrell

Mon-2-5-5 Investigation of Data Augmentation Techniques for Disordered Speech Recognition
Author: Mengzhe Geng, Xurong Xie, SHANSONG LIU, Jianwei Yu, shoukang hu, Xunying Liu and Helen Meng

Mon-2-5-6 A Real-time Robot-based Auxiliary System for Risk Evaluation of COVID-19 Infection
Author: Wenqi Wei, Jianzong Wang, Jiteng Ma, Ning Cheng and Jing Xiao

Mon-2-5-7 An Utterance Verification System for Word Naming Therapy in Aphasia
Author: David Barbera, Mark Huckvale, Victoria Fleming, Emily Upton, Henry Coley-Fisher, Ian Shaw, William Latham, Alexander Paul Leff and Jenny Crinion

Mon-2-5-8 Exploiting Cross Domain Visual Feature Generation for Disordered Speech Recognition
Author: SHANSONG LIU, Xurong Xie, Jianwei Yu, shoukang hu, Mengzhe Geng, Rongfeng Su, Shi-Xiong ZHANG, Xunying Liu and Helen Meng

Mon-2-5-9 Joint prediction of punctuation and disfluency in speech transcripts
Author: Binghuai Lin and Liyuan Wang

Mon-2-5-10 Focal Loss for Punctuation Prediction
Author: Jiangyan Yi, Jianhua Tao, Zhengkun Tian, Ye Bai and Cunhang Fan

Large-Scale Evaluation of Short-Duration Speaker Verification (SdSV)^[Mon-SS-2-6]
Time:   20:30-21:30(GMT+8), October 26
Room: 6

Mon-SS-2-6-1 Improving X-vector and PLDA for Text-dependent Speaker Verification
Author: Zhuxin Chen and Yue Lin

Mon-SS-2-6-2 SdSV Challenge 2020: Large-Scale Evaluation of Short‐Duration Speaker Verification
Author: Hossein Zeinali, Kong Aik Lee, Md Jahangir Alam and Lukas Burget

Mon-SS-2-6-3 The XMUSPEECH System for Short-Duration Speaker Verification Challenge 2020
Author: Tao Jiang, Miao Zhao, Lin Li and Qingyang Hong

Mon-SS-2-6-4 Robust Text-Dependent Speaker Verification via Character-Level Information Preservation for the SdSV Challenge 2020
Author: Sung Hwan Mun, Woo Hyun Kang, Min Hyun Han and Nam Soo Kim

Mon-SS-2-6-5 The TalTech Systems for the Short-duration Speaker Verification Challenge 2020
Author: Tanel Alumäe and Jörgen Valk

Mon-SS-2-6-6 Investigation of NICT submission for short-duration speaker verification challenge 2020
Author: Peng Shen, Xugang Lu and Hisashi Kawai

Mon-SS-2-6-7 Cross-Lingual Speaker Verification with Domain-Balanced Hard Prototype Mining and Language-Dependent Score Normalization
Author: Jenthe Thienpondt, Brecht Desplanques and Kris Demuynck

Mon-SS-2-6-8 BUT Text-Dependent Speaker Verification System for SdSV Challenge 2020
Author: Alicia Lozano-Diez, Anna Silnova, Bhargav Pulugundla, Johan Rohdin, Karel Vesely, Lukas Burget, Oldrich Plchot, Ondrej Glembek, Ondrej Novotny and Pavel Matejka

Mon-SS-2-6-9 Exploring the Use of an Unsupervised Autoregressive Model as a Shared Encoder for Text-Dependent Speaker Verification
Author: Vijay Ravi, Ruchao Fan, Amber Afshan, Huanhua Lu and Abeer Alwan

Voice Conversion and Adaptation I^[Mon-2-7]
Time:   20:30-21:30(GMT+8), October 26
Room: 7

Mon-2-7-1 Recognition-Synthesis Based Non-Parallel Voice Conversion with Adversarial Learning
Author: Jing-Xuan Zhang, Zhen-Hua Ling and Li-Rong Dai

Mon-2-7-2 Improving the Speaker Identity of Non-Parallel Many-to-Many VoiceConversion with Adversarial Speaker Recognition
Author: Shaojin Ding, Guanlong Zhao and Ricardo Gutierrez-Osuna

Mon-2-7-3 Non-parallel Many-to-many Voice Conversion with PSR-StarGAN
Author: Yanping Li, Dongxiang Xu, Yan Zhang, Yang Wang and Binbin Chen

Mon-2-7-4 TTS Skins: Speaker Conversion via ASR
Author: Adam Polyak, Lior Wolf and Yaniv Taigman

Mon-2-7-5 GAZEV: GAN-Based Zero Shot Voice Conversion over Non-parallel Speech Corpus
Author: zining zhang, Bingsheng He and Zhenjie Zhang

Mon-2-7-6 Spoken Content and Voice Factorization for Few-shot Speaker Adaptation
Author: Tao Wang, Jianhua Tao, Ruibo Fu, Jiangyan Yi, Zhengqi Wen and Rongxiu Zhong

Mon-2-7-7 Unsupervised Cross-Domain Singing Voice Conversion
Author: Adam Polyak, Lior Wolf, Yossi Adi and Yaniv Taigman

Mon-2-7-8 Attention-Based Speaker Embeddings for One-Shot Voice Conversion
Author: Tatsuma Ishihara and Daisuke Saito

Mon-2-7-9 Data Efficient Voice Cloning from Noisy Samples with Domain Adversarial Training
Author: Jian Cong, Shan Yang, Lei Xie, Guoqiao Yu and Guanglu Wan

Acoustic Event Detection^[Mon-2-8]
Time:   20:30-21:30(GMT+8), October 26
Room: 8

Mon-2-8-1 Gated Multi-head Attention Pooling for Weakly Labelled Audio Tagging
Author: Sixin Hong, Yuexian Zou and Wenwu Wang

Mon-2-8-2 Environmental Sound Classification with Parallel Temporal-spectral Attention
Author: Helin Wang, Yuexian Zou, dading chong and Wenwu Wang

Mon-2-8-3 Contrastive Predictive Coding of Audio with an Adversary
Author: Luyu Wang, Kazuya Kawakami and Aaron van den Oord

Mon-2-8-4 Memory Controlled Sequential Self Attention for Sound Recognition
Author: Arjun Pankajakshan, Helen L. Bear, Vinod Subramanian and Emmanouil Benetos

Mon-2-8-5 Dual Stage Learning based Dynamic Time-Frequency Mask Generation for Audio Event Classification
Author: Donghyeon Kim, Jaihyun Park, David Han and Hanseok Ko

Mon-2-8-6 An Effective Perturbation based Semi-Supervised Learning Method for Sound Event Detection
Author: Xu Zheng, Yan Song, Jie Yan, Li-Rong Dai, Ian McLoughlin and Lin Liu

Mon-2-8-7 A Joint Framework for Audio Tagging and Weakly Supervised Acoustic Event Detection Using DenseNet with Global Average Pooling
Author: Chieh-Chi Kao, Bowen Shi, Ming Sun and Chao Wang

Mon-2-8-8 Intra-Utterance Similarity Preserving Knowledge Distillation for Audio Tagging
Author: Chun-Chieh Chang, Chieh-Chi Kao, Ming Sun and Chao Wang

Mon-2-8-9 Two-stage Polyphonic Sound Event Detection Based on Faster R-CNN-LSTM with Multi-token Connectionist Temporal Classification
Author: Inyoung Park and Hong Kook Kim

Mon-2-8-10 SpeechMix - Augmenting Deep Sound Recognition using Hidden Space Interpolations
Author: Amit Jindal, Narayanan Elavathur Ranganatha, Aniket Didolkar, Arijit Ghosh Chowdhury, Di Jin, Ramit Sawhney and Rajiv Ratn Shah

Spoken Language Understanding I^[Mon-2-9]
Time:   20:30-21:30(GMT+8), October 26
Room: 9

Mon-2-9-1 End-to-End Neural Transformer Based Spoken Language Understanding
Author: martin radfar, Athanasios Mouchtaris and Jimmy Kunnzmann

Mon-2-9-2 Jointly Encoding Word Confusion Network and Dialogue Context with BERT for Spoken Language Understanding
Author: Chen Liu, Su Zhu, Zijian Zhao, Ruisheng Cao, Lu Chen and Kai Yu

Mon-2-9-3 Speech To Semantics: Improve ASR and NLU Jointly via All-Neural Interfaces
Author: Milind Rao, Anirudh Raju, Pranav Dheram, Bach Bui and Ariya Rastrow

Mon-2-9-4 Pretrained Semantic Speech Embeddings for End-to-End Spoken Language Understanding via Cross-Modal Teacher-Student Learning
Author: Pavel Denisov and Ngoc Thang Vu

Mon-2-9-5 Context Dependent RNNLM for Automatic Transcription of Conversations
Author: Srikanth Raj Chetupalli and Sriram Ganapathy

Mon-2-9-6 Improving End-to-End Speech-to-Intent Classification with Reptile
Author: Yusheng Tian and Philip John Gorinski

Mon-2-9-7 Speech to Text Adaptation: Towards an Efficient Cross-Modal Distillation
Author: Won Ik Cho, Donghyun Kwak, Jiwon Yoon and Nam Soo Kim

Mon-2-9-8 Towards an ASR error robust Spoken Language Understanding System
Author: Weitong Ruan, Yaroslav Nechaev, Luoxin Chen, Chengwei Su and Imre Kiss

Mon-2-9-9 End-to-End Spoken Language Understanding Without Full Transcripts
Author: Hong-Kwang Kuo, Zoltán Tüske, Samuel Thomas, Yinghui Huang, Kartik Audhkhasi, Brian Kingsbury, Gakuto Kurata, Zvi Kons, Ron Hoory and Luis Lastras

Mon-2-9-10 Are Neural Open-Domain Dialog Systems Robust to Speech Recognition Errors in the Dialog History? An Empirical Study
Author: Karthik Gopalakrishnan, Behnam Hedayatnia, Longshaokan Wang, Yang Liu and Dilek Hakkani-Tur

DNN architectures for Speaker Recognition^[Mon-2-10]
Time:   20:30-21:30(GMT+8), October 26
Room: 10

Mon-2-10-1 AutoSpeech: Neural Architecture Search for Speaker Recognition
Author: Shaojin Ding, Tianlong Chen, Xinyu Gong, Weiwei Zha and Zhangyang Wang

Mon-2-10-2 Densely Connected Time Delay Neural Network for Speaker Verification
Author: Ya-Qi Yu and Wu-Jun Li

Mon-2-10-3 Phonetically-Aware Coupled Network For Short Duration Text-independent Speaker Verification
Author: Siqi Zheng, Hongbin Suo and Yun Lei

Mon-2-10-4 Multi-Task Network for Noise-Robust Keyword Spotting and Speaker Verification using CTC-based Soft VAD and Global Query Attention
Author: Myunghun Jung, Youngmoon Jung, Jahyun Goo and Hoi Rin Kim

Mon-2-10-5 Vector-based attentive pooling for text-independent speaker verification
Author: Yanfeng Wu, Chenkai Guo, Hongcan Gao, Xiaolei Hou and Jing Xu

Mon-2-10-6 self-attention encoding and pooling for speaker recognition
Author: pooyan safari, Miquel India and Javier Hernando

Mon-2-10-7 ARET: Aggregated Residual Extended Time-delay Neural Networks for Speaker Verification
Author: Ruiteng Zhang, Jianguo Wei, Wenhuan Lu, Longbiao Wang, Meng Liu, Lin Zhang, Jiayu Jin and Junhai Xu

Mon-2-10-8 Adversarial Separation Network for Speaker Recognition
Author: Hanyi Zhang, Longbiao Wang, Yunchun Zhang, Meng Liu, Kong Aik Lee and Jianguo Wei

Mon-2-10-9 Text-Independent Speaker Verification with Dual Attention Network
Author: Jingyu Li and Tan Lee

Mon-2-10-10 Evolutionary Algorithm Enhanced Neural Architecture Search for Text-Independent Speaker Verification
Author: Xiaoyang Qu, Jianzong Wang and Jing Xiao

Cross/multi-lingual and code-switched speech recognition^[Mon-3-1]
Time:   21:45-22:45(GMT+8), October 26
Room: 1

Mon-3-1-1 Autosegmental Neural Nets: Should Phones and Tones be Synchronous or Asynchronous?
Author: Jialu Li and Mark Hasegawa-Johnson

Mon-3-1-2 Development of Multilingual ASR Using GlobalPhone for Less-Resourced Languages: The Case of Ethiopian Languages
Author: Martha Yifiru Tachbelie, Solomon Teferra Abate and Tanja Schultz

Mon-3-1-3 Large-Scale End-to-End Multilingual Speech Recognition and Language Identification with Multi-Task Learning
Author: Wenxin Hou, Yue Dong, Bairong Zhuang, Longfei Yang, Jiatong Shi and Takahiro Shinozaki

Mon-3-1-4 Multi-Encoder-Decoder Transformer for Code-Switching Speech Recognition
Author: Xinyuan Zhou, Emre Yilmaz, Yanhua Long, Yijie Li and Haizhou Li

Mon-3-1-5 Multilingual Acoustic and Language Modeling for Ethio-Semitic Languages
Author: Solomon Teferra Abate, Martha Yifiru Tachbelie and Tanja Schultz

Mon-3-1-6 Multilingual Jointly Trained Acoustic and Written Word Embeddings
Author: Yushi Hu, Shane Settle and Karen Livescu

Mon-3-1-7 Improving Code-switching Language Modeling with Artificially Generated Texts using Cycle-consistent Adversarial Networks
Author: Chia-Yu Li and Ngoc Thang Vu

Mon-3-1-8 Data Augmentation for Code-switch Language Modeling by Fusing Multiple Text Generation Methods
Author: Xinhui Hu, Qi Zhang, Lei Yang, Binbin Gu and Xinkang Xu

Mon-3-1-9 A 43 Language Multilingual Punctuation Prediction Neural Network Model
Author: Xinxing Li and Edward Lin

Mon-3-1-10 Exploring Lexicon-Free Modeling Units for End-to-End Korean and Korean-English Code-Switching Speech Recognition
Author: Jisung Wang, Kim Jihwan, Sangki Kim and Yeha Lee

Anti-spoofing and Liveness Detection^[Mon-3-2]
Time:   21:45-22:45(GMT+8), October 26
Room: 2

Mon-3-2-1 Multi-Task Siamese Neural Network for Improving Replay Attack Detection
Author: Patrick von Platen, Fei Tao and Gokhan Tur

Mon-3-2-2 POCO: a Voice Spoofing and Liveness Detection Corpus based on Pop Noise
Author: Kosuke Akimoto, Seng Pei Liew, Sakiko Mishima, Ryo Mizushima and Kong Aik Lee

Mon-3-2-3 Dual-adversarial domain adaptation for generalized replay attack detection
Author: Hongji Wang, Heinrich Dinkel, Shuai Wang, Yanmin Qian and Kai Yu

Mon-3-2-4 Self-supervised Pre-training with Acoustic Configurations for Replay Spoofing Detection
Author: Hye-jin Shim, Hee-Soo Heo, Jee-weon Jung and Ha-Jin Yu

Mon-3-2-5 Competency Evaluation in Voice Mimicking Using Acoustic Cues
Author: Abhijith G., Adarsh S., Akshay Prasannan and Rajeev Rajan

Mon-3-2-6 Light Convolutional Neural Network with Feature Genuinization for Detection of Synthetic Speech Attacks
Author: Zhenzong Wu, Rohan Kumar Das, Jichen Yang and Haizhou Li

Mon-3-2-7 Spoofing Attack Detection using the Non-linear Fusion of Sub-band Classifiers
Author: Hemlata Tak, Jose Patino, Andreas Nautsch, Nicholas Evans and Massimiliano Todisco

Mon-3-2-8 Investigating Light-ResNet Architecture for Spoofing Detection under Mismatched Conditions
Author: Prasanth Parasu, Julien Epps, Kaavya Sriskandaraja and Gajan Suthokumar

Mon-3-2-9 Siamese Convolutional Neural Network Using Gaussian Probability Feature for Spoofing Speech Detection
Author: Zhenchun Lei, Yingen Yang, Changhong Liu and Jihua Ye

Noise reduction and intelligibility^[Mon-3-3]
Time:   21:45-22:45(GMT+8), October 26
Room: 3

Mon-3-3-1 Lightweight Online Noise Reduction on Embedded Devices using Hierarchical Recurrent Neural Networks
Author: Hendrik Schröeter, Tobias Rosenkranz, Alberto N. Escalante Banuelos, Pascal Zobel and Andreas Maier

Mon-3-3-2 SEANet: A Multi-modal Speech Enhancement Network
Author: Marco Tagliasacchi, Yunpeng Li, Karolis Misiunas and Dominik Roblek

Mon-3-3-3 Lite Audio-Visual Speech Enhancement
Author: Shang-Yi Chuang, Yu Tsao, Chen-Chou Lo and Hsin-Min Wang

Mon-3-3-4 ORCA-CLEAN: A Deep Denoising Toolkit for Killer Whale Communication
Author: Christian Bergler, Manuel Schmitt, Andreas Maier, Simeon Smeele, Volker Barth and Elmar Nöth

Mon-3-3-5 A Deep Learning Approach to Active Noise Control
Author: Hao Zhang and DeLiang Wang

Mon-3-3-6 Improving Speech Intelligibility through Speaker Dependent and Independent Spectral Style Conversion
Author: Tuan Dinh, Alexander Kain and Kris Tjaden

Mon-3-3-7 End-to-end Speech Intelligibility Prediction Using Time-Domain Fully Convolutional Neural Networks
Author: Mathias Bach Pedersen, Morten Kolbæk, Asger Heidemann Andersen, Søren Holdt Jensen and Jesper Jensen

Mon-3-3-8 Predicting Intelligibility of Enhanced Speech Using Posteriors Derived from DNN-based ASR System
Author: Kenichi Arai, Shoko Araki, Atsunori Ogawa, Keisuke Kinoshita, Tomohiro Nakatani and Toshio Irino

Mon-3-3-9 Automatic Estimation of Inteligibility Measure for Consonants in Speech
Author: Ali Abavisani and Mark Hasegawa-Johnson

Mon-3-3-10 Large scale evaluation of importance maps in automatic speech recognition
Author: Viet Anh Trinh and Michael Mandel

Acoustic Scene Classification^[Mon-3-4]
Time:   21:45-22:45(GMT+8), October 26
Room: 4

Mon-3-4-1 Neural Architecture Search on Acoustic Scene Classification
Author: Jixiang Li, Chuming Liang, Bo Zhang, Zhao Wang, Fei Xiang and Xiangxiang Chu

Mon-3-4-2 Acoustic Scene Classification using Audio Tagging
Author: Jee-weon Jung, Hye-jin Shim, Ju-ho Kim, Seung-bin Kim and Ha-Jin Yu

Mon-3-4-3 ATReSN-Net: Capturing Attentive Temporal Relations in Semantic Neighborhood for Acoustic Scene Classification
Author: Liwen Zhang, Jiqing Han and Ziqiang Shi

Mon-3-4-4 Environment Sound Classification using Multiple Feature Channels and Attention based Deep Convolutional Neural Network
Author: Jivitesh Sharma, Ole-Christoffer Granmo and Morten Goodwin

Mon-3-4-5 Acoustic Scene Analysis with Multi-head Attention Networks
Author: Weimin Wang, Weiran Wang, Ming Sun and Chao Wang

Mon-3-4-6 Relational Teacher Student Learning with Neural Label Embedding for Device Adaptation in Acoustic Scene Classification
Author: Hu Hu, Sabato Marco Siniscalchi, Yannan Wang and Chin-Hui Lee

Mon-3-4-7 An Acoustic Segment Model Based Segment Unit Selection Approach to Acoustic Scene Classification with Partial Utterances
Author: Hu Hu, Sabato Marco Siniscalchi, Yannan Wang, Bai Xue, Jun Du and Chin-Hui Lee

Mon-3-4-8 Attention-Driven Projections for Soundscape Classification
Author: Dhanunjaya Varma Devalraju, Muralikrishna H, Padmanabhan Rajan and Dileep Aroor Dinesh

Mon-3-4-9 Computer Audition for Continuous Rainforest Occupancy Monitoring: The Case of Bornean Gibbons' Call Detection
Author: Panagiotis Tzirakis, Alexander Shiarella, Robert Ewers and Björn Schuller

Mon-3-4-10 Deep Learning Based Open Set Acoustic Scene Classification
Author: Zuzanna Kwiatkowska, Beniamin Kalinowski, Michał Kośmider and Krzysztof Rykaczewski

Singing Voice Computing and Processing in Music^[Mon-3-5]
Time:   21:45-22:45(GMT+8), October 26
Room: 5

Mon-3-5-1 SINGING SYNTHESIS: WITH A LITTLE HELP FROM MY ATTENTION.
Author: Orazio Angelini, Alexis Moinet, Kayoko Yanagisawa and Thomas Drugman

Mon-3-5-2 Peking Opera Synthesis via Duration Informed Attention Network
Author: Yusong Wu, Shengchen Li, Chengzhu Yu, Heng Lu, Chao Weng, liqiang zhang and Dong Yu

Mon-3-5-3 DurIAN-SC: Duration Informed Attention Network based Singing Voice Conversion System
Author: liqiang zhang, Chengzhu Yu, Heng Lu, Chao Weng, Chunlei Zhang, Yusong Wu, Xiang Xie, Zijin Li and Dong Yu

Mon-3-5-4 Transfer Learning for Improving Singing-Voice Detection in Polyphonic Instrumental Music
Author: Yuanbo Hou, Frank Soong, Jian Luan and Shengchen Li

Mon-3-5-5 Channel-wise Subband Input for Better Voice and Accompaniment Separation on High Resolution Music
Author: Haohe Liu, lei xie, Jian Wu and Geng Yang

Acoustic model adaptation for ASR^[Mon-3-7]
Time:   21:45-22:45(GMT+8), October 26
Room: 7

Mon-3-7-1 Continual Learning in Automatic Speech Recognition
Author: Samik Sadhu and Hynek Hermansky

Mon-3-7-2 Speaker Adaptive Training for Speech Recognition Based on Attention-over-Attention Mechanism
Author: Genshun Wan, Jia Pan, Qingran Wang, Jianqing Gao and Zhongfu Ye

Mon-3-7-3 Rapid RNN-T Adaptation Using Personalized Speech Synthesis and Neural Language Generator
Author: Yan Huang, Jinyu Li, Lei He, Wenning Wei, William Gale and Yifan Gong

Mon-3-7-4 Speech Transformer with Speaker Aware Persistent Memory
Author: Yingzhu Zhao, Chongjia Ni, Cheung-Chi LEUNG, Shafiq Joty, Eng Siong Chng and Bin Ma

Mon-3-7-5 Adaptive Speaker Normalization for CTC-Based Speech Recognition
Author: Fenglin Ding, Wu Guo, Bin Gu, Zhenhua Ling and Jun Du

Mon-3-7-6 Unsupervised Domain Adaptation Under Label Space Mismatch for Speech Classification
Author: Akhil Mathur, Nadia Berthouze and Nicholas D. Lane

Mon-3-7-7 Learning Fast Adaptation on Cross-Accented Speech Recognition
Author: Genta Indra Winata, Samuel Cahyawijaya, Zihan Liu, Zhaojiang Lin, Andrea Madotto, Peng Xu and Pascale Fung

Mon-3-7-8 Black-box Adaptation of ASR for Accented Speech
Author: Kartik Khandelwal, Preethi Jyothi, Abhijeet Awasthi and Sunita Sarawagi

Mon-3-7-9 Achieving Multi-Accent ASR via Unsupervised Acoustic Model Adaptation
Author: Mehmet Ali Tugtekin Turan, Emmanuel Vincent and Denis Jouvet

Mon-3-7-10 Frame-wise Online Unsupervised Adaptation of DNN-HMM Acoustic Model from Perspective of Robust Adaptive Filtering
Author: Ryu Takeda and Kazunori Komatani

Singing and Multimodal Synthesis^[Mon-3-8]
Time:   21:45-22:45(GMT+8), October 26
Room: 8

Mon-3-8-1 Adversarially Trained Multi-Singer Sequence-To-Sequence Singing Synthesizer
Author: Jie Wu and Jian Luan

Mon-3-8-2 PREDICTION OF HEAD MOTION FROM SPEECH WAVEFORMS WITH A CANONICAL-CORRELATION-CONSTRAINED AUTOENCODER
Author: Jinhong Lu and Hiroshi Shimodaira

Mon-3-8-3 XiaoiceSing: A High-Quality and Integrated Singing Voice Synthesis System
Author: Peiling Lu, Jie Wu, Jian Luan, Xu Tan and Li Zhou

Mon-3-8-4 Stochastic Talking Face Generation Using Latent Distribution Matching
Author: Ravindra Yadav, Ashish Sardana, Vinay Namboodiri and Rajesh Hegde

Mon-3-8-5 Speech-to-singing Conversion based on Boundary Equilibrium GAN
Author: Da-Yi Wu and Yi-Hsuan Yang

Mon-3-8-6 Face2Speech: Towards Multi-Speaker Text-to-Speech Synthesis Using an Embedding Vector Predicted from a Face Image
Author: Shunsuke Goto, Kotaro Onishi, Yuki Saito, Kentaro Tachibana and Koichiro Mori

Mon-3-8-7 Speech Driven Talking Head Generation via Attentional Landmarks Based Representation
Author: wang wentao, Wang Yan, Li Teng, Sun Jianqing, Liu Qiongsong and Liang Jiaen

Intelligibility-enhancing Speech Modification^[Mon-3-9]
Time:   21:45-22:45(GMT+8), October 26
Room: 9

Mon-3-9-1 Optimization and evaluation of an intelligibility-improving signal processing approach (IISPA) for the Hurricane Challenge 2.0 with FADE
Author: Marc René Schädler

Mon-3-9-2 iMetricGAN: Intelligibility Enhancement for Speech-in-Noise using Generative Adversarial Network-based Metric Learning
Author: Haoyu Li, Szu-wei Fu, Yu Tsao and Junichi Yamagishi

Mon-3-9-3 Intelligibility-enhancing speech modifications – The Hurricane Challenge 2.0
Author: Jan Rennies, Henning Schepker, Cassia Valentini-Botinhao and Martin Cooke

Mon-3-9-4 Exploring listeners' speech rate preferences
Author: Olympia Simantiraki and Martin Cooke

Mon-3-9-5 Adaptive compressive onset-enhancement for improved speech intelligibility in noise and reverberation
Author: Felicitas Bederna, Henning Schepker, Christian Rollwage, Simon Doclo, Arne Pusch, Jörg Bitzer and Jan Rennies

Mon-3-9-6 A Sound Engineering Approach to Near End Listening Enhancement
Author: Carol Chermaz and Simon King

Mon-3-9-7 Enhancing Speech Intelligibility in Text-To-Speech Synthesis using Speaking Style Conversion
Author: Dipjyoti Paul, Muhammed Shifas PV, Yannis Pantazis and Yannis Stylianou

Human speech production I^[Mon-3-10]
Time:   21:45-22:45(GMT+8), October 26
Room: 10

Mon-3-10-1 Two different mechanisms of movable mandible for vocal-tract model with flexible tongue
Author: Takayuki Arai

Mon-3-10-2 Improve the performance of acoustic-to-articulatory inversion by dynamically removing the training loss of noncritical portions of articulatory channels
Author: qiang fang

Mon-3-10-3 Speaker conditioned acoustic-to-articulatory inversion using x-vectors
Author: Aravind Illa and Prasanta Ghosh

Mon-3-10-4 Coarticulation as synchronised sequential target approximation: An EMA study
Author: Zirui Liu, Yi Xu and Feng-fan Hsieh

Mon-3-10-5 Improved Model for Vocal Folds with a Polyp with Potential Application
Author: Jônatas Santos, Jugurta Montalvão and Israel Santos

Mon-3-10-6 Regional Resonance of the Lower Vocal Tract and its Contribution to Speaker Characteristics
Author: Lin Zhang, Kiyoshi Honda, Jianguo Wei and Seiji Adachi

Mon-3-10-7 Air-tissue boundary segmentation in real time Magnetic Resonance Imaging video using 3-D convolutional neural network
Author: Renuka Mannem, Navaneetha Gaddam and Prasanta Ghosh

Mon-3-10-8 An investigation of the virtual lip trajectories during the production of bilabial stops and nasal at different speaking rates
Author: Tilak Purohit and Prasanta Ghosh

Speech Translation and multilingual/multimodal learning^[Tue-1-1]
Time:   19:15-20:15(GMT+8), October 27
Room: 1

Tue-1-1-1 A DNN-HMM-DNN Hybrid Model for Discovering Word-like Units from Spoken Captions and Image Regions
Author: Liming Wang and Mark Hasegawa-Johnson

Tue-1-1-2 Efficient Wait-k Models for Simultaneous Machine Translation
Author: Maha Elbayad, Laurent Besacier and Jakob Verbeek

Tue-1-1-3 Investigating Self-supervised Pre-training for End-to-end Speech Translation
Author: Ha Nguyen, Fethi Bougares, Natalia Tomashenko, Yannick Estève and Laurent Besacier

Tue-1-1-4 Contextualized Translation of Automatically Segmented Speech
Author: Marco Gaido, Mattia A. Di Gangi, Matteo Negri, Mauro Cettolo and Marco Turchi

Tue-1-1-5 Self-Training for End-to-End Speech Translation
Author: Juan Pino, Qiantong Xu, Xutai Ma, Mohammad Javad Dousti and Yun Tang

Tue-1-1-6 Evaluating and Optimizing Prosodic Alignment for Automatic Dubbing
Author: Marcello Federico, Yogesh Virkar, Robert Enyedi and Roberto Barra-Chicote

Tue-1-1-7 Pair Expansion for Learning Multilingual Semantic Embeddings using Disjoint Visually-grounded Speech Audio Datasets
Author: Yasunori Ohishi, Akisato Kimura, Takahito Kawanishi, Kunio Kashino, David Harwath and James Glass

Tue-1-1-8 Self-Supervised Representations Improve End-to-End Speech Translation
Author: Anne Wu, Changhan Wang, Juan Pino and Jiatao Gu

Speaker Recognition I^[Tue-1-2]
Time:   19:15-20:15(GMT+8), October 27
Room: 2

Tue-1-2-1 Improved RawNet with Feature Map Scaling for Text-independent Speaker Verification using Raw Waveforms
Author: Jee-weon Jung, seung-bin kim, Hye-jin Shim, Ju-ho Kim and Ha-Jin Yu

Tue-1-2-2 Improving Multi-Scale Aggregation Using Feature Pyramid Module for Robust Speaker Verification of Variable-Duration Utterances
Author: Youngmoon Jung, Seong Min Kye, Yeunju Choi, Myunghun Jung and Hoi Rin Kim

Tue-1-2-3 An Adaptive X-vector Model for Text-independent Speaker Verification
Author: Bin Gu, Wu Guo, Jun Du, Zhenhua Ling and Fenglin Ding

Tue-1-2-4 Shouted Speech Compensation for Speaker Verification Robust to Vocal Effort Conditions
Author: Santi Prieto

Tue-1-2-5 Sum-Product Networks for Robust Automatic Speaker Identification
Author: Aaron Nicolson and Kuldip K. Paliwal

Tue-1-2-6 Segment Aggregation for short utterances speaker verification using raw waveforms
Author: Seung-bin Kim, Jee-weon Jung, Hye-jin Shim, Ju-ho Kim and Ha-Jin Yu

Tue-1-2-7 SIAMESE X VECTOR RECONSTRUCTION FOR DOMAIN ADAPTED SPEAKER RECOGNITION
Author: Shai Rozenberg, Hagai Aronowitz and Ron Hoory

Tue-1-2-8 Speaker Re-identification with Speaker Dependent Speech Enhancement
Author: Yanpei Shi, Qiang Huang and Thomas Hain

Tue-1-2-9 Blind speech signal quality estimation for speaker verification systems
Author: Galina Lavrentyeva, Marina Volkova, Anastasia Avdeeva, Sergey Novoselov, Artem Gorlanov, Tseren Andzukaev, Artem Ivanov and Alexandr Kozlov

Tue-1-2-10 Investigating Robustness of Adversarial Samples Detection for Automatic Speaker Verification
Author: Xu Li, Na Li, Jinghua Zhong, Xixin Wu, Xunying Liu, Dan Su, Dong Yu and Helen Meng

Spoken Language Understanding II^[Tue-1-3]
Time:   19:15-20:15(GMT+8), October 27
Room: 3

Tue-1-3-1 Modeling ASR Ambiguity for Neural Dialogue State Tracking
Author: Vaishali Pal, Fabien Guillot, Manish Shrivastava, Jean-Michel Renders and Laurent Besacier

Tue-1-3-2 ASR Error Correction with Augmented Transformer for Entity Retrieval
Author: Haoyu Wang, Shuyan Dong, Yue Liu, James Logan, Ashish Kumar Agrawal and Yang Liu

Tue-1-3-3 Large-Scale Transfer Learning for Low-resource Spoken Language Understanding
Author: Xueli Jia, Jianzong Wang, Zhiyong Zhang, Ning Cheng and Jing Xiao

Tue-1-3-4 Data balancing for boosting performance of low-frequency classes in Spoken Language Understanding
Author: Judith Gaspers, Quynh Do and Fabian Triefenbach

Tue-1-3-5 An Interactive Adversarial Reward Learning-based Spoken Language Understanding System
Author: Yu Wang, yilin shen and Hongxia Jin

Tue-1-3-6 Style Attuned Pre-training and Parameter Efficient Fine-tuning for Spoken Language Understanding
Author: jin cao, Jun Wang, Wael Hamza, Kelly Vanee and Shang-Wen Li

Tue-1-3-7 Unsupervised Domain Adaptation for Dialogue Sequence Labeling Based on Hierarchical Adversarial Training
Author: Shota Orihashi, Mana Ihori, Tomohiro Tanaka and Ryo Masumura

Tue-1-3-8 Deep F-measure Maximization for End-to-End Speech Understanding
Author: Leda Sari and Mark Hasegawa-Johnson

Tue-1-3-9 An Effective Domain Adaptive Post-Training Method for BERT in Response Selection
Author: Taesun Whang, Dongyub Lee, Chanhee Lee, Kisu Yang, Dongsuk Oh and Heuiseok Lim

Tue-1-3-10 Confidence measure for speech-to-concept end-to-end spoken language understanding
Author: Antoine Caubrière, Yannick Estève, Antoine LAURENT and Emmanuel Morin

Human speech processing^[Tue-1-4]
Time:   19:15-20:15(GMT+8), October 27
Room: 4

Tue-1-4-1 Attention to indexical information improves voice recall
Author: Grant McGuire and Molly Babel

Tue-1-4-2 Categorization of Whistled Consonants by French Speakers
Author: Anaïs Tran Ngoc, Julien Meyer and Fanny Meunier

Tue-1-4-3 Whistled vowel identification by French listeners
Author: Anaïs Tran Ngoc, Julien Meyer and Fanny Meunier

Tue-1-4-4 F0 slope as a cue to speech segmentation in French
Author: Maria del Mar Cordero, Fanny Meunier, Nicolas Grimault, Stéphane Pota and Elsa Spinelli

Tue-1-4-5 Does French listeners’ ability to use accentual information at the word level depend on the ear of presentation?
Author: Amandine Michelas and Dufour Sophie

Tue-1-4-6 A perceptual study of the five level tones in Hmu (Xinzhai variety)
Author: Wen Liu

Tue-1-4-7 Mandarin and English Adults’ Cue-weighting of Lexical Stress
Author: Zhen zeng, Karen Mattock, Liquan Liu, Varghese Peter, Alba Tuninetti and Feng-Ming Tsao

Tue-1-4-8 Age-related differences of tone perception in Mandarin-speaking seniors
Author: Yan FENG, Gang PENG and William Shi-Yuan WANG

Tue-1-4-9 Social and functional pressures in vocal alignment: Differences for human and voice-AI interlocutors
Author: Georgia Zellou and Michelle Cohn

Tue-1-4-10 Identifying Important Time-frequency Locations in Continuous Speech Utterances
Author: Hassan Salami Kavaki and Michael Mandel

Feature extraction and distant ASR^[Tue-1-5]
Time:   19:15-20:15(GMT+8), October 27
Room: 5

Tue-1-5-1 Raw Sign and Magnitude Spectra for Multi-head Acoustic Modelling
Author: Erfan Loweimi, Peter Bell and Steve Renals

Tue-1-5-2 Robust Raw Waveform Speech Recognition Using Relevance Weighted Representations
Author: Purvi Agrawal and Sriram Ganapathy

Tue-1-5-3 A Deep 2D Convolutional Network for Waveform-based Speech Recognition
Author: Dino Oglic, Zoran Cvetkovic, Peter Bell and Steve Renals

Tue-1-5-4 Lightweight End-to-End Speech Recognition from Raw Audio Data Using Sinc-Convolutions
Author: Ludwig Kürzinger, Nicolas Lindae, Palle Klewitz and Gerhard Rigoll

Tue-1-5-5 An alternative to MFCCs for ASR
Author: Pegah Ghahremani, Hossein Hadian, Sanjeev Khudanpur, Hynek Hermansky and Dan Povey

Tue-1-5-6 Phase based spectro-temporal features for building a robust ASR system
Author: anirban dutta, Gudmalwar Ashishkumar and Ch. V. Rama Rao

Tue-1-5-7 Deep Scattering Power Spectrum Features for Robust Speech Recognition
Author: Neethu Mariam Joy, Dino Oglic, Zoran Cvetkovic, Peter Bell and Steve Renals

Tue-1-5-8 FusionRNN: Shared Neural Parameters for Multi-Channel Distant Speech Recognition
Author: Titouan parcollet, Xinchi Qiu and Nicholas Lane

Tue-1-5-9 Bandpass Noise Generation and Augmentation for Unified ASR
Author: Kshitiz Kumar, Bo Ren, Yifan Gong and Jian Wu

Tue-1-5-10 Deep Learning Based Dereverberation of Temporal Envelopes for Robust Speech Recognition
Author: Anurenjan Purushothaman, Anirudh Sreeram, Rohit Kumar and Sriram Ganapathy

Voice Privacy Challenge^[Tue-SS-1-6]
Time:   19:15-20:15(GMT+8), October 27
Room: 6

Tue-SS-1-6-1 Introducing the VoicePrivacy Initiative
Author: Natalia Tomashenko, Brij Mohan Lal Srivastava, Xin Wang, Emmanuel Vincent, Andreas Nautsch, Junichi Yamagishi, Nicholas Evans, Jose Patino, Jean-F