Morning Sessions (9:00 - 12:30)
Albert Zeyer is a Ph.D. student in the Human Language Technology Group at RWTH Aachen University, Germany, since 2014, under the supervision of Prof. Hermann Ney. He received both the Diplom (M.Sc.) in Mathematics and the Diplom (M.Sc.) in Computer Science from RWTH Aachen University in 2013. His research is focused on neural networks in general. The beginning of his first studies and passion for neural networks and connectionism goes back to 1996. The topics of his recent work include recurrent networks, attention models, and end-to-end models in general, with applications in speech recognition, translation and language modeling, where he achieved many state-of-the-art results. Albert started developing software in 1995, and has published a variety of Open Source projects since then. The TensorFlow based software RETURNN, which he has developed as the main architect for his Ph.D. research, is now widely used by his teammates at RWTH Aachen University, and even beyond. Albert Zeyer gave lectures at the university, and a workshop at eBay, partly with the same content as for this tutorial.
Nick Rossenbach is a Ph.D. student in the Human Language Technology Group at
Parnia Bahar holds a Master's degree in Electrical Engineering from Stuttgart University and is currently a Ph.D. student in the Human Language Technology Group at RWTH Aachen University, Germany, under the supervision of Prof. Dr. Hermann Ney. Her areas of research are human language technology, machine learning, and neural networks. Her focus includes designing end-to-end neural network translation models for both spoken and written forms as well as recognition systems. She develops the models using RETURNN. She is author or co-author of papers at high-ranking international conferences such as ACL, EMNLP, and ICASSP. Systems developed with her cooperation always performed in shared tasks at WMT and IWSLT among the best systems. She also gives lectures at the university, technical and scientific talks in workshops and supervises thesis in her field of interest as well as works on different research projects.
André Merboldt is a master student in the Human Language Technology Group at RWTH Aachen University, Germany, under the supervision of Prof. Dr. Hermann Ney. He worked since 2017 at the chair where he also wrote his bachelor thesis on end-to-end models for speech recognition. Since then his focus was on investigating and designing attention and transducer models for ASR using the RETURNN software.
Ralf Schlüter serves as Academic Director and Senior Lecturer in the Department of Computer Science of the Faculty of Computer Science, Mathematics and Natural Sciences at RWTH Aachen University. He leads the Automatic Speech Recognition Group at the Lehrstuhl Informatik 6: Human Language Technology and Pattern Recognition. He studied physics at RWTH Aachen University and Edinburgh University and received his Diploma in Physics (1995), PhD in Computer Science (2000) and Habilitation for Computer Science (2019), each at RWTH Aachen University. Dr. Schlüter works on all aspects of automatic speech recognition and lead the scientific work of the Lehrstuhl Informatik 6 in the area of automatic speech recognition in many large national and international research projects, e.g. EU-Bridge and TC-STAR (EU), Babel (US-IARPA) and Quaero (French OSEO).




Tatsuya Kawahara received B.E. in 1987, M.E. in 1989, and Ph.D. in 1995, all in information science, from Kyoto University, Kyoto, Japan. From 1995 to 1996, he was a Visiting Researcher at Bell Laboratories, Murray Hill, NJ, USA. Currently, he is a Professor and the Dean in the School of Informatics, Kyoto University. He was also an Invited Researcher at ATR and NICT.
Kristiina Jokinen is Senior Researcher at AI Research Center, AIST Tokyo Waterfront, Adjunct Professor at University of Helsinki, and Life Member of Clare Hall, University of Cambridge. She received her first degree at University of Helsinki, and her PhD at University of Manchester, UK. She was awarded a JSPS Fellowship to research at Nara Institute of Science and Technology. She was Invited Researcher at ATR, and Visiting Professor at Doshisha University, Kyoto.

Hung-yi Lee received the M.S. and Ph.D. degrees from National Taiwan University (NTU), Taipei, Taiwan, in 2010 and 2012, respectively. From September 2012 to August 2013, he was a postdoctoral fellow in Research Center for Information Technology Innovation, Academia Sinica. From September 2013 to July 2014, he was a visiting scientist at the Spoken Language Systems Group of MIT Computer Science and Artificial Intelligence Laboratory (CSAIL). He is currently an associate professor of the Department of Electrical Engineering of National Taiwan University, with a joint appointment at the Department of Computer Science & Information Engineering of the university. His research focuses on machine learning (especially deep learning), spoken language understanding and speech recognition. He owns a YouTube channel teaching deep learning (in Mandarin) with more than 4M views and 50k subscribers.
Ngoc Thang Vu received his Diploma (2009) and PhD (2014) degrees in computer science from Karlsruhe Institute of Technology, Germany. From 2014 to 2015, he worked at Nuance Communications as a senior research scientist and at Ludwig-Maximilian University Munich as an acting professor in computational linguistics. In 2015, he was appointed assistant professor at University of Stuttgart, Germany. Since 2018, he has been a full professor at the Institute for Natural Language Processing in Stuttgart. His main research interests are natural language processing (esp. speech, natural language understanding and dialog systems) and machine learning (esp. deep learning) for low-resource settings.
Shang-Wen Li is a senior Applied Scientist at Amazon AWS AI. His research in human language technology focuses on spoken language understanding, natural language generation, and dialog management. His recent interest is data augmentation for low-resourced conversational bots. He earned his Ph.D. from MIT Computer Science and Artificial Intelligence Laboratory (CSAIL) advised by Professor Victor Zue. He received M.S. and B.S. from National Taiwan University. Before joining Amazon AWS, he also worked at Amazon Alexa and Apple Siri researching on dialog management for error recovery.


Yu Tsao received the B.S. and M.S. degrees in electrical engineering from National Taiwan University, Taipei, Taiwan, in 1999 and 2001, respectively, and the Ph.D. degree in electrical and computer engineering from the Georgia Institute of Technology, Atlanta, GA, USA, in 2008. From 2009 to 2011, he was a Researcher with the National Institute of Information and Communications Technology, Tokyo, Japan, where he engaged in research and product development in automatic speech recognition for multilingual speech-to-speech translation. He is currently an Associate Research Fellow with the Research Center for Information Technology Innovation, Academia Sinica, Taipei, Taiwan. His research interests include speech and speaker recognition, acoustic and language modeling, audio coding, and bio-signal processing. He is currently an Associate Editor of the IEEE/ACM Transactions on Audio, Speech, and Language Processing and IEICE transactions on Information and Systems. Dr. Tsao received the Academia Sinica Career Development Award in 2017, National Innovation Award in 2018 and 2019, and Outstanding Elite Award, Chung Hwa Rotary Educational Foundation 2019-2020.
Fei Chen received the B.Sc. and M.Phil. degrees from the Department of Electronic Science and Engineering, Nanjing University in 1998 and 2001, respectively, and the Ph.D. degree from the Department of Electronic Engineering, The Chinese University of Hong Kong in 2005. He continued his research as post-doctor and senior research fellow in University of Texas at Dallas (supervised by Prof. Philipos Loizou) and The University of Hong Kong, and joined Southern University of Science and Technology (SUSTech) as a faculty in 2014. Dr. Chen is leading the speech processing research group in SUSTech, with research focus on speech perception, speech intelligibility modeling, speech enhancement, and assistive hearing technology. He published over 80 journal papers and over 80 conference papers in IEEE journals/conferences, Interspeech, Journal of Acoustical Society of America, etc. He received the best presentation award in the 9th Asia Pacific Conference of Speech, Language and Hearing, and 2011 National Organization for Hearing Research Foundation Research Awards in States. Dr. Chen is now serving as associate editor/editorial member of 《Frontiers in Psychology》《Biomedical Signal Processing and Control》《Physiological Measurement》.

Afternoon Sessions (14:00 - 17:30)
Prof. Moore (http://staffwww.dcs.shef.ac.uk/people/R.K.Moore/) has over 40 years’ experience in Speech Technology R&D and, although an engineer by training, much of his research has been based on insights from human speech perception and production. As Head of the UK Government's Speech Research Unit from 1985 to 1999, he was responsible for the development of the Aurix range of speech technology products and the subsequent formation of 20/20 Speech Ltd. Since 2004 he has been Professor of Spoken Language Processing at the University of Sheffield, and also holds Visiting Chairs at Bristol Robotics Laboratory and University College London Psychology & Language Sciences. Prof. Moore was President of the European/International Speech Communication Association from 1997 to 2001, General Chair for INTERSPEECH-2009 and ISCA Distinguished Lecturer during 2014-15. In 2017 he organised the first international workshop on ‘Vocal Interactivity in-and-between Humans, Animals and Robots (VIHAR)’. Prof. Moore is the current Editor-in-Chief of Computer Speech & Language and in 2016 he was awarded the LREC Antonio Zampoli Prize for "Outstanding Contributions to the Advancement of Language Resources & Language Technology Evaluation within Human Language Technologies".
Jianfeng Gao (primary contact) is a Partner Research Manager at Microsoft Research AI, Redmond. IEEE Fellow. He leads the development of AI systems for natural language understanding, vision language processing, dialogue, and business applications. He frequently gives tutorials on similar or related topics at conferences and summer schools. Examples include the tutorials on “deep learning for NLP and IR” at ICASSP 2014, HLT-NAACL 2015, IJCAI 2016, and International Summer School on Deep Learning 2017 in Bilbao, as well as the tutorials on “neural approaches to conversational AI” at ACL 2018, SIGIR 2018, and ICML 2019, etc.
Chenyan Xiong is a Senior Researcher in Microsoft Research AI, Redmond. His research area is in the intersection of information retrieval, natural language processing, and deep learning. His current research focus is on long-form text understanding, conversational information access, and neural information retrieval. Before joining Microsoft Research AI, Chenyan obtained his Ph.D. at Language Technologies Institute, Carnegie Mellon University in 2018.
Paul Bennett is a Sr. Principal Research Manager at Microsoft Research AI where he leads the Information Data Sciences group. His published research has focused on a variety of topics surrounding the use of machine learning in information retrieval – including ensemble methods and the combination of information sources, calibration, consensus methods for noisy supervision labels, active learning and evaluation, supervised classification and ranking, crowdsourcing, behavioral modeling and analysis, and personalization. Paul gave the tutorial on “Machine Learning and IR: Recent Successes and New Opportunities” in ICML 2009 and ECIR 2010.


Kyu Jeong Han received his PhD from USC in 2009 and is currently working for ASAPP Inc. as a Principal Speech Scientist, focusing on deep learning technologies for speech applications. Dr. Han held research positions at IBM, Ford, Capio.ai (acquired by Twilio) and JD.com. He is actively involved in the speech community as well, serving as reviewers for IEEE, ISCA and ACL journals and conferences, and a Technical Committee member in the Speech and Language Processing committee of the IEEE SPS since 2019. He also serves for IEEE SLT-2020 as part of the Organizing Committee. In 2018, he won the ISCA Award for the Best Paper Published in Computer Speech & Language 2013-2017.
Tae Jin Park received his B.S. degree in electrical engineering and M.S. degree in Electric Engineering and Computer Science from Seoul National University, Seoul, South Korea. in 2010 and 2012, respectively. In 2012, he joined Electrical and Telecommunication Research Institute (ETRI), Daejeon, South Korea, as a researcher. He is currently a Ph.D. candidate in Signal Analysis and Interpretation Laboratory (SAIL) at University of Southern California (USC). He is interested in machine learning and speech signal processing concentrating on speaker diarization.
Dr. D. Dimitriadis is a Principal Researcher in Microsoft, WA, where he is leading the Federated Learning research project. He worked as a Researcher in IBM Research, NY and AT&T Labs, NJ, and lecturer P.D 407/80 in School of ECE, NTUA, Greece. He is a Senior Member of IEEE. He was part of the Program Committee for the Multi-Learn’17 Workshop, and the Organizing Committee for IEEE SLT'18 and ICASSP'23. He has also served as session chair in multiple conferences. Dr. Dimitriadis has published more than 60 papers in peer-reviewed scientific journals and conferences with over 1500 citations. He received his PhD degree from NTUA in February 2005. His PhD Thesis title is "Non-Linear Speech Processing, Modulation Models and Applications to Speech Recognition". His major was in D.S.P. with Specialization in Speech Processing.


Vikram Ramanarayanan is a Senior Research Scientist in the Speech and NLP Group of Educational Testing Service R&D based out of the San Francisco office, where is he is also the Office Manager. He also holds an Assistant Adjunct Professor appointment in the Department of Otolaryngology - Head and Neck Surgery at the University of California, San Francisco. His work at ETS on dialog and multimodal systems with applications to language learning and behavioral assessment won the prestigious ETS Presidential Award. Vikram's research interests lie in applying scientific knowledge to interdisciplinary engineering problems in speech, language and vision and in turn using engineering approaches to drive scientific understanding. He holds M.S and Ph.D degrees in Electrical Engineering from the University of Southern California, Los Angeles, and is a Fellow of the USC Sidney Harman Academy for Polymathic Study and a Senior Member of the IEEE. Vikram’s work has won two Best Paper awards at top international conferences and has resulted in over 75 publications at refereed international journals and conferences and 10 patents filed. Webpage: http://www.vikramr.com/.
Klaus Zechner (Ph.D., Carnegie Mellon University) is a Senior Research Scientist, leading a team of speech scientists within the Natural Language Processing and Speech Group in the Research and Development Division of Educational Testing Service (ETS) in Princeton, New Jersey, USA. Since joining ETS in 2002, he has been pioneering research and development of technologies for automated scoring of non-native speech. Since 2011 he has been leading large annual R&D projects dedicated to the continuous improvement of automated speech scoring technology. He holds around 20 patents on technology related to SpeechRater®, an automated speech scoring system he and his team have been developing at ETS. SpeechRater is currently used operationally as contributory scoring system, along with human raters, for the TOEFL® iBT Speaking assessment, and further as sole scoring system for the TOEFL® Practice Online (TPO) Speaking assessment, the TOEFL MOOC, and is licensed by multiple external clients to support English language learning. Klaus Zechner authored more than 80 peer-reviewed publications in journals, book chapters, conference and workshop proceedings, and research reports. In 2019, a book on automated speaking assessment was published by Routledge where he was the main editor; it provides an overview of the current state-of-the-art in automated speech scoring of spontaneous non-native speech. Webpage: https://www.researchgate.net/profile/Klaus_Zechner
Keelan Evanini is a Research Director at Educational Testing Service in Princeton, NJ. His research interests include automated assessment of non-native spoken English for large-scale assessments, automated feedback in computer assisted language learning applications, and spoken dialog systems. He leads a team of research scientists that conducts foundational research into automated speech scoring and spoken dialog technology. He also leads a team of research engineers that focuses on applied engineering and capability implementation for ETS automated scoring engines. He received his Ph.D. in Linguistics from the University of Pennsylvania in 2009 under the supervision of Bill Labov, and has worked at ETS Research since then. He has published over 70 papers in peer-reviewed journals and conference proceedings, has been awarded 9 patents, and is a senior member of the IEEE. Webpage: http://evanini.com/keelan.html

