Home
About

About the Conference Welcome from the Chair Conference Committees Area Chairs Organizers ISCA
Calls

Papers Surveys Satellite Workshops Tutorials Show & Tell Special Sessions & Challenges Areas & Topics Important Dates
Authors

Author Resources Submission Policy ISCA Ethics Paper Submission Presentation Guidelines
Program

Program at a Glance Technical Program Presentation Videos Presentation Guidelines Keynotes Satellite Workshops Tutorials Special Sessions & Challenges Show & Tell
Student Information

Student Events Travel Grants
Venue & Travel

Conference Venue & Accommodations Transportations Visa About Shanghai
Registration

Registration Overview & Fees ISCA Membership ISCA Code of Conduct Online Registration
Sponsorships & Exhibition

Sponsors Virtual Booth Satellite Events Acknowledgement
Contact

Contact Us

Program

Program at a Glance

Technical Program

Presentation Videos

Presentation Guidelines

Satellite Workshops

Special Sessions & Challenges

Singing and Multimodal Synthesis

Position: Home > Program > Technical Program > Monday 21:45-22:45(GMT+8), October 26 > Singing and Multimodal Synthesis >

Mon-3-8-1 Adversarially Trained Multi-Singer Sequence-To-Sequence Singing Synthesizer

Jie Wu(Xiaoice, Software Technology Center Asia, Microsoft) and Jian Luan(Microsoft)

Abstract: This paper presents a high quality singing synthesizer that is able to model a voice with limited available recordings. Based on the sequence-to-sequence singing model, we design a multi-singer framework to leverage all the existing singing data of different singers. To attenuate the issue of musical score unbalance among singers, we incorporate an adversarial task of singer classification to make encoder output less singer dependent. Furthermore, we apply multiple random window discriminators (MRWDs) on the generated acoustic features to make the network be a GAN. Both objective and subjective evaluations indicate that the proposed synthesizer can generate higher quality singing voice than baseline (4.12 vs 3.53 in MOS). Especially, the articulation of high-pitched vowels is significantly enhanced.

Paper

prev No More

next Mon-3-8-2 PREDICTION OF HEAD MOTION FROM SPEECH WAVEFORMS WITH A CANONICAL-CORRELATION-CONSTRAINED AUTOENCODER

About

About the Conference

Welcome from the Chair

Conference Committees

Calls

Satellite Workshops

Special Sessions & Challenges

Important Dates

Program

Program at a Glance

Technical Program

Presentation Videos

Presentation Guidelines

Satellite Workshops

Special Sessions & Challenges

Student Information

Venue & Travel

Conference Venue & Accommodations

Transportations

Sponsorships & Exhibition

Satellite Events

Acknowledgement