SP-P6: Speech Synthesis using Neural Networks I |
| Session Type: Poster |
| Time: Wednesday, March 23, 13:30 - 15:30 |
| Location: Poster Area H |
| Session Chair: Frank Soong, Microsoft
|
| |
|
SP-P6.1: FROM HMMS TO DNNS: WHERE DO THE IMPROVEMENTS COME FROM? |
| Oliver Watts; University of Edinburgh |
| Gustav Eje Henter; University of Edinburgh |
| Thomas Merritt; University of Edinburgh |
| Zhizheng Wu; University of Edinburgh |
| Simon King; University of Edinburgh |
| |
|
SP-P6.2: DEEP BELIEF NETWORK-BASED POST-FILTERING FOR STATISTICAL PARAMETRIC SPEECH SYNTHESIS |
| Ya-Jun Hu; University of Science and Technology of China |
| Zhen-Hua Ling; University of Science and Technology of China |
| Li-Rong Dai; University of Science and Technology of China |
| |
|
SP-P6.3: A KL DIVERGENCE AND DNN APPROACH TO CROSS-LINGUAL TTS |
| Feng-Long Xie; Harbin Institute of Technology |
| Frank K. Soong; Microsoft Research Asia |
| Haifeng Li; Harbin Institute of Technology |
| |
|
SP-P6.4: GATING RECURRENT MIXTURE DENSITY NETWORKS FOR ACOUSTIC MODELING IN STATISTICAL PARAMETRIC SPEECH SYNTHESIS |
| Wenfu Wang; Institute of Automation, Chinese Academy of Sciences |
| Shuang Xu; Institute of Automation, Chinese Academy of Sciences |
| Bo Xu; Institute of Automation, Chinese Academy of Sciences |
| |
|
SP-P6.5: WAVELET-BASED DECOMPOSITION OF F0 AS A SECONDARY TASK FOR DNN-BASED SPEECH SYNTHESIS WITH MULTI-TASK LEARNING |
| Manuel Sam Ribeiro; The Centre for Speech Technology Research, University of Edinburgh |
| Oliver Watts; The Centre for Speech Technology Research, University of Edinburgh |
| Junichi Yamagishi; The Centre for Speech Technology Research, University of Edinburgh |
| Robert A. J. Clark; The Centre for Speech Technology Research, University of Edinburgh |
| |
|
SP-P6.6: SPEAKER ADAPTIVE MODEL BASED ON BOLTZMANN MACHINE FOR NON-PARALLEL TRAINING IN VOICE CONVERSION |
| Toru Nakashika; The University of Electro-Communications |
| Yasuhiro Minami; The University of Electro-Communications |
| |
|
SP-P6.7: A DEEP AUTO-ENCODER BASED LOW-DIMENSIONAL FEATURE EXTRACTION FROM FFT SPECTRAL ENVELOPES FOR STATISTICAL PARAMETRIC SPEECH SYNTHESIS |
| Shinji Takaki; National Institute of Informatics |
| Junichi Yamagishi; National Institute of Informatics |
| |
|
SP-P6.8: SPEAKER AND LANGUAGE FACTORIZATION IN DNN-BASED TTS SYNTHESIS |
| Yuchen Fan; Microsoft Corporation |
| Yao Qian; Educational Testing Service Research |
| Frank K. Soong; Microsoft Corporation |
| Lei He; Microsoft Corporation |
| |
|
SP-P6.9: LEARNING CROSS-LINGUAL INFORMATION WITH MULTILINGUAL BLSTM FOR SPEECH SYNTHESIS OF LOW-RESOURCE LANGUAGES |
| Quanjie Yu; Tsinghua University |
| Peng Liu; Tsinghua University |
| Zhiyong Wu; Tsinghua University |
| Shiyin Kang; The Chinese University of Hong Kong |
| Helen Meng; The Chinese University of Hong Kong |
| Lianhong Cai; Tsinghua University |
| |
|
SP-P6.10: LOW LEVEL DESCRIPTORS BASED DBLSTM BOTTLENECK FEATURE FOR SPEECH DRIVEN TALKING AVATAR |
| Xinyu Lan; Tsinghua University |
| Xu Li; Tsinghua University |
| Yishuang Ning; Tsinghua University |
| Zhiyong Wu; Tsinghua University |
| Helen Meng; The Chinese University of Hong Kong |
| Jia Jia; Tsinghua University |
| Lianhong Cai; Tsinghua University |
| |