Шрифт:
Интервал:
Закладка:
2369
Yoshimura T., Tokuda K., Masukoy T., Kobayashiy T., Kitamura T. (1999). Simultaneous modeling of spectrum, pitch and duration in HMM-based speech synthesis // http://www.sp.nitech.ac.jp/~zen/yossie/mypapers/euro_hungary99.pdf
2370
Imai S., Sumita K., Furuichi C. (1983). Mel Log Spectrum Approximation (MLSA) Filter for Speech Synthesis / Electronics and Communications in Japan, Vol. 66-A, No. 2, 1983 // https://doi.org/10.1002/ecja.4400660203
2371
Отрадных Ф. П. (1953). Эпизод из жизни академика А. А. Маркова // Историко-математические исследования. № 6. С. 495—508 // http://pyrkov-professor.ru/default.aspx?tabid=195&ArticleId=44
2372
Chen S.-H., Hwang S.-H., Wang Y.-R. (1998). An RNN-based prosodic information synthesizer for Mandarin text-to-speech / IEEE Transactions on Speech and Audio Processing, Vol. 6, No. 3, pp. 226—239 // https://doi.org/10.1109/89.668817
2373
Zen H., Senior A., Schuster M. (2013). Statistical parametric speech synthesis using deep neural networks / Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2013 // https://doi.org/10.1109/ICASSP.2013.6639215
2374
Kang S., Qian X., Meng H. (2013). Multi-distribution deep belief network for speech synthesis / Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2013 // https://doi.org/10.1109/ICASSP.2013.6639225
2375
Ling Z.-H., Deng L., Yu D. (2013). Modeling Spectral Envelopes Using Restricted Boltzmann Machines and Deep Belief Networks for Statistical Parametric Speech Synthesis / IEEE Transactions on Audio, Speech, and Language Processing, Vol. 21(10), pp. 2129—2139 // https://doi.org/10.1109/tasl.2013.2269291
2376
Lu H., King S., Watts O. (2013). Combining a vector space representation of linguistic context with a deep neural network for text-to-speech synthesis / Proceedings of the 8th ISCASpeech Synthesis Workshop (SSW), 2013 // http://ssw8.talp.cat/papers/ssw8_PS3-3_Lu.pdf
2377
Qian Y., Fan Y., Hu W., Soong F. K. (2014). On the training aspects of deep neural network (DNN) for parametric TTS synthesis / Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2014 // https://doi.org/10.1109/ICASSP.2014.6854318
2378
Fan Y., Qian Y., Xie F., Soong F. K. (2014). TTS synthesis with bidirectional LSTM based recurrent neural networks / Interspeech 2014, 15th Annual Conference of the International Speech Communication Association, Singapore, September 14—18, 2014 // https://www.isca-speech.org/archive/archive_papers/interspeech_2014/i14_1964.pdf
2379
Fernandez R., Rendel A., Ramabhadran B., Hoory R. (2015). Using Deep Bidirectional Recurrent Neural Networks for Prosodic-Target Prediction in a Unit-Selection Text-to-Speech System / Interspeech 2015, 16th Annual Conference of the International Speech Communication Association, 2015 // https://www.isca-speech.org/archive/interspeech_2015/i15_1606.html
2380
Wu Z., Valentini-Botinhao C., Watts O., King S. (2015). Deep neural networks employing multi-task learning and stacked bottleneck features for speech synthesis / Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2015 // https://doi.org/10.1109/ICASSP.2015.7178814
2381
Zen H. (2015). Acoustic Modeling in Statistical Parametric Speech Synthesis — From HMM to LSTM-RNN / Proceedings of the First International Workshop on Machine Learning in Spoken Language Processing (MLSLP2015), Aizu, Japan, 19–20 September 2015 // https://research.google/pubs/pub43893/
2382
Merritt T., Clark R. A. J., Wu Z., Yamagishi J., King S. (2016). Deep neural network-guided unit selection synthesis / 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) // https://doi.org/10.1109/ICASSP.2016.7472658
2383
Holschneider M., Kronland-Martinet R., Morlet J., Tchamitchian P. (1989). A real-time algorithm for signal analysis with the help of the wavelet transform / Combes J.-M., Grossmann A., Tchamitchian P. (1989). Wavelets: Time-Frequency Methods and Phase Space. Springer Berlin Heidelberg // https://books.google.ru/books?id=3R74CAAAQBAJ
2384
Dutilleux P. An implementation of the “algorithme a trous” to compute the wavelet transform / Combes J.-M., Grossmann A., Tchamitchian P. (1989). Wavelets: Time-Frequency Methods and Phase Space. Springer Berlin Heidelberg // https://books.google.ru/books?id=3R74CAAAQBAJ
2385
Yu F., Koltun V. (2016). Multi-scale context aggregation by dilated convolutions / http://arxiv.org/abs/1511.07122
2386
Chen L.-C., Papandreou G., Kokkinos I., Murphy K., Yuille A. L. (2015). Semantic image segmentation with deep convolutional nets and fully connected CRFs // http://arxiv.org/abs/1412.7062
2387
van den Oord A., Dieleman S., Zen H., Simonyan K., Vinyals O., Graves A., Kalchbrenner N., Senior A., Kavukcuoglu K. (2016). WaveNet: A generative model for raw audio // https://arxiv.org/pdf/1609.03499.pdf
2388
van den Oord A., Dieleman S. (2016). WaveNet: A generative model for raw audio // https://deepmind.com/blog/article/wavenet-generative-model-raw-audio
2389
van den Oord A., Li Y., Babuschkin I., Simonyan K., Vinyals O., Kavukcuoglu K., van den Driessche G., Lockhart E., Cobo L. C., Stimberg F., Casagrande N., Grewe D., Noury S., Dieleman S., Elsen E., Kalchbrenner N., Zen H., Graves A., King H., Walters T., Belov D., Hassabis D. (2017). Parallel WaveNet: Fast High-Fidelity Speech Synthesis // https://arxiv.org/abs/1711.10433
2390
Jin Z., Finkelstein A., Mysore G. J., Lu J. (2018). FFTNet: A Real-Time Speaker-Dependent Neural Vocoder / 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) // https://doi.org/10.1109/ICASSP.2018.8462431
2391
Kalchbrenner N., Elsen E., Simonyan K., Noury S., Casagrande N., Lockhart E., Stimberg F., van den Oord A., Dieleman S., Kavukcuoglu K. (2018). Efficient Neural Audio Synthesis // https://arxiv.org/abs/1802.08435
2392
Prenger R., Valle R., Catanzaro B. (2018). WaveGlow: A Flow-based Generative Network for Speech Synthesis // https://arxiv.org/abs/1811.00002
2393
Valin J.-M., Skoglund J. (2018). LPCNet: Improving Neural Speech Synthesis Through Linear Prediction // https://arxiv.org/abs/1810.11846
2394
Govalkar P., Fischer J., Zalkow F., Dittmar C. (2019). A Comparison of Recent Neural Vocoders for Speech Signal Reconstruction / 10th ISCA Speech Synthesis Workshop, 20—22 September 2019, Vienna, Austria // https://doi.org/10.21437/SSW.2019-2
2395
Wang Y., Skerry-Ryan RJ, Stanton D., Wu Y., Weiss