MACHINE LEARNING–BASED ANALYSIS OF ENGLISH LATERAL ALLOPHONES - Publikacja - MOST Wiedzy

Wyszukiwarka

MACHINE LEARNING–BASED ANALYSIS OF ENGLISH LATERAL ALLOPHONES

Abstrakt

Automatic classification methods, such as artificial neural networks (ANNs), the k-nearest neighbor (kNN) and selforganizing maps (SOMs), are applied to allophone analysis based on recorded speech. A list of 650 words was created for that purpose, containing positionally and/or contextually conditioned allophones. For each word, a group of 16 native and non-native speakers were audio-video recorded, from which seven native speakers’ and phonology experts’ speech was selected for analyses. For the purpose of the present study, a sub-list of 103 words containing the English alveolar lateral phoneme /l/ was compiled. The list includes ‘dark’ (velarized) allophonic realizations (which occur before a consonant or at the end of the word before silence) and 52 ‘clear’ allophonic realizations (which occur before a vowel), as well as voicing variants. The recorded signals were segmented into allophones and parametrized using a set of descriptors, originating from the MPEG 7 standard, plus dedicated time-based parameters as well as modified MFCC features proposed by the authors. Classification methods such as ANNs, the kNN and the SOM were employed to automatically detect the two types of allophones. Various sets of features were tested to achieve the best performance of the automatic methods. In the final experiment, a selected set of features was used for automatic evaluation of the pronunciation of dark /l/ by non-native speakers.

Cytowania

  • 1 2

    CrossRef

  • 0

    Web of Science

  • 1 7

    Scopus

Cytuj jako

Pełna treść

pobierz publikację
pobrano 9 razy
Wersja publikacji
Accepted albo Published Version
Licencja
Creative Commons: CC-BY-NC-ND otwiera się w nowej karcie

Słowa kluczowe

Informacje szczegółowe

Kategoria:
Publikacja w czasopiśmie
Typ:
artykuły w czasopismach
Opublikowano w:
International Journal of Applied Mathematics and Computer Science nr 29, strony 393 - 405,
ISSN: 1641-876X
Język:
angielski
Rok wydania:
2019
Opis bibliograficzny:
Piotrowska M., Korvel G., Kostek B., Ciszewski T., Czyżewski A.: MACHINE LEARNING–BASED ANALYSIS OF ENGLISH LATERAL ALLOPHONES// International Journal of Applied Mathematics and Computer Science -Vol. 29,iss. 2 (2019), s.393-405
DOI:
Cyfrowy identyfikator dokumentu elektronicznego (otwiera się w nowej karcie) 10.2478/amcs-2019-0029
Bibliografia: test
  1. Ali, A.A., Van der Spiegel, J., Mueller, P., Haentjens, G. and Berman, J. (1999). An acoustic-phonetic feature-based system for automatic phoneme recognition in continuous speech, Proceedings of the 1999 IEEE International Sym- posium on Circuits and Systems, ISCAS'99, Orlando, FL, USA, Vol. 3, pp. 118-121.
  2. Almajai, I., Cox, S., Harvey, R. and Lan, Y. (2016). Improved speaker independent lip reading using speaker adaptive training and deep neural networks, Proceedings of the 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Shanghai, China, pp. 2722-2726. otwiera się w nowej karcie
  3. Aubanel, V. and Nguyen, N. (2010). Automatic recognition of regional phonological variation in conversational interaction, Speech Communication 52(6): 577-586. otwiera się w nowej karcie
  4. Baghdasaryan, A.G. and Beex, A. (2011). Automatic phoneme recognition with segmental hidden Markov models, 2011 otwiera się w nowej karcie
  5. Conference Record of the 45th Asilomar Conference on Signals, Systems and Computers (ASILOMAR), Pacific Grove, CA, USA, pp. 569-574. otwiera się w nowej karcie
  6. Baken, R.J. and Orlikoff, R.F. (2000). Clinical Measurement of Speech and Voice, 2nd Edn., Singular Thomson Learning, San Diego, CA. otwiera się w nowej karcie
  7. Benezeth, Y., Bachman, G., Le-Jan, G., Souviraà-Labastie, N. and Bimbot, F. (2011). BL-Database: A French Audiovi- sual Database for Speech Driven Lip Animation Systems, PhD thesis, INRIA, Rennes.
  8. Biswas, A., Sahu, P.K. and Chandra, M. (2015). Multiple camera in car audio-visual speech recognition using phonetic and visemic information, Computers & Electrical Engineering 47(2015): 35-50. otwiera się w nowej karcie
  9. Brocki, Ł. and Marasek, K. (2015). Deep belief neural networks and bidirectional long-short term memory hybrid for speech recognition, Archives of Acoustics 40(2): 191-195. otwiera się w nowej karcie
  10. Cooke, M., Barker, J., Cunningham, S. and Shao, X. (2006). An audio-visual corpus for speech perception and automatic speech recognition, The Journal of the Acoustical Society of America 120(5): 2421-2424. otwiera się w nowej karcie
  11. Czyzewski, A., Bratoszewski, P., Hoffmann, P., Lech, M. and Szczodrak, M. (2017a). The project IDENT: Multimodal biometric system for bank client identity verification, In- ternational Conference on Multimedia Communications, Services and Security, Poznań, Poland, pp. 16-32. otwiera się w nowej karcie
  12. Czyzewski, A., Kostek, B., Bratoszewski, P., Kotus, J. and Szykulski, M. (2017b). An audio-visual corpus for multimodal automatic speech recognition, Journal of In- telligent Information Systems 49(2): 167-192. otwiera się w nowej karcie
  13. Czyzewski, A., Kostek, B., Ciszewski, T. and Majewicz, D. (2013). Language material for English audiovisual speech recognition system development, The Journal of the Acoustical Society of America 134/5: 4069. otwiera się w nowej karcie
  14. Dalka, P., Bratoszewski, P. and Czyzewski, A. (2014). Visual lip contour detection for the purpose of speech recognition, 2014 International Conference on Signals and Electronic Systems (ICSES), Poznań, Poland, pp. 1-4. otwiera się w nowej karcie
  15. Fox, N.A., O'Mullane, B.A. and Reilly, R.B. (2005). Valid: A new practical audio-visual database, and comparative results, International Conference on Audio and Video- Based Biometric Person Authentication, Rye Brook, NY, USA, pp. 777-786. otwiera się w nowej karcie
  16. Giegerich, H.J. (1992). English Phonology: An Introduction, Cambridge University Press, Cambridge.
  17. Giles, S.B. and Moll, K.L. (1975). Cinefluorographic study of selected allophones of English /i/, Phonetica 31(3-4): 206-227. otwiera się w nowej karcie
  18. Gillick, L. and Cox, S.J. (1989). Some statistical issues in the comparison of speech recognition algorithms, 1989 Inter- national Conference on Acoustics, Speech, and Signal Pro- cessing, ICASSP-89, Glasgow, UK, pp. 532-535. otwiera się w nowej karcie
  19. Jadczyk, T. and Ziółko, M. (2015). Audio-visual speech processing system for polish with dynamic Bayesian network models, Proceedings of the World Congress on Electrical Engineering and Computer Systems and Science (EECSS 2015), Barcelona, Spain, pp. 13-14. otwiera się w nowej karcie
  20. Kim, H.-G., Moreau, N. and Sikora, T. (2006). MPEG-7 Audio and Beyond: Audio Content Indexing and Retrieval, John Wiley & Sons, Chichester. otwiera się w nowej karcie
  21. Kłosowski, P. (2017). Statistical analysis of orthographic and phonemic language corpus for word-based and phoneme-based Polish language modelling, EURASIP Journal on Audio, Speech, and Music Processing 2017(1): 5. otwiera się w nowej karcie
  22. Korvel, G., Kurowski, A., Kostek, B. and Czyzewski, A. (2019). Speech analytics based on machine learning, in G. Tsihrintzis et al. (Eds.), Machine Learning Paradigms, Springer, Cham, pp. 129-157. otwiera się w nowej karcie
  23. Kostek, B., Kupryjanow, A., Zwan, P., Jiang, W., Raś, Z.W., Wojnarski, M. and Swietlicka, J. (2011). Report of the ISMIS 2011 contest: Music information retrieval, Inter- national Symposium on Methodologies for Intelligent Sys- tems, Warsaw, Poland, pp. 715-724. otwiera się w nowej karcie
  24. Kozierski, P., Sadalla, T., Drgas, S. and Dąbrowski, A. (2016). Allophones in automatic whispery speech recognition, 21st International Conference on Methods and Models in Au- tomation and Robotics (MMAR), Międzyzdroje, Poland, pp. 811-815. otwiera się w nowej karcie
  25. Kunka, B., Kupryjanow, A., Dalka, P., Bratoszewski, P., Szczodrak, M., Spaleniak, P., Szykulski, M. and Czyzewski, A. (2013). Multimodal English corpus for automatic speech recognition, Signal Processing: Algo- rithms, Architectures, Arrangements, and Applications (SPA), Poznań, Poland, pp. 106-111.
  26. Kupryjanow, A. and Czyzewski, A. (2013). Real-time speech signal segmentation methods, Journal of the Audio Engi- neering Society 61(7/8): 521-534. otwiera się w nowej karcie
  27. Makowski, R. and Hossa, R. (2014). Automatic speech signal segmentation based on the innovation adaptive filter, In- ternational Journal of Applied Mathematics and Computer Science 24(2): 259-270, DOI: 10.2478/amcs-2014-0019. otwiera się w nowej karcie
  28. Marasek, K. and Gubrynowicz, R. (2005). otwiera się w nowej karcie
  29. Multi-level annotation in SpeeCon Polish speech database, in L. Bolc et al. (Eds.), Intelligent Media Technology for Communica- tive Intelligence, Springer, Berlin/Heidelberg, pp. 58-67. otwiera się w nowej karcie
  30. McNemar, Q. (1947). Note on the sampling error of the difference between correlated proportions or percentages, Psychometrika 12(2): 153-157. otwiera się w nowej karcie
  31. Mermelstein, P. (1976). Distance measures for speech recognition, psychological and instrumental, in C.H. Chen (Ed.), Pattern Recognition and Artificial Intelligence, Vol. 116, Academic Press, New York, NY, pp. 374-388.
  32. Misra, H., Ikbal, S., Bourlard, H. and Hermansky, H. (2004). Spectral entropy based feature for robust ASR, Pro- ceedings of IEEE International Conference on Acous- tics, Speech, and Signal Processing (ICASSP), Montreal, Canada, EPFL-CONF-83132. otwiera się w nowej karcie
  33. Mitterer, H., Reinisch, E. and McQueen, J.M. (2018). otwiera się w nowej karcie
  34. Allophones, not phonemes in spoken-word recognition, Journal of Memory and Language 98(2018): 77-92.
  35. Mroueh, Y., Marcheret, E. and Goel, V. (2015). Deep multimodal learning for audio-visual speech recognition, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Brisbane, QLD, Aus- tralia, pp. 2130-2134. otwiera się w nowej karcie
  36. Noda, K., Yamaguchi, Y., Nakadai, K., Okuno, H.G. and Ogata, T. (2015). Audio-visual speech recognition using deep learning, Applied Intelligence 42(4): 722-737. otwiera się w nowej karcie
  37. Panek, D., Skalski, A., Gajda, J. and Tadeusiewicz, R. (2015). Acoustic analysis assessment in speech pathology detection, International Journal of Applied Mathe- matics and Computer Science 25(3): 631-643, DOI: 10.1515/amcs-2015-0046. otwiera się w nowej karcie
  38. Piotrowska, M., Korvel, G., Kostek, B., Rojczyk, A. and Czyzewski, A. (2018). Objectivization of phonological evaluation of speech elements by means of audio parametrization, 2018 11th International Conference on Human System Interaction (HSI), Gdańsk, Poland, pp. 325-331. otwiera się w nowej karcie
  39. Plewa, M. and Kostek, B. (2015). Music mood visualization using self-organizing maps, Archives of Acoustics 40(4): 513-525. otwiera się w nowej karcie
  40. Recasens, D. (2012). A cross-language acoustic study of initial and final allophones of /l/, Speech Communication 54(3): 368-383. otwiera się w nowej karcie
  41. Song, Y., Wang, W.-H. and Guo, F.-J. (2009). Feature extraction and classification for audio information in news video, In- ternational Conference on Wavelet Analysis and Pattern Recognition, ICWAPR 2009, Baoding, China, pp. 43-46.
  42. Stefanowski, J., Krawiec, K. and Wrembel, R. (2017). Exploring complex and big data, International Journal of Applied Mathematics and Computer Science 27(4): 669-679, DOI: 10.1515/amcs-2017-0046. otwiera się w nowej karcie
  43. Trojanová, J., Hrúz, M., Campr, P. and Železnỳ, M. (2008). Design and recording of Czech audio-visual database with impaired conditions for continuous speech recognition, Proceedings of the 6th International Conference on Lan- guage Resources and Evaluation (LREC'08), Marrakech, Morocco, pp. 1-5.
  44. Venkateswarlu, R. and Kumari, R.V. (2011). Novel approach for speech recognition by using self-organized maps, 2011 International Conference on Emerging Trends in Networks and Computer Communications (ETNCC), Udaipur, India, pp. 215-222. otwiera się w nowej karcie
  45. Wang, Y. and Van Hamme, H. (2011). Gaussian selection using self-organizing map for automatic speech recognition, In- ternational Workshop on Self-Organizing Maps, Espoo, Finland, pp. 218-227. otwiera się w nowej karcie
  46. Zelasko, P., Ziółko, B., Jadczyk, T. and Skurzok, D. (2016). AGH corpus of Polish speech, Language Resources and Evaluation 50(3): 585-601. otwiera się w nowej karcie
  47. Ziółko, B. and Ziółko, M. (2009). Time durations of phonemes in Polish language for speech and speaker recognition, Language and Technology Conference, Poznań, Poland, pp. 105-114. otwiera się w nowej karcie
Źródła finansowania:
Weryfikacja:
Politechnika Gdańska

wyświetlono 118 razy

Publikacje, które mogą cię zainteresować

Meta Tagi