Comparison of Lithuanian and Polish Consonant Phonemes Based on Acoustic Analysis – Preliminary Results - Publikacja - MOST Wiedzy


Comparison of Lithuanian and Polish Consonant Phonemes Based on Acoustic Analysis – Preliminary Results


The goal of this research is to find a set of acoustic parameters that are related to differences between Polish and Lithuanian language consonants. In order to identify these differences, an acoustic analysis is performed, and the phoneme sounds are described as the vectors of acoustic parameters. Parameters known from the speech domain as well as those from the music information retrieval area are employed. These parameters are time- and frequency-domain descriptors. English language as an auxiliary language is used in the experiments. In the first part of the experiments, an analysis of Lithuanian and Polish language samples is carried out, features are extracted, and the most discriminating ones are determined. In the second part of the experiments, automatic classification of Lithuanian/English, Polish/English, and Lithuanian/Polish phonemes is performed.


  • 0


  • 0

    Web of Science

  • 3


Autorzy (3)

Cytuj jako

Pełna treść

pobierz publikację
pobrano 61 razy
Wersja publikacji
Accepted albo Published Version
Creative Commons: CC-BY-SA otwiera się w nowej karcie

Słowa kluczowe

Informacje szczegółowe

Publikacja w czasopiśmie
artykuły w czasopismach
Opublikowano w:
Archives of Acoustics nr 44, strony 693 - 707,
ISSN: 0137-5075
Rok wydania:
Opis bibliograficzny:
Korvel G., Kurasova O., Kostek B.: Comparison of Lithuanian and Polish Consonant Phonemes Based on Acoustic Analysis – Preliminary Results// Archives of Acoustics -Vol. 44,iss. 4 (2019), s.693-707
Cyfrowy identyfikator dokumentu elektronicznego (otwiera się w nowej karcie) 10.24425/aoa.2019.129725
Bibliografia: test
  1. Badshah A.M. et al. (2019), Deep features-based speech emotion recognition for smart affective services, Multimedia Tools and Applications, 78, 5, 5571-5589, doi: 10.1007/s11042-017-5292-7. otwiera się w nowej karcie
  2. Bourlard H. (2018), Evolution of Neural Network Architectures for speech recognition, Interspeech 2018, 19th Annual Conference of the International Speech Communication Association, Hyderabad, India, 2-6 September 2018, p. 1767.
  3. Chia Ai, Hariharan M., Yaacob S., Sin L. Chee (2012), Classification of speech dysfluencies with MFCC and LPCC features, Expert Systems with Ap- Archives of Acoustics -Volume 44, Number 4, 2019
  4. plications, 39, 2, 2157-2165, doi: 10.1016/j.eswa.2011. 07.065. otwiera się w nowej karcie
  5. Czyżewski A., Piotrowska M., Kostek B. (2017), Analysis of allophones based on audio signal recordings and parameterization, Journal of the Acoustical Society of America, 141, 5, 3521-3521, doi: 10.1121/1.4987415. otwiera się w nowej karcie
  6. Decker D.M. (1999), Handbook of the international phonetic association: a guide to the use of the interna- tional phonetic alphabet, Cambridge University Press.
  7. Demenko G., Wypych M., Baranowska E. (2003), Implementation of grapheme-to-phoneme rules and ex- tended SAMPA alphabet in Polish text-to-speech syn- thesis, Speech and Language Technology, 7, 17, 79-97.
  8. Deng L., Seltzer M.L., Yu D., Acero A., Mo- hamed A.-R., Hinton G.E. (2010), Binary coding of speech spectrograms using a deep auto-encoder, Pro- ceedings of the 11th Annual Conference of the Inter- national Speech Communication Association, INTER- SPEECH 2010, pp. 1692-1695.
  9. Duda R.O., Hart P.E., Stork D.G. (2000), Pat- tern classification, 2nd ed., New York: Wiley. otwiera się w nowej karcie
  10. Eringis D., Tamulevicius G. (2015), Modified filterbank analysis features for speech recognition, Baltic Journal of Modern Computing, 3, 1, 29-42, portal/projekti/bjmc/Contents/3_1_3_Eringis.pdf. otwiera się w nowej karcie
  11. Gales M.J.F., Knill K.M., Ragni A. (2015), Unicode-based graphemic systems for limited resource languages, IEEE International Conference on Acous- tics, Speech and Signal Processing (ICASSP), 2015, pp. 5186-5190, doi: 10.1109/ICASSP.2015.7178960. otwiera się w nowej karcie
  12. Gibbon D., Moore R., Winski R. (1997), Hand- book of standards and resources for spoken language systems, Berlin; New York: Mouton de Gruyter.
  13. Girdenis A.S. (2003), Theoretical bases of Lithuanian phonology [in Lithuanian: Teoriniai lietuvių fonologijos pagrindai], Vilnius: Mokslo ir enciklopediju˛leidybos in- stitutas.
  14. Greibus M., Ringelienė Ž., Telksnys L. (2017), The phoneme set influence for Lithuanian speech com- mands recognition accuracy, Open Conference of Elec- trical, Electronic and Information Sciences (eStream), 27-27 April 2017, Vilnius, Lithuania, pp. 82-85, doi: 10.1109/eStream.2017.7950321. otwiera się w nowej karcie
  15. Gut U. (2014), Introduction to English phonetics and phonology volume, Bern: Peter Lang. otwiera się w nowej karcie
  16. Gussmann E. (2007), The Phonology of Polish, New York: Oxford University Press.
  17. Howard D.M., Murphy D.T. (2007), Voice science, acoustics, and recording, San Diego, CA: Plural Pub- lishing.
  18. Garofolo J.S., Lamel L.F., Fisher W.M., Fiscus J.G., Pallett D.S., Dahlgren N.L. (1993), TIMIT acoustic-phonetic continuous speech corpus, LDC93S1. Web Download. Philadelphia: Linguistic Data Consor- tium. otwiera się w nowej karcie
  19. Igras M., Ziółko B., Jadczyk T. (2013), Au- diovisual database of Polish speech recordings, Stu- dia Informatica, 33, 2B, 163-172, doi: 10.21936/ si2012_v33.n2B.182.
  20. Izydorczyk J., Kłosowski P. (2001), Base acous- tic properties of Polish speech, International Confer- ence Programable Devices and Systems PDS2001 IFAC Workshop, Gliwice, November 22-23, pp. 61-66.
  21. Jassem W. (2003), Polish, Journal of the Inter- national Phonetic Association, 33, 1, 103-107, doi: 10.1017/S0025100303001191. otwiera się w nowej karcie
  22. Kasparaitis P. (2005), Diphone databases for Lithua- nian text-to-speech synthesis, Informatica, 2, 16, 193- 202. otwiera się w nowej karcie
  23. Kasparaitis P. (2008), Lithuanian speech recognition using the English recognizer, Informatica, 19, 4, 505- 516. otwiera się w nowej karcie
  24. Kim H.-G., Moreau N., Sikora T. (2005), MPEG-7 audio and beyond: audio content indexing and retrieval, New York: Wiley & Sons. otwiera się w nowej karcie
  25. Kłosowski P., Dustor A., Izydorczyk J., Ko- tas J., Slimok J. (2014), Speech recognition based on open source speech processing software, [In:] Computer Networks, CN. Vol. 431 of Communications in Com- puter and Information Science, ed. by A. Kwiecień, P. Gaj, and P. Stera, 21st International Science Confer- ence on Computer Networks (CN), Poland, June 23-27 (Springer-Verlag Berlin, 2014), pp. 308-317. otwiera się w nowej karcie
  26. Kłosowski P. (2017), Statistical analysis of or- thographic and phonemic language corpus for word- based and phoneme-based Polish language modelling, EURASIP Journal on Audio, Speech, and Music Pro- cessing, 2017, 5, doi: 10.1186/s13636-017-0102-8. otwiera się w nowej karcie
  27. Korvel G., Kostek B. (2017a), Examining feature vector for phoneme recognition, 2017 IEEE Interna- tional Symposium on Signal Processing and Informa- tion Technology (ISSPIT), Bilbao, 2017, pp. 394-398, doi: 10.1109/ISSPIT.2017.8388675. otwiera się w nowej karcie
  28. Korvel G., Kostek B. (2017b), Voiceless Stop Con- sonant Modelling and Synthesis Framework Based on MISO Dynamic System, Archives of Acoustics, 42, 3, 375-383, doi: 10.1515/aoa-2017-0039. otwiera się w nowej karcie
  29. Korvel G., Kurowski A., Kostek B., Czyzew- ski A. (2019), Speech analytics based on machine learn- ing, [in:] Tsihrintzis G., Sotiropoulos D., Jain L. [Eds], Machine Learning Paradigms. Intelligent Systems Ref- erence Library, Vol. 149, pp. 129-157, Springer: Cham, doi: 10.1007/978-3-319-94030-4. otwiera się w nowej karcie
  30. Korvel G., Treigys P., Tamulevičius G., Ber- natavičienė J., Kostek B. (2018), Analysis of 2d feature spaces for deep learning-based speech recogni- tion, Journal of the Audio Engineering Society, 66, 12, 1072-1081, doi: 10.17743/jaes.2018.0066. otwiera się w nowej karcie
  31. Kostek B. et al. (2011), Report of the ISMIS 2011 Contest: Music Information Retrieval, [in:] otwiera się w nowej karcie
  32. Kryszkiewicz M., Rybinski H., Skowron A., Raś Z.W. [Eds], Foundations of Intelligent Systems. ISMIS 2011. otwiera się w nowej karcie
  33. G. Korvel et al. -Comparison of Lithuanian and Polish Consonant Phonemes Based on Acoustic Analysis. . . 707
  34. Lecture Notes in Computer Science, Vol. 6804, pp. 715- 724, Springer: Berlin, Heidelberg, doi: 10.1007/978-3- 642-21916-0_75. otwiera się w nowej karcie
  35. Kostek B., Piotrowska M., Czyżewski A. (2017), Comparative study of self-organizing maps vs. subjec- tive evaluation of quality of allophone pronunciation for nonnative English speakers, 143rd Audio Engineer- ing Society Convention, preprint 9847, New York. otwiera się w nowej karcie
  36. Kozierski P., Sadalla T., Drgas S., Dąbrow- ski A. (2016), Allophones in automatic whispery speech recognition, 2016 21st International Confer- ence on Methods and Models in Automation and Robotics (MMAR), Miedzyzdroje, 2016, pp. 811-815, doi: 10.1109/MMAR.2016.7575241. otwiera się w nowej karcie
  37. Labarre T. (2011), LING550: CLMS project on Po- lish, CLMS_Project_on_Polish.
  38. Laurinciukaite S., Telksnys L., Kasparaitis P., Kliukiene R., Paukstyte V. (2018), Lithuanian Speech Corpus Liepa for development of human- computer interfaces working in voice recognition and synthesis mode, Informatica, 29, 3, 487-498, doi: 10.15388/informatica.2018.177. otwiera się w nowej karcie
  39. Lileikytė R., Gorin A., Lamel L., Gauvain J., Fraga-Silva T. (2016), Lithuanian broadcast speech transcription using semi-supervised acoustic model training, Procedia Computer Science, 81, 107-113, doi: 10.1016/j.procs.2016.04.037. otwiera się w nowej karcie
  40. Mitterer H., Reinisch E., Mcqueen J.M. (2018), Allophones, not phonemes in spoken-word recognition, Journal of Memory and Language, 98, 77-92, doi: 10.1016/j.jml.2017.09.005. otwiera się w nowej karcie
  41. Noroozi F., Kamińska D., Sapinski T., An- barjafari G. (2017), Supervised Vocal-Based Emo- tion Recognition Using Multiclass Support Vector Ma- chine, Random Forests, and AdaBoost, Journal of the Audio Engineering Society, 65, 7/8, 562-572, doi: 10.17743/jaes.2017.0022. otwiera się w nowej karcie
  42. Oliver D., Szklanny K. (2006), Creation and anal- ysis of a Polish speech database for use in unit se- lection synthesis, publikacje/lrec2006.pdf (accessed Jan. 2019).
  43. Padmanabhan J., Premkumar M.J.J. (2015), Ma- chine Learning in Automatic Speech Recognition: A Survey. IETE Technical Review, 32, 1-12, doi: 10.1080/02564602.2015.1010611. otwiera się w nowej karcie
  44. Przepiórkowski A., Bańko M., Górski R.L., Le- wandowska-Tomaszczyk B. (2012), The National Corpus of Polish [in Polish: Narodowy korpus języka polskiego], Wydawnictwo Naukowe PWN, Warszawa.
  45. Raškinis A., Raškinis G., Kazlauskienė A. (2003), SAMPA (speech assessment methods phonetic alpha- bet) for encoding transcriptions of Lithuanian speech corpora, Information Technology and Control, 29, 4, 50-56, otwiera się w nowej karcie
  46. Recasens D. (2012), A cross-language acoustic study of initial and final allophones of /l/, Speech Com- munication, 54, 3, 368-383, doi: 10.1016/j.specom. 2011.10.001. otwiera się w nowej karcie
  47. Rudzionis V., Maskeliunas R., Rudzionis A., Ratkevicius K. (2009), On the adaptation of fo- reign language speech recognition engines for Lithua- nian speech recognition, [in:] Abramowicz W., Flej- ter D. [Eds], Business Information Systems Workshops. BIS 2009. Lecture Notes in Business Information Pro- cessing, Vol. 37, pp. 113-118, Springer, Berlin, Heidel- berg, doi: 10.1007/978-3-642-03424-4_13. otwiera się w nowej karcie
  48. SAMPA En, english.htm.
  49. SAMPA Pl, polish.htm.
  50. Sathe-Pathak B.V., Panat A.R. (2012), Extraction of pitch and formants and its analysis to identify 3 dif- ferent emotional states of a person, International Jour- nal of Computer Science Issues, Vol. 9, Issue 4, No 1, otwiera się w nowej karcie
  51. Spangler T., Vinodchandran N.V., Samal A., Green J.R. (2017), Fractal features for automatic de- tection of dysarthria, 2017 IEEE EMBS International Conference on Biomedical & Health Informatics (BHI), pp. 437-440, doi: 10.1109/BHI.2017.7897299. otwiera się w nowej karcie
  52. Upadhya S.S., Cheeran A.N., Nirmal J.H. (2018), Thomson Multitaper MFCC and PLP voice features for early detection of Parkinson disease, Biomedi- cal Signal Processing and Control, 46, 293-301, doi: 10.1016/j.bspc.2018.07.019. otwiera się w nowej karcie
  53. Wei Y., Zeng Y., Li C. (2018), Single-Channel Speech Enhancement Based on Sub-Band Spectral En- tropy, J. Audio Eng. Soc., 66, 3, 100-113, doi: 10.17743/jaes.2018.000. otwiera się w nowej karcie
  54. Ziółko B., Gałka J., Ziółko M. (2009), Pol- ish phoneme statistics obtained on large set of writ- ten texts, Computer Science, 10, 3, 97-106, doi: 10.7494/csci.2009.10.3.97. otwiera się w nowej karcie
  55. Ziółko B., Żelasko P., Skurzok D. (2014), Statistics of diphones and triphones presence on the word boundaries in the Polish language. Applica- tions to ASR, XXII Annual Pacific Voice Confer- ence (PVC), Krakow, 2014, pp. 1-6, doi: 10.1109/ PVC.2014.6845418.
Politechnika Gdańska

wyświetlono 75 razy

Publikacje, które mogą cię zainteresować

Meta Tagi