Comparison of Lithuanian and Polish Consonant Phonemes Based on Acoustic Analysis – Preliminary Results
Abstract
The goal of this research is to find a set of acoustic parameters that are related to differences between Polish and Lithuanian language consonants. In order to identify these differences, an acoustic analysis is performed, and the phoneme sounds are described as the vectors of acoustic parameters. Parameters known from the speech domain as well as those from the music information retrieval area are employed. These parameters are time- and frequency-domain descriptors. English language as an auxiliary language is used in the experiments. In the first part of the experiments, an analysis of Lithuanian and Polish language samples is carried out, features are extracted, and the most discriminating ones are determined. In the second part of the experiments, automatic classification of Lithuanian/English, Polish/English, and Lithuanian/Polish phonemes is performed.
Citations
-
0
CrossRef
-
0
Web of Science
-
3
Scopus
Authors (3)
Cite as
Full text
- Publication version
- Accepted or Published Version
- License
- open in new tab
Keywords
Details
- Category:
- Articles
- Type:
- artykuły w czasopismach
- Published in:
-
Archives of Acoustics
no. 44,
pages 693 - 707,
ISSN: 0137-5075 - Language:
- English
- Publication year:
- 2019
- Bibliographic description:
- Korvel G., Kurasova O., Kostek B.: Comparison of Lithuanian and Polish Consonant Phonemes Based on Acoustic Analysis – Preliminary Results// Archives of Acoustics -Vol. 44,iss. 4 (2019), s.693-707
- DOI:
- Digital Object Identifier (open in new tab) 10.24425/aoa.2019.129725
- Bibliography: test
-
- Badshah A.M. et al. (2019), Deep features-based speech emotion recognition for smart affective services, Multimedia Tools and Applications, 78, 5, 5571-5589, doi: 10.1007/s11042-017-5292-7. open in new tab
- Bourlard H. (2018), Evolution of Neural Network Architectures for speech recognition, Interspeech 2018, 19th Annual Conference of the International Speech Communication Association, Hyderabad, India, 2-6 September 2018, p. 1767.
- Chia Ai, Hariharan M., Yaacob S., Sin L. Chee (2012), Classification of speech dysfluencies with MFCC and LPCC features, Expert Systems with Ap- Archives of Acoustics -Volume 44, Number 4, 2019
- plications, 39, 2, 2157-2165, doi: 10.1016/j.eswa.2011. 07.065. open in new tab
- Czyżewski A., Piotrowska M., Kostek B. (2017), Analysis of allophones based on audio signal recordings and parameterization, Journal of the Acoustical Society of America, 141, 5, 3521-3521, doi: 10.1121/1.4987415. open in new tab
- Decker D.M. (1999), Handbook of the international phonetic association: a guide to the use of the interna- tional phonetic alphabet, Cambridge University Press.
- Demenko G., Wypych M., Baranowska E. (2003), Implementation of grapheme-to-phoneme rules and ex- tended SAMPA alphabet in Polish text-to-speech syn- thesis, Speech and Language Technology, 7, 17, 79-97.
- Deng L., Seltzer M.L., Yu D., Acero A., Mo- hamed A.-R., Hinton G.E. (2010), Binary coding of speech spectrograms using a deep auto-encoder, Pro- ceedings of the 11th Annual Conference of the Inter- national Speech Communication Association, INTER- SPEECH 2010, pp. 1692-1695.
- Duda R.O., Hart P.E., Stork D.G. (2000), Pat- tern classification, 2nd ed., New York: Wiley. open in new tab
- Eringis D., Tamulevicius G. (2015), Modified filterbank analysis features for speech recognition, Baltic Journal of Modern Computing, 3, 1, 29-42, https://www.bjmc.lu.lv/fileadmin/user_upload/lu_ portal/projekti/bjmc/Contents/3_1_3_Eringis.pdf. open in new tab
- Gales M.J.F., Knill K.M., Ragni A. (2015), Unicode-based graphemic systems for limited resource languages, IEEE International Conference on Acous- tics, Speech and Signal Processing (ICASSP), 2015, pp. 5186-5190, doi: 10.1109/ICASSP.2015.7178960. open in new tab
- Gibbon D., Moore R., Winski R. (1997), Hand- book of standards and resources for spoken language systems, Berlin; New York: Mouton de Gruyter.
- Girdenis A.S. (2003), Theoretical bases of Lithuanian phonology [in Lithuanian: Teoriniai lietuvių fonologijos pagrindai], Vilnius: Mokslo ir enciklopediju˛leidybos in- stitutas.
- Greibus M., Ringelienė Ž., Telksnys L. (2017), The phoneme set influence for Lithuanian speech com- mands recognition accuracy, Open Conference of Elec- trical, Electronic and Information Sciences (eStream), 27-27 April 2017, Vilnius, Lithuania, pp. 82-85, doi: 10.1109/eStream.2017.7950321. open in new tab
- Gut U. (2014), Introduction to English phonetics and phonology volume, Bern: Peter Lang. open in new tab
- Gussmann E. (2007), The Phonology of Polish, New York: Oxford University Press.
- Howard D.M., Murphy D.T. (2007), Voice science, acoustics, and recording, San Diego, CA: Plural Pub- lishing.
- Garofolo J.S., Lamel L.F., Fisher W.M., Fiscus J.G., Pallett D.S., Dahlgren N.L. (1993), TIMIT acoustic-phonetic continuous speech corpus, LDC93S1. Web Download. Philadelphia: Linguistic Data Consor- tium. open in new tab
- Igras M., Ziółko B., Jadczyk T. (2013), Au- diovisual database of Polish speech recordings, Stu- dia Informatica, 33, 2B, 163-172, doi: 10.21936/ si2012_v33.n2B.182.
- Izydorczyk J., Kłosowski P. (2001), Base acous- tic properties of Polish speech, International Confer- ence Programable Devices and Systems PDS2001 IFAC Workshop, Gliwice, November 22-23, pp. 61-66.
- Jassem W. (2003), Polish, Journal of the Inter- national Phonetic Association, 33, 1, 103-107, doi: 10.1017/S0025100303001191. open in new tab
- Kasparaitis P. (2005), Diphone databases for Lithua- nian text-to-speech synthesis, Informatica, 2, 16, 193- 202. open in new tab
- Kasparaitis P. (2008), Lithuanian speech recognition using the English recognizer, Informatica, 19, 4, 505- 516. open in new tab
- Kim H.-G., Moreau N., Sikora T. (2005), MPEG-7 audio and beyond: audio content indexing and retrieval, New York: Wiley & Sons. open in new tab
- Kłosowski P., Dustor A., Izydorczyk J., Ko- tas J., Slimok J. (2014), Speech recognition based on open source speech processing software, [In:] Computer Networks, CN. Vol. 431 of Communications in Com- puter and Information Science, ed. by A. Kwiecień, P. Gaj, and P. Stera, 21st International Science Confer- ence on Computer Networks (CN), Poland, June 23-27 (Springer-Verlag Berlin, 2014), pp. 308-317. open in new tab
- Kłosowski P. (2017), Statistical analysis of or- thographic and phonemic language corpus for word- based and phoneme-based Polish language modelling, EURASIP Journal on Audio, Speech, and Music Pro- cessing, 2017, 5, doi: 10.1186/s13636-017-0102-8. open in new tab
- Korvel G., Kostek B. (2017a), Examining feature vector for phoneme recognition, 2017 IEEE Interna- tional Symposium on Signal Processing and Informa- tion Technology (ISSPIT), Bilbao, 2017, pp. 394-398, doi: 10.1109/ISSPIT.2017.8388675. open in new tab
- Korvel G., Kostek B. (2017b), Voiceless Stop Con- sonant Modelling and Synthesis Framework Based on MISO Dynamic System, Archives of Acoustics, 42, 3, 375-383, doi: 10.1515/aoa-2017-0039. open in new tab
- Korvel G., Kurowski A., Kostek B., Czyzew- ski A. (2019), Speech analytics based on machine learn- ing, [in:] Tsihrintzis G., Sotiropoulos D., Jain L. [Eds], Machine Learning Paradigms. Intelligent Systems Ref- erence Library, Vol. 149, pp. 129-157, Springer: Cham, doi: 10.1007/978-3-319-94030-4. open in new tab
- Korvel G., Treigys P., Tamulevičius G., Ber- natavičienė J., Kostek B. (2018), Analysis of 2d feature spaces for deep learning-based speech recogni- tion, Journal of the Audio Engineering Society, 66, 12, 1072-1081, doi: 10.17743/jaes.2018.0066. open in new tab
- Kostek B. et al. (2011), Report of the ISMIS 2011 Contest: Music Information Retrieval, [in:] open in new tab
- Kryszkiewicz M., Rybinski H., Skowron A., Raś Z.W. [Eds], Foundations of Intelligent Systems. ISMIS 2011. open in new tab
- G. Korvel et al. -Comparison of Lithuanian and Polish Consonant Phonemes Based on Acoustic Analysis. . . 707
- Lecture Notes in Computer Science, Vol. 6804, pp. 715- 724, Springer: Berlin, Heidelberg, doi: 10.1007/978-3- 642-21916-0_75. open in new tab
- Kostek B., Piotrowska M., Czyżewski A. (2017), Comparative study of self-organizing maps vs. subjec- tive evaluation of quality of allophone pronunciation for nonnative English speakers, 143rd Audio Engineer- ing Society Convention, preprint 9847, New York. open in new tab
- Kozierski P., Sadalla T., Drgas S., Dąbrow- ski A. (2016), Allophones in automatic whispery speech recognition, 2016 21st International Confer- ence on Methods and Models in Automation and Robotics (MMAR), Miedzyzdroje, 2016, pp. 811-815, doi: 10.1109/MMAR.2016.7575241. open in new tab
- Labarre T. (2011), LING550: CLMS project on Po- lish, https://www.academia.edu/5332895/LING550_ CLMS_Project_on_Polish.
- Laurinciukaite S., Telksnys L., Kasparaitis P., Kliukiene R., Paukstyte V. (2018), Lithuanian Speech Corpus Liepa for development of human- computer interfaces working in voice recognition and synthesis mode, Informatica, 29, 3, 487-498, doi: 10.15388/informatica.2018.177. open in new tab
- Lileikytė R., Gorin A., Lamel L., Gauvain J., Fraga-Silva T. (2016), Lithuanian broadcast speech transcription using semi-supervised acoustic model training, Procedia Computer Science, 81, 107-113, doi: 10.1016/j.procs.2016.04.037. open in new tab
- Mitterer H., Reinisch E., Mcqueen J.M. (2018), Allophones, not phonemes in spoken-word recognition, Journal of Memory and Language, 98, 77-92, doi: 10.1016/j.jml.2017.09.005. open in new tab
- Noroozi F., Kamińska D., Sapinski T., An- barjafari G. (2017), Supervised Vocal-Based Emo- tion Recognition Using Multiclass Support Vector Ma- chine, Random Forests, and AdaBoost, Journal of the Audio Engineering Society, 65, 7/8, 562-572, doi: 10.17743/jaes.2017.0022. open in new tab
- Oliver D., Szklanny K. (2006), Creation and anal- ysis of a Polish speech database for use in unit se- lection synthesis, http://syntezamowy.pjwstk.edu.pl/ publikacje/lrec2006.pdf (accessed Jan. 2019).
- Padmanabhan J., Premkumar M.J.J. (2015), Ma- chine Learning in Automatic Speech Recognition: A Survey. IETE Technical Review, 32, 1-12, doi: 10.1080/02564602.2015.1010611. open in new tab
- Przepiórkowski A., Bańko M., Górski R.L., Le- wandowska-Tomaszczyk B. (2012), The National Corpus of Polish [in Polish: Narodowy korpus języka polskiego], Wydawnictwo Naukowe PWN, Warszawa.
- Raškinis A., Raškinis G., Kazlauskienė A. (2003), SAMPA (speech assessment methods phonetic alpha- bet) for encoding transcriptions of Lithuanian speech corpora, Information Technology and Control, 29, 4, 50-56, https://hdl.handle.net/20.500.12259/55530. open in new tab
- Recasens D. (2012), A cross-language acoustic study of initial and final allophones of /l/, Speech Com- munication, 54, 3, 368-383, doi: 10.1016/j.specom. 2011.10.001. open in new tab
- Rudzionis V., Maskeliunas R., Rudzionis A., Ratkevicius K. (2009), On the adaptation of fo- reign language speech recognition engines for Lithua- nian speech recognition, [in:] Abramowicz W., Flej- ter D. [Eds], Business Information Systems Workshops. BIS 2009. Lecture Notes in Business Information Pro- cessing, Vol. 37, pp. 113-118, Springer, Berlin, Heidel- berg, doi: 10.1007/978-3-642-03424-4_13. open in new tab
- SAMPA En, https://www.phon.ucl.ac.uk/home/sampa/ english.htm.
- SAMPA Pl, https://www.phon.ucl.ac.uk/home/sampa/ polish.htm.
- Sathe-Pathak B.V., Panat A.R. (2012), Extraction of pitch and formants and its analysis to identify 3 dif- ferent emotional states of a person, International Jour- nal of Computer Science Issues, Vol. 9, Issue 4, No 1, http://www.ijcsi.org/papers/IJCSI-9-4-1-296-299.pdf. open in new tab
- Spangler T., Vinodchandran N.V., Samal A., Green J.R. (2017), Fractal features for automatic de- tection of dysarthria, 2017 IEEE EMBS International Conference on Biomedical & Health Informatics (BHI), pp. 437-440, doi: 10.1109/BHI.2017.7897299. open in new tab
- Upadhya S.S., Cheeran A.N., Nirmal J.H. (2018), Thomson Multitaper MFCC and PLP voice features for early detection of Parkinson disease, Biomedi- cal Signal Processing and Control, 46, 293-301, doi: 10.1016/j.bspc.2018.07.019. open in new tab
- Wei Y., Zeng Y., Li C. (2018), Single-Channel Speech Enhancement Based on Sub-Band Spectral En- tropy, J. Audio Eng. Soc., 66, 3, 100-113, doi: 10.17743/jaes.2018.000. open in new tab
- Ziółko B., Gałka J., Ziółko M. (2009), Pol- ish phoneme statistics obtained on large set of writ- ten texts, Computer Science, 10, 3, 97-106, doi: 10.7494/csci.2009.10.3.97. open in new tab
- Ziółko B., Żelasko P., Skurzok D. (2014), Statistics of diphones and triphones presence on the word boundaries in the Polish language. Applica- tions to ASR, XXII Annual Pacific Voice Confer- ence (PVC), Krakow, 2014, pp. 1-6, doi: 10.1109/ PVC.2014.6845418.
- Verified by:
- Gdańsk University of Technology
seen 93 times
Recommended for you
Introduction to the special issue on machine learning in acoustics
- Z. Michalopoulou,
- P. Gerstoft,
- B. Kostek
- + 1 authors