Recognition of Emotions in Speech Using Convolutional Neural Networks on Different Datasets

Marta Zielonka; Artur Piastowski; Andrzej Czyżewski; Paweł Nadachowski; Maksymilian Operlejn; Kamil Kaczor

doi:10.3390/electronics11223831

Recognition of Emotions in Speech Using Convolutional Neural Networks on Different Datasets

Abstrakt

Artificial Neural Network (ANN) models, specifically Convolutional Neural Networks (CNN), were applied to extract emotions based on spectrograms and mel-spectrograms. This study uses spectrograms and mel-spectrograms to investigate which feature extraction method better represents emotions and how big the differences in efficiency are in this context. The conducted studies demonstrated that mel-spectrograms are a better-suited data type for training CNN-based speech emotion recognition (SER). The research experiments employed five popular datasets: Crowdsourced Emotional Multimodal Actors Dataset (CREMA-D), Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS), Surrey Audio-Visual Expressed Emotion (SAVEE), Toronto Emotional Speech Set (TESS), and The Interactive Emotional Dyadic Motion Capture (IEMOCAP). Six different classes of emotions were used: happiness, anger, sadness, fear, disgust, and neutral. However, some experiments were prepared to recognize just four emotions due to the characteristics of the IEMOCAP dataset. A comparison of classification efficiency on different datasets and an attempt to develop a universal model trained using all datasets were also performed. This approach brought an accuracy of 55.89% when recognizing four emotions. The most accurate model for six emotion recognition was trained and achieved 57.42% accuracy on a combination of four datasets (CREMA-D, RAVDESS, SAVEE, TESS). What is more, another study was developed that demonstrated that improper data division for training and test sets significantly influences the test accuracy of CNNs. Therefore, the problem of inappropriate data division between the training and test sets, which affected the results of studies known from the literature, was addressed extensively. The performed experiments employed the popular ResNet18 architecture to demonstrate the reliability of the research results and to show that these problems are not unique to the custom CNN architecture proposed in experiments. Subsequently, the label correctness of the CREMA-D dataset was studied through the employment of a prepared questionnaire.

Cytowania

2 0

CrossRef
0

Web of Science
2 0

Scopus

Autorzy (6)

Cytuj jako

Pełna treść

pobierz publikację

pobrano 82 razy

Wersja publikacji: Accepted albo Published Version
DOI:: Cyfrowy identyfikator dokumentu elektronicznego (otwiera się w nowej karcie) 10.3390/electronics11223831
Licencja: otwiera się w nowej karcie

pełna treść artykułu zobacz w serwisie zewnętrznym otwiera się w nowej karcie

Słowa kluczowe

Informacje szczegółowe

Kategoria:

Publikacja w czasopiśmie

Typ:

artykuły w czasopismach

Opublikowano w:

Electronics nr 11,
ISSN: 2079-9292

Język:

angielski

Rok wydania:

2022

Opis bibliograficzny:

Zielonka M., Piastowski A., Czyżewski A., Nadachowski P., Operlejn M., Kaczor K.: Recognition of Emotions in Speech Using Convolutional Neural Networks on Different Datasets// Electronics -,iss. 11, 3831 (2022), s.1-12

DOI:

10.3390/electronics11223831

Źródła finansowania:

Działalność statutowa/subwencja

Weryfikacja:

Politechnika Gdańska

wyświetlono 195 razy

Publikacje, które mogą cię zainteresować

Speech Analytics Based on Machine Learning

G. Korvel,
A. Kurowski,
B. Kostek
+ 1 autorów

2019

Ranking Speech Features for Their Usage in Singing Emotion Classification

2020

A Study of Cross-Linguistic Speech Emotion Recognition Based on 2D Feature Spaces

G. Tamulevicius,
G. Korvel,
A. B. Yayak
+ 3 autorów

2020

Analysis of 2D Feature Spaces for Deep Learning-based Speech Recognition

G. Korvel,
P. Treigys,
G. Tamulevicus
+ 2 autorów

2018

Meta Tagi

Recognition of Emotions in Speech Using Convolutional Neural Networks on Different Datasets

Abstrakt

Cytowania

Autorzy (6)

Marta Zielonka inż.

Artur Piastowski

Andrzej Czyżewski prof. dr hab. inż.

Paweł Nadachowski

Maksymilian Operlejn

Kamil Kaczor

Cytuj jako

Pełna treść

Słowa kluczowe

Informacje szczegółowe

Publikacje, które mogą cię zainteresować

Speech Analytics Based on Machine Learning

Ranking Speech Features for Their Usage in Singing Emotion Classification

A Study of Cross-Linguistic Speech Emotion Recognition Based on 2D Feature Spaces

Analysis of 2D Feature Spaces for Deep Learning-based Speech Recognition

Wyszukiwarka

Recognition of Emotions in Speech Using Convolutional Neural Networks on Different Datasets

Abstrakt

Cytowania

Autorzy (6)

Marta Zielonka inż.

Andrzej Czyżewski prof. dr hab. inż.

Cytuj jako

Pełna treść

Słowa kluczowe

Informacje szczegółowe

Publikacje, które mogą cię zainteresować