Detecting Lombard Speech Using Deep Learning Approach

Krzysztof Kąkol; Grazina Korvel; Gintautas Tamulevicius; Bożena Kostek

doi:10.3390/s23010315

Detecting Lombard Speech Using Deep Learning Approach

Abstrakt

Robust Lombard speech-in-noise detecting is challenging. This study proposes a strategy to detect Lombard speech using a machine learning approach for applications such as public address systems that work in near real time. The paper starts with the background concerning the Lombard effect. Then, assumptions of the work performed for Lombard speech detection are outlined. The framework proposed combines convolutional neural networks (CNNs) and various two-dimensional (2D) speech signal representations. To reduce the computational cost and not resign from the 2D representation-based approach, a strategy for threshold-based averaging of the Lombard effect detection results is introduced. The pseudocode of the averaging process is also included. A series of experiments are performed to determine the most effective network structure and the 2D speech signal representation. Investigations are carried out on German and Polish recordings containing Lombard speech. All 2D signal speech representations are tested with and without augmentation. Augmentation means using the alpha channel to store additional data: gender of the speaker, F0 frequency, and first two MFCCs. The experimental results show that Lombard and neutral speech recordings can clearly be discerned, which is done with high detection accuracy. It is also demonstrated that the proposed speech detection process is capable of working in near real-time. These are the key contributions of this work.

Cytowania

1

CrossRef
0

Web of Science
2

Scopus

Autorzy (4)

Cytuj jako

Pełna treść

pobierz publikację

pobrano 28 razy

Wersja publikacji: Accepted albo Published Version
DOI:: Cyfrowy identyfikator dokumentu elektronicznego (otwiera się w nowej karcie) 10.3390/s23010315
Licencja: otwiera się w nowej karcie

pełna treść artykułu zobacz w serwisie zewnętrznym otwiera się w nowej karcie

Słowa kluczowe

Informacje szczegółowe

Kategoria:

Publikacja w czasopiśmie

Typ:

artykuły w czasopismach

Opublikowano w:

SENSORS nr 23,
ISSN: 1424-8220

Język:

angielski

Rok wydania:

2023

Opis bibliograficzny:

Kąkol K., Korvel G., Tamulevicius G., Kostek B.: Detecting Lombard Speech Using Deep Learning Approach// SENSORS -,iss. 23, 315 (2022), s.1-20

DOI:

10.3390/s23010315

Źródła finansowania:

Publikacja bezkosztowa

Weryfikacja:

Politechnika Gdańska

wyświetlono 175 razy

Publikacje, które mogą cię zainteresować

Improvement of speech intelligibility in the presence of noise interference using the Lombard effect and an automatic noise interference profiling based on deep learning

K. Kąkol

2023

A Study of Cross-Linguistic Speech Emotion Recognition Based on 2D Feature Spaces

G. Tamulevicius,
G. Korvel,
A. B. Yayak
+ 3 autorów

2020

Investigating Feature Spaces for Isolated Word Recognition

G. Korvel,
G. Tamulevicus,
P. Treigys
+ 2 autorów

2018

An Attempt to Create Speech Synthesis Model That Retains Lombard Effect Characteristics

G. Korvel,
O. Kurasova,
B. Kostek

2019

Meta Tagi

Detecting Lombard Speech Using Deep Learning Approach

Abstrakt

Cytowania

Autorzy (4)

Krzysztof Kąkol mgr inż.

Grazina Korvel

Gintautas Tamulevicius dr

Bożena Kostek prof. dr hab. inż.

Cytuj jako

Pełna treść

Słowa kluczowe

Informacje szczegółowe

Publikacje, które mogą cię zainteresować

Improvement of speech intelligibility in the presence of noise interference using the Lombard effect and an automatic noise interference profiling based on deep learning

A Study of Cross-Linguistic Speech Emotion Recognition Based on 2D Feature Spaces

Investigating Feature Spaces for Isolated Word Recognition

An Attempt to Create Speech Synthesis Model That Retains Lombard Effect Characteristics

Wyszukiwarka

Detecting Lombard Speech Using Deep Learning Approach

Abstrakt

Cytowania

Autorzy (4)

Krzysztof Kąkol mgr inż.

Grazina Korvel

Gintautas Tamulevicius dr

Bożena Kostek prof. dr hab. inż.

Cytuj jako

Pełna treść

Słowa kluczowe

Informacje szczegółowe

Publikacje, które mogą cię zainteresować

Improvement of speech intelligibility in the presence of noise interference using the Lombard effect and an automatic noise interference profiling based on deep learning

A Study of Cross-Linguistic Speech Emotion Recognition Based on 2D Feature Spaces

Investigating Feature Spaces for Isolated Word Recognition

An Attempt to Create Speech Synthesis Model That Retains Lombard Effect Characteristics