An audio-visual corpus for multimodal automatic speech recognition

Andrzej Czyżewski; Bożena Kostek; Piotr Bratoszewski; Józef Kotus; Marcin Szykulski

doi:10.1007/s10844-016-0438-z

An audio-visual corpus for multimodal automatic speech recognition

Abstrakt

review of available audio-visual speech corpora and a description of a new multimodal corpus of English speech recordings is provided. The new corpus containing 31 hours of recordings was created specifically to assist audio-visual speech recognition systems (AVSR) development. The database related to the corpus includes high-resolution, high-framerate stereoscopic video streams from RGB cameras, depth imaging stream utilizing Time-of-Flight camera accompanied by audio recorded using both: a microphone array and a microphone built in a mobile computer. For the purpose of applications related to AVSR systems training, every utterance was manually labeled, resulting in label files added to the corpus repository. Owing to the inclusion of recordings made in noisy conditions the elaborated corpus can also be used for testing robustness of speech recognition systems in the presence of acoustic background noise. The process of building the corpus, including the recording, labeling and post-processing phases is described in the paper. Results achieved with the developed audio-visual automatic speech recognition (ASR) engine trained and tested with the material contained in the corpus are presented and discussed together with comparative test results employing a state-of-the-art/commercial ASR engine. In order to demonstrate the practical use of the corpus it is made available for the public use.

Cytowania

5 7

CrossRef
0

Web of Science
7 5

Scopus

Autorzy (5)

Cytuj jako

Pełna treść

pobierz publikację

pobrano 234 razy

Wersja publikacji: Accepted albo Published Version
Licencja: otwiera się w nowej karcie

Słowa kluczowe

MODALITY CORPUS · ENGLISH LANGUAGE CORPUS · SPEECH RECOGNITION · AVSR

Informacje szczegółowe

Kategoria:: Publikacja w czasopiśmie
Typ:: artykuł w czasopiśmie wyróżnionym w JCR
Opublikowano w:: JOURNAL OF INTELLIGENT INFORMATION SYSTEMS nr 49, strony 167 - 192,
ISSN: 0925-9902
Język:: angielski
Rok wydania:: 2017
Opis bibliograficzny:: Czyżewski A., Kostek B., Bratoszewski P., Kotus J., Szykulski M.: An audio-visual corpus for multimodal automatic speech recognition// JOURNAL OF INTELLIGENT INFORMATION SYSTEMS. -Vol. 49, nr. 2 (2017), s.167-192
DOI:: Cyfrowy identyfikator dokumentu elektronicznego (otwiera się w nowej karcie) 10.1007/s10844-016-0438-z
Weryfikacja:: Politechnika Gdańska

Powiązane datasety

dane badawcze MODALITY corpus - SPEAKER 35 - COMMANDS C1
dane badawcze MODALITY corpus - SPEAKER 21 - SEQUENCE S6
dane badawcze MODALITY corpus - SPEAKER 21 - COMMANDS C5
dane badawcze MODALITY corpus - SPEAKER 21 - SEQUENCE S4
dane badawcze MODALITY corpus - SPEAKER 10 - SEQUENCE S1
dane badawcze MODALITY corpus - SPEAKER 01 - SEQUENCE S2
dane badawcze MODALITY corpus - SPEAKER 39 - COMMANDS C1
dane badawcze MODALITY corpus - SPEAKER 01 - SEQUENCE S3
dane badawcze MODALITY corpus - SPEAKER 01 - COMMANDS C3
dane badawcze MODALITY corpus - SPEAKER 21 - SEQUENCE S2

zobacz wszystkie (159)

wyświetlono 410 razy

Publikacje, które mogą cię zainteresować

Language material for English audiovisual speech recognition system developmen . Materiał językowy do wykorzystania w systemie audiowizualnego rozpoznawania mowy angielskiej

A. Czyżewski,
B. Kostek,
T. Ciszewski
+ 1 autorów

2013

Multimodal English corpus for automatic speech recognition

B. Kunka,
A. Kupryjanow,
P. Dalka
+ 5 autorów

2013

Material for Automatic Phonetic Transcription of Speech Recorded in Various Conditions

2016

A comparative study of English viseme recognition methods and algorithm

2018

Meta Tagi

An audio-visual corpus for multimodal automatic speech recognition

Abstrakt

Cytowania

Autorzy (5)

Andrzej Czyżewski prof. dr hab. inż.

Bożena Kostek prof. dr hab. inż.

Piotr Bratoszewski mgr inż.

Józef Kotus dr hab. inż.

Marcin Szykulski mgr inż.

Cytuj jako

Pełna treść

Słowa kluczowe

Informacje szczegółowe

Powiązane datasety

Publikacje, które mogą cię zainteresować

Language material for English audiovisual speech recognition system developmen . Materiał językowy do wykorzystania w systemie audiowizualnego rozpoznawania mowy angielskiej

Multimodal English corpus for automatic speech recognition

Material for Automatic Phonetic Transcription of Speech Recorded in Various Conditions

A comparative study of English viseme recognition methods and algorithm

Wyszukiwarka

An audio-visual corpus for multimodal automatic speech recognition

Abstrakt

Cytowania

Autorzy (5)

Andrzej Czyżewski prof. dr hab. inż.

Bożena Kostek prof. dr hab. inż.

Piotr Bratoszewski mgr inż.

Józef Kotus dr hab. inż.

Marcin Szykulski mgr inż.

Cytuj jako

Pełna treść

Słowa kluczowe

Informacje szczegółowe

Powiązane datasety

Publikacje, które mogą cię zainteresować

Language material for English audiovisual speech recognition system developmen . Materiał językowy do wykorzystania w systemie audiowizualnego rozpoznawania mowy angielskiej

Multimodal English corpus for automatic speech recognition

Material for Automatic Phonetic Transcription of Speech Recorded in Various Conditions

A comparative study of English viseme recognition methods and algorithm