An audio-visual corpus for multimodal automatic speech recognition

Andrzej Czyżewski; Bożena Kostek; Piotr Bratoszewski; Józef Kotus; Marcin Szykulski

doi:10.1007/s10844-016-0438-z

An audio-visual corpus for multimodal automatic speech recognition

Abstract

review of available audio-visual speech corpora and a description of a new multimodal corpus of English speech recordings is provided. The new corpus containing 31 hours of recordings was created specifically to assist audio-visual speech recognition systems (AVSR) development. The database related to the corpus includes high-resolution, high-framerate stereoscopic video streams from RGB cameras, depth imaging stream utilizing Time-of-Flight camera accompanied by audio recorded using both: a microphone array and a microphone built in a mobile computer. For the purpose of applications related to AVSR systems training, every utterance was manually labeled, resulting in label files added to the corpus repository. Owing to the inclusion of recordings made in noisy conditions the elaborated corpus can also be used for testing robustness of speech recognition systems in the presence of acoustic background noise. The process of building the corpus, including the recording, labeling and post-processing phases is described in the paper. Results achieved with the developed audio-visual automatic speech recognition (ASR) engine trained and tested with the material contained in the corpus are presented and discussed together with comparative test results employing a state-of-the-art/commercial ASR engine. In order to demonstrate the practical use of the corpus it is made available for the public use.

Citations

6 4

CrossRef
0

Web of Science
7 9

Scopus

Authors (5)

Cite as

Full text

download paper

downloaded 251 times

Publication version: Accepted or Published Version
License: open in new tab

Keywords

MODALITY CORPUS · ENGLISH LANGUAGE CORPUS · SPEECH RECOGNITION · AVSR

Details

Category:: Articles
Type:: artykuł w czasopiśmie wyróżnionym w JCR
Published in:: JOURNAL OF INTELLIGENT INFORMATION SYSTEMS no. 49, pages 167 - 192,
ISSN: 0925-9902
Language:: English
Publication year:: 2017
Bibliographic description:: Czyżewski A., Kostek B., Bratoszewski P., Kotus J., Szykulski M.: An audio-visual corpus for multimodal automatic speech recognition// JOURNAL OF INTELLIGENT INFORMATION SYSTEMS. -Vol. 49, nr. 2 (2017), s.167-192
DOI:: Digital Object Identifier (open in new tab) 10.1007/s10844-016-0438-z
Verified by:: Gdańsk University of Technology

Referenced datasets

dataset MODALITY corpus - SPEAKER 01 - COMMANDS C5
dataset MODALITY corpus - SPEAKER 01 - COMMANDS C6
dataset MODALITY corpus - SPEAKER 01 - SEQUENCE S4
dataset MODALITY corpus - SPEAKER 01 - SEQUENCE S6
dataset MODALITY corpus - SPEAKER 01 - COMMANDS C4
dataset MODALITY corpus - SPEAKER 01 - SEQUENCE S2
dataset MODALITY corpus - SPEAKER 01 - SEQUENCE S3
dataset MODALITY corpus - SPEAKER 01 - COMMANDS C3
dataset MODALITY corpus - SPEAKER 01 - COMMANDS C2
dataset MODALITY corpus - SPEAKER 01 - SEQUENCE S5

zobacz wszystkie (159)

seen 430 times

Recommended for you

Language material for English audiovisual speech recognition system developmen . Materiał językowy do wykorzystania w systemie audiowizualnego rozpoznawania mowy angielskiej

A. Czyżewski,
B. Kostek,
T. Ciszewski
+ 1 authors

2013

An audio-visual corpus for multimodal automatic speech recognition

Abstract

Citations

Authors (5)

Andrzej Czyżewski prof. dr hab. inż.

Bożena Kostek prof. dr hab. inż.

Piotr Bratoszewski mgr inż.

Józef Kotus dr hab. inż.

Marcin Szykulski mgr inż.

Cite as

Full text

Keywords

Details

Referenced datasets

Recommended for you

Language material for English audiovisual speech recognition system developmen . Materiał językowy do wykorzystania w systemie audiowizualnego rozpoznawania mowy angielskiej

Multimodal English corpus for automatic speech recognition

Material for Automatic Phonetic Transcription of Speech Recorded in Various Conditions

A comparative study of English viseme recognition methods and algorithm

Search

An audio-visual corpus for multimodal automatic speech recognition

Abstract

Citations

Authors (5)

Andrzej Czyżewski prof. dr hab. inż.

Bożena Kostek prof. dr hab. inż.

Piotr Bratoszewski mgr inż.

Józef Kotus dr hab. inż.

Marcin Szykulski mgr inż.

Cite as

Full text

Keywords

Details

Referenced datasets

Recommended for you

Language material for English audiovisual speech recognition system developmen . Materiał językowy do wykorzystania w systemie audiowizualnego rozpoznawania mowy angielskiej

Multimodal English corpus for automatic speech recognition

Material for Automatic Phonetic Transcription of Speech Recorded in Various Conditions

A comparative study of English viseme recognition methods and algorithm