Abstract
review of available audio-visual speech corpora and a description of a new multimodal corpus of English speech recordings is provided. The new corpus containing 31 hours of recordings was created specifically to assist audio-visual speech recognition systems (AVSR) development. The database related to the corpus includes high-resolution, high-framerate stereoscopic video streams from RGB cameras, depth imaging stream utilizing Time-of-Flight camera accompanied by audio recorded using both: a microphone array and a microphone built in a mobile computer. For the purpose of applications related to AVSR systems training, every utterance was manually labeled, resulting in label files added to the corpus repository. Owing to the inclusion of recordings made in noisy conditions the elaborated corpus can also be used for testing robustness of speech recognition systems in the presence of acoustic background noise. The process of building the corpus, including the recording, labeling and post-processing phases is described in the paper. Results achieved with the developed audio-visual automatic speech recognition (ASR) engine trained and tested with the material contained in the corpus are presented and discussed together with comparative test results employing a state-of-the-art/commercial ASR engine. In order to demonstrate the practical use of the corpus it is made available for the public use.
Citations
-
5 7
CrossRef
-
0
Web of Science
-
7 3
Scopus
Authors (5)
Cite as
Full text
- Publication version
- Accepted or Published Version
- License
- open in new tab
Keywords
Details
- Category:
- Articles
- Type:
- artykuł w czasopiśmie wyróżnionym w JCR
- Published in:
-
JOURNAL OF INTELLIGENT INFORMATION SYSTEMS
no. 49,
pages 167 - 192,
ISSN: 0925-9902 - Language:
- English
- Publication year:
- 2017
- Bibliographic description:
- Czyżewski A., Kostek B., Bratoszewski P., Kotus J., Szykulski M.: An audio-visual corpus for multimodal automatic speech recognition// JOURNAL OF INTELLIGENT INFORMATION SYSTEMS. -Vol. 49, nr. 2 (2017), s.167-192
- DOI:
- Digital Object Identifier (open in new tab) 10.1007/s10844-016-0438-z
- Verified by:
- Gdańsk University of Technology
Referenced datasets
- dataset MODALITY corpus - SPEAKER 35 - COMMANDS C1
- dataset MODALITY corpus - SPEAKER 21 - SEQUENCE S6
- dataset MODALITY corpus - SPEAKER 21 - COMMANDS C5
- dataset MODALITY corpus - SPEAKER 21 - SEQUENCE S4
- dataset MODALITY corpus - SPEAKER 10 - SEQUENCE S1
- dataset MODALITY corpus - SPEAKER 01 - SEQUENCE S2
- dataset MODALITY corpus - SPEAKER 39 - COMMANDS C1
- dataset MODALITY corpus - SPEAKER 01 - SEQUENCE S3
- dataset MODALITY corpus - SPEAKER 01 - COMMANDS C3
- dataset MODALITY corpus - SPEAKER 21 - SEQUENCE S2
seen 397 times
Recommended for you
Multimodal English corpus for automatic speech recognition
- B. Kunka,
- A. Kupryjanow,
- P. Dalka
- + 5 authors