Vocalic Segments Classification Assisted by Mouth Motion Capture

Sebastian Cygert; Grzegorz Szwoch; Szymon Zaporowski; Andrzej Czyżewski

doi:10.1109/hsi.2018.8430943

Vocalic Segments Classification Assisted by Mouth Motion Capture

Abstract

Visual features convey important information for automatic speech recognition (ASR), especially in noisy environment. The purpose of this study is to evaluate to what extent visual data (i.e. lip reading) can enhance recognition accuracy in the multi-modal approach. For that purpose motion capture markers were placed on speakers' faces to obtain lips tracking data during speaking. Different parameterizations strategies were tested and the accuracy of phonemes recognition in different experiments was analyzed. The obtained results and further challenges related to the bi-modal feature extraction process and decision systems employment are discussed.

Citations

1

CrossRef
0

Web of Science
3

Scopus

Authors (4)

Cite as

Full text

full text is not available in portal

full content of the article see on external site open in new tab

Keywords

LIP-READING, FACIAL MOTION CAPTURE, SPEECH RECOGNITION, VOCALIC SEGMENTS

Details

Category:: Conference activity
Type:: publikacja w wydawnictwie zbiorowym recenzowanym (także w materiałach konferencyjnych)
Title of issue:: 2018 11th International Conference on Human System Interaction (HSI) strony 318 - 324
Language:: English
Publication year:: 2018
Bibliographic description:: Cygert S., Szwoch G., Zaporowski S., Czyżewski A.: Vocalic Segments Classification Assisted by Mouth Motion Capture// 2018 11th International Conference on Human System Interaction (HSI)/ : , 2018, s.318-324
DOI:: Digital Object Identifier (open in new tab) 10.1109/hsi.2018.8430943
Verified by:: Gdańsk University of Technology

seen 134 times

Vocalic Segments Classification Assisted by Mouth Motion Capture

Abstract

Citations

Authors (4)

Sebastian Cygert dr inż.

Grzegorz Szwoch dr hab. inż.

Szymon Zaporowski mgr inż.

Andrzej Czyżewski prof. dr hab. inż.

Cite as

Full text

Keywords

Details

Recommended for you

Bimodal classification of English allophones employing acoustic speech signal and facial motion capture

Selection of Features for Multimodal Vocalic Segments Classification

Database of speech and facial expressions recorded with optimized face motion capture settings

Recognition of Emotions in Speech Using Convolutional Neural Networks on Different Datasets

Search

Vocalic Segments Classification Assisted by Mouth Motion Capture

Abstract

Citations

Authors (4)

Sebastian Cygert dr inż.

Grzegorz Szwoch dr hab. inż.

Szymon Zaporowski mgr inż.

Andrzej Czyżewski prof. dr hab. inż.

Cite as

Full text

Keywords

Details

Recommended for you

Bimodal classification of English allophones employing acoustic speech signal and facial motion capture

Selection of Features for Multimodal Vocalic Segments Classification

Database of speech and facial expressions recorded with optimized face motion capture settings

Recognition of Emotions in Speech Using Convolutional Neural Networks on Different Datasets