Bimodal classification of English allophones employing acoustic speech signal and facial motion capture
Abstract
A method for automatic transcription of English speech into International Phonetic Alphabet (IPA) system is developed and studied. The principal objective of the study is to evaluate to what extent the visual data related to lip reading can enhance recognition accuracy of the transcription of English consonantal and vocalic allophones. To this end, motion capture markers were placed on the faces of seven speakers to obtain lip tracking data synchronized with the audio signal. 32 markers were used, 20 of which were placed on the speaker's inner lips and 4 on a special cap, which served as the point of reference and stabilized the FMC image while post-processing. Speech samples were simultaneously recorded as a list of approximately 300 words in which all English consonantal and vocalic allophones were represented. Different parameterization strategies were tested and the accuracy of vocalic segments
Citations
-
2
CrossRef
-
0
Web of Science
-
0
Scopus
Authors (3)
Cite as
Full text
full text is not available in portal
Keywords
Details
- Category:
- Articles
- Type:
- artykuł w czasopiśmie wyróżnionym w JCR
- Published in:
-
Journal of the Acoustical Society of America
no. 144,
edition 3,
pages 1801 - 1802,
ISSN: 0001-4966 - Language:
- English
- Publication year:
- 2018
- Bibliographic description:
- Czyżewski A., Zaporowski S., Kostek B.: Bimodal classification of English allophones employing acoustic speech signal and facial motion capture// Journal of the Acoustical Society of America. -Vol. 144, iss. 3 (2018), s.1801-1802
- DOI:
- Digital Object Identifier (open in new tab) 10.1121/1.5067951
- Verified by:
- Gdańsk University of Technology
seen 134 times
Recommended for you
MACHINE LEARNING–BASED ANALYSIS OF ENGLISH LATERAL ALLOPHONES
- M. Piotrowska,
- G. Korvel,
- B. Kostek
- + 2 authors