Bimodal classification of English allophones employing acoustic speech signal and facial motion capture - Publication - Bridge of Knowledge

Search

Bimodal classification of English allophones employing acoustic speech signal and facial motion capture

Abstract

A method for automatic transcription of English speech into International Phonetic Alphabet (IPA) system is developed and studied. The principal objective of the study is to evaluate to what extent the visual data related to lip reading can enhance recognition accuracy of the transcription of English consonantal and vocalic allophones. To this end, motion capture markers were placed on the faces of seven speakers to obtain lip tracking data synchronized with the audio signal. 32 markers were used, 20 of which were placed on the speaker's inner lips and 4 on a special cap, which served as the point of reference and stabilized the FMC image while post-processing. Speech samples were simultaneously recorded as a list of approximately 300 words in which all English consonantal and vocalic allophones were represented. Different parameterization strategies were tested and the accuracy of vocalic segments

Citations

  • 1

    CrossRef

  • 0

    Web of Science

  • 0

    Scopus

Cite as

Full text

full text is not available in portal

Keywords

Details

Category:
Articles
Type:
artykuł w czasopiśmie wyróżnionym w JCR
Published in:
Journal of the Acoustical Society of America no. 144, edition 3, pages 1801 - 1802,
ISSN: 0001-4966
Language:
English
Publication year:
2018
Bibliographic description:
Czyżewski A., Zaporowski S., Kostek B.: Bimodal classification of English allophones employing acoustic speech signal and facial motion capture// Journal of the Acoustical Society of America. -Vol. 144, iss. 3 (2018), s.1801-1802
DOI:
Digital Object Identifier (open in new tab) 10.1121/1.5067951
Verified by:
Gdańsk University of Technology

seen 88 times

Recommended for you

Meta Tags