Search results for: corpora - Bridge of Knowledge

An audio-visual corpus for multimodal automatic speech recognition
Publication
- JOURNAL OF INTELLIGENT INFORMATION SYSTEMS - Year 2017
review of available audio-visual speech corpora and a description of a new multimodal corpus of English speech recordings is provided. The new corpus containing 31 hours of recordings was created specifically to assist audio-visual speech recognition systems (AVSR) development. The database related to the corpus includes high-resolution, high-framerate stereoscopic video streams from RGB cameras, depth imaging stream utilizing Time-of-Flight...

Full text available to download
Computer-assisted pronunciation training—Speech synthesis is almost all you need
Publication
- D. Korzekwa
- J. Lorenzo-trueba
- T. Drugman
- B. Kostek
- SPEECH COMMUNICATION - Year 2022
The research community has long studied computer-assisted pronunciation training (CAPT) methods in non-native speech. Researchers focused on studying various model architectures, such as Bayesian networks and deep learning methods, as well as on the analysis of different representations of the speech signal. Despite significant progress in recent years, existing CAPT methods are not able to detect pronunciation errors with high...

Full text available to download
Constructing a Dataset of Speech Recordingswith Lombard Effect
Publication
- D. Weber
- S. Zaporowski
- D. Korzekwa
- Year 2020
Thepurpose of therecordings was to create a speech corpus based on the ISLEdataset, extended with video and Lombard speech. Selected from a set of 165sentences, 10, evaluatedas having thehighest possibility to occur in the context ofthe Lombard effect,were repeated in the presence of the so-called babble speech to obtain Lombard speech features. Altogether,15speakers were recorded, and speech parameterswere...

Filters

Catalog

Category

Year

Options

An audio-visual corpus for multimodal automatic speech recognition

Computer-assisted pronunciation training—Speech synthesis is almost all you need

Constructing a Dataset of Speech Recordingswith Lombard Effect