Didn't find any results in this catalog!
But we have some results in other catalogs.Filters
total: 18
Search results for: asr system
-
The Impact of Foreign Accents on the Performance of Whisper Family Models Using Medical Speech in Polish
PublicationThe article presents preliminary experiments investigating the impact of accent on the performance of the Whisper automatic speech recognition (ASR) system, specifically for the Polish language and medical data. The literature review revealed a scarcity of studies on the influence of accents on speech recognition systems in Polish, especially concerning medical terminology. The experiments involved voice cloning of selected individuals...
-
Comparison of Acoustic and Visual Voice Activity Detection for Noisy Speech Recognition
PublicationThe problem of accurate differentiating between the speaker utterance and the noise parts in a speech signal is considered. The influence of utilizing a voice activity detection in speech signals on the accuracy of the automatic speech recognition (ASR) system is presented. The examined methods of voice activity detection are based on acoustic and visual modalities. The problem of detecting the voice activity in clean and noisy...
-
Examining Influence of Distance to Microphone on Accuracy of Speech Recognition
PublicationThe problem of controlling a machine by the distant-talking speaker without a necessity of handheld or body-worn equipment usage is considered. A laboratory setup is introduced for examination of performance of the developed automatic speech recognition system fed by direct and by distant speech acquired by microphones placed at three different distances from the speaker (0.5 m to 1.5 m). For feature extraction from the voice signal...
-
Material for Automatic Phonetic Transcription of Speech Recorded in Various Conditions
PublicationAutomatic speech recognition (ASR) is under constant development, especially in cases when speech is casually produced or it is acquired in various environment conditions, or in the presence of background noise. Phonetic transcription is an important step in the process of full speech recognition and is discussed in the presented work as the main focus in this process. ASR is widely implemented in mobile devices technology, but...
-
Cost-effective methods of fabricating thin rare-earth element layers on SOC interconnects based on low-chromium ferritic stainless steel and exposed to air, humidified air or humidified hydrogen atmospheres
PublicationMost oxidation studies involving interconnects are conducted in air under isothermal conditions, but during real-life solid oxide cell (SOC) operation, cells are also exposed a mixture of hydrogen and water vapor. For this study, an Fe–16Cr low-chromium ferritic stainless steel was coated with different reactive element oxides – Gd2O3, CeO2, Ce0.9Y0.1O2 – using an array of methods: dip coating, electrodeposition and spray pyrolysis....
-
Optimizing Medical Personnel Speech Recognition Models Using Speech Synthesis and Reinforcement Learning
PublicationText-to-Speech synthesis (TTS) can be used to generate training data for building Automatic Speech Recognition models (ASR). Access to medical speech data is because it is sensitive data that is difficult to obtain for privacy reasons; TTS can help expand the data set. Speech can be synthesized by mimicking different accents, dialects, and speaking styles that may occur in a medical language. Reinforcement Learning (RL), in the...
-
An audio-visual corpus for multimodal automatic speech recognition
Publicationreview of available audio-visual speech corpora and a description of a new multimodal corpus of English speech recordings is provided. The new corpus containing 31 hours of recordings was created specifically to assist audio-visual speech recognition systems (AVSR) development. The database related to the corpus includes high-resolution, high-framerate stereoscopic video streams from RGB cameras, depth imaging stream utilizing Time-of-Flight...
-
Enhanced voice user interface employing spatial filtration of signals from acoustic vector sensor
PublicationSpatial filtration of sound is introduced to enhance speech recognition accuracy in noisy conditions. An acoustic vector sensor (AVS) is employed. The signals from the AVS probe are processed in order to attenuate the surrounding noise. As a result the signal to noise ratio is increased. An experiment is featured in which speech signals are disturbed by babble noise. The signals before and after spatial filtration are processed...
-
La 0.6 Sr 0.4 Co 0.2 Fe 0.8 O 3-δ oxygen electrodes for solid oxide cells prepared by polymer precursor and nitrates solution infiltration into gadolinium doped ceria backbone
PublicationInfiltration is a method, which can be applied for the electrode preparation. In this paper oxygen electrode is prepared solely by the infiltration of La0.6Sr0.4Co0.2Fe0.8O3‐δ (LSCF) into Ce0.8Gd0.2O2-δ (CGO) backbone. The use a polymer precursor as an infiltrating medium, instead of an aqueous nitrate salts solution is presented. It is shown that the polymer forms the single-phase perovskite at 600 °C, contrary to the nitrates...
-
Vocalic Segments Classification Assisted by Mouth Motion Capture
PublicationVisual features convey important information for automatic speech recognition (ASR), especially in noisy environment. The purpose of this study is to evaluate to what extent visual data (i.e. lip reading) can enhance recognition accuracy in the multi-modal approach. For that purpose motion capture markers were placed on speakers' faces to obtain lips tracking data during speaking. Different parameterizations strategies were tested...