Wyniki wyszukiwania dla: DYSARTHRIA DETECTION, SPEECH RECOGNITION, SPEECH SYNTHESIS, INTERPRETABLE DEEP LEARNING MODELS - MOST Wiedzy

Wyszukiwarka

Wyniki wyszukiwania dla: DYSARTHRIA DETECTION, SPEECH RECOGNITION, SPEECH SYNTHESIS, INTERPRETABLE DEEP LEARNING MODELS

Wyniki wyszukiwania dla: DYSARTHRIA DETECTION, SPEECH RECOGNITION, SPEECH SYNTHESIS, INTERPRETABLE DEEP LEARNING MODELS

  • Interpretable Deep Learning Model for the Detection and Reconstruction of Dysarthric Speech

    Publikacja
    • D. Korzekwa
    • R. Barra-Chicote
    • B. Kostek
    • T. Drugman
    • M. Łajszczak

    - Rok 2019

    We present a novel deep learning model for the detection and reconstruction of dysarthric speech. We train the model with a multi-task learning technique to jointly solve dysarthria detection and speech reconstruction tasks. The model key feature is a low-dimensional latent space that is meant to encode the properties of dysarthric speech. It is commonly believed that neural networks are black boxes that solve problems but do not...

    Pełny tekst do pobrania w portalu

  • Optimizing Medical Personnel Speech Recognition Models Using Speech Synthesis and Reinforcement Learning

    Text-to-Speech synthesis (TTS) can be used to generate training data for building Automatic Speech Recognition models (ASR). Access to medical speech data is because it is sensitive data that is difficult to obtain for privacy reasons; TTS can help expand the data set. Speech can be synthesized by mimicking different accents, dialects, and speaking styles that may occur in a medical language. Reinforcement Learning (RL), in the...

    Pełny tekst do pobrania w portalu

  • A survey of automatic speech recognition deep models performance for Polish medical terms

    Among the numerous applications of speech-to-text technology is the support of documentation created by medical personnel. There are many available speech recognition systems for doctors. Their effectiveness in languages such as Polish should be verified. In connection with our project in this field, we decided to check how well the popular speech recognition systems work, employing models trained for the general Polish language....

    Pełny tekst do pobrania w serwisie zewnętrznym

  • Analysis of 2D Feature Spaces for Deep Learning-based Speech Recognition

    Publikacja

    - JOURNAL OF THE AUDIO ENGINEERING SOCIETY - Rok 2018

    convolutional neural network (CNN) which is a class of deep, feed-forward artificial neural network. We decided to analyze audio signal feature maps, namely spectrograms, linear and Mel-scale cepstrograms, and chromagrams. The choice was made upon the fact that CNN performs well in 2D data-oriented processing contexts. Feature maps were employed in the Lithuanian word recognition task. The spectral analysis led to the highest word...

  • Intra-subject class-incremental deep learning approach for EEG-based imagined speech recognition

    Publikacja

    - Biomedical Signal Processing and Control - Rok 2023

    Brain–computer interfaces (BCIs) aim to decode brain signals and transform them into commands for device operation. The present study aimed to decode the brain activity during imagined speech. The BCI must identify imagined words within a given vocabulary and thus perform the requested action. A possible scenario when using this approach is the gradual addition of new words to the vocabulary using incremental learning methods....

    Pełny tekst do pobrania w serwisie zewnętrznym

  • SYNTHESIZING MEDICAL TERMS – QUALITY AND NATURALNESS OF THE DEEP TEXT-TO-SPEECH ALGORITHM

    The main purpose of this study is to develop a deep text-to-speech (TTS) algorithm designated for an embedded system device. First, a critical literature review of state-of-the-art speech synthesis deep models is provided. The algorithm implementation covers both hardware and algorithmic solutions. The algorithm is designed for use with the Raspberry Pi 4 board. 80 synthesized sentences were prepared based on medical and everyday...

    Pełny tekst do pobrania w portalu

  • Speech Analytics Based on Machine Learning

    In this chapter, the process of speech data preparation for machine learning is discussed in detail. Examples of speech analytics methods applied to phonemes and allophones are shown. Further, an approach to automatic phoneme recognition involving optimized parametrization and a classifier belonging to machine learning algorithms is discussed. Feature vectors are built on the basis of descriptors coming from the music information...

    Pełny tekst do pobrania w serwisie zewnętrznym

  • WYKORZYSTANIE SIECI NEURONOWYCH DO SYNTEZY MOWY WYRAŻAJĄCEJ EMOCJE

    Publikacja

    W niniejszym artykule przedstawiono analizę rozwiązań do rozpoznawania emocji opartych na mowie i możliwości ich wykorzystania w syntezie mowy z emocjami, wykorzystując do tego celu sieci neuronowe. Przedstawiono aktualne rozwiązania dotyczące rozpoznawania emocji w mowie i metod syntezy mowy za pomocą sieci neuronowych. Obecnie obserwuje się znaczny wzrost zainteresowania i wykorzystania uczenia głębokiego w aplikacjach związanych...

  • Investigating Feature Spaces for Isolated Word Recognition

    Publikacja

    - Rok 2018

    Much attention is given by researchers to the speech processing task in automatic speech recognition (ASR) over the past decades. The study addresses the issue related to the investigation of the appropriateness of a two-dimensional representation of speech feature spaces for speech recognition tasks based on deep learning techniques. The approach combines Convolutional Neural Networks (CNNs) and timefrequency signal representation...

  • Investigating Feature Spaces for Isolated Word Recognition

    Publikacja
    • P. Treigys
    • G. Korvel
    • G. Tamulevicius
    • J. Bernataviciene
    • B. Kostek

    - Rok 2020

    The study addresses the issues related to the appropriateness of a two-dimensional representation of speech signal for speech recognition tasks based on deep learning techniques. The approach combines Convolutional Neural Networks (CNNs) and time-frequency signal representation converted to the investigated feature spaces. In particular, waveforms and fractal dimension features of the signal were chosen for the time domain, and...

    Pełny tekst do pobrania w serwisie zewnętrznym

  • Deep neural networks for data analysis

    Kursy Online
    • K. Draszawka

    The aim of the course is to familiarize students with the methods of deep learning for advanced data analysis. Typical areas of application of these types of methods include: image classification, speech recognition and natural language understanding. Celem przedmiotu jest zapoznanie studentów z metodami głębokiego uczenia maszynowego na potrzeby zaawansowanej analizy danych. Do typowych obszarów zastosowań tego typu metod należą:...

  • Language Models in Speech Recognition

    Publikacja

    - Rok 2022

    This chapter describes language models used in speech recognition, It starts by indicating the role and the place of language models in speech recognition. Mesures used to compare language models follow. An overview of n-gram, syntactic, semantic, and neural models is given. It is accompanied by a list of popular software.

    Pełny tekst do pobrania w serwisie zewnętrznym

  • Minimizing Distribution and Data Loading Overheads in Parallel Training of DNN Acoustic Models with Frequent Parameter Averaging

    Publikacja

    In the paper we investigate the performance of parallel deep neural network training with parameter averaging for acoustic modeling in Kaldi, a popular automatic speech recognition toolkit. We describe experiments based on training a recurrent neural network with 4 layers of 800 LSTM hidden states on a 100-hour corpora of annotated Polish speech data. We propose a MPI-based modification of the training program which minimizes the...

    Pełny tekst do pobrania w serwisie zewnętrznym

  • Automated detection of pronunciation errors in non-native English speech employing deep learning

    Publikacja

    - Rok 2023

    Despite significant advances in recent years, the existing Computer-Assisted Pronunciation Training (CAPT) methods detect pronunciation errors with a relatively low accuracy (precision of 60% at 40%-80% recall). This Ph.D. work proposes novel deep learning methods for detecting pronunciation errors in non-native (L2) English speech, outperforming the state-of-the-art method in AUC metric (Area under the Curve) by 41%, i.e., from...

    Pełny tekst do pobrania w portalu

  • Orken Mamyrbayev Professor

    Osoby

    1.  Education: Higher. In 2001, graduated from the Abay Almaty State University (now Abay Kazakh National Pedagogical University), in the specialty: Computer science and computerization manager. 2.  Academic degree: Ph.D. in the specialty "6D070300-Information systems". The dissertation was defended in 2014 on the topic: "Kazakh soileulerin tanudyn kupmodaldy zhuyesin kuru". Under my supervision, 16 masters, 1 dissertation...

  • Modeling and Simulation for Exploring Power/Time Trade-off of Parallel Deep Neural Network Training

    In the paper we tackle bi-objective execution time and power consumption optimization problem concerning execution of parallel applications. We propose using a discrete-event simulation environment for exploring this power/time trade-off in the form of a Pareto front. The solution is verified by a case study based on a real deep neural network training application for automatic speech recognition. A simulation lasting over 2 hours...

    Pełny tekst do pobrania w portalu

  • Detecting Lombard Speech Using Deep Learning Approach

    Publikacja
    • K. Kąkol
    • G. Korvel
    • G. Tamulevicius
    • B. Kostek

    - SENSORS - Rok 2023

    Robust Lombard speech-in-noise detecting is challenging. This study proposes a strategy to detect Lombard speech using a machine learning approach for applications such as public address systems that work in near real time. The paper starts with the background concerning the Lombard effect. Then, assumptions of the work performed for Lombard speech detection are outlined. The framework proposed combines convolutional neural networks...

    Pełny tekst do pobrania w portalu

  • Performance Analysis of the OpenCL Environment on Mobile Platforms

    Publikacja

    Today’s smartphones have more and more features that so far were only assigned to personal computers. Every year these devices are composed of better and more efficient components. Everything indicates that modern smartphones are replacing ordinary computers in various activities. High computing power is required for tasks such as image processing, speech recognition and object detection. This paper analyses the performance of...

    Pełny tekst do pobrania w serwisie zewnętrznym

  • Visual Lip Contour Detection for the Purpose of Speech Recognition

    Publikacja

    A method for visual detection of lip contours in frontal recordings of speakers is described and evaluated. The purpose of the method is to facilitate speech recognition with visual features extracted from a mouth region. Different Active Appearance Models are employed for finding lips in video frames and for lip shape and texture statistical description. Search initialization procedure is proposed and error measure values are...

  • Comparison of Acoustic and Visual Voice Activity Detection for Noisy Speech Recognition

    Publikacja

    The problem of accurate differentiating between the speaker utterance and the noise parts in a speech signal is considered. The influence of utilizing a voice activity detection in speech signals on the accuracy of the automatic speech recognition (ASR) system is presented. The examined methods of voice activity detection are based on acoustic and visual modalities. The problem of detecting the voice activity in clean and noisy...

  • Evaluation of Lombard Speech Models in the Context of Speech in Noise Enhancement

    Publikacja

    - IEEE Access - Rok 2020

    The Lombard effect is one of the most well-known effects of noise on speech production. Speech with the Lombard effect is more easily recognizable in noisy environments than normal natural speech. Our previous investigations showed that speech synthesis models might retain Lombard-effect characteristics. In this study, we investigate several speech models, such as harmonic, source-filter, and sinusoidal, applied to Lombard speech...

    Pełny tekst do pobrania w portalu

  • Noise profiling for speech enhancement employing machine learning models

    Publikacja

    - Journal of the Acoustical Society of America - Rok 2022

    This paper aims to propose a noise profiling method that can be performed in near real-time based on machine learning (ML). To address challenges related to noise profiling effectively, we start with a critical review of the literature background. Then, we outline the experiment performed consisting of two parts. The first part concerns the noise recognition model built upon several baseline classifiers and noise signal features...

    Pełny tekst do pobrania w portalu

  • Hybrid of Neural Networks and Hidden Markov Models as a modern approach to speech recognition systems

    The aim of this paper is to present a hybrid algorithm that combines the advantages ofartificial neural networks and hidden Markov models in speech recognition for control purpos-es. The scope of the paper includes review of currently used solutions, description and analysis of implementation of selected artificial neural network (NN) structures and hidden Markov mod-els (HMM). The main part of the paper consists of a description...

    Pełny tekst do pobrania w portalu

  • Transient detection for speech coding applications

    Signal quality in speech codecs may be improved by selecting transients from speech signal and encoding them using a suitable method. This paper presents an algorithm for transient detection in speech signal. This algorithm operates in several frequency bands. Transient detection functions are calculated from energy measured in short frames of the signal. The final selection of transient frames is based on results of detection...

    Pełny tekst do pobrania w serwisie zewnętrznym

  • Improvement of speech intelligibility in the presence of noise interference using the Lombard effect and an automatic noise interference profiling based on deep learning

    Publikacja
    • K. Kąkol

    - Rok 2023

    The Lombard effect is a phenomenon that results in speech intelligibility improvement when applied to noise. There are many distinctive features of Lombard speech that were recalled in this dissertation. This work proposes the creation of a system capable of improving speech quality and intelligibility in real-time measured by objective metrics and subjective tests. This system consists of three main components: speech type detection,...

    Pełny tekst do pobrania w portalu

  • Examining Influence of Distance to Microphone on Accuracy of Speech Recognition

    The problem of controlling a machine by the distant-talking speaker without a necessity of handheld or body-worn equipment usage is considered. A laboratory setup is introduced for examination of performance of the developed automatic speech recognition system fed by direct and by distant speech acquired by microphones placed at three different distances from the speaker (0.5 m to 1.5 m). For feature extraction from the voice signal...

    Pełny tekst do pobrania w serwisie zewnętrznym

  • Multimodal English corpus for automatic speech recognition

    A multimodal corpus developed for research of speech recognition based on audio-visual data is presented. Besides usual video and sound excerpts, the prepared database contains also thermovision images and depth maps. All streams were recorded simultaneously, therefore the corpus enables to examine the importance of the information provided by different modalities. Based on the recordings, it is also possible to develop a speech...

  • Interpretable deep learning approach for classification of breast cancer - a comparative analysis of multiple instance learning models

    Breast cancer is the most frequent female cancer. Its early diagnosis increases the chances of a complete cure for the patient. Suitably designed deep learning algorithms can be an excellent tool for quick screening analysis and support radiologists and oncologists in diagnosing breast cancer.The design of a deep learning-based system for automated breast cancer diagnosis is not easy due to the lack of annotated data, especially...

    Pełny tekst do pobrania w serwisie zewnętrznym

  • An audio-visual corpus for multimodal automatic speech recognition

    review of available audio-visual speech corpora and a description of a new multimodal corpus of English speech recordings is provided. The new corpus containing 31 hours of recordings was created specifically to assist audio-visual speech recognition systems (AVSR) development. The database related to the corpus includes high-resolution, high-framerate stereoscopic video streams from RGB cameras, depth imaging stream utilizing Time-of-Flight...

    Pełny tekst do pobrania w portalu

  • Computer-assisted pronunciation training—Speech synthesis is almost all you need

    Publikacja

    - SPEECH COMMUNICATION - Rok 2022

    The research community has long studied computer-assisted pronunciation training (CAPT) methods in non-native speech. Researchers focused on studying various model architectures, such as Bayesian networks and deep learning methods, as well as on the analysis of different representations of the speech signal. Despite significant progress in recent years, existing CAPT methods are not able to detect pronunciation errors with high...

    Pełny tekst do pobrania w portalu

  • An Attempt to Create Speech Synthesis Model That Retains Lombard Effect Characteristics

    Publikacja

    - Rok 2019

    The speech with the Lombard effect has been extensively studied in the context of speech recognition or speech enhancement. However, few studies have investigated the Lombard effect in the context of speech synthesis. The aim of this paper is to create a mathematical model that allows for retaining the Lombard effect. These models could be used as a basis of a formant speech synthesizer. The proposed models are based on dividing...

    Pełny tekst do pobrania w portalu

  • Speech synthesis controlled by eye gazing

    Publikacja

    A method of communication based on eye gaze controlling is presented. Investigations of using gaze tracking have been carried out in various context applications. The solution proposed in the paper could be referred to as ''talking by eyes'' providing an innovative approach in the domain of speech synthesis. The application proposed is dedicated to disabled people, especially to persons in a so-called locked-in syndrome who cannot...

  • Silence/noise detection for speech and music signals

    Publikacja

    - Rok 2008

    This paper introduces a novel off-line algorithm for silence/noise detection in noisy signals. The main concept of the proposed algorithm is to provide noise patterns for further signals processing i.e. noise reduction for speech enhancement. The algorithm is based on frequency domain characteristics of signals. The examples of different types of noisy signals are presented.

  • A Study of Cross-Linguistic Speech Emotion Recognition Based on 2D Feature Spaces

    Publikacja
    • G. Tamulevicius
    • G. Korvel
    • A. B. Yayak
    • P. Treigys
    • J. Bernataviciene
    • B. Kostek

    - Electronics - Rok 2020

    In this research, a study of cross-linguistic speech emotion recognition is performed. For this purpose, emotional data of different languages (English, Lithuanian, German, Spanish, Serbian, and Polish) are collected, resulting in a cross-linguistic speech emotion dataset with the size of more than 10.000 emotional utterances. Despite the bi-modal character of the databases gathered, our focus is on the acoustic representation...

    Pełny tekst do pobrania w portalu

  • Virtual keyboard controlled by eye gaze employing speech synthesis

    Publikacja

    The article presents the speech synthesis integrated into the eye gaze tracking system. This approach can significantly improve the quality of life of physically disabled people who are unable to communicate. The virtual keyboard (QWERTY) is an interface which allows for entering the text for the speech synthesizer. First, this article describes a methodology of determining the fixation point on a computer screen. Then it presents...

  • Virtual Keyboard controlled by eye gaze employing speech synthesis

    The article presents the speech synthesis integrated into the eye gaze tracking system. This approach can significantly improve the quality of life of physically disabled people who are unable to communicate. The virtual keyboard (QWERTY) is an interface which allows for entering the text for the speech synthesizer. First, this article describes a methodology of determining the fixation point on a computer screen. Then it presents...

    Pełny tekst do pobrania w serwisie zewnętrznym

  • Comparison of Language Models Trained on Written Texts and Speech Transcripts in the Context of Automatic Speech Recognition

    Publikacja
    • S. Dziadzio
    • A. Nabożny
    • A. Smywiński-Pohl
    • B. Ziółko

    - Rok 2015

    Pełny tekst do pobrania w serwisie zewnętrznym

  • Recognition of Emotions in Speech Using Convolutional Neural Networks on Different Datasets

    Artificial Neural Network (ANN) models, specifically Convolutional Neural Networks (CNN), were applied to extract emotions based on spectrograms and mel-spectrograms. This study uses spectrograms and mel-spectrograms to investigate which feature extraction method better represents emotions and how big the differences in efficiency are in this context. The conducted studies demonstrated that mel-spectrograms are a better-suited...

    Pełny tekst do pobrania w portalu

  • Deep Learning: A Case Study for Image Recognition Using Transfer Learning

    Publikacja

    - Rok 2021

    Deep learning (DL) is a rising star of machine learning (ML) and artificial intelligence (AI) domains. Until 2006, many researchers had attempted to build deep neural networks (DNN), but most of them failed. In 2006, it was proven that deep neural networks are one of the most crucial inventions for the 21st century. Nowadays, DNN are being used as a key technology for many different domains: self-driven vehicles, smart cities,...

    Pełny tekst do pobrania w serwisie zewnętrznym

  • Artur Gańcza dr inż.

    I received the M.Sc. degree from the Gdańsk University of Technology (GUT), Gdańsk, Poland, in 2019. I am currently a Ph.D. student at GUT, with the Department of Automatic Control, Faculty of Electronics, Telecommunications and Informatics. My professional interests include speech recognition, system identification, adaptive signal processing and linear algebra.

  • Weakly-Supervised Word-Level Pronunciation Error Detection in Non-Native English Speech

    Publikacja
    • D. Korzekwa
    • J. Lorenzo-trueba
    • T. Drugman
    • S. Calamaro
    • B. Kostek

    - Rok 2021

    We propose a weakly-supervised model for word-level mispronunciation detection in non-native (L2) English speech. To train this model, phonetically transcribed L2 speech is not required and we only need to mark mispronounced words. The lack of phonetic transcriptions for L2 speech means that the model has to learn only from a weak signal of word-level mispronunciations. Because of that and due to the limited amount of mispronounced...

    Pełny tekst do pobrania w portalu

  • EXAMINING INFLUENCE OF VIDEO FRAMERATE AND AUDIO/VIDEO SYNCHRONIZATION ON AUDIO-VISUAL SPEECH RECOGNITION ACCURACY

    Publikacja

    The problem of video framerate and audio/video synchronization in audio-visual speech recognition is considered. The visual features are added to the acoustic parameters in order to improve the accuracy of speech recognition in noisy conditions. The Mel-Frequency Cepstral Coefficients are used on the acoustic side whereas Active Appearance Model features are extracted from the image. The feature fusion approach is employed. The...

  • EXAMINING INFLUENCE OF VIDEO FRAMERATE AND AUDIO/VIDEO SYNCHRONIZATION ON AUDIO-VISUAL SPEECH RECOGNITION ACCURACY

    Publikacja

    The problem of video framerate and audio/video synchronization in audio-visual speech recogni-tion is considered. The visual features are added to the acoustic parameters in order to improve the accuracy of speech recognition in noisy conditions. The Mel-Frequency Cepstral Coefficients are used on the acoustic side whereas Active Appearance Model features are extracted from the image. The feature fusion approach is employed. The...

  • Training of Deep Learning Models Using Synthetic Datasets

    Publikacja

    - Rok 2022

    In order to solve increasingly complex problems, the complexity of Deep Neural Networks also needs to be constantly increased, and therefore training such networks requires more and more data. Unfortunately, obtaining such massive real world training data to optimize neural networks parameters is a challenging and time-consuming task. To solve this problem, we propose an easy-touse and general approach to training deep learning...

    Pełny tekst do pobrania w serwisie zewnętrznym

  • Rediscovering Automatic Detection of Stuttering and Its Subclasses through Machine Learning—The Impact of Changing Deep Model Architecture and Amount of Data in the Training Set

    Publikacja

    - Applied Sciences-Basel - Rok 2023

    This work deals with automatically detecting stuttering and its subclasses. An effective classification of stuttering along with its subclasses could find wide application in determining the severity of stuttering by speech therapists, preliminary patient diagnosis, and enabling communication with the previously mentioned voice assistants. The first part of this work provides an overview of examples of classical and deep learning...

    Pełny tekst do pobrania w portalu

  • Language material for English audiovisual speech recognition system developmen . Materiał językowy do wykorzystania w systemie audiowizualnego rozpoznawania mowy angielskiej

    Publikacja

    - Rok 2013

    The bi-modal speech recognition system requires a 2-sample language input for training and for testing algorithms which precisely depicts natural English speech. For the purposes of the audio-visual recordings, a training data base of 264 sentences (1730 words without repetitions; 5685 sounds) has been created. The language sample reflects vowel and consonant frequencies in natural speech. The recording material reflects both the...

  • Deep learning techniques for biometric security: A systematic review of presentation attack detection systems

    Publikacja

    - ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE - Rok 2024

    Biometric technology, including finger vein, fingerprint, iris, and face recognition, is widely used to enhance security in various devices. In the past decade, significant progress has been made in improving biometric sys- tems, thanks to advancements in deep convolutional neural networks (DCNN) and computer vision (CV), along with large-scale training datasets. However, these systems have become targets of various attacks, with...

    Pełny tekst do pobrania w serwisie zewnętrznym

  • Estimation of the excitation variances of speech and noise AR-models for enhanced speech coding

    Publikacja

    - Rok 2001

    Pełny tekst do pobrania w serwisie zewnętrznym

  • Speech recognition system for hearing impaired people.

    Publikacja

    - Rok 2005

    Praca przedstawia wyniki badań z zakresu rozpoznawania mowy. Tworzony system wykorzystujący dane wizualne i akustyczne będzie ułatwiał trening poprawnego mówienia dla osób po operacji transplantacji ślimaka i innych osób wykazujących poważne uszkodzenia słuchu. Active Shape models zostały wykorzystane do wyznaczania parametrów wizualnych na podstawie analizy kształtu i ruchu ust w nagraniach wideo. Parametry akustyczne bazują na...

  • Optimized Deep Learning Model for Flood Detection Using Satellite Images

    Publikacja
    • A. Stateczny
    • H. D. Praveena
    • R. H. Krishnappa
    • K. R. Chythanya
    • B. B. Babysarojam

    - Remote Sensing - Rok 2023

    The increasing amount of rain produces a number of issues in Kerala, particularly in urban regions where the drainage system is frequently unable to handle a significant amount of water in such a short duration. Meanwhile, standard flood detection results are inaccurate for complex phenomena and cannot handle enormous quantities of data. In order to overcome those drawbacks and enhance the outcomes of conventional flood detection...

    Pełny tekst do pobrania w portalu